Automated theorem proving

Automated theorem proving (also known as ATP or automated deduction) is a subfield of

mathematical theorems by computer programs. Automated reasoning over mathematical proof was a major impetus for the development of computer science

.

Logical foundations

While the roots of formalised

Herbrand universe and a Herbrand interpretation that allowed (un)satisfiability of first-order formulas (and hence the validity of a theorem) to be reduced to (potentially infinitely many) propositional satisfiability problems.^[5]

In 1929,

language was true or false.^[6]^[7]

However, shortly after this positive result, Kurt Gödel published On Formally Undecidable Propositions of Principia Mathematica and Related Systems (1931), showing that in any sufficiently strong axiomatic system there are true statements that cannot be proved in the system. This topic was further developed in the 1930s by Alonzo Church and Alan Turing, who on the one hand gave two independent but equivalent definitions of computability, and on the other gave concrete examples of undecidable questions.

First implementations

Shortly after

propositional logic of the Principia Mathematica, developed by Allen Newell, Herbert A. Simon and J. C. Shaw. Also running on a JOHNNIAC, the Logic Theorist constructed proofs from a small set of propositional axioms and three deduction rules: modus ponens, (propositional) variable substitution, and the replacement of formulas by their definition. The system used heuristic guidance, and managed to prove 38 of the first 52 theorems of the Principia.^[7]

The "heuristic" approach of the Logic Theorist tried to emulate human mathematicians, and could not guarantee that a proof could be found for every valid theorem even in principle. In contrast, other, more systematic algorithms achieved, at least theoretically,

Herbrand universe. The propositional formulas could then be checked for unsatisfiability using a number of methods. Gilmore's program used conversion to disjunctive normal form, a form in which the satisfiability of a formula is obvious.^[7]^[9]

Decidability of the problem

Depending on the underlying logic, the problem of deciding the validity of a formula varies from trivial to impossible. For the common case of

computably enumerable

: given unbounded resources, any valid formula can eventually be proven. However, invalid formulas (those that are not entailed by a given theory), cannot always be recognized.

The above applies to first-order theories, such as

Gödel's incompleteness theorem, we know that any consistent theory whose axioms are true for the natural numbers cannot prove all first-order statements true for the natural numbers, even if the list of axioms is allowed to be infinite enumerable. It follows that an automated theorem prover will fail to terminate while searching for a proof precisely when the statement being investigated is undecidable in the theory being used, even if it is true in the model of interest. Despite this theoretical limit, in practice, theorem provers can solve many hard problems, even in models that are not fully described by any first-order theory (such as the integers

).

Applications

Commercial use of automated theorem proving is mostly concentrated in

floating point units of modern microprocessors have been designed with extra scrutiny. AMD, Intel and others use automated theorem proving to verify that division and other operations are correctly implemented in their processors.^[12]

Other uses of theorem provers include program synthesis, constructing programs that satisfy a formal specification.^[13] Automated theorem provers have been integrated with proof assistants, including Isabelle/HOL.^[14]

First-order theorem proving

In the late 1960s agencies funding research in automated deduction began to emphasize the need for practical applications.^[

program verification whereby first-order theorem provers were applied to the problem of verifying the correctness of computer programs in languages such as Pascal, Ada, etc. Notable among early program verification systems was the Stanford Pascal Verifier developed by David Luckham at Stanford University.^[15]^[16]^[17] This was based on the Stanford Resolution Prover also developed at Stanford using John Alan Robinson's resolution principle. This was the first automated deduction system to demonstrate an ability to solve mathematical problems that were announced in the Notices of the American Mathematical Society before solutions were formally published.^{[citation needed}

]
First-order theorem proving is one of the most mature subfields of automated theorem proving. The logic is expressive enough to allow the specification of arbitrary problems, often in a reasonably natural and intuitive way. On the other hand, it is still semi-decidable, and a number of sound and complete calculi have been developed, enabling fully automated systems.^[18] More expressive logics, such as higher-order logics, allow the convenient expression of a wider range of problems than first-order logic, but theorem proving for these logics is less well developed.^[19]^[20]

Relationship with SMT

There is substantial overlap between first-order automated theorem provers and SMT solvers. Generally, automated theorem provers focus on supporting full first-order logic with quantifiers, whereas SMT solvers focus more on supporting various theories (interpreted predicate symbols). ATPs excel at problems with lots of quantifiers, whereas SMT solvers do well on large problems without quantifiers.^[21] The line is blurry enough that some ATPs participate in SMT-COMP, while some SMT solvers participate in CASC.^[22]

Benchmarks, competitions, and sources

The quality of implemented systems has benefited from the existence of a large library of standard benchmark examples—the Thousands of Problems for Theorem Provers (TPTP) Problem Library^[23]—as well as from the CADE ATP System Competition (CASC), a yearly competition of first-order systems for many important classes of first-order problems.
Some important systems (all have won at least one CASC competition division) are listed below.

E is a high-performance prover for full first-order logic, but built on a purely equational calculus, originally developed in the automated reasoning group of Technical University of Munich under the direction of Wolfgang Bibel, and now at Baden-Württemberg Cooperative State University in Stuttgart
.

Mace4
.

SETHEO is a high-performance system based on the goal-directed model elimination calculus, originally developed by a team under direction of Wolfgang Bibel. E and SETHEO have been combined (with other systems) in the composite theorem prover E-SETHEO.

Vampire was originally developed and implemented at Manchester University by Andrei Voronkov and Kryštof Hoder. It is now developed by a growing international team. It has won the FOF division (among other divisions) at the CADE ATP System Competition regularly since 2001.^[24]

Waldmeister is a specialized system for unit-equational first-order logic developed by Arnim Buch and Thomas Hillenbrand. It won the CASC UEQ division for fourteen consecutive years (1997–2010).

Max Planck Institute for Computer Science
.

The Theorem Prover Museum [25] is an initiative to conserve the sources of theorem prover systems for future analysis, since they are important cultural/scientific artefacts. It has the sources of many of the systems mentioned above.

Popular techniques

This article is in
prose. You can help by converting this article, if appropriate. Editing help
is available. (December 2023)

unification

Model elimination

Method of analytic tableaux

Superposition and term rewriting

Model checking

Mathematical induction^[26]

Binary decision diagrams

DPLL

Higher-order unification

Quantifier elimination^[27]

Software systems

See also: Proof assistant § Comparison, and Category:Theorem proving software systems

Comparison
Name License type Web service Library Standalone Last update (
YYYY-mm-dd format
)

ACL2
3-clause BSD
No No Yes May 2019

Prover9/Otter Public Domain Via System on TPTP Yes No 2009

Jape
GPLv2
Yes Yes No May 15, 2015

PVS
GPLv2
No Yes No January 14, 2013

EQP ? No Yes No May 2009

PhoX ? No Yes No September 28, 2017

E

GPL
Via System on TPTP No Yes July 4, 2017

SNARK
Mozilla Public License 1.1 No Yes No 2012

Vampire
Vampire License Via System on TPTP Yes Yes December 14, 2017

Theorem Proving System (TPS) TPS Distribution Agreement No Yes No February 4, 2012

SPASS
FreeBSD license
Yes Yes Yes November 2005

IsaPlanner
GPL
No Yes Yes 2007

KeY
GPL
Yes Yes Yes October 11, 2017

Z3 Theorem Prover MIT License Yes Yes Yes November 19, 2019

Free software

Alt-Ergo

Automath

CVC

E

IsaPlanner

LCF

Mizar

NuPRL

Paradox

Prover9

PVS

SPARK (programming language)

Twelf

Z3 Theorem Prover

Proprietary software

CARINE

Wolfram Mathematica

ResearchCyc

See also

Curry–Howard correspondence

Symbolic computation

Ramanujan machine

Computer-aided proof

Formal verification

Logic programming

Proof checking

Model checking

Proof complexity

Computer algebra system

Program analysis (computer science)

General Problem Solver

Metamath language for formalized mathematics

De Bruijn factor

Notes

^ Frege, Gottlob (1879). Begriffsschrift. Verlag Louis Neuert.

^ Frege, Gottlob (1884). Die Grundlagen der Arithmetik (PDF). Breslau: Wilhelm Kobner. Archived from the original (PDF) on 2007-09-26. Retrieved 2012-09-02.

^ Russell, Bertrand; Whitehead, Alfred North (1910–1913). Principia Mathematica (1st ed.). Cambridge University Press.

^ Russell, Bertrand; Whitehead, Alfred North (1927). Principia Mathematica (2nd ed.). Cambridge University Press.

^ Herbrand, J. (1930). Recherches sur la théorie de la démonstration (PhD) (in French). University of Paris.

^ Presburger, Mojżesz (1929). "Über die Vollständigkeit eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt". Comptes Rendus du I Congrès de Mathématiciens des Pays Slaves. Warszawa: 92–101.

^ ^a ^b ^c ^d Davis, Martin (2001). "The Early History of Automated Deduction". Robinson & Voronkov 2001. Archived from the original on 2012-07-28. Retrieved 2012-09-08.

^ Bibel, Wolfgang (2007). "Early History and Perspectives of Automated Deduction" (PDF). Ki 2007. LNAI (4667). Springer: 2–18. Archived (PDF) from the original on 2022-10-09. Retrieved 2 September 2012.

doi:10.1147/rd.41.0028
.

S2CID 30847540
.

^ Kolata, Gina (December 10, 1996). "Computer Math Proof Shows Reasoning Power". The New York Times. Retrieved 2008-10-11.

ISBN 978-981-15-6401-7
, retrieved 2024-02-10

CiteSeerX 10.1.1.62.4976
.

S2CID 7716709
.

^ Luckham, David C.; Suzuki, Norihisa (Mar 1976). Automatic Program Verification V: Verification-Oriented Proof Rules for Arrays, Records, and Pointers (Technical Report AD-A027 455). Defense Technical Information Center. Archived from the original on August 12, 2021.

S2CID 10088183
.

^ Luckham, D.; German, S.; von Henke, F.; Karp, R.; Milne, P.; Oppen, D.; Polak, W.; Scherlis, W. (1979). Stanford Pascal verifier user manual (Technical report). Stanford University. CS-TR-79-731.

S2CID 14361631
.

^ Kerber, Manfred. "How to prove higher order theorems in first order logic." (1999).

^ Benzmüller, Christoph, et al. "LEO-II-a cooperative automatic theorem prover for classical higher-order logic (system description)." International Joint Conference on Automated Reasoning. Berlin, Germany and Heidelberg: Springer, 2008.

S2CID 5389933
. ATPs and SMT solvers have complementary strengths. The former handle quantifiers more elegantly, whereas the latter excel on large, mostly ground problems.

doi:10.3233/SAT190123
. In recent years, we have seen a blurring of lines between SMT-COMP and CASC with SMT solvers competing in CASC and ATPs competing in SMT-COMP.

^ Sutcliffe, Geoff. "The TPTP Problem Library for Automated Theorem Proving". Retrieved 15 July 2019.

^ "History". vprover.github.io.

^ "The Theorem Prover Museum". Michael Kohlhase. Retrieved 2022-11-20.

hdl:1842/3394
.

^ Gabbay, Dov M., and Hans Jürgen Ohlbach. "Quantifier elimination in second-order predicate logic." (1992).

References

Chang, Chin-Liang; Lee, Richard Char-Tung (2014) [1973]. Symbolic Logic and Mechanical Theorem Proving. Elsevier.
ISBN 9780080917283
.

ISBN 9781483296777
.

ISBN 978-1461396871
.

ISBN 978-0-486-78082-5
. This material may be reproduced for any educational purpose, ...

Duffy, David A. (1991). Principles of Automated Theorem Proving. Wiley.
ISBN 9780471927846
.

ISBN 9780079112514
.

ISBN 9780262182232
.

ISBN 9781461223603
.

External links

A list of theorem proving tools

v
t
e
Mathematical logic
General

Axiom
list

Cardinality

First-order logic

Formal proof

Formal semantics

Foundations of mathematics

Information theory

Lemma

Logical consequence

Model

Theorem

Theory

Type theory

Theorems (list)
and paradoxes

Gödel's completeness and incompleteness theorems

Tarski's undefinability

Banach–Tarski paradox

Cantor's theorem, paradox and diagonal argument

Compactness

Halting problem

Lindström's

Löwenheim–Skolem

Russell's paradox

Logics
Traditional

Classical logic

Logical truth

Tautology

Proposition

Inference

Logical equivalence

Consistency
Equiconsistency

Argument

Soundness

Validity

Syllogism

Square of opposition

Venn diagram

Propositional

Boolean algebra

Boolean functions

Logical connectives

Propositional calculus

Propositional formula

Truth tables

Many-valued logic
3

finite

∞

Predicate

First-order
list

Second-order
Monadic

Higher-order

Fixed-point

Free

Quantifiers

Predicate

Monadic predicate calculus

Set theory

Set
hereditary

Class

(Ur-)Element

Ordinal number

Extensionality

Forcing

Relation
equivalence

partition

Set operations:
intersection

union

complement

Cartesian product

power set

identities

Types of sets

Countable

Uncountable

Empty

Inhabited

Singleton

Finite

Infinite

Transitive

Ultrafilter

Recursive

Fuzzy

Universal

Universe
constructible

Grothendieck

Von Neumann

Maps and cardinality

Function/Map
domain

codomain

image

In/Sur/Bi-jection

Schröder–Bernstein theorem

Isomorphism

Gödel numbering

Enumeration

Large cardinal
inaccessible

Aleph number

Operation
binary

Set theories

Zermelo–Fraenkel
axiom of choice

continuum hypothesis

General

Kripke–Platek

Morse–Kelley

Naive

New Foundations

Tarski–Grothendieck

Von Neumann–Bernays–Gödel

Ackermann

Constructive

Formal systems (list),
language and syntax

Alphabet

Arity

Automata

Axiom schema

Expression
ground

Extension
by definition

conservative

Relation

Formation rule

Grammar

Formula
atomic

closed

ground

open

Free/bound variable

Language

Metalanguage

Logical connective
¬

∨

∧

→

↔

=

Predicate
functional

variable

propositional variable

Proof

Quantifier
∃

!

∀

rank

Sentence
atomic

spectrum

Signature

String

Substitution

Symbol
function

logical/constant

non-logical

variable

Term

Theory
list

Example axiomatic
systems (list)

of arithmetic:
Peano

second-order

elementary function

primitive recursive

Robinson

Skolem

of the real numbers
Tarski's axiomatization

of
Boolean algebras

canonical

minimal axioms

of geometry:
Euclidean:
Elements

Hilbert's

Tarski's

non-Euclidean

Principia Mathematica

Proof theory

Formal proof

Natural deduction

Logical consequence

Rule of inference

Sequent calculus

Theorem

Systems
axiomatic

deductive

Hilbert
list

Complete theory

Independence (from ZFC)

Proof of impossibility

Ordinal analysis

Reverse mathematics

Self-verifying theories

Model theory

Interpretation
function

of models

Model
equivalence

finite

saturated

spectrum

submodel

Non-standard model
of arithmetic

Diagram
elementary

Categorical theory

Model complete theory

Satisfiability

Semantics of logic

Strength

Theories of truth

semantic

Tarski's

Kripke's

T-schema

Transfer principle

Truth predicate

Truth value

Type

Ultraproduct

Validity

Computability theory

Church encoding

Church–Turing thesis

Computably enumerable

Computable function

Computable set

Decision problem
decidable

undecidable

P

NP

P versus NP problem

Kolmogorov complexity

Lambda calculus

Primitive recursive function

Recursion

Recursive set

Turing machine

Type theory

Related

Abstract logic

Algebraic logic

Automated theorem proving

Category theory

Concrete/Abstract category

Category of sets

History of logic

History of mathematical logic

timeline

Logicism

Mathematical object

Philosophy of mathematics

Supertask

Mathematics portal

Authority control databases: National

Israel

United States

Retrieved from "https://en.wikipedia.org/w/index.php?title=Automated_theorem_proving&oldid=1205993341"

[1] Frege, Gottlob (1879). Begriffsschrift. Verlag Louis Neuert.

[2] Frege, Gottlob (1884). Die Grundlagen der Arithmetik (PDF). Breslau: Wilhelm Kobner. Archived from the original (PDF) on 2007-09-26. Retrieved 2012-09-02.

[3] Russell, Bertrand; Whitehead, Alfred North (1910–1913). Principia Mathematica (1st ed.). Cambridge University Press.

[4] Russell, Bertrand; Whitehead, Alfred North (1927). Principia Mathematica (2nd ed.). Cambridge University Press.

[5] Herbrand, J. (1930). Recherches sur la théorie de la démonstration (PhD) (in French). University of Paris.

[6] Presburger, Mojżesz (1929). "Über die Vollständigkeit eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt". Comptes Rendus du I Congrès de Mathématiciens des Pays Slaves. Warszawa: 92–101.

[Davis2001-7] Davis, Martin (2001). "The Early History of Automated Deduction". Robinson & Voronkov 2001. Archived from the original on 2012-07-28. Retrieved 2012-09-08.

[Bibel2007-8] Bibel, Wolfgang (2007). "Early History and Perspectives of Automated Deduction" (PDF). Ki 2007. LNAI (4667). Springer: 2–18. Archived (PDF) from the original on 2022-10-09. Retrieved 2 September 2012.

[9] :10.1147/rd.41.0028
.

[10] S2CID 30847540
.

[11] Kolata, Gina (December 10, 1996). "Computer Math Proof Shows Reasoning Power". The New York Times. Retrieved 2008-10-11.

[12] ISBN 978-981-15-6401-7
, retrieved 2024-02-10

[13] CiteSeerX 10.1.1.62.4976
.

[14] S2CID 7716709
.

[15] Luckham, David C.; Suzuki, Norihisa (Mar 1976). Automatic Program Verification V: Verification-Oriented Proof Rules for Arrays, Records, and Pointers (Technical Report AD-A027 455). Defense Technical Information Center. Archived from the original on August 12, 2021.

[16] S2CID 10088183
.

[17] Luckham, D.; German, S.; von Henke, F.; Karp, R.; Milne, P.; Oppen, D.; Polak, W.; Scherlis, W. (1979). Stanford Pascal verifier user manual (Technical report). Stanford University. CS-TR-79-731.

[18] S2CID 14361631
.

[19] Kerber, Manfred. "How to prove higher order theorems in first order logic." (1999).

[20] Benzmüller, Christoph, et al. "LEO-II-a cooperative automatic theorem prover for classical higher-order logic (system description)." International Joint Conference on Automated Reasoning. Berlin, Germany and Heidelberg: Springer, 2008.

[21] S2CID 5389933
. ATPs and SMT solvers have complementary strengths. The former handle quantifiers more elegantly, whereas the latter excel on large, mostly ground problems.

[22] :10.3233/SAT190123
. In recent years, we have seen a blurring of lines between SMT-COMP and CASC with SMT solvers competing in CASC and ATPs competing in SMT-COMP.

[23] Sutcliffe, Geoff. "The TPTP Problem Library for Automated Theorem Proving". Retrieved 15 July 2019.

[24] "History". vprover.github.io.

[25] "The Theorem Prover Museum". Michael Kohlhase. Retrieved 2022-11-20.

[26] :1842/3394
.

[27] Gabbay, Dov M., and Hans Jürgen Ohlbach. "Quantifier elimination in second-order predicate logic." (1992).

[5]

[6]

[7]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[26]

[27]

Logical foundations

First implementations

Decidability of the problem

Related problems

Applications

First-order theorem proving

Relationship with SMT

Benchmarks, competitions, and sources

Popular techniques

Software systems

Free software

Proprietary software

See also

Notes

References

External links