E-graph

In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.

Definition and operations

Let $\Sigma$ be a set of uninterpreted functions, where $\Sigma _{n}$ is the subset of $\Sigma$ consisting of functions of arity $n$ . Let $\mathbb {id}$ be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of $f\in \Sigma _{n}$ to e-class IDs $i_{1},i_{2},\ldots ,i_{n}\in \mathbb {id}$ is denoted $f(i_{1},i_{2},\ldots ,i_{n})$ and called an e-node.

The e-graph then represents equivalence classes of e-nodes, using the following data structures:^[1]

A union-find structure $U$ representing equivalence classes of e-class IDs, with the usual operations $\mathrm {find}$ , $\mathrm {add}$ and $\mathrm {merge}$ . An e-class ID $e$ is canonical if $\mathrm {find} (U,e)=e$ ; an e-node $f(i_{1},\ldots ,i_{n})$ is canonical if each $i_{j}$ is canonical ( $j$ in $1,\ldots ,n$ ).
An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
- a hashcons $H$ (i.e. a mapping) from canonical e-nodes to e-class IDs, and
- an e-class map $M$ that maps e-class IDs to e-classes, such that $M$ maps equivalent IDs to the same set of e-nodes: $\forall i,j\in \mathbb {id} ,M[i]=M[j]\Leftrightarrow \mathrm {find} (U,i)=\mathrm {find} (U,j)$

Invariants

In addition to the above structure, a valid e-graph conforms to several data structure invariants.^[2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes $f(i_{1},\ldots ,i_{n}),f(j_{1},\ldots ,j_{n})$ are congruent when $\mathrm {find} (U,i_{k})=\mathrm {find} (U,j_{k}),k\in \{1,\ldots ,n\}$ . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.

Operations

E-graphs expose wrappers around the $\mathrm {add}$ , $\mathrm {find}$ , and $\mathrm {merge}$ operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.

E-matching

Let $V$ be a set of variables and let $\mathrm {Term} (\Sigma ,V)$ be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, $\mathrm {Term} (\Sigma ,V)$ is the smallest set such that $V\subset \mathrm {Term} (V,\Sigma )$ , $\Sigma _{0}\subset \mathrm {Term} (\Sigma ,V)$ , and when $x_{1},\ldots ,x_{n}\in \mathrm {Term} (\Sigma ,V)$ and $f\in \Sigma _{n}$ , then $f(x_{1},\ldots ,x_{n})\in \mathrm {Term} (\Sigma ,V)$ . A term containing variables is called a pattern, a term without variables is called ground.

An e-graph $E$ represents a ground term $t\in \mathrm {Term} (\Sigma ,\emptyset )$ if one of its e-classes represents $t$ . An e-class $C$ represents $t$ if some e-node $f(i_{1},\ldots ,i_{n})\in C$ does. An e-node $f(i_{1},\ldots ,i_{n})\in C$ represents a term $g(j_{1},\ldots ,j_{n})$ if $f=g$ and each e-class $M[i_{k}]$ represents the term $j_{k}$ ( $k$ in $1,\ldots ,n$ ).

e-matching is an operation that takes a pattern $p\in \mathrm {Term} (\Sigma ,V)$ and an e-graph $E$ , and yields all pairs $(\sigma ,C)$ where $\sigma \subset V\times \mathbb {id}$ is a substitution mapping the variables in $p$ to e-class IDs and $C\in \mathbb {id}$ is an e-class ID such that each term $\sigma (p)$ is represented by $C$ . There are several known algorithms for e-matching,^[3]^[4] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.^[5]

Complexity

An e-graph with n equalities can be constructed in O(n log n) time.^[6]

Equality saturation

Equality saturation is a technique for building optimizing compilers using e-graphs.^[7] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.

Applications

E-graphs are used in

empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers.^[9] In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates.^[10] E-graphs are also used in the Simplify theorem prover of ESC/Java.^[11]

Equality saturation is used in specialized

translation validation applied to the LLVM toolchain.^[15]

E-graphs have been applied to several problems in program analysis, including fuzzing,^[16] abstract interpretation,^[17]^[18] and library learning.^[19]

References

^ (Willsey et al. 2021)
^ (Willsey et al. 2021)
^ (de Moura & Bjørner 2007)
ISSN 1571-0661
.

S2CID 236924583
.

^ (Flatt et al. 2022, p. 2)

^ (Tate et al. 2009)

ISBN 978-3-540-78800-3
.

ISBN 978-3-642-28717-6. {{cite book}}: |journal= ignored (help); Missing or empty |title= (help
)

^ (Flatt et al. 2022, p. 2)

S2CID 9613854
.

ISSN 0362-1340
.

arXiv:2101.01332 [cs.AI
].

arXiv:2002.07951 [cs.DB
].

ISBN 978-3-642-22110-1
.

^ "Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022". pldi22.sigplan.org. Retrieved 2023-02-03.

arXiv:2203.09191. {{cite journal}}: Cite journal requires |journal= (help
)

arXiv:2205.14989. {{cite journal}}: Cite journal requires |journal= (help
)

S2CID 254536022
.

de Moura, Leonardo; Bjørner, Nikolaj (2007). "Efficient E-Matching for SMT Solvers". In Pfenning, Frank (ed.). Automated Deduction – CADE-21. Lecture Notes in Computer Science. Vol. 4603. Berlin, Heidelberg: Springer. pp. 183–198.
ISBN 978-3-540-73595-3
.

Willsey, Max; Nandi, Chandrakana; Wang, Yisu Remy; Flatt, Oliver; Tatlock, Zachary; Panchekha, Pavel (2021-01-04). "egg: Fast and extensible equality saturation". Proceedings of the ACM on Programming Languages. 5 (POPL): 23:1–23:29.
S2CID 226282597
.

Tate, Ross; Stepp, Michael; Tatlock, Zachary; Lerner, Sorin (2009-01-21). "Equality saturation". Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '09. Savannah, GA, USA: Association for Computing Machinery. pp. 264–276.
S2CID 2138086
.

Flatt, Oliver; Coward, Samuel; Willsey, Max; Tatlock, Zachary; Panchekha, Pavel (October 2022). "Small Proofs from Congruence Closure". In A. Griggio; N. Rungta (eds.). Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022. TU Wien Academic Press. pp. 75–83.
S2CID 252118847
.

External links

The Egg Project

A Colab notebook explaining e-graphs

v
t
e
Program analysis
Key concepts

Control-flow graph

Correctness

Hyperproperties

Invariants

Path explosion

Polyvariance

Rice's theorem

Runtime verification

Safety and liveness

Undefined behavior

Semantics
Types

Axiomatic

Denotational
Categorical semantics

Operational
Big-step

Small-step

Models

Lambda calculus

Petri net

Process calculus

Rewriting system

State machine

Turing machine

Analyses
Static

Abstract interpretation

Alias

Control flow
kCFA

Data-flow

Dependence

Effect system

Escape

Model checking

Pointer

Shape

Symbolic execution

Termination

Type systems

Typestate

Dynamic

Data-flow

Taint tracking

Concolic execution

Fuzzing

Invariant inference

Program slicing

Testing

Formal methods
Concepts

Curry–Howard correspondence

Loop invariant

Refinement

Side effect

Soundness and completeness

Specification
Languages

Verification

Logics

Hoare

Incorrectness

Linear

Separation

Temporal

Data structures

BDD

E-graph

Hashcons

Union-find

Tools
Constraint solvers

CHC

SAT

SMT

Lightweight

Alloy

TLA+

Proof assistants

ACL2

Agda

Coq

F*

HOL Light

HOL4

Idris

Isabelle
Isabelle/HOL

Lean

LEGO

Mizar

NuPRL

PVS

Twelf

Category

Outline

Glossary

Retrieved from "https://en.wikipedia.org/w/index.php?title=E-graph&oldid=1214370373"

[1] (Willsey et al. 2021)

[2] (Willsey et al. 2021)

[3] (de Moura & Bjørner 2007)

[4] ISSN 1571-0661
.

[5] S2CID 236924583
.

[6] (Flatt et al. 2022, p. 2)

[7] (Tate et al. 2009)

[8] ISBN 978-3-540-78800-3
.

[9] ISBN 978-3-642-28717-6. {{cite book}}: |journal= ignored (help); Missing or empty |title= (help
)

[10] (Flatt et al. 2022, p. 2)

[11] S2CID 9613854
.

[12] ISSN 0362-1340
.

[13] rXiv:2101.01332 [cs.AI
].

[14] rXiv:2002.07951 [cs.DB
].

[15] ISBN 978-3-642-22110-1
.

[16] "Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022". pldi22.sigplan.org. Retrieved 2023-02-03.

[17] rXiv:2203.09191. {{cite journal}}: Cite journal requires |journal= (help
)

[18] rXiv:2205.14989. {{cite journal}}: Cite journal requires |journal= (help
)

[19] S2CID 254536022
.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[9]

[10]

[11]

[15]

[16]

[17]

[18]

[19]