E-graph
In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.
Definition and operations
Let be a set of uninterpreted functions, where is the subset of consisting of functions of arity . Let be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of to e-class IDs is denoted and called an e-node.
The e-graph then represents equivalence classes of e-nodes, using the following data structures:[1]
- A union-find structure representing equivalence classes of e-class IDs, with the usual operations , and . An e-class ID is canonical if ; an e-node is canonical if each is canonical ( in ).
- An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
- a hashcons (i.e. a mapping) from canonical e-nodes to e-class IDs, and
- an e-class map that maps e-class IDs to e-classes, such that maps equivalent IDs to the same set of e-nodes:
Invariants
In addition to the above structure, a valid e-graph conforms to several data structure invariants.[2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes are congruent when . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.
Operations
This section needs expansion. You can help by adding to it. (June 2021) |
E-graphs expose wrappers around the , , and operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.
E-matching
Let be a set of variables and let be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, is the smallest set such that , , and when and , then . A term containing variables is called a pattern, a term without variables is called ground.
An e-graph represents a ground term if one of its e-classes represents . An e-class represents if some e-node does. An e-node represents a term if and each e-class represents the term ( in ).
e-matching is an operation that takes a pattern and an e-graph , and yields all pairs where is a substitution mapping the variables in to e-class IDs and is an e-class ID such that each term is represented by . There are several known algorithms for e-matching,[3][4] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.[5]
Complexity
- An e-graph with n equalities can be constructed in O(n log n) time.[6]
Equality saturation
This section needs expansion. You can help by adding to it. (June 2021) |
Equality saturation is a technique for building optimizing compilers using e-graphs.[7] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.
Applications
E-graphs are used in
Equality saturation is used in specialized
E-graphs have been applied to several problems in program analysis, including fuzzing,[16] abstract interpretation,[17][18] and library learning.[19]
References
- ^ (Willsey et al. 2021)
- ^ (Willsey et al. 2021)
- ^ (de Moura & Bjørner 2007)
- ISSN 1571-0661.
- S2CID 236924583.
- ^ (Flatt et al. 2022, p. 2)
- ^ (Tate et al. 2009)
- ISBN 978-3-540-78800-3.
- )
- ^ (Flatt et al. 2022, p. 2)
- S2CID 9613854.
- ISSN 0362-1340.
- arXiv:2101.01332 [cs.AI].
- arXiv:2002.07951 [cs.DB].
- ISBN 978-3-642-22110-1.
- ^ "Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022". pldi22.sigplan.org. Retrieved 2023-02-03.
- )
- )
- S2CID 254536022.
- de Moura, Leonardo; Bjørner, Nikolaj (2007). "Efficient E-Matching for SMT Solvers". In Pfenning, Frank (ed.). Automated Deduction – CADE-21. Lecture Notes in Computer Science. Vol. 4603. Berlin, Heidelberg: Springer. pp. 183–198. ISBN 978-3-540-73595-3.
- Willsey, Max; Nandi, Chandrakana; Wang, Yisu Remy; Flatt, Oliver; Tatlock, Zachary; Panchekha, Pavel (2021-01-04). "egg: Fast and extensible equality saturation". Proceedings of the ACM on Programming Languages. 5 (POPL): 23:1–23:29. S2CID 226282597.
- Tate, Ross; Stepp, Michael; Tatlock, Zachary; Lerner, Sorin (2009-01-21). "Equality saturation". Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '09. Savannah, GA, USA: Association for Computing Machinery. pp. 264–276. S2CID 2138086.
- Flatt, Oliver; Coward, Samuel; Willsey, Max; Tatlock, Zachary; Panchekha, Pavel (October 2022). "Small Proofs from Congruence Closure". In A. Griggio; N. Rungta (eds.). Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022. TU Wien Academic Press. pp. 75–83. S2CID 252118847.