Zaddy
Designed by | Corbin |
---|---|
Appeared in | 2025 |
Computational class | pushdown automata |
Reference implementation | [1] |
Influenced by | CHR, META II, OMeta, Parsley, Tmg, Zephyr ASDL |
META Restricted ACE Zaddy, or Zaddy for short, is a living language for describing compilers. It is described via its own metacompiler, which compiles itself to RPython. Zaddy was forked from META II following Neighbors' approach.[1]
Formally, Zaddy is the smallest language accepted by the reference implementation and compiling itself to a fixed point, rather than a language generated by some grammar.
Overview
A Zaddy program contains multiple sections which each describe a portion of a compiler pipeline. The available section types are:
.signature
sections which declare tree algebras- A
.rules
section which pattern-match elements of algebras without preserving structure .grammar
sections which declare parsing rules that map syntax to algebraic actions
The overall pipeline involves parsing the input into algebraic actions, applying rewrite rules to optimize them, and extracting expressions matching a target language. Multiple sections are allowed and namespaced, except for .rules
which are global.
Packrat PEG parsing
Zaddy implements Ford's Packrat algorithm for fast recursive-descent parsing with backtracking.[2] Zaddy supports Unicode; input and output must be encoded with UTF-8.
Character classes — that is, sets of characters — may be declared with the class
keyword. For example, the following classes are useful for parsing languages with whitespace, alphanumeric identifiers, numbers, and/or single-quoted strings:
.grammar IDENTIFIERS class Whitespace = 9 | 10 | 13 | 32 ; class Digit = .range(48:57) ; class Upper = .range(65:90) ; class Lower = .range(97:122) ; class Alpha = Upper | Lower ; class Alphanumeric = Alpha | Digit ; class Quoted = ~( 10 | 13 | 39 | 92 ) ;
Tree algebras
Zaddy supports definitions of tree algebras using inline Zephyr ASDL. For example, the following abstract syntax models a Python-like target language with indented blocks of text:
.signature py stmt = Statement(string line) | Compound(string head, stmt* block) | Conditional(string test, stmt* block) | Handler(stmt* block, stmt* handler)
Zephyr tree algebras provide Herbrand universes over strings and lists. Via rewrite rules, all of the main concepts in compilation are possible, including desugaring, multiple intermediate languages, abstract interpretation, optimization, instruction selection, and textual output. For example, the following rules map the above signature to a pretty-printer:
.rules pyStatement @ py.Statement(L) ==> [.line(L)] ; pyCompPass @ py.Compound(H, []) ==> [.line(H ': pass')] ; pyCompBlock @ py.Compound(H, B), .ne(B) ==> [.line(H ':'), .block(B)] ; pyCondEmpty @ py.Conditional(T, []) ==> [] ; pyCondBlock @ py.Conditional(T, B), .ne(B) ==> [.line('if ' T ':'), .block(B)] ; pyHandEmpty @ py.Handler(B, []) ==> [] ; pyHandBlock @ py.Handler(B, H), .ne(H) ==> [.line('try:'), .block(B), .line('except ParseError:'), .block(H)] ;
The following rules (and following extension to the signature) augment it with some Python-specific sugar:
.signature py statement = ... | Imp(string module, string names) | Ret(string expr) | RaiseIf(string test) .rules pyImp @ py.Imp(M, Ns) ==> py.Statement('from ' M ' import ' Ns) ; pyRet @ py.Ret(E) ==> py.Statement('return ' E) ; pyRaiseIf @ py.RaiseIf(T) ==> py.Statement('if ' T ': raise ParseError()') ;
These rules use the builtin tree algebra, which is specialized for writing line-oriented languages with indented blocks, for output. This builtin algebra effectively has the signature:
.signature builtin builtin = Line(string line) | Block(builtin* bs)
E-matching
Instead of graphs, DAGs, or multisets, Zaddy uses e-graph semantics to rewrite tree algebras. This means that a rewrite rule acts upon a partially-ordered equivalence class ("e-class") of algebraic expressions by adding new possibilities to the set. Rewrite rules are given in a syntax which mirrors Zephyr ASDL but includes Prolog-style logic variables.
Intuitively, a rewriting rule establishes a preference for one expression over another. The following guarantees are made about rewriting in a rule X ==> Y
, supposing that all logic variables have been bound and the rule is applied:
- Any e-class containing X can have Y extracted from it later; in particular, if X and Y are structs and X has no side conditions then all e-classes with X's constructor will also have Y's constructor.
- Any e-class where both X and Y are elements will always prefer to extract Y rather than X; in particular, if X and Y are both structs of the same Zephyr type, regardless of whether they are the same constructor, Y will be preferred to X.
Extraction breaks any ties or ordering conflicts by lexicographical order for strings and order of appearance for Zephyr constructors. In the Python example, py.Statment
will be preferred to py.Compound
.
Zaddy does not fully support associative operations, but it does support pattern-matching heads and tails of lists. For example, the following rule for an Algebraic Brainfuck peephole optimizer removes redundant zeroing of memory:
.rules zeroZero @ [Xs…, bf.Zero, bf.Zero, Ys…] ==> [Xs…, bf.Zero, Ys…] ;
Zaddy does not have explicit support for commutative operations, but they can be added via one rule per commutator. For example, the following hypothetical rules could optimize commutative multiplication by two into addition. The rule mulComm
doesn't cause an infinite loop because e-graphs eventually saturate, becoming full with results and reaching a fixed point.
.rules mulComm @ ops.Mul(X, Y) ==> ops.Mul(Y, X) ; mul2Add @ ops.Mul(X, '2') ==> ops.Add(X, X) ;
Note that currently only strings are supported; there are no numbers yet, and there are no Prolog-style constant terms, although ASDL allows constructors with no arguments. Also the implementation is, to put it politely, brittle jank.
The e-graph implementation and pattern-matching strategy are based on the generic-join algorithm for conjunctive database queries.[3] Pattern-matching syntax was inspired by Prolog and CHR; in particular, Zaddy's semantics are a simplified version of CHR's standard semantics.
Examples
The following example compiles Brainfuck to C. It is a handwritten translation of this program in Tmg by Andriy Makukha which is based on this program in C by Cody Burgett. It is able to compile and run Jon Ripley's Lost Kingdom, generating over 187 thousand lines of C and taking two minutes to compile with GCC.
.signature c statement = Include(string h) | Statement(string l) | Block(string head, statement* cs) .signature bf program = Program(program* ps) | Right(int n) | Left(int n) | Plus(int n) | Minus(int n) | Zero | Set(int n) | ScaleAdd(int n, int k) | Output | Input | Loop(program* exprs) .rules cInc @ c.Include(h) ==> [.line('#include ' h)] ; cStat @ c.Statement(l) ==> [.line(l ';')] ; cBlock @ c.Block(head, cs) ==> [.line(head ' {'), .block(cs), .line('}')] ; bfRight @ bf.Right(n) ==> [c.Statement('ptr += ' n)] ; bfLeft @ bf.Left(n) ==> [c.Statement('ptr -= ' n)] ; bfPlus @ bf.Plus(n) ==> [c.Statement('*ptr += ' n)] ; bfMinus @ bf.Minus(n) ==> [c.Statement('*ptr -= ' n)] ; bfZero @ bf.Zero ==> [c.Statement('*ptr = 0')] ; bfSet @ bf.Set(n) ==> [c.Statement('*ptr = ' n)] ; bfSA @ bf.ScaleAdd(n, k) ==> [c.Statement('*(ptr + ' n ') += *ptr * ' k)] bf.Zero ; bfOut @ bf.Output ==> [c.Statement('putchar(*ptr)')] ; bfIn @ bf.Input ==> [c.Statement('*ptr = getchar()')] ; bfLoop @ bf.Loop(exprs) ==> [c.Block('while (*ptr)', exprs)] ; bfProg @ bf.Program(p) ==> [c.Include('<stdio.h>'), c.Include('<stdlib.h>'), c.Block('int main(int argc, char **argv)', [ c.Statement('unsigned char arr[30 * 1024] = { 0 }'), c.Statement('unsigned char *ptr = arr') ] p [c.Statement('return 0')])] ; zeroZero @ [Xs…, bf.Zero, bf.Zero, Ys…] ==> [Xs…, bf.Zero, Ys…] ; zeroPlus @ [Xs…, bf.Zero, bf.Plus(N), Ys…] ==> [Xs…, bf.Set(N), Ys…] ; scaleAddDec @ bf.Loop([bf.Minus('1'), bf.Right(N), bf.Plus(K), bf.Left(N)]) ==> bf.ScaleAdd(N, K) ; scaleAddInc @ bf.Loop([bf.Right(N), bf.Plus(K), bf.Left(N), bf.Minus('1')]) ==> bf.ScaleAdd(N, K) ; loopMinusZero @ bf.Loop([bf.Minus('1')]) ==> bf.Zero ; loopPlusZero @ bf.Loop([bf.Plus('1')]) ==> bf.Zero ; .grammar BF class Right = 62 ; class Left = 60 ; class Plus = 43 ; class Minus = 45 ; class Output = 46 ; class Input = 44 ; class LoopOpen = 91 ; class LoopClose = 93 ; class Comment = ~( Right | Left | Plus | Minus | Output | Input | LoopOpen | LoopClose ) ; token Right = Right+ ; token Left = Left+ ; token Plus = Plus+ ; token Minus = Minus+ ; token Output = Comment* Output ; token Input = Comment* Input ; token LoopOpen = Comment* LoopOpen ; token LoopClose = Comment* LoopClose ; token Comment = Comment* ; BF := SQN:p Comment -> bf.Program(p) ; SQN := OP*:ops -> *ops ; OP := Comment Right:cs -> bf.Right(#cs) / Comment Left:cs -> bf.Left(#cs) / Comment Plus:cs -> bf.Plus(#cs) / Comment Minus:cs -> bf.Minus(#cs) / Output -> bf.Output / Input -> bf.Input / LoopOpen SQN:exprs LoopClose -> bf.Loop(exprs) ;
References
- ↑ Tutorial: Metacompilers Part 1, James M. Neighbors. 2008. https://www.bayfronttechnologies.com/mc_tutorial.html
- ↑ Bryan Ford. 2002. Packrat parsing: simple, powerful, lazy, linear time, functional pearl. In Proceedings of the seventh ACM SIGPLAN international conference on Functional programming (ICFP '02). Association for Computing Machinery, New York, NY, USA, 36–47. https://doi.org/10.1145/581478.581483
- ↑ Yihong Zhang, Yisu Remy Wang, Max Willsey, and Zachary Tatlock. 2022. Relational e-matching. Proc. ACM Program. Lang. 6, POPL, Article 35 (January 2022), 22 pages. https://doi.org/10.1145/3498696