Zaddy
Designed by | Corbin |
---|---|
Appeared in | 2025 |
Computational class | pushdown automata |
Reference implementation | [1] |
Influenced by | CHR, META II, OMeta, Parsley, Tmg, Zephyr ASDL |
META Restricted CHR(Zaddy), or Zaddy for short, is a living language for describing automatically-described compilers. It is described via its own metacompiler, which compiles itself to RPython. Zaddy was forked from META II following Neighbors' approach.[1]
Formally, Zaddy is the smallest language accepted by the reference implementation and compiling itself to a fixed point, rather than a language generated by some grammar.
Packrat PEG parsing
Zaddy implements Ford's Packrat algorithm for fast recursive-descent parsing with backtracking.[2]
Tree algebras
Zaddy supports definitions of tree algebras using inline Zephyr ASDL. For example, the following abstract syntax models a Python-like indentation-sensitive basic-block target language:
.signature py stmt = Statement(string line) | Compound(string head, stmt* block) | Conditional(string test, stmt* block) | Handler(stmt* block, stmt* handler)
Zephyr tree algebras provide Herbrand universes over strings and lists. A homomorphism between tree algebras can be defined as a functor; the recursion scheme will always be a hylomorphism.
.functor py Statement(line) -> [.line(line)] ; Compound(head, block) -> .if (block) .then [.line(head ':'), .block(block)] .else [.line(head ': pass')] ; Conditional(test, block) -> .if (block) .then [.line('if ' test ':'), .block(block)] .else [] ; Handler(block, handler) -> .if (block) .then .if (handler) .then [.line('try:'), .block(block), .line('except ParseError:'), .block(handler)] .else (*block) .else [] ;
This functor uses the builtin tree algebra, which is specialized for writing line-oriented languages with indented blocks, for output. This builtin algebra has the signature:
.signature builtin builtin = Line(string) | Block(builtin*)
Constraint Handling Rules
Zaddy embeds Constraint Handling Rules (CHR), including a pure fragment of Prolog, for describing non-homomorphic rewriting of tree algebras. For example, the following rules implement the greater-than and maximum relations over natural numbers:
.rules gte(N, zero). gte(N, M) :- succ(X, N), succ(Y, M), gte(X, Y). maxLeft @ max(N, zero, M) <=> N = M. maxRight @ max(N, M, zero) <=> N = M. maxSucc @ succ(X, L), succ(Y, N), succ(Z, M) \ max(L, N, M) <=> max(X, Y, Z).
Building on top of those relations, the following rules[3] implement an efficient union-find data structure with user-defined equivalence:
.rules make @ make(A) <=> root(A, zero). union @ union(A, B) <=> find(A, X), find(B, Y), link(X, Y). findNode @ branch(A, B), find(A, X) <=> find(B, X), branch(A, X). findRoot @ root(A, _) \ find(A, X) <=> X = A. linkEq @ link(A, A) <=> true. linkLeft @ link(A, B), root(A, N), root(B, M) <=> gte(N, M) | branch(B, A), succ(M, M1), max(R, N, M1), root(A, R). linkRight @ link(B, A), root(A, N), root(B, M) <=> gte(N, M) | branch(B, A), succ(M, M1), max(R, N, M1), root(A, R).
The reference implementation compiles CHR to a miniKanren-style persistent constraint store[4] encapsulated in a simple logic monad.[5] Rules refine the constraint store iteratively in a process similar to the loop-oriented compilation of CHR(Java).[6]
Binding-time Hoare logic
Zaddy unifies the concepts of wikipedia:Hoare logic and binding-time improvement[7] into a single class of semantic actions. The parser becomes an abstract interpreter over a specified semantic domain.
For example, consider the following construction from Zaddy's reference implementation, where a parsing flag "pf" controls some internal state. We can model the flag as a Boolean value:
.domain pf : bool ;
Then, we may annotate the parser with Hoare triples. The annotations must be precise; Zaddy will compile them down to approximations which will be automatically simulated. For example, to set the flag, we have a precondition that the flag is the complement of being enabled (and so may be either disabled or unknown) and a postcondition that the flag is enabled:
SET = .pre { ~+pf } { .out('self.pf = True' .nl) } .post { +pf } ;
This can be seen as a Hoare specification of Schorre's original "SET" assembly instruction from META II. When compiled, Zaddy will generate optimizing parsers which only set the parse flag when it must. This can create Jones improvements — binding-time improvements in the generated parsers and their generated outputs. For example, in this fragment, an error guard is removed if it statically cannot be entered, and since the .error()
method happens to throw an exception, we can know that the flag is set:
.pre { ~+pf } { .out('if not self.pf: self.error()' .nl) } .post { +pf }
We can perform a similar improvement when parsing tokens that cannot fail:
$( TX3 .pre { ~+pf } { .out('if not self.pf: return' .nl) } .post { +pf } )
For correctness, the parsing machine can reset its abstract domain with the .top
command, as shown here at the top-level rule for tokens:
TR = ID .out('def parse' * '(self):' .nl .lm+) ':' .top TX1 ';' .out(.lm-) ;
Examples
The hello-world of parsing languages is wikipedia:S-expressions over some simple collection of atoms. The following grammar generates a parser which compiles S-expressions to RPython code.
.syntax SEXP SEXP = .out('def parseSExpOntoDomain(d):' .nl .lm+ 'return ') EXP .out(.nl) ; EXP = LIST / ATOM .out('d.atom("' * '")') ; LIST = OPEN ( CLOSE .out('d.sexp([])') / .out('d.sexp([' .nl .lm+) $( EXP .out(',' .nl) ) CLOSE .out(.lm- '])') ) ; .tokens ATOM : WS .token CHAR $CHAR .tokout ; OPEN : WS .any('() ; CLOSE : WS .any(')) ; CHAR : .any('a:'z) ; WS : $.any(10!13!' ) ; .domain .end
The following example compiles Brainfuck to C. It is a handwritten translation of this program in Tmg by Andriy Makukha which is based on this program in C by Cody Burgett. It is able to compile and run Jon Ripley's Lost Kingdom, generating over 187 thousand lines of C and taking two minutes to compile with GCC.
.functor c Include(h) -> [.line('#include' h)] ; Statement(l) -> [.line(l ';')] ; Block(head, cs) -> [.line(head ' {'), .block(cs), .line('}')] ; .functor bf Right(n) -> [c.Statement('ptr += ' n)] ; Left(n) -> [c.Statement('ptr -= ' n)] ; Plus(n) -> [c.Statement('*ptr += ' n)] ; Minus(n) -> [c.Statement('*ptr -= ' n)] ; Output -> [c.Statement('putchar(*ptr)')] ; Input -> [c.Statement('*ptr = getchar()')] ; Loop(exprs) -> [c.Block('while (*ptr)', exprs)] ; Program(p) -> [c.Include('<stdio.h>'), c.Include('<stdlib.h>'), c.Block('int main(int argc, char **argv)', [ c.Statement('unsigned char *ptr = calloc(30 * 1000, 1)'), c.Block('if (!ptr)', [ c.Statement('fprintf(stderr, "Error allocating memory.\\n")'), c.Statement('return 1') ])] p [ c.Statement('free(ptr)'), c.Statement('return 0') ])] ; .grammar BF BF := SQN:p COMMENT -> bf.Program(p) ; SQN := ( COMMENT OP )*:ops -> *ops ; OP := '>'+:cs -> bf.Right(#cs) / '<'+:cs -> bf.Left(#cs) / '+'+:cs -> bf.Plus(#cs) / '-'+:cs -> bf.Minus(#cs) / '.' -> bf.Output / ',' -> bf.Input / '[' SQN:exprs ']' -> bf.Loop(exprs) ; COMMENT := ( !( '>' / '<' / '+' / '-' / '.' / ',' / '[' / ']' ) .any )* ;
References
- ↑ Tutorial: Metacompilers Part 1, James M. Neighbors. 2008. https://www.bayfronttechnologies.com/mc_tutorial.html
- ↑ Bryan Ford. 2002. Packrat parsing: simple, powerful, lazy, linear time, functional pearl. In Proceedings of the seventh ACM SIGPLAN international conference on Functional programming (ICFP '02). Association for Computing Machinery, New York, NY, USA, 36–47. https://doi.org/10.1145/581478.581483
- ↑ Schrijvers, T., & Frühwirth, T.W. (2004). Implementing and Analysing Union-Find in CHR. https://exia.informatik.uni-ulm.de/fruehwirth/Papers/wclp-union-find.pdf
- ↑ Bender, D. & Kuper, L. & Byrd, W. & Friedman, D. (2015). Efficient representations for triangular substitutions: A comparison in miniKanren. https://users.soe.ucsc.edu/~lkuper/papers/walk.pdf
- ↑ Hemann, J., & Friedman, D. P. (2013). µKanren: A Minimal Functional Core for Relational Programming. Workshop on Scheme and Functional Programming, Alexandria, United States. http://webyrd.net/scheme-2013/papers/HemannMuKanren2013.pdf
- ↑ Wolf, A. (2001). Adaptive Constraint Handling with CHR in Java. In: Walsh, T. (eds) Principles and Practice of Constraint Programming — CP 2001. CP 2001. Lecture Notes in Computer Science, vol 2239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45578-7_18 https://github.com/endario/references/blob/master/Adaptive%20Constraint%20Handling%20with%20CHR%20in%20Java%20Wolf.pdf
- ↑ N.D. Jones, C.K. Gomard, and P. Sestoft, Partial Evaluation and Automatic Program Generation. With chapters by L.O. Andersen and T. Mogensen. Prentice Hall International, June 1993. xii + 415 pages. ISBN 0-13-020249-5. https://studwww.itu.dk/~sestoft/pebook/