Zaddy

From Esolang
Jump to navigation Jump to search
Zaddy
Designed by Corbin
Appeared in 2025
Computational class pushdown automata
Reference implementation [1]
Influenced by CHR, META II, OMeta, Parsley, Tmg, Zephyr ASDL

META Restricted ACE Zaddy, or Zaddy for short, is a living language for describing compilers. It is described via its own metacompiler, which compiles itself to RPython. Zaddy was forked from META II following Neighbors' approach.[1]

Formally, Zaddy is the smallest language accepted by the reference implementation and compiling itself to a fixed point, rather than a language generated by some grammar.

Overview

A Zaddy program contains multiple sections which each describe a portion of a compiler pipeline. The available section types are:

  • .signature sections which declare tree algebras
  • A .rules section which pattern-match elements of algebras without preserving structure
  • .grammar sections which declare parsing rules that map syntax to algebraic actions

The overall pipeline involves parsing the input into algebraic actions, applying rewrite rules to optimize them, and extracting expressions matching a target language. Multiple sections are allowed and namespaced, except for .rules which are global.

Packrat PEG parsing

Zaddy implements Ford's Packrat algorithm for fast recursive-descent parsing with backtracking.[2] Zaddy supports Unicode; input and output must be encoded with UTF-8.

Character classes — that is, sets of characters — may be declared with the class keyword. For example, the following classes are useful for parsing languages with whitespace, alphanumeric identifiers, numbers, and/or single-quoted strings:

.grammar IDENTIFIERS
class Whitespace = 9 | 10 | 13 | 32 ;
class Digit = .range(48:57) ;
class Upper = .range(65:90) ;
class Lower = .range(97:122) ;
class Alpha = Upper | Lower ;
class Alphanumeric = Alpha | Digit ;
class Quoted = ~( 10 | 13 | 39 | 92 ) ;

Tree algebras

Zaddy supports definitions of tree algebras using inline Zephyr ASDL. For example, the following abstract syntax models a Python-like target language with indented blocks of text:

.signature py
stmt = Statement(string line)
     | Compound(string head, stmt* block)
     | Conditional(string test, stmt* block)
     | Handler(stmt* block, stmt* handler)

Zephyr tree algebras provide Herbrand universes over strings and lists. Via rewrite rules, all of the main concepts in compilation are possible, including desugaring, multiple intermediate languages, abstract interpretation, optimization, instruction selection, and textual output. For example, the following rules map the above signature to a pretty-printer:

.rules
pyStatement @ py.Statement(L)               ==> [.line(L)] ;
pyCompPass  @ py.Compound(H, [])            ==> [.line(H ': pass')] ;
pyCompBlock @ py.Compound(H, B), .ne(B)     ==> [.line(H ':'), .block(B)] ;
pyCondEmpty @ py.Conditional(T, [])         ==> [] ;
pyCondBlock @ py.Conditional(T, B), .ne(B)  ==> [.line('if ' T ':'), .block(B)] ;
pyHandEmpty @ py.Handler(B, [])             ==> [] ;
pyHandBlock @ py.Handler(B, H), .ne(H)      ==> [.line('try:'), .block(B), .line('except ParseError:'), .block(H)] ;

The following rules (and following extension to the signature) augment it with some Python-specific sugar:

.signature py
statement = ...
          | Imp(string module, string names)
          | Ret(string expr)
          | RaiseIf(string test)
.rules
pyImp      @ py.Imp(M, Ns)  ==> py.Statement('from ' M ' import ' Ns) ;
pyRet      @ py.Ret(E)      ==> py.Statement('return ' E) ;
pyRaiseIf  @ py.RaiseIf(T)  ==> py.Statement('if ' T ': raise ParseError()') ;

These rules use the builtin tree algebra, which is specialized for writing line-oriented languages with indented blocks, for output. This builtin algebra effectively has the signature:

.signature builtin
builtin = Line(string line) | Block(builtin* bs)

E-matching

Instead of graphs, DAGs, or multisets, Zaddy uses e-graph semantics to rewrite tree algebras. This means that a rewrite rule acts upon a partially-ordered equivalence class ("e-class") of algebraic expressions by adding new possibilities to the set. Rewrite rules are given in a syntax which mirrors Zephyr ASDL but includes Prolog-style logic variables.

Intuitively, a rewriting rule establishes a preference for one expression over another. The following guarantees are made about rewriting in a rule X ==> Y, supposing that all logic variables have been bound and the rule is applied:

  1. Any e-class containing X can have Y extracted from it later; in particular, if X and Y are structs and X has no side conditions then all e-classes with X's constructor will also have Y's constructor.
  2. Any e-class where both X and Y are elements will always prefer to extract Y rather than X; in particular, if X and Y are both structs of the same Zephyr type, regardless of whether they are the same constructor, Y will be preferred to X.

Extraction breaks any ties or ordering conflicts by lexicographical order for strings and order of appearance for Zephyr constructors. In the Python example, py.Statment will be preferred to py.Compound.

Zaddy does not fully support associative operations, but it does support pattern-matching heads and tails of lists. For example, the following rule for an Algebraic Brainfuck peephole optimizer removes redundant zeroing of memory:

.rules
zeroZero @ [Xs…, bf.Zero, bf.Zero, Ys…] ==> [Xs…, bf.Zero, Ys…] ;

Zaddy does not have explicit support for commutative operations, but they can be added via one rule per commutator. For example, the following hypothetical rules could optimize commutative multiplication by two into addition. The rule mulComm doesn't cause an infinite loop because e-graphs eventually saturate, becoming full with results and reaching a fixed point.

.rules
mulComm @ ops.Mul(X, Y) ==> ops.Mul(Y, X) ;
mul2Add @ ops.Mul(X, '2') ==> ops.Add(X, X) ;

Note that currently only strings are supported; there are no numbers yet, and there are no Prolog-style constant terms, although ASDL allows constructors with no arguments. Also the implementation is, to put it politely, brittle jank.

The e-graph implementation and pattern-matching strategy are based on the generic-join algorithm for conjunctive database queries.[3] Pattern-matching syntax was inspired by Prolog and CHR; in particular, Zaddy's semantics are a simplified version of CHR's standard semantics.

Examples

The following example compiles Brainfuck to C. It is a handwritten translation of this program in Tmg by Andriy Makukha which is based on this program in C by Cody Burgett. It is able to compile and run Jon Ripley's Lost Kingdom, generating over 187 thousand lines of C and taking two minutes to compile with GCC.

.signature c
statement = Include(string h)
          | Statement(string l)
          | Block(string head, statement* cs)
.signature bf
program = Program(program* ps)
        | Right(int n)
        | Left(int n)
        | Plus(int n)
        | Minus(int n)
        | Zero
        | Set(int n)
        | ScaleAdd(int n, int k)
        | Output | Input
        | Loop(program* exprs)
.rules
cInc    @ c.Include(h)      ==> [.line('#include ' h)] ;
cStat   @ c.Statement(l)    ==> [.line(l ';')] ;
cBlock  @ c.Block(head, cs) ==> [.line(head ' {'), .block(cs), .line('}')] ;
bfRight @ bf.Right(n)       ==> [c.Statement('ptr += ' n)] ;
bfLeft  @ bf.Left(n)        ==> [c.Statement('ptr -= ' n)] ;
bfPlus  @ bf.Plus(n)        ==> [c.Statement('*ptr += ' n)] ;
bfMinus @ bf.Minus(n)       ==> [c.Statement('*ptr -= ' n)] ;
bfZero  @ bf.Zero           ==> [c.Statement('*ptr = 0')] ;
bfSet   @ bf.Set(n)         ==> [c.Statement('*ptr = ' n)] ;
bfSA    @ bf.ScaleAdd(n, k) ==> [c.Statement('*(ptr + ' n ') += *ptr * ' k)] bf.Zero ;
bfOut   @ bf.Output         ==> [c.Statement('putchar(*ptr)')] ;
bfIn    @ bf.Input          ==> [c.Statement('*ptr = getchar()')] ;
bfLoop  @ bf.Loop(exprs)    ==> [c.Block('while (*ptr)', exprs)] ;
bfProg  @ bf.Program(p)     ==> [c.Include('<stdio.h>'),
                                 c.Include('<stdlib.h>'),
                                 c.Block('int main(int argc, char **argv)', [
                                     c.Statement('unsigned char arr[30 * 1024] = { 0 }'),
                                     c.Statement('unsigned char *ptr = arr')
                                 ] p [c.Statement('return 0')])] ;
zeroZero @ [Xs…, bf.Zero, bf.Zero, Ys…] ==> [Xs…, bf.Zero, Ys…] ;
zeroPlus @ [Xs…, bf.Zero, bf.Plus(N), Ys…] ==> [Xs…, bf.Set(N), Ys…] ;
scaleAddDec @ bf.Loop([bf.Minus('1'), bf.Right(N), bf.Plus(K), bf.Left(N)])
          ==> bf.ScaleAdd(N, K) ;
scaleAddInc @ bf.Loop([bf.Right(N), bf.Plus(K), bf.Left(N), bf.Minus('1')])
          ==> bf.ScaleAdd(N, K) ;
loopMinusZero @ bf.Loop([bf.Minus('1')]) ==> bf.Zero ;
loopPlusZero  @ bf.Loop([bf.Plus('1')])  ==> bf.Zero ;
.grammar BF
class Right = 62 ;
class Left = 60 ;
class Plus = 43 ;
class Minus = 45 ;
class Output = 46 ;
class Input = 44 ;
class LoopOpen = 91 ;
class LoopClose = 93 ;
class Comment = ~( Right | Left | Plus | Minus | Output | Input | LoopOpen | LoopClose ) ;
token Right = Right+ ;
token Left = Left+ ;
token Plus = Plus+ ;
token Minus = Minus+ ;
token Output = Comment* Output ;
token Input = Comment* Input ;
token LoopOpen = Comment* LoopOpen ;
token LoopClose = Comment* LoopClose ;
token Comment = Comment* ;
BF := SQN:p Comment -> bf.Program(p) ;
SQN := OP*:ops -> *ops ;
OP := Comment Right:cs             -> bf.Right(#cs)
    / Comment Left:cs              -> bf.Left(#cs)
    / Comment Plus:cs              -> bf.Plus(#cs)
    / Comment Minus:cs             -> bf.Minus(#cs)
    / Output                       -> bf.Output
    / Input                        -> bf.Input
    / LoopOpen SQN:exprs LoopClose -> bf.Loop(exprs) ;

References

  1. Tutorial: Metacompilers Part 1, James M. Neighbors. 2008. https://www.bayfronttechnologies.com/mc_tutorial.html
  2. Bryan Ford. 2002. Packrat parsing: simple, powerful, lazy, linear time, functional pearl. In Proceedings of the seventh ACM SIGPLAN international conference on Functional programming (ICFP '02). Association for Computing Machinery, New York, NY, USA, 36–47. https://doi.org/10.1145/581478.581483
  3. Yihong Zhang, Yisu Remy Wang, Max Willsey, and Zachary Tatlock. 2022. Relational e-matching. Proc. ACM Program. Lang. 6, POPL, Article 35 (January 2022), 22 pages. https://doi.org/10.1145/3498696