Zaddy
Designed by | Corbin |
---|---|
Appeared in | 2025 |
Computational class | pushdown automata |
Reference implementation | [1] |
Influenced by | META II, Parsley, Tmg |
META Restricted Zaddy, or Zaddy for short, is a living language for describing automatically-described compilers. It is described via its own metacompiler, which compiles itself to RPython. Zaddy was forked from META II.
Formally, Zaddy is the smallest language accepted by the reference implementation and compiling itself to a fixed point, rather than a language generated by some grammar.
Binding-time Hoare logic
Zaddy unifies the concepts of wikipedia:Hoare logic and binding-time improvement (The Book) into a single class of semantic actions. The parser becomes an abstract interpreter over a specified semantic domain.
For example, consider the following construction from Zaddy's reference implementation, where a parsing flag "pf" controls some internal state. We can model the flag as a Boolean value:
.domain pf : bool ;
Then, we may annotate the parser with Hoare triples. The annotations must be precise; Zaddy will compile them down to approximations which will be automatically simulated. For example, to set the flag, we have a precondition that the flag is the complement of being enabled (and so may be either disabled or unknown) and a postcondition that the flag is enabled:
SET = .pre { ~+pf } { .out('self.pf = True' .nl) } .post { +pf } ;
This can be seen as a Hoare specification of Schorre's original "SET" assembly instruction from META II. When compiled, Zaddy will generate optimizing parsers which only set the parse flag when it must. This can create Jones improvements — binding-time improvements in the generated parsers and their generated outputs. For example, in this fragment, an error guard is removed if it statically cannot be entered, and since the .error()
method happens to throw an exception, we can know that the flag is set:
.pre { ~+pf } { .out('if not self.pf: self.error()' .nl) } .post { +pf }
We can perform a similar improvement when parsing tokens that cannot fail:
$( TX3 .pre { ~+pf } { .out('if not self.pf: return' .nl) } .post { +pf } )
For correctness, the parsing machine can reset its abstract domain with the .top
command, as shown here at the top-level rule for tokens:
TR = ID .out('def parse' * '(self):' .nl .lm+) ':' .top TX1 ';' .out(.lm-) ;
Examples
The hello-world of parsing languages is wikipedia:S-expressions over some simple collection of atoms. The following grammar generates a parser which compiles S-expressions to RPython code.
.syntax SEXP SEXP = .out('def parseSExpOntoDomain(d):' .nl .lm+ 'return ') EXP .out(.nl) ; EXP = LIST / ATOM .out('d.atom("' * '")') ; LIST = OPEN ( CLOSE .out('d.sexp([])') / .out('d.sexp([' .nl .lm+) $( EXP .out(',' .nl) ) CLOSE .out(.lm- '])') ) ; .tokens ATOM : WS .token CHAR $CHAR .tokout ; OPEN : WS .any('() ; CLOSE : WS .any(')) ; CHAR : .any('a:'z) ; WS : $.any(10!13!' ) ; .domain .end
The following example compiles Brainfuck to C. It is a handwritten translation of this program in Tmg by Andriy Makukha which is based on this program in C by Cody Burgett. While not perfect, it is able to compile and somewhat run Jon Ripley's Lost Kingdom, generating over 2 megalines of C and taking several minutes to compile with GCC.
.syntax BF BF = .out('#include <stdio.h>' .nl '#include <stdlib.h>' .nl 'int main(int argc, char **argv) {' .nl .lm+ 'unsigned char *ptr = calloc(30 * 1000, 1);' .nl 'if (!ptr) {' .nl .lm+ 'fprintf(stderr, "Error allocating memory.\\n");' .nl 'return 1;' .nl .lm- '}' .nl) SQN .out('free(ptr);' .nl 'return 0;' .nl .lm- '}' .nl) ; SQN = $( LOOP / OP ) ; LOOP = WHILE .out('while (*ptr) {' .nl .lm+) SQN END .out(.lm- '};' .nl) ; OP = RIGHT .out('--ptr;' .nl) / LEFT .out('++ptr;' .nl) / PLUS .out('++*ptr;' .nl) / MINUS .out('--*ptr;' .nl) / OUT .out('putchar(*ptr);' .nl) / IN .out('*ptr = getchar();' .nl) ; .tokens RIGHT : COMMENT .token .any('>) .tokout ; LEFT : COMMENT .token .any('<) .tokout ; PLUS : COMMENT .token .any('+) .tokout ; MINUS : COMMENT .token .any('-) .tokout ; OUT : COMMENT .token .any('.) .tokout ; IN : COMMENT .token .any(',) .tokout ; WHILE : COMMENT .token .any('[) .tokout ; END : COMMENT .token .any(']) .tokout ; COMMENT : $.not('>!'<!'+!'-!'.!',!'[!']) ; .domain .end