Abstract syntax tree

An abstract syntax tree (AST) is an applicative tree encoding the wikipedia:abstract syntax of a program.

Motivation

In parsing theory, the grammar of a language equips every sentence with a concrete syntax tree (CST) whose vertices are productions. However, these trees generally do not have nice graph homomorphisms to the structures used in compilers for tasks like optimization or instruction selection. To simplify the problem, we can select an abstract grammar which is wikipedia:adjoint to the concrete grammar, and use the adjunction to forget the concrete details of the CST, resulting in an abstract tree.

Applications

Note that the above adjunction also comes with a free mapping which generates concrete syntax representing a given AST, known as the serializer of abstract syntax.

The primary application for ASTs prior to the wide availability of the Internet was in compiler engineering, where abstract syntax simplifies almost every piece of the traditional compiler pipeline. In particular, wikipedia:instruction selection is always witnessed by a tree homomorphism from abstract syntax to templated sequences of machine code, and algorithms like NOLTIS^[1] are merely quick searches for acceptable witnesses. Additionally, serialization serves as useful debugging and pretty-printing.

With the advent of large highly-networked rooms with many computers and the transition from data warehouses to data centers, perhaps more understandable as "warehouse-scale machines"^[2], ASTs became a way to represent messages passed between computers. In particular, the serializer is used to produce network-safe messages of concrete syntax which are easy to parse, and the abstract syntax is exposed as an API over those messages.

Abstract binding trees

It can be useful to have generic bindings for ASTs which are not language-dependent. For any abstract grammar, we may add two additional productions: one for variable names and one for a one-variable binder. We also must wrap the original AST with variable-management annotations. In Zephyr ASDL, given an abstract syntax ast, we could have:

ast = …
abt = Var(str name) | Abs(str name, abt term) | Tm(str* free, abt tree) | Pure(ast tree)

This construction gives the type of an abstract binding tree (ABT)^[3]. Formal capture-avoiding substitution can be defined on ABTs up to alpha-equivalence without knowledge of the underlying syntax; in pseudocode:

-- XXX exercise for reader: write newNameNotIn, rename
substitute value name body = case body of
 Var(n) -> if n == name then value else Var(n)
 -- NB: usually more complex than this!
 Abs(x, t) -> let x' = newNameNotIn t, t' = rename x x' t in Abs(x', substitute value name t')
 Tm(xs, t) -> Tm(xs, if name in xs then substitute value name t else t)
 Pure(t) -> Pure(t)

Any sort of binding within an abstract syntax can be lifted to an abstract binding. As such, ABTs generalize lambda calculi. For example, ABTs were originally introduced by Aczel in 1978^[4] to prove that confluence generalizes beyond lambda calculus.

Theorem (Church–Rosser for ABTs). Every ABT has at most one normal form up to alpha-equivalence. (Aczel, 1978)^[5]

Note that ABTs themselves are only coherent up to alpha-equivalence; while names can be chosen by users, they may not be predictably preserved after substitution.

Herbrand structures

Any type of ASTs can be thought of as a wikipedia:Herbrand structure. Readers fluent in Prolog may think of ASTs as ground values generated by functors and atoms, for example. Similarly, a type of ABTs with substitution can be thought of as a wikipedia:Herbrand universe with logic variables which reify each substitution action via unification.

References

↑ David Ryan Koes and Seth Copen Goldstein. 2008. Near-optimal instruction selection on dags. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization (CGO '08). Association for Computing Machinery, New York, NY, USA, 45–54. https://doi.org/10.1145/1356058.1356065 https://llvm.org/pubs/2008-CGO-DagISel.pdf
↑ Luiz André Barroso and Jimmy Clidaras and Urs Hölzle, 2013. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. http://dx.doi.org/10.2200/S00516ED2V01Y201306CAC024 https://pages.cs.wisc.edu/~shivaram/cs744-readings/dc-computer-v3.pdf
↑ R. Harper, 2012. Practical Foundations for Programming Languages. https://www.cs.cmu.edu/~rwh/pfpl.html https://moss.cs.iit.edu/cs440/readings/harper.pdf
↑ J. Sterling, D. Morrison. 2016. Syntax and Semantics of Abstract Binding Trees. https://arxiv.org/abs/1601.06298
↑ P. Aczel. A general Church-Rosser theorem. Technical report, University of Manchester, 1978. https://www.ens-lyon.fr/LIP/REWRITING/MISC/AGeneralChurch-RosserTheorem.pdf

[1] David Ryan Koes and Seth Copen Goldstein. 2008. Near-optimal instruction selection on dags. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization (CGO '08). Association for Computing Machinery, New York, NY, USA, 45–54. https://doi.org/10.1145/1356058.1356065 https://llvm.org/pubs/2008-CGO-DagISel.pdf

[2] Luiz André Barroso and Jimmy Clidaras and Urs Hölzle, 2013. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. http://dx.doi.org/10.2200/S00516ED2V01Y201306CAC024 https://pages.cs.wisc.edu/~shivaram/cs744-readings/dc-computer-v3.pdf

[3] R. Harper, 2012. Practical Foundations for Programming Languages. https://www.cs.cmu.edu/~rwh/pfpl.html https://moss.cs.iit.edu/cs440/readings/harper.pdf

[4] J. Sterling, D. Morrison. 2016. Syntax and Semantics of Abstract Binding Trees. https://arxiv.org/abs/1601.06298

[5] P. Aczel. A general Church-Rosser theorem. Technical report, University of Manchester, 1978. https://www.ens-lyon.fr/LIP/REWRITING/MISC/AGeneralChurch-RosserTheorem.pdf

[1]

[2]

[3]

[4]

[5]

Abstract syntax tree

Contents

Motivation

Applications

Abstract binding trees

Herbrand structures

See also

References

Navigation menu

Abstract syntax tree

Motivation

Applications

Abstract binding trees

Herbrand structures

See also

References

Navigation menu

Search