User:Orby/Term rewriting metalanguage
Thue is a string rewriting matrioshka language. That is, it is possible to define TC string rewriting languages within it. Many of the languages I have been tinkering with recently can be thought of term rewriting languages (Underload, Unlambda, etc.). This got me thinking about developing a term rewriting matrioshka language. The term rewriting language would be defined in three steps:
- Define the the terms (this could be done in BNF or something similar)
- Define the metavariables used in the term rewriting rules
- Define the term rewriting rules
with these three pieces of information, we could generate an interpreter for the language.
There are various ways the aforementioned steps could be defined.
term comb "s" | "k" | "`" comb comb ;
The term command would be followed by a token which names the term we are defining. A pipe delineated list would define the strings which the term could be expanded to (potentially recursively). The line would be terminated by a semicolon. Alternately, we could omit the pipe and put each form in a separate term line.
var comb x y z ;
The var command would be used to define variables used in rules. The first token would define the term type as previously defined by a term command. All following tokens would define variable names that would span that type.
rule "``k" x y -> x ; rule "```s" x y z -> "``" x z "`" y z ;
The rule command would define rewrite rules. Terms would be matched to variables preceding defined by the var command. The string preceding the -> token would be rewritten to the string following the -> token.
Other ideas for syntax
[t comb "s" ] [t comb "k" ] [t comb "`" comb comb ] [v comb x ] [v comb y ] [v comb z ] [r "``k" x y : x ] [r "```s" x y z : "``" x z "`" y z ] ```skkk
Meta-statements are enclosed in brackets. There are three types of meta-statements:
- [t ]: Term meta-statements
- [v ]: Variable meta-statements
- [r ]: Rule meta-statements
Spaces between tokens imply concatenation. Rules are applied left to right. Term matching is greedy.
Ideas for reduction
If there is only one type of term, then the syntax can be reduced greatly. Variable definitions are no longer necessary; unquoted tokens in rules can be assumed to be variables of the lone term type and can be m matched accordingly.
- How would recursion in term definitions be represented? Unquoted tokens could be treated as term labels. This allows for repetition of identical terms to be represented in term definitions.
- Is there a way to combine term and rule definitions?
The ":" token with an empty LHS could represent a term definition. Otherwise, the LHS and RHS of ":" could represent a rewrite rule.
: "s" : "k" : "`" a b "``k" a b : a "```s" a b c : "``" a c "`" b c
Alternately, variables could be quoted and all other strings could represent string literals
:s :k :`(a)(b) ``k(a)(b):(a) ```s(a)(b)(c):``(a)(c)`(b)(c)
We could use "\:", "\(", "\)", and "\\" to represent the literal characters ":", "(", ")", and "\" respectively. "\n" could be used as EOL.
- Lines which contain ":" are meta-commands. If the LHS is empty, it is a term description. Otherwise it is a rule.
:s :k :`(a)(b) ``k(a)(b):(a) ```s(a)(b)(c):``(a)(c)`(b)(c) ```skkk
As a Thue extension
The language could be treated as a Thue extension if we use "::=" instead of ":".
- If the LHS of a rule is empty, it defines the RHS as a term.
- If LHS and RHS are empty, it delineates the end of the rule list.
- Otherwise it defines a rewrite rule.
- Strings contained in parentheses represent meta-variables that range over terms.
- All white-space is ignored with the exception of \n, which denotes the end of a rule.
::= s ::= k ::= i ::= `(a)(b) ``k(a)(b) ::= (a) ```s(a)(b)(c) ::= ``(a)(c)`(b)(c) `i(a) ::= (a) ::= ```skkk
Advantage: Thue would be a subset of the language (this is not strictly true because Thue is non-deterministic and we haven't decided whether or not this language will be non-deterministic)
Better not to overload ::= for term definitions. Maybe no colon for term definitions, a colon for rewrite rules, and a semi-colon for end of statement. An
empty statement (";") "$" could represent end of meta-data. Like
s; k; `(a)(b); ``k(a)(b) : (a); ```s(a)(b)(c) : ``(a)(c)`(b)(c); $ ```skkk
Use \(, \), \;, \\ and \: to represent the literal characters. All white-space (including newlines) is ignored.
- Is there any way to elegantly include input and output while only supporting a single type of term?
- How do all of these thoughts tie into type theory?
- Matrioshka language vs. meta-language?
- At the very least, it makes sense to separate the meta-commands and commands into different files? No. Empty line is horrible hack. Use $.
- Rules are applied left to right top to bottom (the first rule to appear in a file takes precedence over later rules).
- Should syntax be closer to standard notation?
- Single characters are easier to escape.
- Parentheses are often used in term rewriting languages. Maybe it would be better to use a different quote character in the meta-language?
\(, \), \;, \:, \$, \\ to escape commands? This would complicate implementations unnecessarily. Better to be able to split on ; and :.