grammar

From Esolang
Jump to navigation Jump to search
grammar
Paradigm(s) String-rewriting Paradigm
Designed by User:Citrons
Appeared in 2022
Dimensions one-dimensional
Computational class Turing complete
Reference implementation Unimplemented
Influenced by BNF, thue
File extension(s) .gram, .grm

grammar is a language created by User:Citrons. a grammar program consists of a list of symbol replacement rules and an initial sequence of symbols. the execution of the program consists of repeatedly applying the rules in order until no replacements can be made, at which point the program ends.

syntax

the grammar of grammar can be represented as BNF (with certain obvious definitions omitted) like so:

<opt-whitespace> = " " | "\t" | ""
<opt-newline> = "\n" | <opt-whitespace> 

<name-char> = <alpha> | <digit> | "_" | "-"
<name> = <name-char> | <name> <name-char>

<escape-sequence> = "\\" <any-char>
<string-char> = <any-char-except-double-quotes>
<string-chars> = <string-char> <string-chars> | <string-char>
<char-literal> = "'" <string-char>
<string-literal> = "\"" <string-chars> "\""

<symbol> = <name> | <char-literal> | <string-literal>

<replacement-symbol> = <symbol> | "?" 
<nonempty-replacement> = <opt-whitespace> <replacement-symbol> <opt-whitespace>
	| <sequence> <replacement-symbol> <opt-whitespace>
<replacement> = <nonempty-replacement> | ""

<repetition-oper> = "*" | "+"
<repetition> = <expr> <repetition-oper> <opt-whitespace>

<expr> = <symbol> <opt-whitespace> 
	| "[" <opt-whitespace> <pattern> "]" <opt-whitespace>
	| "{" <opt-whitespace> <pattern> "}" <opt-whitespace>
	| "(" <opt-whitespace> <pattern> ")" <opt-whitespace> 
	| <repetition>

<pattern> = <expr> <pattern> | <expr> "|" <opt-whitespace> <pattern> | <expr>

<rule> = <replacement> "=" <pattern>
<end-rule> = "\n" | ";" | ""
<rules> = <rule> <end-rule> | <rule> <end-rule> <rules>

<start-sequence> = <opt-newline> <symbol> <opt-newline>
	| <start-sequence> <symbol> <opt-newline>

<program> = <rules> "." <start-sequence>

comments are made with the # character; it and postceding characters are ignored until the end of the line.

patterns

the right side of a rule specifies a pattern of symbols to match for the rule. the simplest form of pattern is one which is a list of symbols, e.g:

foo = bar baz etc

this rule would replace any part of the sequence of symbols wherein the symbols bar, baz, and etc appear consecutively with the symbol foo.

the symbol _ in a pattern will match any symbol.

operators

the | operator is used for alternation. it can be read as "or". for example, this rule would replace all instances of the symbol bar OR any sequence of bee followed by apioform with the symbol foo:

foo = bar | bee apioform

the postfix * and + operators allow patterns to match arbitrary repetitions of symbols. + matches one or more instance, whereas * matches even if there are no instances of the expression. this example would match any sequence of foo followed by any number of bar, including 0 bars:

foo = foo bar*

grouping

parenthesis can be used to enclose an expression to change precedence. for instance:

foo = (bar | bee) apioform
foo = bar | bee apioform

these are two different rules. the former only matches a sequence if it ends with apioform, whereas the latter only matches apioform if the first symbol matched is bee.

square brackets enclose an optional expression. if the expression within square brackets does not match, it does not prevent the pattern as a whole from matching.

captures

the pattern of a rule may contain expressions enclosed in curly brackets. these expressions "capture" the sequence of symbols they match. a question mark included in the replacement on the left side of the rule will expand to the captured symbols. the nth question mark in the replacement expands to the (n % number of captures)th capture of the pattern. for example:

bee ? utterly ? = utterly apioformic {bees | apioforms | you | everyone} {char+}

this will translate the sequence of symbols utterly apioformic bees "abcdef" to bee bees utterly "abcdef".

literal symbols

literal symbols represent single ASCII characters. literal symbols can be specified in the program using character literals ('c), or an entire sequence of literal symbols can be specified as a string literal ("Hello, world!). an empty string ("") is an error.

there are three special symbols for manipulating literal symbols. these symbols will not behave normally in a program and will instead do special magic.

  • char, when used in a pattern, will match any literal symbol. using it in a replacement is an error.
  • stdin, when used in a replacement, will read a character from standard input and place it as a literal symbol in its position. standard input is read once per rule containing stdin such that if the pattern has multiple matches, or if a replacement contains multiple instances of stdin, it will resolve to the same character. if the standard input stream has reached EOF, then the matches are instead replaced with the symbol eof. using stdin in a pattern is an error.
  • stdout, when used in a replacement, will traverse the symbols of each match in order. for each symbol, if it is a literal symbol, it will write it to standard output; otherwise, it does nothing. stdout does not appear in the sequence. using stdout in a pattern is an error.

example programs

hello, world

stdout = char+
."Hello, world!"

cat program

stdin = char+
stdout = char+
.'x

truth machine

start = stdin
stdout = '0
stdout '1 = '1
.start

Bitwise Cyclic Tag

program ? 0 data ? = program 0 {(0|1)*} data (0|1) {(0|1)*}
program ? 1 0 data 1 ? 0 = program 1 0 {(0|1)*} data 1 {(0|1)*}
program ? 1 1 data 1 ? 1 = program 1 1 {(0|1)*} data 1 {(0|1)*}
program ? 1 0 data ? = program 1 0 {(0|1)*} data 0 {(0|1)*}
program ? 1 1 data ? = program 1 1 {(0|1)*} data 0 {(0|1)*}
.
program 0 0 1 1 1 
data 1 0 1

todo

  • make more examples
  • implement the language
  • potentially improve the language