User:Icecream17/Grammar Explanation
This is a based off of https://tc39.es/ecma262/#sec-notational-conventions, which is similar to https://orteil.dashnet.org/randomgen/?do=create
{{User:Icecream17/Grammar| grammar text }}
when specifying things
{{User:Icecream17/Sans serif| text }}
for sans-serif
}} text {{User:Icecream17/Grammar|
in the following:
- tables
Templates
See User:Icecream17/Variable, User:Icecream17/Algorithm, User:Icecream17/Sans serif, User:Icecream17/Grammar.
The following content amazed me because it took me so long to understand it that it felt like genius
From the amazing ecmascript specification:
5.1.1 Context-Free Grammars
A context-free grammar consists of a number of productions. Each production has an abstract symbol called a nonterminal as its left-hand side, and a sequence of zero or more nonterminal and terminal symbols as its right-hand side. For each grammar, the terminal symbols are drawn from a specified alphabet.
A chain production is a production that has exactly one nonterminal symbol on its right-hand side along with zero or more terminal symbols.
Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given context-free grammar specifies a language, namely, the (perhaps infinite) set of possible sequences of terminal symbols that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production for which the nonterminal is the left-hand side.
Back to the article
Here a `non-terminal` is just `literal text`, that's what "drawn from a specified alphabet" means.
A `production` has a `left-hand-side` or `terminal`. Let's actually call it the `name` of a production.
A `production` also has a `right-hand-side`, which is zero or more `name`s and `literal text`s.
A `grammar` is a bunch of `productions`. (You don't need to worry about `context-free` because I was too lazy to describe grammar that have context)
So the last sentence above is now:
Starting from the `name` of a single distinguished `production`, called the goal symbol, a given grammar specifies a language, namely, the (perhaps infinite) set of possible sequences of `literal text` that can result from repeatedly replacing any `name` in the sequence with the right-hand-side of a `production` for which the `name` is the `left-hand-side`.
In this situation however, `productions` can be replaced by multiple `right-hand-side`s, called `alternatives`.
The explanation
Literal text is bolded, for example: if
You can also use replace a character with its' name as long as you surround it in angle brackets: <Space>
Here's some additional literals:
<Empty> | an empty string of text |
<Any> | any character |
<Newline> | https://tc39.es/ecma262/#prod-LineTerminatorSequence |
<EOF> | end of file (e.g. no more source code) |
A literal can also just be a sans-serif description. For example you could say: Any capital letter .
Each production has an italic name (in square brackets) and one or more distinct alternatives; and also optional context (in parens)
Each alternative is a list of zero or more names and literals
Productions can also have context passed onto them, but I don't need them yet so I'm not defining them
Let's define a comment. Python's comments are pretty simple:
[Python comment]
- # [Remaining text of line]
[Remaining text of line]
- (Any but not [Endline]) [Remaining text of line]
- (Any but not [Endline]) followed by [Endline]
Here, [Python comment] is a hashtag followed by [Remaining text of line]
As you can see, [Remaining text of line] has a recursive definition in alternative 1.
This makes [Remaining text of line] become a bunch of (Any but not [Endline]),
where the last one is (Any but not [Endline]) followed by [Endline]
I'm going to introduce even more stuff
but not | matches any A that is not B |
followed by | matches A as long as it's followed by B |
one of | used when you don't feel like making a list |
[Endline] can be used in all grammars, even ones that doesn't explain what it is.
Its' alternatives are:
- <Newline>
- <EOF>
Eventually, with this unnecessary syntax, you'll define every token. Bam, a language.