Metalanguage
A metalanguage is a language which describes other languages. Formally, a language is the set of all terminal words generated by a formal grammar. Strictly speaking, any language which describes the structure or nature of any language is a metalanguage, meaning that the languages used to specify formal grammars are examples of metalanguages. In the context of programming, the term metalanguage refers to programming languages which describe programming languages, or equivalently, programming languages which implement compilers.
Formal language theory
As an example: a simple language are the integers written in base 10, like 123, 322, 0, -12, and so on, called number literals. Without additional meaning applied to these strings, the numbers are simply strings. Since they are simple strings they have no ability to represent anything except for themselves. We can specify the exact structure of this language (which is the set of all strings which are integers) with a regular expression, like -?[1-9][0-9]*
. This matches all positive and negative integers. Note that we used language to represent another language, we described the language of number literals by using a regular expression, which is a word in the language of all regular expressions. This means that regular expressions are a metalanguage.
Most useful programming languages are (strict) metalanguages, since recognizing languages is a fundamental concept in computation. For instance, brainfuck is a metalanguage since one can make a brainfuck program which determines if the input is an alternating string of as and bs, which corresponds to this formal grammar:
- S → aB
- S → bA
- A → aB
- B → bA
In general, describing a program which determines if input matches some criterion, or generates output which matches some criterion, requires a metalanguage, because this program is able to determine if words are part of a given language.
Programming languages which cannot perform any conditional behavior whatsoever are not metalanguages. Non-metalanguages include markup languages, data description languages like JSON, or no-code esolangs. In this way, being a metalanguage is similar to the colloquial notion of being a programming language; being able to describe and specify more than what is exactly written down. To further illustrate the point, a metalanguage can be differentiated from a simple language by observing if any of the words in the potential metalanguage describe a language which has more than one word. That is, a metalanguage is one where there is a non-constant program, a program which (when given input) can generate multiple different outputs/go down multiple different paths.
Compiler theory
In the context of compiler theory, metalanguage has a more specific definition. Instead of referring to languages which can describe other languages, the term refers to programming languages which implement other programming languages. Usually this reflects the design purpose of the language, rather than capability. Programs in these metalanguages describe programming languages and their transformations; the metalanguage describes compilers. A popular metalanguage is Standard ML (Wikipedia), which is frequently used to build compilers. Another popular metalanguage is Backus-Naur form (Wikipedia), which cannot compile programs by itself, but extensions which produce output when text is matched allow for it to describe compilers.
See also
Example metalanguages (compiler theory definition):