Genera Tag

From Esolang
Jump to navigation Jump to search

Genera Tag is a generalisation of tag systems and cyclic tag systems that User:ais523 almost discovered in 2018 (Genera Tag is very similar to, but not quite the same as, An Odd Rewriting System), and then subsequently discovered twice in 2023. ais523 thinks that a language that is randomly discovered multiple times is likely to be fairly fundamental, and thus worth documenting.

Compared to other sorts of tag systems, Genera Tag is more complicated to define, but easier to reason about (there are some elegant meaning-preserving transformations on programs available in Genera Tag that would not work in a regular tag system, which makes refactoring programs much easier); ais523 finds Genera Tag programs easier to write than those in other sorts of tag system. Additionally, productions in Genera Tag are typically shorter than in other sorts of tag system (e.g. traditional tag systems require length-3 productions to be Turing complete, whereas Genera Tag can do it with length-2 productions); this means that it can be useful when a Turing-completeness proof requires the ability to implement a tag-system-like language in a confined space.

Semantics

A Genera Tag program is defined over a modulus m and an alphabet of symbols. The program specifies, for each symbol:

  • a production map, which maps each of the m integers modulo m to a string of symbols; and
  • a width, which is an integer modulo m.

To run a program, you must provide an initial string of symbols, which cannot be empty. In addition to the strings used as state, a running program stores one integer, the position, which is an integer modulo m (initially 0).

One generation of program execution is a function from strings to strings, that modifies the position as a side effect, and has the following behaviour:

  • Start with an empty output string;
  • For each symbol in the input string, in order from start to end:
    • Append the string that that symbol's production map maps the position to to the output string;
    • Add the symbol's width to the position, modulo m.

Program execution consists of performing a generation from the input string, then repeatedly iterating this function on the output of the previous generation (thus, the first generation is the output of the generation function on the input string, the second generation is the output of the generation function on the first generation, and so on).

One symbol is specified to be the halt symbol; its production map and width are irrelevant. If a generation ever produces a string that contains exactly one copy of the halt symbol, the program halts. It is undefined behaviour for the string to contain more than one copy of the halt symbol; additionally, it is undefined behaviour if, when hypothetically running a generation from the prefix of the string that precedes the halt symbol, the next generation would also contain a halt symbol. If the string does not contain the halt symbol, program execution continues by calculating the next generation, and does not halt (in particular, this means that if a generation produces an empty string, the interpreter will go into an infinite loop).

Syntax

Genera Tag's standard syntax is based on that of An Odd Rewriting System. In this syntax, each symbol is represented by a Unicode codepoint with the "cased letter" property (exception: the halt symbol is represented as $, and does not have its width nor production rules explicitly specified because they do not matter). By convention, symbols with width 0 are lowercase letters, and symbols with nonzero width are not lowercase (the original interpreter does not require this, but will give warnings if the cases are incorrect). It is legal to use both a capital letter and the corresponding lowercase letter for unrelated purposes – the implementation cares only about the case of the letter, not what its counterpart letter of the other case is.

Genera Tag source files are encoded in UTF-8. Comments, which are lines that have a # as their first non-whitespace character, are ignored.

The first non-comment line of input to an interpreter specifies the symbols of the initial string.

The remainder of the input is the program. This consists of a number of definitions, each of which is separated by whitespace and does not contain whitespace internally. (The direction of the whitespace does not matter: definitions can be separated by horizontal whitespace, vertical whitespace, or a mix of bot.) There are two types of definition:

  • A production definition consists of a number written in big-endian decimal in ASCII (representing a position), followed by a symbol, a colon :, and a string of symbols. (For example, 12a:aBc.) This specifies that the symbol's (in this example, symbol a) production map maps the given position (in this example, position 12) to the specified string (in this example, aBc).
  • A width definition consists of a symbol, an at sign @, and an integer written in big-endian decimal in ASCII. (For example, B@3.) The integer specifies the width of the specified symbol.

Programs are expected to explicitly specify all the productions, and the widths, of every symbol. (Among other things, this allows implementations to determine the value of m, and the alphabet of symbols in use, without them needing to be specified separately.) If the widths are not specified, implementations may guess at the widths based on the case of the symbols, for compatibility with An Odd Rewriting System, but do not have to.

Example

Here's an example program with modulus 2 and alphabet ABCDEXy, preceded by its initial string:

ABCDE
0A:y  1A:   A@1
0B:   1B:   B@1
0C:X  1C:   C@1
0D:$  1D:   D@1
0E:A  1E:   E@1
0X:$  1X:   X@1
0y:X  1y:A  y@0

and here's what happens upon running it (positions have been added in parentheses):

(0) A (1) B (0) C (1) D (0) E (1)
(1) y (1) X (0) A (1)
(1) A (0) y (0)
(0) X (1)
(1) $ (halt)

Computational class and programming tips

The full version of Genera Tag is almost trivially Turing complete, because it embeds multiple other Turing-complete languages:

  • A traditional tag system is Genera Tag restricted so that all symbols have width 1 and all production maps map positions other than 0 to the empty string (and because 2-tag is Turing complete, this means that Genera Tag is Turing complete with m=2);
  • A cyclic tag system is Genera Tag with only two symbols, both of which have width 1, and for which one of the symbols maps all positions to the empty string.

(The "almost trivially" here is because the halt state in Genera Tag is different from the normal ones. However, it is fairly easy to intentionally trigger using a carefully written program, and does not end up affecting the Turing-completeness of the language.)

Unlike either traditional tag or cyclic tag, Genera Tag is Turing-complete even if productions are limited to being at most two symbols long:

  • The following transformation does not change the meaning of a Genera Tag program: double the size of the alphabet, creating a new symbol ś corresponding to each original symbol s; change every production for the old symbols to map to the corresponding string of new symbols, rather than old symbols; and give each new symbol a production that maps all positions to the corresponding old symbol, and a width of 0. This basically just doubles the number of generations, with execution alternating between performing a generation and replacing symbols with their new counterparts, and then replacing the symbols back with their old counterparts.
  • It is possible to extend that transformation to add a new symbol for every ordered pair of old symbols (in addition to adding a new symbol for each of the old symbols individually). The productions for the newly added pair-symbols map to length-2 strings, that produce the two old symbols that they represent in the correct order.
  • It is now possible to halve all the old productions in length (rounding up) by replacing pairs of symbols in the productions with the correct pair-symbol.

This transformation halves the length of all the existing productions, whilst adding new length-1 and length-2 productions, without changing the meaning of the program. As such, it can be repeated on any Genera Tag program until all the productions have length 2.

Another interesting transformation that is possible on Genera Tag programs is the "substring to symbol" transformation: in any Genera Tag program, it is possible to add an additional symbol to represent any string of symbols, with its width being the total width of the symbols it contains, and its productions based on the productions of the symbols it contains (allowing for the positions that they will be seen at after the earlier symbols have been processed). For example, if 0A:xy 1A:z 2A: A@1 0E:ij 1E:k 2E:l E@1, it would be possible to add a symbol Æ with definitions 0Æ:xyk 1Æ:zl 2Æ:ij Æ@2, and Æ would act identically in all contexts to the two-symbol string AE. This transformation makes Genera Tag programs easier to refactor, and its existence tends to make them easier to reason about, as you can think about symbols as a group.

One interesting consequence of this transformation is that you can create a symbol that acts in all contexts like an empty string. This can be written either along the lines of 0x: 1x: x@0, or along the lines of 0x:x 1x:x x@0 or even 0x:xx 1x:xx x@0. This in turn means that for any Genera Tag program, it is possible to "pad out" the productions in order to create a program in which all the productions have the same length (which can be any chosen value 2 or greater); and because none of the transformations listed above change m, this means that m can also be any chosen value 2 or greater. This fact can make Genera Tag easier to implement in languages with very restricted control flow, because they no longer have to be able to handle productions that contain varying numbers of symbols (which can be one of the hardest parts of implementing a tag system in a tarpit); instead of using symbol count as the primary method of communicating information from one part of the program to another, Genera Tag uses symbol width instead.

Relationship to other languages and implementation tips

Genera Tag with m=2 is very nearly the same language as An Odd Rewriting System. There is only one difference: in Genera Tag the position is a global variable, and running a generation changes it as a side effect; whereas in An Odd Rewriting System, the position resets to 0 in between every generation, and thus is effectively a local variable from the point of view of a generation. This difference has a major effect on the ease of programming in the languages: in Genera Tag, the position can be used to communicate information from symbols near the end of one generation to symbols near the start of the next, whereas in An Odd Rewriting System, information can only ever be sent rightwards. (An Odd Rewriting System was intentionally designed to have the limitation in question, in an attempt to explore the consequences; but it does not affect tag systems more generally, so Genera Tag was designed without it.)

Genera Tag compiles directly into cyclic tag, except for the halt state. The basic idea is to assign symbol a distinct index y, where 0 ≤ y < ymax; then a Genera Tag symbol can be represented as a cyclic tag substring that contains exactly one 1, at position y (with the leftmost position being 0), and has length wymax mod mymax (where w is the symbol's width). The cyclic tag productions then just consist of all strings that position 0 is mapped to by the Genera Tag productions (in index order), then all strings that position 1 is mapped to, and so on. Handling the halting condition is not hard either: replace every 0 with 00 and every 1 with 10 everywhere, add an empty cyclic tag production after each existing cyclic tag production, and then a halt symbol can be represented as a single 0 (which will push all the 1s into positions that produce nothing). One potential use of this transformation is that it allows a way to produce an "empty queue" halt state from a program that has a "explicit halt symbol" halt state (so long as the halt symbol is only produced once).

In the definition given above, Genera Tag is not a queue-based language. However, although the generational point of view makes the language easier to think about, the semantics of the language do not actually require running it one generation at a time: instead of generating from one string into another, you could alternatively read symbols from the head of a queue and enqueue productions onto its tail, and the language would behave in exactly the same way. As such, Genera Tag is conceptually a queue-based language, in much the same way other tag-system-like languages are.

See also

External resources