Thubi

Thubi is a matrioshka language by User:Ihope127 inspired by Thue. Its name is a pun on Thue's: the name "Thue" is pronounced roughly like "2A", so "Thubi" is pronounced roughly like "2B", or TOO-bee.

Thubi, like Thue, is based on string manipulation. However, unlike Thue, Thubi strings often consist mostly of programmer-defined symbols. However, most printable ASCII characters (all except the backslash) are "hard-wired" as symbols, so they may be manipulated as well.

An example Thubi program, which uses only predefined symbols:

=T\s :\bF\s =F\s :FF =F :FT =T :TF =T :TT =F TFFTFTTFTFTTF

The following Thue program is equivalent to (in that it does the same thing as, but isn't exactly a "word-for-word translation" of) the Thubi program above:

{T}::=~T {F}::=~F FF::=F FT::=T TF::=T TT::=F ::= {TFFTFTTFTFTTF}

In Thubi, ":this" on one line and "=that" on the next is equivalent to "this::=that" in Thue, and the first blank line is equivalent to just "::=" in Thue: it marks the division between rules and initial program state. However, the "initial program state" doesn't exactly match the initial program state: whatever's there is turned into C string format, plus \e for the ESC character and minus \" and \', and minus the quotes around. Then \b (for "begin") is put at the beginning of the string, and \s (for "stop") is put at the end. So the above Thubi string is simply turned into \bTFFTFTTFTFTTF\s, then replacement rules are followed just like in Thue. An additional "rule" is added: whenever any character symbol is at the left side of the program, the corresponding character is removed and output. (This is treated as a normal replacement rule: if there are other rules available, either this one or one of those may be chosen.) When \s reaches the left side of the program, the program is terminated. \b and defined symbols do nothing at the left side of the program.

So here is one way to execute the above Thubi program:

\bTFFTFTTFTFTTF\s \bTFFTFFFTFTTF\s \bTFTFFFTFTTF\s \bTFTFFFTTTF\s \bTFTFFFFTF\s \bTFTFFFFT\s \bTTFFFFT\s \bTTFFFT\s \bTTFFT\s \bTTFT\s \bFFT\s \bFT\s \bT\s T\s

After this, both the T and the \s are allowed to slide off the left end of the program, outputting T and ending the program.

As is said above, all printable ASCII characters except the backslash are built-in symbols. The following symbols are also built in, and all but two of them (\b and \s) also represent characters:

\\  backslash \n  newline \r  carriage return \t  tab \f  form feed \a  bell \v  vertical tab \e  escape \nnn octal nnn \xnn hexadecimal nn \b  beginning of program \s  end of program/input

Symbols which represent the same character are considered the same symbol: \n, \012, and \x0A are all considered to be the same. \s is not the same as the EOF character.

Programmer-defined symbols can contain any ASCII printable characters, including spaces and backslashes. A symbol can be defined in a program simply by using it as a declaration. An example use of symbols:

\Foo :f =\Foo :\b\Foo =F f

This outputs the lowercase letter f. The set of symbols defined at any given time must be prefix-free: if you have either a symbol called \Foo or one called \Foobar, trying to define the other is an error. However, you can have both \Foobar and \Foobaz, but both prohibit \Foo. Note that this also makes it impossible to define a symbol that does not start with a backslash. Also, it is advisable to start every custom escape code with a capital letter or a symbol, as this will mostly prevent clashes.

Symbols can also be undefined. Undefining a symbol is done in exactly the same way as defining it: simply list its name. (Builtin symbols cannot be unidentified.) The symbol will then only be available between the two instances of the name. If a symbol is defined, undefined, and redefined, the two "versions" of the symbol are not considered the same symbol. An example:

\Foo :f =\Foo \Foo \Foo :\b\Foo =F f

This will not output the lowercase letter f. It acts the same as the following program:

\Foo :f =\Foo \Foo \Bar :\b\Bar =F f

Input in Thubi is done whenever no rules can be applied and nothing can be output: an interpreter only inputs a character as a "last resort". Whatever is input is simply put at the right end of the current string. If EOF is reached, \s will be put on the end of the current string, and once no rules are applicable, the program is simply halted as if it had output \s. So this is a cat program in Thubi:

=

(Both blank lines at the end are required.) Since the second part of the program is initially empty, the initial string is \b\s. The replacement rule turns this into the empty string, then a character is input, then that character is output, etc. This goes until the cat program reaches EOF: then the string is \s, and this is "output", thus ending the program.

Thubi is Turing-complete, as Thue programs that don't perform I/O can easily be translated into it.