Thue

From Esolang
Jump to navigation Jump to search

Thue is an esoteric programming language invented by John Colagioia in 2000. It is a matrioshka language based on nondeterministic string rewriting which the author describes as a constraint-programming version of a Turing tarpit.

Computational model

Thue is based on a semi-Thue grammar, which is a generalized form of Thue system in which each association between a pair of strings is only one-way.

Etymology

The language is presumably named after either Norwegian mathematician Axel Thue, or the grammar which bears his name (which the language employs).

It is worth noting that while the mathematician's name is pronounced approximately "TOO-eh", the associated documentation explicitly gives the pronounciation of the name of the language as "TOO-ay".

Language description

Each Thue program consists of two parts: a list of substitution rules of the form

<original string>::=<replacement string>

which is terminated with a line having both <original string> and <replacement string> either empty, or consisting entirely of whitespace:

::=

followed by a string representing the initial program state, which may extend over multiple lines.

Execution consists of picking, from the list of rules, an arbitrary rule whose original string exists as a substring somewhere in the program state, and replacing that substring by the rule's replacement string. This process repeats until there are no rules that can be applied, at which point, the program ends.

Edge cases

The case where a substitution rule has nothing but whitespace or an empty string on its left-hand side, but something other than whitespace or an empty string on its right-hand side, is not defined. Some implementations consider this to be a syntax error, while others (including the original interpreter) consider it to terminate the list of rules, just as a rule with whitespace on both sides of the ::= would.

It is also possible (and arguably reasonable) for an implementation to interpret it as a rule which rewrites the given whitespace (or zero-length string) to the right-hand side, when applied. However, no such implementations have been brought to the attention of this article.

Input and output

Input is performed using a special rule like this:

<original string>::=:::

When triggered, instead of replacing the original string with the string ::: it will be replaced with one line from the input stream.

Similarly, output is triggered by a line containing a rule like this:

<original string>::=~<string to output>

When triggered, the original string is deleted (i.e. replaced by the null string) and the string to the right of the ~ character is sent to the output stream.

Criticisms

Laurent Vogel complains that handling of newlines is not well defined in the specifications and proposes the following convention: when a string is sent to the output stream, no newline is printed at the end, except if the string is empty, in which case a newline is all that is printed.

Ørjan Johansen complains that input is a lousy hack that has no way of handling arbitrary strings (think Wikipedia:code injection here) and no well-defined behavior on end of file. This makes an otherwise powerful language useless for implementing I/O-supporting programs properly in.

Chris Pressey complains that the spec does not agree with the reference implementation on the matter of input. The spec defines ::: and ~ as "strings" or "symbols" which "trigger implicit rules", while the implementation treats them as special rule forms. This leaves open questions such as whether a Thue program with no rules, but with initial data containing :::, is intended to replace that string with user input.

Sample programs

Here's the traditional "Hello World!" in Thue:

a::=~Hello World!
::=
a


The following Thue program performs an increment of a binary number entered as the initial state surrounded by "_" characters, in this case the number 1111111111:

1_::=1++
0_::=1

01++::=10
11++::=1++0

_0::=_
_1++::=10

::=

_1111111111_


The following sample program is to demonstrate Thue's nondeterminism (and to show an example of an infinite loop, besides). The program outputs bits in an undefined (and quite possibly random) sequence.

b::=~0
b::=~1
ac::=abc
::=
abc

The following program is a typical truth machine:

*::=:::
1::=1_
_::=~1
0::=~0
::=
*

The following program shows the fallacy of attributing to Thue the same behavior as Markov algorithms. It is supposed to convert a group of * to Roman numerals and it does work correctly as a Markov algorithm. However, due to Thue's non-determinism, it does not work on Thue (Talk:Thue):

*::=I
IIIII::=V
IIII::=IV
VV::=X
VIV::=IX
XXXXX::=L
XXXX::=XL
LL::=C
LXL::=XC
CCCCC::=D
CCCC::=CD
DD::=M
DCD::=CM
::=
******************

This program is an InterpretMe interpreter:

*::=:::
::=
*

The following program is a multiplier and multiplies 3x7. The result is on the state string.

(*::=a(
*)::=)b
(x)::=
ab::=ba*
*b::=b*
a*::=*a
{b::={
a}::=}
*}::=}*
{}::=
::=
{(***x*******)}

Computational class

By showing that there is a reduction to Thue from Type-0 languages in the Chomsky hierarchy (unrestricted grammars), which are computationally equivalent to Turing machines, we can show that Thue is Turing-complete.

Let some arbitrary set of non-zero-length potential substrings of the Thue state be called the nonterminals; the only restriction on these is that they don't show up in the final state of the program. All other substrings are terminals.

Both sides of every Thue substitution rule thus contain an arbitrary combination of terminals and nonterminals. Each Thue substitution rule is in this way equivalent to a production in an unrestricted grammar, and the Thue program is equivalent to the grammar itself.

See also

External resources