Thutu

From Esolang
Jump to navigation Jump to search

Thutu is an esoteric programming language created by User:ais523 and released in 2007. It was based on Thue; although it was created independently of Thubi, the languages evolved along similar lines. Thutu resembles Thue and ///, but uses regular expressions rather than just strings, and has a more sophisticated flow-control system (in Thue, lines execute nondeterministically, or sequentially in ///, but in Thutu the order of flow can be controlled using command characters).

Syntax

Each line of a Thutu program must take one of the following forms:

Comments

A comment consists of nothing but spaces and tab characters, followed by the # symbol, followed by anything up to the end of the line. Comments have no meaning in Thutu and must be ignored by an interpreter (they can be used to provide metadata to a surrounding environment if that environment accepts it, such as the #! notation in Unix, but this is not an issue for the language).

Statements

All Thutu statements start with 0 or more horizontal whitespace characters (spaces and tabs). A tab is always interpreted as being exactly equivalent to 8 spaces, even though this is not the behaviour that a tab normally designates. The statement is then followed by 0 or more regular expressions, and either a command character or a replacement string (slash-terminated) at the end; each statement must contain either at least 1 regexp and a replacement or a command character. If there are any regular expressions, a slash is written between each regexp and also at the start and end of the list of regexps, so the following forms are possible:

# Command character, no regexps
.
# Regexp and replacement
/foo/bar/
# Regexps and command character
/baz/qux/quux/!
# Multiple regexps and a replacement
/corge/grault/garply/waldo/

(Note that it is not necessary for metasyntactic variables used in Thutu examples to follow the list in RFC 3092, but this practice is recommended.)

Regexp syntax

The following syntax is used for regexps:

\ (When the next character is punctuation) Remove any special meaning the next character might have, even if it's a slash or backslash. Removing the special meaning of a slash prevents it delimiting the end of a regexp, making it possible to write slashes within a regexp.
? Match the previous group or character 1 (preferred) or 0 times.
?? Match the previous group or character 0 (preferred) or 1 times.
+ Match the previous group or character 1 or more times (preferred as many as possible).
+? Match the previous group or character 1 (preferred) or more times.
* Match the previous group or character 0 or more times (preferred as many as possible).
*? Match the previous group or character 0 (preferred) or more times.
(...) Match the regexp inside the brackets. This creates a group which can be referenced by the replacement (if any) or within the regexp, and the entire group can be modified with ?*+ and combinations.
| Match either what's to the left or to the right of the vertical bar, up to the boundaries of the containing group.
\number Match the numberth group (i.e. both groups must have the same content to match. This operation prevents the regexps used by Thutu from being strictly regular, but it can make programming easier.) Note that \ followed by a letter is undefined.
[characters] Match any one of the characters inside the square brackets.
[^characters] Match any character that isn't inside the square brackets.
. Match any character.
^ Match the start of the main string.
$ Match the end of the main string.

A preferred match will always be made in preference to a less preferred match if the repeated/optional character would be in the same place in the string; however, (ab)* might match either the first or second set of a's and b's in "ababcdefabababab", because they are in different places in the string. (It couldn't match nothing or just the first ab, for instance, whereas (ab)+? would match just one occurence of ab).

Replacements are just strings, except that \ can be used to remove the special meaning from punctuation characters, and $number represents whatever was in the numberth group from the last regexp before the replacement.

The role of whitespace

Whitespace is significant in Thutu. All the command characters @^!* create a block of code, which lasts until the next line with the same amount of whitespace at the start of the line (the block doesn't include the line ending with these characters (the block marker line), and doesn't include the line with the same indentation that ends it). (Blocks can be nested by using multiple different indentation levels). All lines within the block must have more whitespace than the block marker line, and there must be some line with the same whitespace as the block marker line to end the block (., the nop command, is useful for this purpose, as well as for adding vertical space to make programs more readable). The entire program is also considered to be a block, with an imaginary block marker line with a negative amount of whitespace at the start that says nothing but @. Also, the first line of the program must not start with whitespace, and the amount of whitespace at the start of a line can't be increased or decreased except to delimit a block. (Therefore, the language itself enforces good indentation practices on programmers.)

Execution

The entire state of a Thutu program is contained in the instruction pointer, and one string (the main string) of unlimited length. At the start of the program, the main string is "=1" (the quotes are not part of the initial string). After the imaginary @-block in which the program is contained exits, the string is examined. If it contains "=x" anywhere, everything before the =x is unescaped and printed to standard output, and the =x and everything before it is removed from the string. If the string contains "=9" anywhere, the program exits. (If it contains both =9 and =x, the string is printed and then the program ends if the =x comes first, but if the =9 comes first, the behaviour is undefined.) If the string didn't contain =9, one line of input, minus the terminating newline, is read, and that string is escaped, "=x" is appended to it, and then inserted at the start of the main string. Should the input be at EOF, instead "=9" is inserted at the start of the string. Then after the main string is modified in this way, the @-block at the start of the program is restarted and the program can act on the new input. This method is the only way to do I/O in Thutu.

Escaping / Unescaping

To make it easier to read and/or output special characters, as well as to allow the program to use strings for working that it knows won't be in input, the input is escaped or unescaped with = characters.

Unescaped Escaped
Tab =t
Newline =n
Carriage return =r
Form feed =f
Alert character (the one that goes beep when printed) =a
Escape character =e
= =q
Any other punctuation character = followed by that character

Commands

All Thutu commands are guardable except ., the nop statement (which can never have regexps before it); the regexps at the start of each command (apart from the last regexp and replacement string if there isn't a command character) are guards which control when the statement has any effect. For most command characters, if any of the guards don't match, the command isn't run and any block it might have is skipped, although some act differently.

No command character (regexps and replacement)

A line of the form /guard regexp/guard regexp/guard regexp/regexp/replacement/ (with no command character) is a replacement line, which forms much of most Thutu programs. If any of the guards or the regexp don't match the main string anywhere, the replacement line has no effect. Otherwise, one match of the last regexp is removed from the main string and replaced by the replacement given (any $number codes in the replacement will insert the numberth group in the last regexp that was removed, so, for instance, /(.)/$1$1/ will double some character in the main string, and /(.*)/$1$1/ will double the entire main string). Then, program flow returns to either the block marker line for the current block, or the start of the current block (depending on what the block marker line is).

. (nop)

The . statement has no effect. It's useful to make lines that you'd otherwise leave blank for style reasons syntactically legal, and to close blocks when several blocks in a row need to be closed (as its indentation still matters); when used to close blocks, it's sometimes called a dedentation marker.

< (reloop) and > (deloop)

If all the guards are met, < transfers program flow to the block marker line for or start of the current block (depending on the block marker line itself). Likewise, if all the guards are met, > transfers program flow to just after the end of the current block. Neither command does anything if any guards aren't met.

Block marker commands: @^!*

There are four similar commands which are used only at the end of block marker lines.

  • * enters the block if all the guards are met, and < and replacements transfer flow back to the block marker line.
  • @ enters the block if all the guards are met, and < and replacements transfer flow back to the first line in the block.
  • ! enters the block if none of the guards are met, and < and replacements transfer flow back to the block marker line.
  • ^ enters the block if none of the guards are met, and < and replacements transfer flow back to the first line in the block.

In all cases, if the end of the block is reached, program flow continues to the next line of source code rather than continuing within the block. When unguarded, all four of these command characters have the same effect (that of looping until no replacements in the block they mark match).

Computational class

An Underload interpreter (that messes up quoting using ", which is not required for Turing-completeness) has been written in Thutu (see the External Resources), proving Turing-completeness. The fact that a Thutu to Perl compiler exists demonstrates that Thutu is not uncomputable.

See also

External resources

  • Thutu at the The Esoteric File Archive (includes a reference compiler written in Perl, a slightly buggy interactive Underload interpreter written in Thutu as a Turing-completeness demonstration, and example programs)
  • The reference compiler by ais523 for Thutu2, a wimpmode of Thutu with built-in arithmetic and string functions