2C

From Esolang
Jump to navigation Jump to search

2C is an esoteric programming language created by User:ais523 in 2018. It is something of a cross between Thue, a cellular automaton, and a tag system.

Semantics

The state of a running 2C program is a string of characters (other than newlines and / characters), initially 1 (i.e. a single 1 character). The string is considered to implicitly have an infinite number of 0 characters prepended to it (thus, 00045 is considered to be a substring of 456).

The program itself specifies a number of search/replace string pairs (rules), consisting of two strings; however, in each search/replace pair, the search string and its replacement must be the same length, and identical apart from the last character. (So for example, "abc"→"abd" would be a valid replacement, but "axc"→"abc" would not be.) Additionally, a search string may not be a (contiguous) substring of another search string, and may not consist entirely of 0 (except in the degenerate case where the replacement equals the search string).

To run the program, the implementation identifies all occurrences of search strings in the state string, and simultaneously replaces them all with the corresponding replacement string (as only one character is actually replaced in each case, and it's a different character for each such replacement, it's possible to apply all the replacements simultaneously without any ambiguity or confusion). Then it appends a 0 character to the end of the state string, and repeats this process indefinitely or until the state string contains a $ character.

If at any point there is exactly one $ character in the state string, the program halts. If there is ever more than one $ character in the state string, this is undefined behaviour (although halting the program would be a reasonable response).

This specification does not define which character set the characters belong to (although ASCII and Unicode would be good choices).

Syntax

A 2C program is fully defined by the list of search/replace pairs. These are given by writing the search string, optionally a / character (for clarity), the last character of the replacement string, and a newline (concatenating all the pairs in question).

Variants

Consistent 2C

In Consistent 2C, all search strings in the program are the same length. A 2C program can be trivially converted to Consistent 2C via padding the search strings out to the length of the longest by enumerating all possible prefixes. Programs can be made even more consistent via explicitly listing all possible strings of the given length as search strings (with the replacement string equalling the search string).

Ignorant 2C

Instead of abcd/x meaning "replace d with x when it's preceded with abc", an alternative view is to replace the character after the search string instead; thus abcd/x would mean "replace any character with x if preceded by abcd". In this version of the language, known as Ignorant 2C, two 0 characters are appended every cycle, rather than one.

In the case of consistent programs that list all possible search strings, Ignorant 2C is exactly equivalent to the original language (the state string simply shifts one step to the right every cycle, with a leading 0 inserted, but it has implicitly many leading 0s anyway).

01-2C

01-2C is a variant of (full) 2C in which the only characters allowed in strings are 0 and 1 (the minimum possible alphabet).

Couplet 2C

Couplet 2C is a variant of Consistent 2C in which every string has length 2.

Computational class

Unlimited alphabet and string length

With an unlimited alphabet of characters, 2C can implement a 1-dimensional cellular automaton in which the state of each cell on cycle n+1 is a function of its state, and those of its two neighbours, on cycle n; and for which the (infinite) initial tape of cells has only a single cell as nonzero, and zero cells with zero neighbours stay zero; but which can have arbitrarily many states for the cells. (Having arbitrarily many states means that it's possible to simulate finitely-many-nonzero or periodic-arbitrary-periodic initial conditions via causing the initial cell to expand into a pair of state machines, which proceed to simulate each end of the tape; when moving at the speed of light through an arbitrary-then-periodic state, the cells you encounter will provably eventually enter a cycle, and thus have a finite amount of state which can be simulated using a finite state machine. This means that it's possible to simulate, e.g., the Turing-completeness construction for Rule 110, making 2C Turing-complete. It is likely more efficient to implement something like cyclic tag directly, though.)

In order to do this, each 2C character is used to represent a pair of consecutive cellular automaton states, in an overlapping way. (For example, if the cellular automaton state is …000123000…, the corresponding 2C state would be ⟨01⟩⟨12⟩⟨23⟩⟨30⟩, where xy means "the character representing the pair consisting of x and y".) ⟨00⟩ is represented as 0; 1 and 2 are not used to represent any pair (because they're used in the startup code).

To start up the simulation, the following rules are used (where i is the nonzero initial symbol of the cellular automaton):

01/2
02/⟨0i⟩
20/⟨i0⟩

For each triple of states wx⟩⟨xy⟩⟨yz, for which the cellular automaton has rules wxya and xyzb, the following rule is used in the 2C program:

wx⟩⟨xy⟩⟨yz⟩/⟨ab

The 2C program now simulates the cellular automaton program, with its state shifting one step to the right each cycle (compared to the cellular automaton). It's also possible to emulate the halting behaviour of the cellular automaton, if it has a halt state (and never generates two halt cells at the same time), via using $ as the representation of every pair whose second element is a halt state (but not pairs whose first element is a halt state).

Because full 2C programs can be trivially compiled to Consistent 2C and thus Ignorant 2C, Ignorant 2C is Turing-complete as well.

As a side note, a considerably simpler compilation method is available when all the cellular automaton's rules of the form x00 produce 0: a rule xyzb can be compiled into xyz/b, using 1 as the initial symbol and 0 as the infinite whitespace surrounding it. For example, a Rule 110 interpreter starting from the …0001000… initial condition (note: this does not prove Turing-completeness by itself because this initial condition is not known to be Turing-complete):

000/0
001/1
010/1
011/1
100/0
101/1
110/1
111/0

Unlimited alphabet, limited string length

The technique of representing two consecutive characters x and y as a single character xy can also be used to compile Consistent 2C into itself. For example, if rules wxy/a and xyz/b exist in the original program, the result of the compilation would have a rule wx⟩⟨yz⟩/⟨ab. (When using this technique, any character representing two consecutive characters, one of which is a $, is represented as $, regardless of whether it's the first or second character.) This technique compiles a Consistent 2C program with string length 2n-1 into a Consistent 2C program with string length n (and squares the size of the character set).

If this technique is used repeatedly, it can therefore convert any Consistent 2C program into an equivalent Couplet 2C program, and thus Couplet 2C is Turing-complete.

Limited alphabet, unlimited string length

01-2C is also Turing-complete, which can be demonstrated via compiling Ignorant 2C into it. This means that any alphabet (that contains 0 and 1, which are mandatory for the padding and initial string respectively), even a limited one, is sufficient to make 2C Turing-complete.

Choose a as the smallest integer greater than or equal to 10 such that 2a is greater than the number of distinct characters used in the Ignorant 2C program. Assign each of the characters that are used a number, with 1 being assigned 0, and larger integers being used for the other characters (a different integer for each character), including 0. In the 01-2C program, we represent a character c from the 2C program as the following string:

  • 1; the cycle number (0-based) mod 2a+3, in a+3 bits of little-endian binary; 1 (unless c=0, in which case this can legally be either 0 or 1); c, in a bits of little-endian binary; 2a-5 copies of 0.

The character 0 may alternatively be represented using 4a 0 bits. The initial state of the 01-2C program thus represents the string 1 in the Ignorant 2C program, as required.

At all points during the execution of the program, the only places at which a 1 is preceded by at least a+5 0s is at the start of a character, and non-0 characters always start with a 1 preceded by at least a+5 0s (as a+5 ≤ 2a-5). As such, because every search sequence in the Ignorant 2C program contains a non-0 character, its translation will include this "start of character" sequence somewhere, and thus match only at character boundaries; it's therefore possible to translate an Ignorant 2C rule into a set of 01-2C rules, one for each of the 4a positions within each character (except for positions within the counter and the 0 padding at the end), that implement it directly. (In the case where the search string includes a 0 or 1, a separate rule will need to be generated for each of the two encodings of the 0 or 1 character respectively.) The rules are written so as to only match when the cycle number is −1 mod 2a+3; this means that 8a 0 (i.e. two Ignorant 2C 0 characters) are appended every Ignorant 2C cycle, as required. (Note that there will always be a counter visible somewhere to the left to match against, except in the degenerate case where the search string is all-0s, the replacement is 0, and the character being replaced is 0 using the all-zeroes representation, in which the replacement is degenerate and thus can safely be done every cycle.)

The only remaining thing to do is to update the counters, but a cell within a counter is always aware that it's within a counter (because it can see the 1 preceded by a+5 zeroes closely to its left), and all it needs to do to update itself is to toggle if all cells to its left, as far as the start of the character, are 1 cells. (This is the reason that the counters use little-endian notation; it's to make updates possible even though only a partial counter may be visible.) Thus, counter-updating rules can be written to work correctly regardless of the surrounding state (and when a 0 becomes nonzero, the counter will already have the correct value as the cycle number after the replacement will be 0 mod 2a+3, thus even if the all-bits-zero representation of the 0 was in use the non-replaced zero bits will happen to form a correct representation of a counter).

See also