2C
2C is an esoteric programming language created by User:ais523 in 2018. It is something of a cross between Thue, a cellular automaton, and a tag system.
Contents
Semantics
The state of a running 2C program is a string of characters (other than newlines and /
characters), initially 1
(i.e. a single 1
character). The string is considered to implicitly have an infinite number of 0
characters prepended to it (thus, 00045
is considered to be a substring of 456
).
The program itself specifies a number of search/replace string pairs (rules), consisting of two strings; however, in each search/replace pair, the search string and its replacement must be the same length, and identical apart from the last character. (So for example, "abc"→"abd" would be a valid replacement, but "axc"→"abc" would not be.) Additionally, a search string may not be a (contiguous) substring of another search string, and may not consist entirely of 0
(except in the degenerate case where the replacement equals the search string).
To run the program, the implementation identifies all occurrences of search strings in the state string, and simultaneously replaces them all with the corresponding replacement string (as only one character is actually replaced in each case, and it's a different character for each such replacement, it's possible to apply all the replacements simultaneously without any ambiguity or confusion). Then it appends a 0
character to the end of the state string, and repeats this process indefinitely or until the state string contains a $
character.
If at any point there is exactly one $
character in the state string, the program halts. If there is ever more than one $
character in the state string, this is undefined behaviour (although halting the program would be a reasonable response).
This specification does not define which character set the characters belong to (although ASCII and Unicode would be good choices).
Syntax
A 2C program is fully defined by the list of search/replace pairs. These are given by writing the search string, optionally a /
character (for clarity), the last character of the replacement string, and a newline (concatenating all the pairs in question).
Variants
Consistent 2C
In Consistent 2C, all search strings in the program are the same length. A 2C program can be trivially converted to Consistent 2C via padding the search strings out to the length of the longest by enumerating all possible prefixes. Programs can be made even more consistent via explicitly listing all possible strings of the given length as search strings (with the replacement string equalling the search string).
Ignorant 2C
Instead of abcd/x
meaning "replace d
with x
when it's preceded with abc
", an alternative view is to replace the character after the search string instead; thus abcd/x
would mean "replace any character with x
if preceded by abcd
". In this version of the language, known as Ignorant 2C, two 0
characters are appended every cycle, rather than one.
In the case of consistent programs that list all possible search strings, Ignorant 2C is exactly equivalent to the original language (the state string simply shifts one step to the right every cycle, with a leading 0
inserted, but it has implicitly many leading 0
s anyway).
012C
012C is a variant of (full) 2C in which the only characters allowed in strings are 0
and 1
(the minimum possible alphabet).
Computational class
Unlimited alphabets
With an unlimited alphabet of characters, 2C can implement a 1dimensional cellular automaton in which the state of each cell on cycle n+1 is a function of its state, and those of its two neighbours, on cycle n; and for which the (infinite) initial tape of cells has only a single cell as nonzero, and zero cells with zero neighbours stay zero; but which can have arbitrarily many states for the cells. (Having arbitrarily many states means that it's possible to simulate finitelymanynonzero or periodicarbitraryperiodic initial conditions via causing the initial cell to expand into a pair of state machines, which proceed to simulate each end of the tape; when moving at the speed of light through an arbitrarythenperiodic state, the cells you encounter will provably eventually enter a cycle, and thus have a finite amount of state which can be simulated using a finite state machine. This means that it's possible to simulate, e.g., the Turingcompleteness construction for Rule 110, making 2C Turingcomplete. It is likely more efficient to implement something like cyclic tag directly, though.)
In order to do this, each 2C character is used to represent a pair of consecutive cellular automaton states, in an overlapping way. (For example, if the cellular automaton state is …000123000…
, the corresponding 2C state would be ⟨01⟩⟨12⟩⟨23⟩⟨30⟩
, where ⟨xy⟩
means "the character representing the pair consisting of x and y".) ⟨00⟩
is represented as 0
; 1
and 2
are not used to represent any pair (because they're used in the startup code).
To start up the simulation, the following rules are used (where i
is the nonzero initial symbol of the cellular automaton):
01/2 02/⟨0i⟩ 20/⟨i0⟩
For each triple of states ⟨wx⟩⟨xy⟩⟨yz⟩
, for which the cellular automaton has rules wxy→a
and xyz→b
, the following rule is used in the 2C program:
⟨wx⟩⟨xy⟩⟨yz⟩/⟨ab⟩
The 2C program now simulates the cellular automaton program, with its state shifting one step to the right each cycle (compared to the cellular automaton). It's also possible to emulate the halting behaviour of the cellular automaton, if it has a halt state (and never generates two halt cells at the same time), via using $
as the representation of every pair whose second element is a halt state (but not pairs whose first element is a halt state).
Because full 2C programs can be trivially compiled to Consistent 2C and thus Ignorant 2C, Ignorant 2C is Turingcomplete as well.
Limited alphabet
012C is also Turingcomplete, which can be demonstrated via compiling Ignorant 2C into it. This means that any alphabet (that contains 0
and 1
, which are mandatory for the padding and initial string respectively), even a limited one, is sufficient to make 2C Turingcomplete.
Choose a as the smallest integer greater than equal to 10 such that 2^{a} is greater than the number of distinct characters used in the Ignorant 2C program. Assign each of the characters that are used a number, with 1
being assigned 1, and larger integers being used for the other characters (a different integer for each character), including 0
. In the 012C program, we represent a character c from the 2C program as the following string:

1
; the cycle number (0based) mod 2^{a+3}, in a+3 bits of littleendian binary;1
(unless c=1, in which case this can legally be either0
or1
); c, in a bits of littleendian binary; 2a5 copies of0
.
The character 0
may alternatively be represented using 4a 0
bits. The initial state of the 012C program thus represents the string 1
in the Ignorant 2C program, as required.
At all points during the execution of the program, the only places at which a 1
is preceded by at least a+5 0
s is at the start of a character, and non0
characters always start with a 1
preceded by at least a+5 0
s (as a+5 ≤ 2a5). As such, because every search sequence in the Ignorant 2C program contains a non0
character, its translation will include this "start of character" sequence somewhere, and thus match only at character boundaries; it's therefore possible to translate an Ignorant 2C rule into a set of 012C rules, one for each of the 4a positions within each character (except for positions within the counter and the 0
padding at the end), that implement it directly. (In the case where the search string includes a 0
or 1
, a separate rule will need to be generated for each of the two encodings of the 0
or 1
character respectively.) The rules are written so as to only match when the cycle number is −1 mod 2^{a+3}; this means that 8a 0
(i.e. two Ignorant 2C 0
characters) are appended every Ignorant 2C cycle, as required. (Note that there will always be a counter visible somewhere to the left to match against, except in the degenerate case where the search string is all0
s, the replacement is 0
, and the character being replaced is 0
using the allzeroes representation, in which the replacement is degenerate and thus can safely be done every cycle.)
The only remaining thing to do is to update the counters, but a cell within a counter is always aware that it's within a counter (because it can see the 1
preceded by a+5 zeroes closely to its left), and all it needs to do to update itself is to toggle if all cells to its left, as far as the start of the character, are 1
cells. (This is the reason that the counters use littleendian notation; it's to make updates possible even though only a partial counter may be visible.) Thus, counterupdating rules can be written to work correctly regardless of the surrounding state (and when a 0
becomes nonzero, the counter will already have the correct value as the cycle number after the replacement will be 0 mod 2^{a+3}, thus even if the allbitszero representation of the 0
was in use the nonreplaced zero bits will happen to form a correct representation of a counter).
See also
 This language was originally created for use in a Turingcompleteness proof of An Odd Rewriting System.