Transceternal
Paradigm(s) | Graph-rewriting paradigm |
---|---|
Designed by | User:Hakerh400 |
Appeared in | 2019 |
Computational class | Turing complete |
Major implementations | Interpreter |
File extension(s) | .txt |
Transceternal is very similar to Realm, but while Realm stores some data outside the graph (source code, instruction pointer, nested level, etc), this programming language stores everything in the memory.
Memory
Memory representation is the same as in Realm. We suggest reading it first, because in this article we do not explain things that are explained there.
Syntax
All strings are valid Transceternal source code. The first thing that parser does is tokenizing the source code. There are three cases:
- If the source code is empty, or consists only of whitespace characters, the list of tokens contains three identical tokens (irrelevant which).
- If the source code does not contain any whitespace characters, the list of tokens contains all non-whitespace characters in the same order that appear in the source code.
- If the source code contains at least one whitespace character, the list of tokens contains all continuous sequences of non-whitespace characters in the same order that appear in the source code.
For example, empty program is tokenized to ["0", "0", "0"]
(0
can be replaced by any other sequence of non-whitespace characters). Program abc
is tokenized to ["a", "b", "c"]
. Program abc defgh xyz
is tokenized to ["abc", "defgh", "xyz"]
.
Initial memory construction
After converting the source code to a list of tokens, we create a new node for each token. Then, iterate over tokens and for each token get the corresponding node and assign the next nodes (represented by the next tokens) to the 0-pointer and 1-pointer of the current node. We do it in the depth-first manner.
Here is an example. Given the following source code:
012032004051025
Each character is a token (because there is no whitespace). We start with the first token (digit 0
) and it represents the root node. We add it to the stack (underscores mean that pointers are missing):
Stack: 0: _ _
On the stack we have a single node and it corresponds to the token 0
. It is missing two pointers. Now, for each token, we add it to the first missing pointer of the node at the top of the stack, then we remove that node if it has two pointers, and if we have not seen the new token before, add a new entry to the top of the stack. In this example, The next token is 1
, we add it to the first missing pointer of the root node, we do not remove the root node (bacause it has still 1 missing pointer) and we add 1
to the stack, because it is a new token:
Stack: 0: 1 _ 1: _ _
We repeat the process until we consume all tokens. At the end, the stack will be empty and we will have the following graph (node tokens on the left and pointers on the right):
0: 1 4 1: 2 0 2: 0 3 3: 2 0 4: 0 5 5: 1 0
This is our initial memory. We still have two tokens left: 2
and 5
, but since we have already formed the graph, we ignore them. On the other hand, if we had missing some tokens to fill all pointers, for each missing pointer we assign the node itself (so that each missing pointer points to the node that contains the pointer).
Initializing input string
After constructing a graph from the list of tokens, the graph is not yet ready for computation. We need to inject the input string first, since all information must be stored in the graph (the graph uniquely represents execution state). To do that, we convert the given input string, represented as an array of bytes, to an array of bits (8 bits for each byte, bits are from lowest to highest). For example, string abc
is converted to the following array of bits:
100001100100011011000110
Then we convert the array of bits to a graph in the following way. For each bit we create a new node. 1-pointer of the node points to the next node (corresponding to the next bit in the array) and the 1-pointer of the last node points to B0
node (we will explain later what that node means). 0-pointer of each node points to B0
node or B1
node if the corresponding bit is 0
or 1
, respectively (we will also explain what B1
node is).
The reverse of this algorithm (for converting a graph to an array of bits) we use later in program execution. After constructing a graph that corresponds to the input string, we create a new node and assign the old root node (from the step Initial memory construction) to the 0-pointer and we assign the input string graph to the 1-pointer. Then we make this new node the root node.
Node B0
is the node at address 000
(see addressing in Realm) and node B1
is the node at address 001
after step Initializing input string. Or, in other words, nodes at addresses 00
and 01
, respectively, before step Initializing input string.
The graph is now ready and the computation can begin.
Parsing address from the given node
We do not apply this algorithm immediately, but we explain how it works, because it is used later in the computation. Given a node in the graph, we parse an array of bits from it in the following way.
Let the array be empty. Repeat the following until the current node becomes the node at address 000
, or if we reached the same node two times while parsing. In the loop, if the 0-pointer of the current node points to the node at address 000
, then add bit 0
to the sequence, otherwise add bit 1
to the sequence. Update the current node to point to the 1-pointer of the previous 0-node. After the loop ends, return the array of bits.
Note that this algorithm is somewhat a reverse of the algorithm for converting an array of bits to a graph. If we apply this algorithm to the graph that we obtained from the input array of bits, we will get the input array of bits back (this does not work always - for example if nodes at addresses 000
and 001
are equal, we will get all zeros in the sequence).
Computation
This is the main loop. If the node at address 01
is equal to the node at address 000
, then terminate the program. Otherwise, do the following cases (we check cases in order, if a case matches, we skip other cases in that iteration).
Case 1
If the node at address 0100
is equal to the node at address 000
, then let A1
be the parsed address at node 01010
and A2
be the parsed address at node 01011
, then assign the node at address A2 to the address A1 and after that assign the node at address 011
to the address 01
.
Case 2
If the node at address 0100
is equal to the node at address 001
, then let A1
be the parsed address at node 01010
, A2
be the parsed address at node 010110
and A3
be the parsed address at node 010111
, then create a new node that points to nodes from addresses A2 and A3 (0-pointer and 1-pointer, respectively) and assign it to the address A1. Like in the previous case, after that assign the node at address 011
to the address 01
.
Case 3
If no other cases can be applied, let A1
be the parsed address at node 010100
and A2
be the parsed address at node 010101
. If the node at address A1 is equal to the node at address A2, then assign the node at address 01011
to the address 01
, otherwise assign the node at address 011
to the address 01
.
Obtaining the output
After the main loop ends, we obtain the output array of bits by parsing the address from the node at address 1
, then we convert it to array of bytes (append zeros if necessary to form a byte) and it represents the output string.
Similarities to Realm
Case 1 corresponds to the assignment in Realm, Case 2 corresponds to allocating a new node and Case 3 corresponds to loops. Any Realm program can be converted to a Transceternal program.
Examples
Empty program
If the source code is empty, or consists only of whitespace characters, the list of tokens will contain three identical tokens (we say tokens are a
for simplicity). Then we construct our initial graph. It is a graph that contains a single node whose both pointers point to itself:
Initial graph: a: a a
Then we inject the input and switch roots:
Graph ready for computation: root: a input a: a a input: ...
where input
is the root node of the graph obtained by converting the input string of bits to a graph. Now we start the main loop. It says that if the node at address 01
is eqaul to the node at address 000
, then break the main loop. The node at address 01
is a
and the node at address 000
is also a
, so the main loop has zero iterations. Now we need to extract the output. We parse the sequence of bits from the node at address 1
(which is node input
). Since nodes at addresses 000
and 001
are identical (it is node a
), all bits in the output are zeros, but the number of bits do not change.
Therefore, the empty program converts all ones from the input string of bits to zeros. In other words, it outputs the NULL
character (character with char code 00
) repeated the number of times equal to the length of the input string.
Cat
catacat
First, we construct the graph:
Graph: root: c input c: a t a: t a t: a c input: ...
The main loop starts. The node at address 01
is t
and the node at address 000
is also t
, so the main loop ends without doing any iterations. Then we parse the output string of bits. The node at address 000
is t
and the node at address 001
is a
, so the output string is identical to the input string.
Output a digit
This program ignores the input and outputs ASCII digit 3
0122233445262778889A2B9C2A2
The main loop has a single iteration and it is the Case 1.
Hello, World!
Ignores the input and outputs the Hello, World!
string:
00 01 02 02 02 03 02 04 02 05 02 06 00 07 02 08 02 09 00 0A 02 0B 00 0C 02 0D 00 0E 02 0F 02 10 00 11 00 12 02 13 02 14 02 15 00 16 00 17 02 18 00 19 00 1A 02 1B 02 1C 02 1D 00 1E 00 1F 02 20 00 21 00 22 02 23 00 24 00 25 00 26 00 27 02 28 00 29 00 2A 02 2B 02 2C 02 2D 00 2E 00 2F 02 30 00 31 02 32 02 33 02 34 02 35 02 36 02 37 02 38 00 39 02 3A 02 3B 00 3C 00 3D 00 3E 02 3F 00 40 02 41 00 42 02 43 00 44 00 45 00 46 00 47 02 48 00 49 00 4A 02 4B 02 4C 00 4D 02 4E 02 4F 00 50 00 51 00 52 02 53 02 54 02 55 00 56 00 57 02 58 00 59 00 5A 02 5B 02 5C 02 5D 00 5E 02 5F 02 60 00 61 00 62 02 63 00 64 02 65 02 66 02 67 02 68 00 69 02 6A 02 02 6B 6C 02 6D 6B 6E 02 6C 02
Note: this is not the hexadecimal representation of the source code, but the literal text. We could write it in a lot of different ways.