# Transceternal

Paradigm(s) Graph-rewriting paradigm User:Hakerh400 2019 Turing complete Interpreter `.txt`

Transceternal is very similar to Realm, but while Realm stores some data outside the graph (source code, instruction pointer, nested level, etc), this programming language stores everything in the memory.

## Memory

Memory representation is the same as in Realm. We suggest reading it first, because in this article we do not explain things that are explained there.

## Syntax

All strings are valid Transceternal source code. The first thing that parser does is tokenizing the source code. There are three cases:

• If the source code is empty, or consists only of whitespace characters, the list of tokens contains three identical tokens (irrelevant which).
• If the source code does not contain any whitespace characters, the list of tokens contains all non-whitespace characters in the same order that appear in the source code.
• If the source code contains at least one whitespace character, the list of tokens contains all continuous sequences of non-whitespace characters in the same order that appear in the source code.

For example, empty program is tokenized to `["0", "0", "0"]` (`0` can be replaced by any other sequence of non-whitespace characters). Program `abc` is tokenized to `["a", "b", "c"]`. Program `abc defgh xyz` is tokenized to `["abc", "defgh", "xyz"]`.

## Initial memory construction

After converting the source code to a list of tokens, we create a new node for each token. Then, iterate over tokens and for each token get the corresponding node and assign the next nodes (represented by the next tokens) to the 0-pointer and 1-pointer of the current node. We do it in the depth-first manner.

Here is an example. Given the following source code:

```012032004051025
```

Each character is a token (because there is no whitespace). We start with the first token (digit `0`) and it represents the root node. We add it to the stack (underscores mean that pointers are missing):

```Stack:
0: _ _
```

On the stack we have a single node and it corresponds to the token `0`. It is missing two pointers. Now, for each token, we add it to the first missing pointer of the node at the top of the stack, then we remove that node if it has two pointers, and if we have not seen the new token before, add a new entry to the top of the stack. In this example, The next token is `1`, we add it to the first missing pointer of the root node, we do not remove the root node (bacause it has still 1 missing pointer) and we add `1` to the stack, because it is a new token:

```Stack:
0: 1 _
1: _ _
```

We repeat the process until we consume all tokens. At the end, the stack will be empty and we will have the following graph (node tokens on the left and pointers on the right):

```0: 1 4
1: 2 0
2: 0 3
3: 2 0
4: 0 5
5: 1 0
```

This is our initial memory. We still have two tokens left: `2` and `5`, but since we have already formed the graph, we ignore them. On the other hand, if we had missing some tokens to fill all pointers, for each missing pointer we assign the node itself (so that each missing pointer points to the node that contains the pointer).

## Initializing input string

After constructing a graph from the list of tokens, the graph is not yet ready for computation. We need to inject the input string first, since all information must be stored in the graph (the graph uniquely represents execution state). To do that, we convert the given input string, represented as an array of bytes, to an array of bits (8 bits for each byte, bits are from lowest to highest). For example, string `abc` is converted to the following array of bits:

```100001100100011011000110
```

Then we convert the array of bits to a graph in the following way. For each bit we create a new node. 1-pointer of the node points to the next node (corresponding to the next bit in the array) and the 1-pointer of the last node points to `B0` node (we will explain later what that node means). 0-pointer of each node points to `B0` node or `B1` node if the corresponding bit is `0` or `1`, respectively (we will also explain what `B1` node is).

The reverse of this algorithm (for converting a graph to an array of bits) we use later in program execution. After constructing a graph that corresponds to the input string, we create a new node and assign the old root node (from the step Initial memory construction) to the 0-pointer and we assign the input string graph to the 1-pointer. Then we make this new node the root node.

Node `B0` is the node at address `000` (see addressing in Realm) and node `B1` is the node at address `001` after step Initializing input string. Or, in other words, nodes at addresses `00` and `01`, respectively, before step Initializing input string.

The graph is now ready and the computation can begin.

## Parsing address from the given node

We do not apply this algorithm immediately, but we explain how it works, because it is used later in the computation. Given a node in the graph, we parse an array of bits from it in the following way.

Let the array be empty. Repeat the following until the current node becomes the node at address `000`, or if we reached the same node two times while parsing. In the loop, if the 0-pointer of the current node points to the node at address `000`, then add bit `0` to the sequence, otherwise add bit `1` to the sequence. Update the current node to point to the 1-pointer of the previous 0-node. After the loop ends, return the array of bits.

Note that this algorithm is somewhat a reverse of the algorithm for converting an array of bits to a graph. If we apply this algorithm to the graph that we obtained from the input array of bits, we will get the input array of bits back (this does not work always - for example if nodes at addresses `000` and `001` are equal, we will get all zeros in the sequence).

## Computation

This is the main loop. If the node at address `01` is equal to the node at address `000`, then terminate the program. Otherwise, do the following cases (we check cases in order, if a case matches, we skip other cases in that iteration).

### Case 1

If the node at address `0100` is equal to the node at address `000`, then let `A1` be the parsed address at node `01010` and `A2` be the parsed address at node `01011`, then assign the node at address A2 to the address A1 and after that assign the node at address `011` to the address `01`.

### Case 2

If the node at address `0100` is equal to the node at address `001`, then let `A1` be the parsed address at node `01010`, `A2` be the parsed address at node `010110` and `A3` be the parsed address at node `010111`, then create a new node that points to nodes from addresses A2 and A3 (0-pointer and 1-pointer, respectively) and assign it to the address A1. Like in the previous case, after that assign the node at address `011` to the address `01`.

### Case 3

If no other cases can be applied, let `A1` be the parsed address at node `010100` and `A2` be the parsed address at node `010101`. If the node at address A1 is equal to the node at address A2, then assign the node at address `01011` to the address `01`, otherwise assign the node at address `011` to the address `01`.

## Obtaining the output

After the main loop ends, we obtain the output array of bits by parsing the address from the node at address `1`, then we convert it to array of bytes (append zeros if necessary to form a byte) and it represents the output string.

## Similarities to Realm

Case 1 corresponds to the assignment in Realm, Case 2 corresponds to allocating a new node and Case 3 corresponds to loops. Any Realm program can be converted to a Transceternal program.

## Examples

### Empty program

If the source code is empty, or consists only of whitespace characters, the list of tokens will contain three identical tokens (we say tokens are `a` for simplicity). Then we construct our initial graph. It is a graph that contains a single node whose both pointers point to itself:

```Initial graph:
a: a a
```

Then we inject the input and switch roots:

```Graph ready for computation:
root: a input
a: a a
input: ...
```

where `input` is the root node of the graph obtained by converting the input string of bits to a graph. Now we start the main loop. It says that if the node at address `01` is eqaul to the node at address `000`, then break the main loop. The node at address `01` is `a` and the node at address `000` is also `a`, so the main loop has zero iterations. Now we need to extract the output. We parse the sequence of bits from the node at address `1` (which is node `input`). Since nodes at addresses `000` and `001` are identical (it is node `a`), all bits in the output are zeros, but the number of bits do not change.

Therefore, the empty program converts all ones from the input string of bits to zeros. In other words, it outputs the `NULL` character (character with char code `00`) repeated the number of times equal to the length of the input string.

### Cat

```catacat
```

First, we construct the graph:

```Graph:
root: c input
c: a t
a: t a
t: a c
input: ...
```

The main loop starts. The node at address `01` is `t` and the node at address `000` is also `t`, so the main loop ends without doing any iterations. Then we parse the output string of bits. The node at address `000` is `t` and the node at address `001` is `a`, so the output string is identical to the input string.

### Output a digit

This program ignores the input and outputs ASCII digit `3`

```0122233445262778889A2B9C2A2
```

The main loop has a single iteration and it is the Case 1.

### Hello, World!

Ignores the input and outputs the `Hello, World!` string:

```00 01 02 02 02 03 02 04 02 05 02 06 00 07 02 08 02 09 00 0A 02 0B 00 0C 02 0D 00 0E 02 0F 02 10
00 11 00 12 02 13 02 14 02 15 00 16 00 17 02 18 00 19 00 1A 02 1B 02 1C 02 1D 00 1E 00 1F 02 20
00 21 00 22 02 23 00 24 00 25 00 26 00 27 02 28 00 29 00 2A 02 2B 02 2C 02 2D 00 2E 00 2F 02 30
00 31 02 32 02 33 02 34 02 35 02 36 02 37 02 38 00 39 02 3A 02 3B 00 3C 00 3D 00 3E 02 3F 00 40
02 41 00 42 02 43 00 44 00 45 00 46 00 47 02 48 00 49 00 4A 02 4B 02 4C 00 4D 02 4E 02 4F 00 50
00 51 00 52 02 53 02 54 02 55 00 56 00 57 02 58 00 59 00 5A 02 5B 02 5C 02 5D 00 5E 02 5F 02 60
00 61 00 62 02 63 00 64 02 65 02 66 02 67 02 68 00 69 02 6A 02 02 6B 6C 02 6D 6B 6E 02 6C 02
```

Note: this is not the hexadecimal representation of the source code, but the literal text. We could write it in a lot of different ways.