# Malbolge programming

The Malbolge language has eluded many attempts to use it for some years. That was before Lou Scheffer published his cryptanalysis of the encryption algorithm it uses. This article walks the way he opened, explaining many of his findings in detail and other caveats that must be taken into account not mentioned by him.

## Weaknesses

Malbolge's probably biggest piece of evilness comes from the encryption of instructions as they are executed. That encryption, fortunately, has a few weaknesses that make writing Malbolge programs feasible.

### Jumps are not encrypted

First of all let's examine the order in which instruction fetching and execution is performed:

1. Fetch the instruction from `[C]` and decrypt it.
2. Execute the instruction.
3. Encrypt the byte at `[C]`.
4. Increment `C`.

In the case of the jump instruction, whose execution has the effect of changing the value of the code pointer `C` in step 2 above, the memory position that gets encrypted via step 3 is the final value of the code pointer, i.e. the address previous to the target of the jump, instead of the jump instruction itself which remains unencrypted. That's a great advantage as we will see later.

### Cycles of length 2

Another weakness that can be exploited is the existence of length-2 cycles in the encryption step. Every possible instruction can be written as a 2-cycle in which the other instruction in the cycle is a NOP, that is, when an instruction is executed it turns into a NOP and when that NOP is executed it turns into the original instruction.

Here's a table of these memory addresses modulo 94 allowing period-2 instructions where the second is a NOP, together with the instruction itself (there are exactly two possible addresses for each of these instructions, so both are listed):

25 29 `<`
43 47 `/`
59 63 `*`
60 64 `j`
82 86 `p`

The remaining three instructions are irrelevant here: `v` is executed at most once per program, `i` is not encrypted as explained above, and `o` is part of the immutable NOPs mentioned below and the cycle length is not important.

### Immutable NOPs

The last weakness that makes it practical to write Malbolge programs is the existence of at least one immutable NOP for every possible memory address. An immutable NOP is an instruction that always behaves as a NOP no matter the number of times it is executed. However very few addresses allow to directly enter immutable NOPs into the code because the range of allowed values in the source code is very limited. (Lou Scheffer notes that such values can always be entered by using a bug in the reference interpreter which allows values greater than 126 to be entered, then rotating with the `*` instruction, but other authors consider that making use of that bug is cheating.)

Note the distinction between the `o` instruction and any of the other NOPs: the former may be entered directly into the code while the latter can only be obtained by indirect means. Both can be immutable NOPs, though, depending on the address. The following addresses modulo 94 allow to directly enter an immutable NOP into the code by using the `o` instruction:

```6, 8, 10, 17, 26, 27, 37, 42, 48, 51, 82, 86, 88, 92
```

In memory addresses other than these the `o` (NOP) instructions will mutate into a non-NOP instruction after a variable number of executions. For example, an `o` executed at position 0 modulo 94 will become a `j` after 29 executions.

The values 70 and 74 form the 2-cycle: an instruction with value 70 becomes 74 when executed and 74 becomes 70 and this is independent of the memory address (what is dependent is the meaning of that value). They are immutable NOPs for most addresses modulo 94. The following table lists all the addresses that are an exception to this rule, together with sample values which are immutable NOPs for them:

7 60
11 58
24 60
25 60
28 60
29 60
43 60
47 58
59 60
60 58
63 60
64 60
82 62, 80
86 60, 76

The importance of immutable NOPs can easily be seen by realizing that the data pointer `D` is incremented with every instruction executed, just as the code pointer, so in order to position the data pointer as required it's sometimes necessary to advance it by inserting NOP instructions in the code.

## Putting all together

We'll see now how to take advantage of these weaknesses to write actual programs.

### Program flux

Let's assume we want to execute the following code (as in the cat program):

1. Input a character.
2. Output the character.

Easy as it looks, it's not so easy to write in Malbolge. To start with, this must be translated to the following:

1. Input a character (at address 43 mod 94)
3. Output a character (at address 25 mod 94)
5. Reload the data pointer to start again.

Since the instructions are at addresses which form a 2-cycle, in the first execution all instructions will perform their function while in the second they will act as NOPs, then the cycle will start again.

Now it starts to look more promising, doesn't it? Unfortunately that's not the whole story.

### Store operation

Crazy as it is, the crazy operator can be very useful. At a minimum it provides a mechanism for storing values into memory, which is necessary for almost any operation except the most basic ones as the above. For example, checking for EOF before printing the character is not possible without using at least a partial version of the store operation, because there's no way to examine the contents of the accumulator without altering it.

It has to be made in several steps. Prior to the store itself the memory address <addr> in which it is performed must be initialized to 1111111111 ternary (29524 decimal). This can be done in this way:

1. Set `A` to 1111111111t (29524 decimal).

After step 3, the memory word at <addr> will contain 1111111111t no matter the initial value it contained. Now <addr> is ready for a value to be stored in it.

The store of the accumulator `A` into memory now consists of two steps: first OP the value into a location already containing 1111111111t; (this will swap the 0's and 1's in `A`); then OP the resulting value into <addr> (this will unswap the 0's and 1's, leaving the original value stored in both `A` and <addr>). Note that this method makes it necessary to perform the above 3 steps into just another memory address. That's part of the fun of writing Malbolge programs.

There are two simple(ish) ways of loading a value from memory into `A`; a straightforward one is to perform ten rotates, but this is very costly. There's another one using the crazy operator: load `A` with 2222222222t (this requires just one rotation), then OP into the address with the value to load, then load `A` again with 2222222222t, then OP again into that address. After these operations the accumulator will hold the value originally present in that address.

## Not so easy as it looks

Now that we've reached the point where Scheffer stops let's continue from there, pointing out the problems that are encountered on the way. For now we'll assume that the memory can be preloaded at will, even if that's not the case.

### Data pointer load operations need fixup

Let's remember the algorithm for writing a cat program which does not stop on EOF:

1. Input a character (at address 43 mod 94)
3. Output a character (at address 25 mod 94)
5. Reload the data pointer to start again.

The first point to note is that step 5 can't be performed every two cycles: it must take effect immediately or otherwise the jump addresses have to be repeated again, meaning an avoidable waste of preinitialized data space. The solution to this problem is to revert the `j` instruction at step 5 just after it's executed. The way of doing this is to re-execute or "pseudo-execute" it (a pseudo-execution means to jump to the instruction immediately following it, so that the encryption step is performed on it anyway). We add a step 5b executed after step 5:

5b. Encrypt the `j` instruction of step 5.

This will complete the simplified version of the cat program. The design involves working with both the data and the code section at the same time, keeping track especially of the value of the `D` register, as follows (the choice of 97 as the start of the data area is arbitrary):

Data:

97 60
98 42
99 (unused)
100 24
101 (unused)
102 59
103 96

Code:

Step `C` register `D` register Content of `[D]` Ins. `C` after `D` after
1 43 99 - `/` - -
2 44 100 24 `i` 24 -
3 25 101 - `<` - -
4 26 102 59 `i` 59 -
5 60 103 96 `j` - 96
5b 61 97 60 `i` 60 -
6 61 98 42 `i` 42 -

(Remember that both `C` and `D` are incremented between steps, and that the final value of `C` prior to the increment is the position that gets encrypted.)

This will loop forever; every two iterations it will input a byte and output it, and execute two NOPs at the next iteration. It does not stop on EOF; instead it outputs 2222222222t mod 256 = 168 repeatedly.

Now it should be apparent that it's possible to achieve execution of arbitrary code by means of jumping to the places where the instructions are to be executed, but there are still a few problems that need to be addressed: the overhead due to data manipulation and the initialization of memory with the desired values.

### Initialization of memory

We've assumed that the memory can be preloaded at will, even if that's not the case. Why? Because it's possible (in principle) to create an initialization routine that is executed only once, with no loops or repetitions, so that there's no need to care about self-encryption. That routine's mission is to set up the memory just as needed. The problem is that the routine can be very expensive in terms of number of instructions used: many instructions may be needed for one single memory word to reach the desired value. Fortunately it's much easier in Malbolge to write execute-once code than reusable code.

### Overhead due to data manipulation

With the programming method exposed above, when a memory word at address <addr> needs to be manipulated (e.g. for a store operation), for each instruction that acts on it there's a need of:

• A data pointer that positions the data register before it via a `j` instruction (it can be before or after <addr>).
• A program pointer to fix up that `j` instruction.
• A program pointer that jumps to the instruction that manipulates <addr>.
• A program pointer that jumps to the next instruction, which is probably a `j` if more operations on <addr> are needed.
• A certain number of immutable NOPs around the instruction that make the data register skip other words used to manipulate the value at <addr> (if it's the only instruction manipulating <addr>, then these are not necessary).

As we've explained things, it's not possible to reuse an instruction which already has the desired number of immutable NOPs around it in the same turn of the main loop, because the second time it's executed it acts as a NOP. This means that either of these two approaches has to be taken: either the instructions have to be allocated (in different addresses which have the same value mod 94) or they have to be fixed up on the fly in the same manner as the `j` instruction was. The second approach is by far the most desirable one because it means just one extra program pointer per instruction, while the first one implies initialization of lots of immutable NOPs and allocation of many extra instructions. So we add one extra word of overhead to the ones already stated:

• A program pointer to fix up the data manipulation instruction for reuse.

Now it may be more apparent why the approach of making ten rotate operations for one load was not the preferred one, as that would mean ten times the overhead explained above plus the initialization of it all.

### Example: How to store the accumulator into memory (long)

Here's an example of a subprogram that stores the accumulator into a memory position, for the purpose of illustrating how the influence of the overhead mentioned above applies to Malbolge programs. Please note that the code is untested and there might be bugs; it's more an example than anything else. It's possible to reduce the total size, but not much. The code pointers use labels while the data addresses are expressed numerically. Address 200 is chosen as the target of the store/load operation and address 300 is chosen as the ancillary address that has to be filled with 1111111111t as part of the store operation. All instructions except `i` are assumed to be length-2 cycles. The labels for code pointers represent the address of the memory word prior to the address where the label points to. The labels starting with "R_" are the restoring addresses of the corresponding instructions. Thus, code pointers with this prefix directly point to the command's address instead of the memory word before the command. In general, special care must be taken to ensure that the address prior to the target of a jump always contains a character in the printable ASCII range (33-126) or the reference interpreter may crash. Execution starts with data register pointing to address 400.

Code:

Label Command
OPR p
i
Label Command
ROT *
i
Label Command
MOVD j
i
Label Command
LOOP j
i
Label Command
RETURN_FLAG_1 j
i
Label Command
RETURN_FLAG_2 j
i

Data starting at address 199 (target variable):

199 OPR Access target variable
200 (initial value is irrelevant) Target varible
201 R_OPR Restore OPR
203 RETURN_FLAG_1 Test first return flag (the `j` command is intact iff the flag is set)
205 R_RETURN_FLAG_1 Destroy return flag 1
206 RETURN_FLAG_2 Test second return flag
208 R_RETURN_FLAG_2 Destroy return flag 2 (only necessary if further return flags follow)

Data starting at address 299 (temporary variable):

299 OPR
300 (initial value is irrelevant)
301 R_OPR
302 R_MOVD
303 RETURN_FLAG_1
304 407
305 R_RETURN_FLAG_1
306 RETURN_FLAG_2
307 502
308 R_RETURN_FLAG_2

Data starting at address 400 (initialization):

400 R_RETURN_FLAG_1 Unset flags (destroy `j` command)
401 R_RETURN_FLAG_2
403 1111111111t
404 R_ROT
405 R_RETURN_FLAG_1 Store D register by setting return flag 1
406 MOVD
407 298 Write A register into tmp
408 LOOP Write A register into tmp again
409 404
410 R_RETURN_FLAG_1 Store D register by setting return flag 1
411 MOVD
412 198 Write A register into val
413 LOOP Write A register into tmp again
414 409
415 (fetch value that should be stored)

Data starting at address 500 (storing the value):

500 R_RETURN_FLAG_2 Store D register by setting return flag 2
501 MOVD
502 298 Write A register into tmp
503 R_RETURN_FLAG_2 Store D register by setting return flag 2
504 MOVD
505 198 Write A register into val
506 (continuation of program)

Trace:

 Label D register Content of [D] Ins. Comment ... 400 R_RETURN_FLAG_1 `i` Entry R_RETURN_FLAG_1 401 R_RETURN_FLAG_2 `i` Initialize return flags R_RETURN_FLAG_2 402 ROT `i` ROT 403 1111111111t `*` Load A with all 1's R_ROT 404 R_ROT `i` R_ROT 405 R_RETURN_FLAG1 `i` R_RETURN_FLAG1 406 MOVD `i` Save D register MOVD 407 298 `j` R_MOVD 299 OPR `i` OPR 300 ??????????t `p` Opr all 1's into [300] R_OPR 301 R_OPR `i` R_OPR 302 R_MOVD `i` R_MOVD 303 RETURN_FLAG_1 `i` RETURN_FLAG_1 304 407 `j` Restore D register R_RETURN_FLAG_1 408 LOOP `i` LOOP 409 404 `j` Test loop condition -> opr A into [300] again R_LOOP 405 R_RETURN_FLAG1 `i` R_RETURN_FLAG1 406 MOVD `i` Save D register MOVD 407 298 `j` R_MOVD 299 OPR `i` OPR 300 ??????????t `p` Opr into [300], resulting in [300] containing all 1's R_OPR 301 R_OPR `i` R_OPR 302 R_MOVD `i` R_MOVD 303 RETURN_FLAG_1 `i` RETURN_FLAG_1 304 407 `j` Restore D register R_RETURN_FLAG_1 408 LOOP `i` LOOP 409 404 `o` Test loop condition -> exit loop R_LOOP 410 R_RETURN_FLAG_1 `i` R_RETURN_FLAG1 411 MOVD `i` MOVD 412 198 `j` R_MOVD 199 OPR `i` OPR 200 ??????????t `p` Opr all 1's into [200] R_OPR 201 R_OPR `i` R_OPR 202 R_MOVD `i` R_MOVD 203 RETURN_FLAG_1 `i` RETURN_FLAG_1 204 412 `j` Restore D register R_RETURN_FLAG_1 413 LOOP `i` LOOP 414 409 `j` Test loop condition -> opr A into [200] again R_LOOP 410 R_RETURN_FLAG1 `i` R_RETURN_FLAG1 411 MOVD `i` Save D register MOVD 412 198 `j` R_MOVD 199 OPR `i` OPR 200 ??????????t `p` Opr into [200], resulting in [200] containing all 1's R_OPR 201 R_OPR `i` R_OPR 202 R_MOVD `i` R_MOVD 203 RETURN_FLAG_1 `i` RETURN_FLAG_1 204 412 `j` Restore D register R_RETURN_FLAG_1 413 LOOP `i` LOOP 414 409 `o` Test loop condition -> exit loop R_LOOP 415 ... `i` Do something that loads the accumulator with the value to store. ... 500 R_RETURN_FLAG_2 `i` R_RETURN_FLAG_2 501 MOVD `i` Save D register MOVD 502 298 `j` R_MOVD 299 OPR `i` OPR 300 1111111111t `p` Opr A into [300] R_OPR 301 R_OPR `i` R_OPR 302 R_MOVD `i` R_MOVD 303 RETURN_FLAG_1 `i` RETURN_FLAG_1 304 407 `o` Return flag 1 was not set R_RETURN_FLAG_1 305 R_RETURN_FLAG_1 `i` R_RETURN_FLAG_1 306 RETURN_FLAG_2 `i` Unset return flag 1 again RETURN_FLAG_2 307 502 `j` Restore D register R_RETURN_FLAG_2 503 R_RETURN_FLAG_2 `i` R_RETURN_FLAG_2 504 MOVD `i` Save D register MOVD 505 198 `j` R_MOVD 199 OPR `i` OPR 200 1111111111t `p` Opr A into [200] R_OPR 201 R_OPR `i` R_OPR 202 R_MOVD `i` R_MOVD 203 RETURN_FLAG_1 `i` RETURN_FLAG_1 204 412 `o` Return flag 1 was not set R_RETURN_FLAG_1 205 R_RETURN_FLAG_1 `i` R_RETURN_FLAG_1 206 RETURN_FLAG_2 `i` Unset return flag 1 again RETURN_FLAG_2 207 505 `j` Restore D register R_RETURN_FLAG_2 506 ... `i` Go on with program logic

## Reducing the number of immutable NOPs

Instead of controlling the program flow with the code register, which means that all commands are written in their correct order filled with immutable NOPs, it is possible to control the program flow using the data register. This technique has been developed by Hisashi Iizawa et al. and has been used in their 99 bottles of beer program.

Using this technique every instruction in Malbolge (except Jmp) will only occur once in the whole program at a position where the encryption function has a 2-cycle, followed by a Jmp instruction (there may be some exceptions, but this is the basic idea). Thus we know that after the execution of every instruction a Jmp instruction will follow.

 Code pointer to rotate instruction. 0t1111111111 Code pointer to rotate instruction + 1 (restore)

The next data word will be a pointer to another Malbolge command, e.g. the crazy instruction, followed by the cell used by it.

When we control the program flow with the D register, branches can be executed by MoveD. If we use a MoveD instruction with a 2-cycle of the encryption function without a restore operation, we can build a loop that will be executed two times.

Using this technique it's possible to save many immutable NOPs, so the number of cells that have to be initialized decreases dramatically. Simple programs in Malbolge still need many memory cells. However, with this technique programs like a cat that halts on EOF are possible.

Programming with memory addresses is very confusing, and it may be time-consuming to manually write code in Malbolge to initialize the memory cells that are needed. To avoid this one can use a Malbolge assembler. Such an assembler can be found in the external resources: LMAO (Low-level Malbolge Assembler, Ooh!). LMAO comes with some examples written in the Malbolge assembler language HeLL (Hellish Low-level Language), so the user can understand the way HeLL works. One of the examples is a cat program that halts on EOF.

### Introducing assembly language for Malbolge

Let's consider the example above and introduce labels for data pointers.

Variable val (offset 199):

Label Value
opr_val OPR
val (initial value is irrelevant)
R_OPR
R_MOVD
FLAG1
val_back1
R_RETURN_FLAG_1
FLAG2
val_back2
R_RETURN_FLAG_2

Variable tmp (offset 299):

Label Value
opr_tmp OPR
tmp (initial value is irrelevant)
R_OPR
R_MOVD
FLAG1
tmp_back1
R_RETURN_FLAG_1
FLAG2
tmp_back2
R_RETURN_FLAG_2

Initialization (offset 400):

Label Value
ENTRY ROT
1111111111t
R_ROT
clear_tmp R_RETURN_FLAG_1
MOVD
opr_tmp
tmp_back1 LOOP
clear_tmp
clear_val R_RETURN_FLAG_1
MOVD
opr_val
val_back1 LOOP
clear_val
(fetch value that should be stored)

Write value (offset 500):

Label Value
write R_RETURN_FLAG_2
MOVD
opr_tmp
tmp_back2 R_RETURN_FLAG_2
MOVD
opr_val
val_back2 (continuation of program)

We just introduced a Malbolge assembly language and got rid of absolute positions for code or data sections. Thus, the code above can be extended easily. A Malbolge program can be created from the assembly program above as follows. We must choose positions of code and data sections, replace labels by the address of the memory word prior to the address where the label points to, and write use-once Malbolge code to initialize everything. However, one may use a Malbolge assembler in order to automatize this process.