Malbolge programming
The Malbolge language has eluded many attempts to use it for some years. That was before Lou Scheffer published his cryptanalysis of the encryption algorithm it uses. This article walks the way he opened, explaining many of his findings in detail and other caveats that must be taken into account not mentioned by him.
Weaknesses
Malbolge's probably biggest piece of evilness comes from the encryption of instructions as they are executed. That encryption, fortunately, has a few weaknesses that make writing Malbolge programs feasible.
Jumps are not encrypted
First of all let's examine the order in which instruction fetching and execution is performed:
- Fetch the instruction from
[C]
and decrypt it. - Execute the instruction.
- Encrypt the byte at
[C]
. - Increment
C
.
In the case of the jump instruction, whose execution has the effect of changing the value of the code pointer C
in step 2 above, the memory position that gets encrypted via step 3 is the final value of the code pointer, i.e. the address previous to the target of the jump, instead of the jump instruction itself which remains unencrypted. That's a great advantage as we will see later.
Cycles of length 2
Another weakness that can be exploited is the existence of length-2 cycles in the encryption step. Every possible instruction can be written as a 2-cycle in which the other instruction in the cycle is a NOP, that is, when an instruction is executed it turns into a NOP and when that NOP is executed it turns into the original instruction.
Here's a table of these memory addresses modulo 94 allowing period-2 instructions where the second is a NOP, together with the instruction itself (there are exactly two possible addresses for each of these instructions, so both are listed):
Address (mod 94) | Alternative address (mod 94) | Instruction |
---|---|---|
25 | 29 | <
|
43 | 47 | /
|
59 | 63 | *
|
60 | 64 | j
|
82 | 86 | p
|
The remaining three instructions are irrelevant here: v
is executed at most once per program, i
is not encrypted as explained above, and o
is part of the immutable NOPs mentioned below and the cycle length is not important.
Immutable NOPs
The last weakness that makes it practical to write Malbolge programs is the existence of at least one immutable NOP for every possible memory address. An immutable NOP is an instruction that always behaves as a NOP no matter the number of times it is executed. However very few addresses allow to directly enter immutable NOPs into the code because the range of allowed values in the source code is very limited. (Lou Scheffer notes that such values can always be entered by using a bug in the reference interpreter which allows values greater than 126 to be entered, then rotating with the *
instruction, but other authors consider that making use of that bug is cheating.)
Note the distinction between the o
instruction and any of the other NOPs: the former may be entered directly into the code while the latter can only be obtained by indirect means. Both can be immutable NOPs, though, depending on the address. The following addresses modulo 94 allow to directly enter an immutable NOP into the code by using the o
instruction:
6, 8, 10, 17, 26, 27, 37, 42, 48, 51, 82, 86, 88, 92
In memory addresses other than these the o
(NOP) instructions will mutate into a non-NOP instruction after a variable number of executions. For example, an o
executed at position 0 modulo 94 will become a j
after 29 executions.
The values 70 and 74 form the 2-cycle: an instruction with value 70 becomes 74 when executed and 74 becomes 70 and this is independent of the memory address (what is dependent is the meaning of that value). They are immutable NOPs for most addresses modulo 94. The following table lists all the addresses that are an exception to this rule, together with sample values which are immutable NOPs for them:
Address (mod 94) | Value |
---|---|
7 | 60 |
11 | 58 |
24 | 60 |
25 | 60 |
28 | 60 |
29 | 60 |
43 | 60 |
47 | 58 |
59 | 60 |
60 | 58 |
63 | 60 |
64 | 60 |
82 | 62, 80 |
86 | 60, 76 |
The importance of immutable NOPs can easily be seen by realizing that the data pointer D
is incremented with every instruction executed, just as the code pointer, so in order to position the data pointer as required it's sometimes necessary to advance it by inserting NOP instructions in the code.
Putting all together
We'll see now how to take advantage of these weaknesses to write actual programs.
Program flux
Let's assume we want to execute the following code (as in the cat program):
- Input a character.
- Output the character.
- Jump to step 1.
Easy as it looks, it's not so easy to write in Malbolge. To start with, this must be translated to the following:
- Input a character (at address 43 mod 94)
- Jump to address 25 mod 94.
- Output a character (at address 25 mod 94)
- Jump to address 60 mod 94.
- Reload the data pointer to start again.
- Jump to address 43 mod 94.
Since the instructions are at addresses which form a 2-cycle, in the first execution all instructions will perform their function while in the second they will act as NOPs, then the cycle will start again.
Now it starts to look more promising, doesn't it? Unfortunately that's not the whole story.
Store operation
Crazy as it is, the crazy operator can be very useful. At a minimum it provides a mechanism for storing values into memory, which is necessary for almost any operation except the most basic ones as the above. For example, checking for EOF before printing the character is not possible without using at least a partial version of the store operation, because there's no way to examine the contents of the accumulator without altering it.
It has to be made in several steps. Prior to the store itself the memory address <addr> in which it is performed must be initialized to 1111111111 ternary (29524 decimal). This can be done in this way:
- Set
A
to 1111111111t (29524 decimal). - Op into <addr>.
- Op into <addr>.
After step 3, the memory word at <addr> will contain 1111111111t no matter the initial value it contained. Now <addr> is ready for a value to be stored in it.
The store of the accumulator A
into memory now consists of two steps: first OP the value into a location already containing 1111111111t; (this will swap the 0's and 1's in A
); then OP the resulting value into <addr> (this will unswap the 0's and 1's, leaving the original value stored in both A
and <addr>). Note that this method makes it necessary to perform the above 3 steps into just another memory address. That's part of the fun of writing Malbolge programs.
Load operation
There are two simple(ish) ways of loading a value from memory into A
; a straightforward one is to perform ten rotates, but this is very costly. There's another one using the crazy operator: load A
with 2222222222t (this requires just one rotation), then OP into the address with the value to load, then load A
again with 2222222222t, then OP again into that address. After these operations the accumulator will hold the value originally present in that address.
Not so easy as it looks
Now that we've reached the point where Scheffer stops let's continue from there, pointing out the problems that are encountered on the way. For now we'll assume that the memory can be preloaded at will, even if that's not the case.
Data pointer load operations need fixup
Let's remember the algorithm for writing a cat program which does not stop on EOF:
- Input a character (at address 43 mod 94)
- Jump to address 25 mod 94.
- Output a character (at address 25 mod 94)
- Jump to address 60 mod 94.
- Reload the data pointer to start again.
- Jump to address 43 mod 94.
The first point to note is that step 5 can't be performed every two cycles: it must take effect immediately or otherwise the jump addresses have to be repeated again, meaning an avoidable waste of preinitialized data space. The solution to this problem is to revert the j
instruction at step 5 just after it's executed. The way of doing this is to re-execute or "pseudo-execute" it (a pseudo-execution means to jump to the instruction immediately following it, so that the encryption step is performed on it anyway). We add a step 5b executed after step 5:
- 5b. Encrypt the
j
instruction of step 5.
This will complete the simplified version of the cat program. The design involves working with both the data and the code section at the same time, keeping track especially of the value of the D
register, as follows (the choice of 97 as the start of the data area is arbitrary):
Data:
Address | Content |
---|---|
97 | 60 |
98 | 42 |
99 | (unused) |
100 | 24 |
101 | (unused) |
102 | 59 |
103 | 96 |
Code:
Step | C register
|
D register
|
Content of [D]
|
Ins. | C after
|
D after
|
---|---|---|---|---|---|---|
1 | 43 | 99 | - | /
|
- | - |
2 | 44 | 100 | 24 | i
|
24 | - |
3 | 25 | 101 | - | <
|
- | - |
4 | 26 | 102 | 59 | i
|
59 | - |
5 | 60 | 103 | 96 | j
|
- | 96 |
5b | 61 | 97 | 60 | i
|
60 | - |
6 | 61 | 98 | 42 | i
|
42 | - |
(Remember that both C
and D
are incremented between steps, and that the final value of C
prior to the increment is the position that gets encrypted.)
This will loop forever; every two iterations it will input a byte and output it, and execute two NOPs at the next iteration. It does not stop on EOF; instead it outputs 2222222222t mod 256 = 168 repeatedly.
Now it should be apparent that it's possible to achieve execution of arbitrary code by means of jumping to the places where the instructions are to be executed, but there are still a few problems that need to be addressed: the overhead due to data manipulation and the initialization of memory with the desired values.
Initialization of memory
We've assumed that the memory can be preloaded at will, even if that's not the case. Why? Because it's possible (in principle) to create an initialization routine that is executed only once, with no loops or repetitions, so that there's no need to care about self-encryption. That routine's mission is to set up the memory just as needed. The problem is that the routine can be very expensive in terms of number of instructions used: many instructions may be needed for one single memory word to reach the desired value. Fortunately it's much easier in Malbolge to write execute-once code than reusable code.
Overhead due to data manipulation
With the programming method exposed above, when a memory word at address <addr> needs to be manipulated (e.g. for a store operation), for each instruction that acts on it there's a need of:
- A data pointer that positions the data register before it via a
j
instruction (it can be before or after <addr>). - A program pointer to fix up that
j
instruction. - A program pointer that jumps to the instruction that manipulates <addr>.
- A program pointer that jumps to the next instruction, which is probably a
j
if more operations on <addr> are needed. - A certain number of immutable NOPs around the instruction that make the data register skip other words used to manipulate the value at <addr> (if it's the only instruction manipulating <addr>, then these are not necessary).
As we've explained things, it's not possible to reuse an instruction which already has the desired number of immutable NOPs around it in the same turn of the main loop, because the second time it's executed it acts as a NOP. This means that either of these two approaches has to be taken: either the instructions have to be allocated (in different addresses which have the same value mod 94) or they have to be fixed up on the fly in the same manner as the j
instruction was. The second approach is by far the most desirable one because it means just one extra program pointer per instruction, while the first one implies initialization of lots of immutable NOPs and allocation of many extra instructions. So we add one extra word of overhead to the ones already stated:
- A program pointer to fix up the data manipulation instruction for reuse.
Now it may be more apparent why the approach of making ten rotate operations for one load was not the preferred one, as that would mean ten times the overhead explained above plus the initialization of it all.
Example: How to store the accumulator into memory (long)
Here's an example of a subprogram that stores the accumulator into a memory position, for the purpose of illustrating how the influence of the overhead mentioned above applies to Malbolge programs. Please note that the code is untested and there might be bugs; it's more an example than anything else. It's possible to reduce the total size, but not much.
The code pointers use labels while the data addresses are expressed numerically. Address 200 is chosen as the target of the store/load operation and address 300 is chosen as the ancillary address that has to be filled with 1111111111t as part of the store operation. All instructions except i
are assumed to be length-2 cycles. The labels for code pointers represent the address of the memory word prior to the address where the label points to. The labels starting with "R_" are the restoring addresses of the corresponding instructions. Thus, code pointers with this prefix directly point to the command's address instead of the memory word before the command. In general, special care must be taken to ensure that the address prior to the target of a jump always contains a character in the printable ASCII range (33-126) or the reference interpreter may crash.
Execution starts with data register pointing to address 400.
Code:
Label | Command |
---|---|
OPR | p |
i |
Label | Command |
---|---|
ROT | * |
i |
Label | Command |
---|---|
MOVD | j |
i |
Label | Command |
---|---|
LOOP | j |
i |
Label | Command |
---|---|
RETURN_FLAG_1 | j |
i |
Label | Command |
---|---|
RETURN_FLAG_2 | j |
i |
Data starting at address 199 (target variable):
Address | Content | Comment |
---|---|---|
199 | OPR | Access target variable |
200 | (initial value is irrelevant) | Target varible |
201 | R_OPR | Restore OPR |
202 | R_MOVD | Restore MOVD that has been used to jump to address 199 |
203 | RETURN_FLAG_1 | Test first return flag (the j command is intact iff the flag is set)
|
204 | 412 | Return to address 412 if return flag 1 was set |
205 | R_RETURN_FLAG_1 | Destroy return flag 1 |
206 | RETURN_FLAG_2 | Test second return flag |
207 | 505 | Return to address 505 if return flag 1 was set |
208 | R_RETURN_FLAG_2 | Destroy return flag 2 (only necessary if further return flags follow) |
Data starting at address 299 (temporary variable):
Address | Content |
---|---|
299 | OPR |
300 | (initial value is irrelevant) |
301 | R_OPR |
302 | R_MOVD |
303 | RETURN_FLAG_1 |
304 | 407 |
305 | R_RETURN_FLAG_1 |
306 | RETURN_FLAG_2 |
307 | 502 |
308 | R_RETURN_FLAG_2 |
Data starting at address 400 (initialization):
Address | Content | Comment |
---|---|---|
400 | R_RETURN_FLAG_1 | Unset flags (destroy j command)
|
401 | R_RETURN_FLAG_2 | |
402 | ROT | Load 1111111111t |
403 | 1111111111t | |
404 | R_ROT | |
405 | R_RETURN_FLAG_1 | Store D register by setting return flag 1 |
406 | MOVD | |
407 | 298 | Write A register into tmp |
408 | LOOP | Write A register into tmp again |
409 | 404 | |
410 | R_RETURN_FLAG_1 | Store D register by setting return flag 1 |
411 | MOVD | |
412 | 198 | Write A register into val |
413 | LOOP | Write A register into tmp again |
414 | 409 | |
415 | (fetch value that should be stored) |
Data starting at address 500 (storing the value):
Address | Content | Comment |
---|---|---|
500 | R_RETURN_FLAG_2 | Store D register by setting return flag 2 |
501 | MOVD | |
502 | 298 | Write A register into tmp |
503 | R_RETURN_FLAG_2 | Store D register by setting return flag 2 |
504 | MOVD | |
505 | 198 | Write A register into val |
506 | (continuation of program) |
Trace:
Label | D register | Content of [D] | Ins. | Comment |
... | 400 | R_RETURN_FLAG_1 | i
|
Entry |
R_RETURN_FLAG_1 | 401 | R_RETURN_FLAG_2 | i
|
Initialize return flags |
R_RETURN_FLAG_2 | 402 | ROT | i
|
|
ROT | 403 | 1111111111t | *
|
Load A with all 1's |
R_ROT | 404 | R_ROT | i
|
|
R_ROT | 405 | R_RETURN_FLAG1 | i
|
|
R_RETURN_FLAG1 | 406 | MOVD | i
|
Save D register |
MOVD | 407 | 298 | j
|
|
R_MOVD | 299 | OPR | i
|
|
OPR | 300 | ??????????t | p
|
Opr all 1's into [300] |
R_OPR | 301 | R_OPR | i
|
|
R_OPR | 302 | R_MOVD | i
|
|
R_MOVD | 303 | RETURN_FLAG_1 | i
|
|
RETURN_FLAG_1 | 304 | 407 | j
|
Restore D register |
R_RETURN_FLAG_1 | 408 | LOOP | i
|
|
LOOP | 409 | 404 | j
|
Test loop condition -> opr A into [300] again |
R_LOOP | 405 | R_RETURN_FLAG1 | i
|
|
R_RETURN_FLAG1 | 406 | MOVD | i
|
Save D register |
MOVD | 407 | 298 | j
|
|
R_MOVD | 299 | OPR | i
|
|
OPR | 300 | ??????????t | p
|
Opr into [300], resulting in [300] containing all 1's |
R_OPR | 301 | R_OPR | i
|
|
R_OPR | 302 | R_MOVD | i
|
|
R_MOVD | 303 | RETURN_FLAG_1 | i
|
|
RETURN_FLAG_1 | 304 | 407 | j
|
Restore D register |
R_RETURN_FLAG_1 | 408 | LOOP | i
|
|
LOOP | 409 | 404 | o
|
Test loop condition -> exit loop |
R_LOOP | 410 | R_RETURN_FLAG_1 | i
|
|
R_RETURN_FLAG1 | 411 | MOVD | i
|
|
MOVD | 412 | 198 | j
|
|
R_MOVD | 199 | OPR | i
|
|
OPR | 200 | ??????????t | p
|
Opr all 1's into [200] |
R_OPR | 201 | R_OPR | i
|
|
R_OPR | 202 | R_MOVD | i
|
|
R_MOVD | 203 | RETURN_FLAG_1 | i
|
|
RETURN_FLAG_1 | 204 | 412 | j
|
Restore D register |
R_RETURN_FLAG_1 | 413 | LOOP | i
|
|
LOOP | 414 | 409 | j
|
Test loop condition -> opr A into [200] again |
R_LOOP | 410 | R_RETURN_FLAG1 | i
|
|
R_RETURN_FLAG1 | 411 | MOVD | i
|
Save D register |
MOVD | 412 | 198 | j
|
|
R_MOVD | 199 | OPR | i
|
|
OPR | 200 | ??????????t | p
|
Opr into [200], resulting in [200] containing all 1's |
R_OPR | 201 | R_OPR | i
|
|
R_OPR | 202 | R_MOVD | i
|
|
R_MOVD | 203 | RETURN_FLAG_1 | i
|
|
RETURN_FLAG_1 | 204 | 412 | j
|
Restore D register |
R_RETURN_FLAG_1 | 413 | LOOP | i
|
|
LOOP | 414 | 409 | o
|
Test loop condition -> exit loop |
R_LOOP | 415 | ... | i
|
Do something that loads the accumulator with the value to store. |
... | 500 | R_RETURN_FLAG_2 | i
|
|
R_RETURN_FLAG_2 | 501 | MOVD | i
|
Save D register |
MOVD | 502 | 298 | j
|
|
R_MOVD | 299 | OPR | i
|
|
OPR | 300 | 1111111111t | p
|
Opr A into [300] |
R_OPR | 301 | R_OPR | i
|
|
R_OPR | 302 | R_MOVD | i
|
|
R_MOVD | 303 | RETURN_FLAG_1 | i
|
|
RETURN_FLAG_1 | 304 | 407 | o
|
Return flag 1 was not set |
R_RETURN_FLAG_1 | 305 | R_RETURN_FLAG_1 | i
|
|
R_RETURN_FLAG_1 | 306 | RETURN_FLAG_2 | i
|
Unset return flag 1 again |
RETURN_FLAG_2 | 307 | 502 | j
|
Restore D register |
R_RETURN_FLAG_2 | 503 | R_RETURN_FLAG_2 | i
|
|
R_RETURN_FLAG_2 | 504 | MOVD | i
|
Save D register |
MOVD | 505 | 198 | j
|
|
R_MOVD | 199 | OPR | i
|
|
OPR | 200 | 1111111111t | p
|
Opr A into [200] |
R_OPR | 201 | R_OPR | i
|
|
R_OPR | 202 | R_MOVD | i
|
|
R_MOVD | 203 | RETURN_FLAG_1 | i
|
|
RETURN_FLAG_1 | 204 | 412 | o
|
Return flag 1 was not set |
R_RETURN_FLAG_1 | 205 | R_RETURN_FLAG_1 | i
|
|
R_RETURN_FLAG_1 | 206 | RETURN_FLAG_2 | i
|
Unset return flag 1 again |
RETURN_FLAG_2 | 207 | 505 | j
|
Restore D register |
R_RETURN_FLAG_2 | 506 | ... | i
|
Go on with program logic |
Reducing the number of immutable NOPs
Instead of controlling the program flow with the code register, which means that all commands are written in their correct order filled with immutable NOPs, it is possible to control the program flow using the data register. This technique has been developed by Hisashi Iizawa et al. and has been used in their 99 bottles of beer program.
Using this technique every instruction in Malbolge (except Jmp) will only occur once in the whole program at a position where the encryption function has a 2-cycle, followed by a Jmp instruction (there may be some exceptions, but this is the basic idea). Thus we know that after the execution of every instruction a Jmp instruction will follow.
Example: Loading 0t1111111111 into the A register
Code pointer to rotate instruction. |
0t1111111111 |
Code pointer to rotate instruction + 1 (restore) |
The next data word will be a pointer to another Malbolge command, e.g. the crazy instruction, followed by the cell used by it.
When we control the program flow with the D register, branches can be executed by MoveD. If we use a MoveD instruction with a 2-cycle of the encryption function without a restore operation, we can build a loop that will be executed two times.
Using this technique it's possible to save many immutable NOPs, so the number of cells that have to be initialized decreases dramatically. Simple programs in Malbolge still need many memory cells. However, with this technique programs like a cat that halts on EOF are possible.
Programming with memory addresses is very confusing, and it may be time-consuming to manually write code in Malbolge to initialize the memory cells that are needed. To avoid this one can use a Malbolge assembler. Such an assembler can be found in the external resources: LMAO (Low-level Malbolge Assembler, Ooh!). LMAO comes with some examples written in the Malbolge assembler language HeLL (Hellish Low-level Language), so the user can understand the way HeLL works. One of the examples is a cat program that halts on EOF.
Introducing assembly language for Malbolge
Let's consider the example above and introduce labels for data pointers.
Variable val (offset 199):
Label | Value |
---|---|
opr_val | OPR |
val | (initial value is irrelevant) |
R_OPR | |
R_MOVD | |
FLAG1 | |
val_back1 | |
R_RETURN_FLAG_1 | |
FLAG2 | |
val_back2 | |
R_RETURN_FLAG_2 |
Variable tmp (offset 299):
Label | Value |
---|---|
opr_tmp | OPR |
tmp | (initial value is irrelevant) |
R_OPR | |
R_MOVD | |
FLAG1 | |
tmp_back1 | |
R_RETURN_FLAG_1 | |
FLAG2 | |
tmp_back2 | |
R_RETURN_FLAG_2 |
Initialization (offset 400):
Label | Value |
---|---|
ENTRY | ROT |
1111111111t | |
R_ROT | |
clear_tmp | R_RETURN_FLAG_1 |
MOVD | |
opr_tmp | |
tmp_back1 | LOOP |
clear_tmp | |
clear_val | R_RETURN_FLAG_1 |
MOVD | |
opr_val | |
val_back1 | LOOP |
clear_val | |
(fetch value that should be stored) |
Write value (offset 500):
Label | Value |
---|---|
write | R_RETURN_FLAG_2 |
MOVD | |
opr_tmp | |
tmp_back2 | R_RETURN_FLAG_2 |
MOVD | |
opr_val | |
val_back2 | (continuation of program) |
We just introduced a Malbolge assembly language and got rid of absolute positions for code or data sections. Thus, the code above can be extended easily. A Malbolge program can be created from the assembly program above as follows. We must choose positions of code and data sections, replace labels by the address of the memory word prior to the address where the label points to, and write use-once Malbolge code to initialize everything. However, one may use a Malbolge assembler in order to automatize this process.