Malbolge programming

The Malbolge language has eluded many attempts to use it for some years. That was before Lou Scheffer published his cryptanalysis of the encryption algorithm it uses. This article walks the way he opened, explaining many of his findings in detail and other caveats that must be taken into account not mentioned by him.

Weaknesses

Malbolge's probably biggest piece of evilness comes from the encryption of instructions as they are executed. That encryption, fortunately, has a few weaknesses that make writing Malbolge programs feasible.

Jumps are not encrypted

First of all let's examine the order in which instruction fetching and execution is performed:

Fetch the instruction from [C] and decrypt it.
Execute the instruction.
Encrypt the byte at [C].
Increment C.

In the case of the jump instruction, whose execution has the effect of changing the value of the code pointer C in step 2 above, the memory position that gets encrypted via step 3 is the final value of the code pointer, i.e. the address previous to the target of the jump, instead of the jump instruction itself which remains unencrypted. That's a great advantage as we will see later.

Cycles of length 2

Another weakness that can be exploited is the existence of length-2 cycles in the encryption step. Every possible instruction can be written as a 2-cycle in which the other instruction in the cycle is a NOP, that is, when an instruction is executed it turns into a NOP and when that NOP is executed it turns into the original instruction.

Here's a table of these memory addresses modulo 94 allowing period-2 instructions where the second is a NOP, together with the instruction itself (there are exactly two possible addresses for each of these instructions, so both are listed):

Address (mod 94)	Alternative address (mod 94)	Instruction
25	29	`<`
43	47	`/`
59	63	*``**
60	64	`j`
82	86	`p`

The remaining three instructions are irrelevant here: v is executed at most once per program, i is not encrypted as explained above, and o is part of the immutable NOPs mentioned below and the cycle length is not important.

Immutable NOPs

The last weakness that makes it practical to write Malbolge programs is the existence of at least one immutable NOP for every possible memory address. An immutable NOP is an instruction that always behaves as a NOP no matter the number of times it is executed. However very few addresses allow to directly enter immutable NOPs into the code because the range of allowed values in the source code is very limited. (Lou Scheffer notes that such values can always be entered by using a bug in the reference interpreter which allows values greater than 126 to be entered, then rotating with the * instruction, but other authors consider that making use of that bug is cheating.)

Note the distinction between the o instruction and any of the other NOPs: the former may be entered directly into the code while the latter can only be obtained by indirect means. Both can be immutable NOPs, though, depending on the address. The following addresses modulo 94 allow to directly enter an immutable NOP into the code by using the o instruction:

6, 8, 10, 17, 26, 27, 37, 42, 48, 51, 82, 86, 88, 92

In memory addresses other than these the o (NOP) instructions will mutate into a non-NOP instruction after a variable number of executions. For example, an o executed at position 0 modulo 94 will become a j after 29 executions.

The values 70 and 74 form the 2-cycle: an instruction with value 70 becomes 74 when executed and 74 becomes 70 and this is independent of the memory address (what is dependent is the meaning of that value). They are immutable NOPs for most addresses modulo 94. The following table lists all the addresses that are an exception to this rule, together with sample values which are immutable NOPs for them:

Address (mod 94)	Value
7	60
11	58
24	60
25	60
28	60
29	60
43	60
47	58
59	60
60	58
63	60
64	60
82	62, 80
86	60, 76

The importance of immutable NOPs can easily be seen by realizing that the data pointer D is incremented with every instruction executed, just as the code pointer, so in order to position the data pointer as required it's sometimes necessary to advance it by inserting NOP instructions in the code.

Putting all together

We'll see now how to take advantage of these weaknesses to write actual programs.

Program flux

Let's assume we want to execute the following code (as in the cat program):

Input a character.
Output the character.
Jump to step 1.

Easy as it looks, it's not so easy to write in Malbolge. To start with, this must be translated to the following:

Input a character (at address 43 mod 94)
Jump to address 25 mod 94.
Output a character (at address 25 mod 94)
Jump to address 60 mod 94.
Reload the data pointer to start again.
Jump to address 43 mod 94.

Since the instructions are at addresses which form a 2-cycle, in the first execution all instructions will perform their function while in the second they will act as NOPs, then the cycle will start again.

Now it starts to look more promising, doesn't it? Unfortunately that's not the whole story.

Store operation

Crazy as it is, the crazy operator can be very useful. At a minimum it provides a mechanism for storing values into memory, which is necessary for almost any operation except the most basic ones as the above. For example, checking for EOF before printing the character is not possible without using at least a partial version of the store operation, because there's no way to examine the contents of the accumulator without altering it.

It has to be made in several steps. Prior to the store itself the memory address <addr> in which it is performed must be initialized to 1111111111 ternary (29524 decimal). This can be done in this way:

Set A to 1111111111t (29524 decimal).
Op into <addr>.
Op into <addr>.

After step 3, the memory word at <addr> will contain 1111111111t no matter the initial value it contained. Now <addr> is ready for a value to be stored in it.

The store of the accumulator A into memory now consists of two steps: first OP the value into a location already containing 1111111111t; (this will swap the 0's and 1's in A); then OP the resulting value into <addr> (this will unswap the 0's and 1's, leaving the original value stored in both A and <addr>). Note that this method makes it necessary to perform the above 3 steps into just another memory address. That's part of the fun of writing Malbolge programs.

Load operation

There are two simple(ish) ways of loading a value from memory into A; a straightforward one is to perform ten rotates, but this is very costly. There's another one using the crazy operator: load A with 2222222222t (this requires just one rotation), then OP into the address with the value to load, then load A again with 2222222222t, then OP again into that address. After these operations the accumulator will hold the value originally present in that address.

Not so easy as it looks

Now that we've reached the point where Scheffer stops let's continue from there, pointing out the problems that are encountered on the way. For now we'll assume that the memory can be preloaded at will, even if that's not the case.

Data pointer load operations need fixup

Let's remember the algorithm for writing a cat program which does not stop on EOF:

Input a character (at address 43 mod 94)
Jump to address 25 mod 94.
Output a character (at address 25 mod 94)
Jump to address 60 mod 94.
Reload the data pointer to start again.
Jump to address 43 mod 94.

The first point to note is that step 5 can't be performed every two cycles: it must take effect immediately or otherwise the jump addresses have to be repeated again, meaning an avoidable waste of preinitialized data space. The solution to this problem is to revert the j instruction at step 5 just after it's executed. The way of doing this is to re-execute or "pseudo-execute" it (a pseudo-execution means to jump to the instruction immediately following it, so that the encryption step is performed on it anyway). We add a step 5b executed after step 5:

5b. Encrypt the j instruction of step 5.

This will complete the simplified version of the cat program. The design involves working with both the data and the code section at the same time, keeping track especially of the value of the D register, as follows (the choice of 97 as the start of the data area is arbitrary):

Data:

Address	Content
97	60
98	42
99	(unused)
100	24
101	(unused)
102	59
103	96

Code:

Step	`C` register	`D` register	Content of `[D]`	Ins.	`C` after	`D` after
1	43	99	-	`/`	-	-
2	44	100	24	`i`	24	-
3	25	101	-	`<`	-	-
4	26	102	59	`i`	59	-
5	60	103	96	`j`	-	96
5b	61	97	60	`i`	60	-
6	61	98	42	`i`	42	-

(Remember that both C and D are incremented between steps, and that the final value of C prior to the increment is the position that gets encrypted.)

This will loop forever; every two iterations it will input a byte and output it, and execute two NOPs at the next iteration. It does not stop on EOF; instead it outputs 2222222222t mod 256 = 168 repeatedly.

Now it should be apparent that it's possible to achieve execution of arbitrary code by means of jumping to the places where the instructions are to be executed, but there are still a few problems that need to be addressed: the overhead due to data manipulation and the initialization of memory with the desired values.

Initialization of memory

We've assumed that the memory can be preloaded at will, even if that's not the case. Why? Because it's possible (in principle) to create an initialization routine that is executed only once, with no loops or repetitions, so that there's no need to care about self-encryption. That routine's mission is to set up the memory just as needed. The problem is that the routine can be very expensive in terms of number of instructions used: many instructions may be needed for one single memory word to reach the desired value. Fortunately it's much easier in Malbolge to write execute-once code than reusable code.

Overhead due to data manipulation

With the programming method exposed above, when a memory word at address <addr> needs to be manipulated (e.g. for a store operation), for each instruction that acts on it there's a need of:

A data pointer that positions the data register before it via a j instruction (it can be before or after <addr>).
A program pointer to fix up that j instruction.
A program pointer that jumps to the instruction that manipulates <addr>.
A program pointer that jumps to the next instruction, which is probably a j if more operations on <addr> are needed.
A certain number of immutable NOPs around the instruction that make the data register skip other words used to manipulate the value at <addr> (if it's the only instruction manipulating <addr>, then these are not necessary).

As we've explained things, it's not possible to reuse an instruction which already has the desired number of immutable NOPs around it in the same turn of the main loop, because the second time it's executed it acts as a NOP. This means that either of these two approaches has to be taken: either the instructions have to be allocated (in different addresses which have the same value mod 94) or they have to be fixed up on the fly in the same manner as the j instruction was. The second approach is by far the most desirable one because it means just one extra program pointer per instruction, while the first one implies initialization of lots of immutable NOPs and allocation of many extra instructions. So we add one extra word of overhead to the ones already stated:

A program pointer to fix up the data manipulation instruction for reuse.

Now it may be more apparent why the approach of making ten rotate operations for one load was not the preferred one, as that would mean ten times the overhead explained above plus the initialization of it all.

Example: How to store the accumulator into memory (long)

Here's an example of a subprogram that stores the accumulator into a memory position, for the purpose of illustrating how the influence of the overhead mentioned above applies to Malbolge programs. Please note that the code is untested and there might be bugs; it's more an example than anything else. It's possible to reduce the total size, but not much. The code pointers use labels while the data addresses are expressed numerically. Address 200 is chosen as the target of the store/load operation and address 300 is chosen as the ancillary address that has to be filled with 1111111111t as part of the store operation. All instructions except i are assumed to be length-2 cycles. The labels for code pointers represent the address of the memory word prior to the address where the label points to. The labels starting with "R_" are the restoring addresses of the corresponding instructions. Thus, code pointers with this prefix directly point to the command's address instead of the memory word before the command. In general, special care must be taken to ensure that the address prior to the target of a jump always contains a character in the printable ASCII range (33-126) or the reference interpreter may crash. Execution starts with data register pointing to address 400.

Code:

Label	Command
OPR	p
	i

Label	Command
ROT	*
	i

Label	Command
MOVD	j
	i

Label	Command
LOOP	j
	i

Label	Command
RETURN_FLAG_1	j
	i

Label	Command
RETURN_FLAG_2	j
	i

Data starting at address 199 (target variable):

Address	Content	Comment
199	OPR	Access target variable
200	(initial value is irrelevant)	Target varible
201	R_OPR	Restore OPR
202	R_MOVD	Restore MOVD that has been used to jump to address 199
203	RETURN_FLAG_1	Test first return flag (the `j` command is intact iff the flag is set)
204	412	Return to address 412 if return flag 1 was set
205	R_RETURN_FLAG_1	Destroy return flag 1
206	RETURN_FLAG_2	Test second return flag
207	505	Return to address 505 if return flag 1 was set
208	R_RETURN_FLAG_2	Destroy return flag 2 (only necessary if further return flags follow)

Data starting at address 299 (temporary variable):

Address	Content
299	OPR
300	(initial value is irrelevant)
301	R_OPR
302	R_MOVD
303	RETURN_FLAG_1
304	407
305	R_RETURN_FLAG_1
306	RETURN_FLAG_2
307	502
308	R_RETURN_FLAG_2

Data starting at address 400 (initialization):

Address	Content	Comment
400	R_RETURN_FLAG_1	Unset flags (destroy `j` command)
401	R_RETURN_FLAG_2
402	ROT	Load 1111111111t
403	1111111111t
404	R_ROT
405	R_RETURN_FLAG_1	Store D register by setting return flag 1
406	MOVD
407	298	Write A register into tmp
408	LOOP	Write A register into tmp again
409	404
410	R_RETURN_FLAG_1	Store D register by setting return flag 1
411	MOVD
412	198	Write A register into val
413	LOOP	Write A register into tmp again
414	409
415	(fetch value that should be stored)

Data starting at address 500 (storing the value):

Address	Content	Comment
500	R_RETURN_FLAG_2	Store D register by setting return flag 2
501	MOVD
502	298	Write A register into tmp
503	R_RETURN_FLAG_2	Store D register by setting return flag 2
504	MOVD
505	198	Write A register into val
506	(continuation of program)

Trace:

Label	D register	Content of [D]	Ins.	Comment
...	400	R_RETURN_FLAG_1	`i`	Entry
R_RETURN_FLAG_1	401	R_RETURN_FLAG_2	`i`	Initialize return flags
R_RETURN_FLAG_2	402	ROT	`i`
ROT	403	1111111111t	*``**	Load A with all 1's
R_ROT	404	R_ROT	`i`
R_ROT	405	R_RETURN_FLAG1	`i`
R_RETURN_FLAG1	406	MOVD	`i`	Save D register
MOVD	407	298	`j`
R_MOVD	299	OPR	`i`
OPR	300	??????????t	`p`	Opr all 1's into [300]
R_OPR	301	R_OPR	`i`
R_OPR	302	R_MOVD	`i`
R_MOVD	303	RETURN_FLAG_1	`i`
RETURN_FLAG_1	304	407	`j`	Restore D register
R_RETURN_FLAG_1	408	LOOP	`i`
LOOP	409	404	`j`	Test loop condition -> opr A into [300] again
R_LOOP	405	R_RETURN_FLAG1	`i`
R_RETURN_FLAG1	406	MOVD	`i`	Save D register
MOVD	407	298	`j`
R_MOVD	299	OPR	`i`
OPR	300	??????????t	`p`	Opr into [300], resulting in [300] containing all 1's
R_OPR	301	R_OPR	`i`
R_OPR	302	R_MOVD	`i`
R_MOVD	303	RETURN_FLAG_1	`i`
RETURN_FLAG_1	304	407	`j`	Restore D register
R_RETURN_FLAG_1	408	LOOP	`i`
LOOP	409	404	`o`	Test loop condition -> exit loop
R_LOOP	410	R_RETURN_FLAG_1	`i`
R_RETURN_FLAG1	411	MOVD	`i`
MOVD	412	198	`j`
R_MOVD	199	OPR	`i`
OPR	200	??????????t	`p`	Opr all 1's into [200]
R_OPR	201	R_OPR	`i`
R_OPR	202	R_MOVD	`i`
R_MOVD	203	RETURN_FLAG_1	`i`
RETURN_FLAG_1	204	412	`j`	Restore D register
R_RETURN_FLAG_1	413	LOOP	`i`
LOOP	414	409	`j`	Test loop condition -> opr A into [200] again
R_LOOP	410	R_RETURN_FLAG1	`i`
R_RETURN_FLAG1	411	MOVD	`i`	Save D register
MOVD	412	198	`j`
R_MOVD	199	OPR	`i`
OPR	200	??????????t	`p`	Opr into [200], resulting in [200] containing all 1's
R_OPR	201	R_OPR	`i`
R_OPR	202	R_MOVD	`i`
R_MOVD	203	RETURN_FLAG_1	`i`
RETURN_FLAG_1	204	412	`j`	Restore D register
R_RETURN_FLAG_1	413	LOOP	`i`
LOOP	414	409	`o`	Test loop condition -> exit loop
R_LOOP	415	...	`i`	Do something that loads the accumulator with the value to store.
...	500	R_RETURN_FLAG_2	`i`
R_RETURN_FLAG_2	501	MOVD	`i`	Save D register
MOVD	502	298	`j`
R_MOVD	299	OPR	`i`
OPR	300	1111111111t	`p`	Opr A into [300]
R_OPR	301	R_OPR	`i`
R_OPR	302	R_MOVD	`i`
R_MOVD	303	RETURN_FLAG_1	`i`
RETURN_FLAG_1	304	407	`o`	Return flag 1 was not set
R_RETURN_FLAG_1	305	R_RETURN_FLAG_1	`i`
R_RETURN_FLAG_1	306	RETURN_FLAG_2	`i`	Unset return flag 1 again
RETURN_FLAG_2	307	502	`j`	Restore D register
R_RETURN_FLAG_2	503	R_RETURN_FLAG_2	`i`
R_RETURN_FLAG_2	504	MOVD	`i`	Save D register
MOVD	505	198	`j`
R_MOVD	199	OPR	`i`
OPR	200	1111111111t	`p`	Opr A into [200]
R_OPR	201	R_OPR	`i`
R_OPR	202	R_MOVD	`i`
R_MOVD	203	RETURN_FLAG_1	`i`
RETURN_FLAG_1	204	412	`o`	Return flag 1 was not set
R_RETURN_FLAG_1	205	R_RETURN_FLAG_1	`i`
R_RETURN_FLAG_1	206	RETURN_FLAG_2	`i`	Unset return flag 1 again
RETURN_FLAG_2	207	505	`j`	Restore D register
R_RETURN_FLAG_2	506	...	`i`	Go on with program logic

Reducing the number of immutable NOPs

Instead of controlling the program flow with the code register, which means that all commands are written in their correct order filled with immutable NOPs, it is possible to control the program flow using the data register. This technique has been developed by Hisashi Iizawa et al. and has been used in their 99 bottles of beer program.

Using this technique every instruction in Malbolge (except Jmp) will only occur once in the whole program at a position where the encryption function has a 2-cycle, followed by a Jmp instruction (there may be some exceptions, but this is the basic idea). Thus we know that after the execution of every instruction a Jmp instruction will follow.

Example: Loading 0t1111111111 into the A register

Code pointer to rotate instruction.
0t1111111111
Code pointer to rotate instruction + 1 (restore)

The next data word will be a pointer to another Malbolge command, e.g. the crazy instruction, followed by the cell used by it.

When we control the program flow with the D register, branches can be executed by MoveD. If we use a MoveD instruction with a 2-cycle of the encryption function without a restore operation, we can build a loop that will be executed two times.

Using this technique it's possible to save many immutable NOPs, so the number of cells that have to be initialized decreases dramatically. Simple programs in Malbolge still need many memory cells. However, with this technique programs like a cat that halts on EOF are possible.

Programming with memory addresses is very confusing, and it may be time-consuming to manually write code in Malbolge to initialize the memory cells that are needed. To avoid this one can use a Malbolge assembler. Such an assembler can be found in the external resources: LMAO (Low-level Malbolge Assembler, Ooh!). LMAO comes with some examples written in the Malbolge assembler language HeLL (Hellish Low-level Language), so the user can understand the way HeLL works. One of the examples is a cat program that halts on EOF.

Introducing assembly language for Malbolge

Let's consider the example above and introduce labels for data pointers.

Variable val (offset 199):

Label	Value
opr_val	OPR
val	(initial value is irrelevant)
	R_OPR
	R_MOVD
	FLAG1
	val_back1
	R_RETURN_FLAG_1
	FLAG2
	val_back2
	R_RETURN_FLAG_2

Variable tmp (offset 299):

Label	Value
opr_tmp	OPR
tmp	(initial value is irrelevant)
	R_OPR
	R_MOVD
	FLAG1
	tmp_back1
	R_RETURN_FLAG_1
	FLAG2
	tmp_back2
	R_RETURN_FLAG_2

Initialization (offset 400):

Label	Value
ENTRY	ROT
	1111111111t
	R_ROT
clear_tmp	R_RETURN_FLAG_1
	MOVD
	opr_tmp
tmp_back1	LOOP
	clear_tmp
clear_val	R_RETURN_FLAG_1
	MOVD
	opr_val
val_back1	LOOP
	clear_val
	(fetch value that should be stored)

Write value (offset 500):

Label	Value
write	R_RETURN_FLAG_2
	MOVD
	opr_tmp
tmp_back2	R_RETURN_FLAG_2
	MOVD
	opr_val
val_back2	(continuation of program)

We just introduced a Malbolge assembly language and got rid of absolute positions for code or data sections. Thus, the code above can be extended easily. A Malbolge program can be created from the assembly program above as follows. We must choose positions of code and data sections, replace labels by the address of the memory word prior to the address where the label points to, and write use-once Malbolge code to initialize everything. However, one may use a Malbolge assembler in order to automatize this process.

External resources

Malbolge programming

Contents

Weaknesses

Jumps are not encrypted

Cycles of length 2

Immutable NOPs

Putting all together

Program flux

Store operation

Load operation

Not so easy as it looks

Data pointer load operations need fixup

Initialization of memory

Overhead due to data manipulation

Example: How to store the accumulator into memory (long)

Reducing the number of immutable NOPs

Introducing assembly language for Malbolge

External resources

Navigation menu

Malbolge programming

Weaknesses

Jumps are not encrypted

Cycles of length 2

Immutable NOPs

Putting all together

Program flux

Store operation

Load operation

Not so easy as it looks

Data pointer load operations need fixup

Initialization of memory

Overhead due to data manipulation

Example: How to store the accumulator into memory (long)

Reducing the number of immutable NOPs

Introducing assembly language for Malbolge

External resources

Navigation menu

Search