User:The-Ennemy/asm2bf-tutorial
About this tutorial
I started writing this tutorial to help out the amazing User:Palaiologos with helping users get into the language, as her work has been a source of much inspiration and fun for me. This tutorial is meant to be a how-to guide with specific problems and solutions, introducing asm2bf
features and components slowly over time as the code grows in complexity and functionality. It starts off very simple, with printing a single letter to the screen, and culminates in a complex codebase with procedures and argument passing.
This tutorial is not meant to be a definitive guide to asm2bf
; for that you should always refer to the author's documentation and manual, or ask her yourself.
About Brainfuck
About asm2bf
Installing and "Hello World!"
Installing asm2bf
is fairly simple. You'll always find the latest instructions on the Github repo of the project. Palaiologos says:
- 1. Pull the repository from github:
`git clone https://github.com/kspalaiologos/asmbf'
- 2. Run the configure script and build asm2bf:
`./configure && sudo make all install`
- ...
- To _configure_ this project on Windows, you need either Cygwin or MinGW (msys). The build should look
exactly the same like the Linux one. - Note that building the project on Windows and targets other than `release' and `auto' is unsupported.
This should install asm2bf
on your computer. You can now run it through the command line by writing bfmake
, which calls the entire toolchain in order. If this command fails here, it means some part of the installation didn't succeed. If, on the other hand, it doesn't give you an error and silently goes to a new line, you did it! You now have the asm2bf
toolchain on your device.
- Note: since this software is in active development, be sure to check the github repo from time to time and update it every now and then. When you encounter a bug, make sure you're running an up-to-date version, or as close as you can get it, because some of your bugs may already have been fixed.
It's fairly easy to use bfmake
, you just pass it the name of the file you want it to assemble and it outputs a file with the same name and the .b
file extension. An example call bfmake ./t7
creates the output file t7.b
. The outputted file is what passes for machine code for asm2bf
: it's a Brainfuck file that you can pass to a Brainfuck interpreter.
As for getting your code to run, asm2bf
includes a lot of tools (including a Brainfuck debugger!) but you can use practically any interpreter. I heartily recommend you use something with 16 or more bits per cell—or, even better, whose cell size you can control—because having 8 bits per cell is extremely limiting, even if traditional. I personally use tritium for a lot of work, since it has a bunch of handy features. In this tutorial, I'll be using tritium for some of its convenience and my being used to it.
To check that everything is working as it should, assemble the following snippet:
stk 3 org 0 out .H out .e out .l out .l out .o out 10
When you run the output, you should see Hello
followed by a newline. This means you did it! Now it's time to do some low level programming!
Basic concepts
When writing assembly for asm2bf
, you're writing assembly for a theoretical processor that draws on mnemonics and some architecture choices from both the x86 family and RISC processors—but the actual code is preprocessed by a handful of tools and then compiled to Brainfuck. We will, for the most part, be interacting with the assembly, a bit of Lua macros, and the theoretical CPU.
There are no types as far as asm2bf
is concerned: everything we work with is a raw memory word that has its numerical value.
As far as memory goes, instead of only dealing with the classical Brainfuck tape, we're dealing with a more traditional machine memory model. The three types of useful memory we have on hand are the registers, the taperam and the stack. For now we'll be dealing only with the registers: they're the fastest and cleanest to access, and (almost!) all instructions work directly with registers.
The CPU has six general-purpose registers and four internal registers. The six general registers are r1, r2, r3, r4, r5, r6
, and the internal registers are f1, f2, f3
. The general-purpose registers are safe to work with: you'll be the only one setting and working with them, while the internal registers are also used by instructions "under the hood". Registers are quick and cheap to access, and you should use them in code as much as you can, even later in the tutorial when working with other kinds of memory.
Each register (and stack/ram unit) stores exactly one Brainfuck cell: if your interpreter is working with 8-bit cell sizes, your CPU will address 8 bits of memory and work on 8-bit words.
Each instruction can take zero, one or two arguments. We separate the first and second argument with commas. Examples of these three kinds:
mov r1, .t ; Here we move the (numerical value of the) character "t" ; into register r1. The "mov" instruction takes two arguments out r1 ; Here we tell the code to print out the value stored in ; register r1. The "out" instruction takes one argument end ; And this is the end of the line! "end" ends the program ; while obviously taking zero arguments
There are two "types" of arguments: you can either pass a register to an instruction, or an immediate. This is either a raw number (for example: out 512
) or a character literal: a single symbol preceded with a period (for example: out .H
that we saw in a previous section). Both the character and the number are essentially the same, it's just more convenient to write a character literal instead of looking up the ASCII/Unicode number for a letter or symbol you want to pass. Do note that you can only prefix the dot to ASCII characters, and that the program will not accept Unicode dotted characters as immediates.
For now, we will be working with a fairly rigid argument structure: our instructions can only take specific kinds of arguments. If the instruction takes two arguments, the first has to be a register while the second can be anything; if it takes only one argument, it will (depending on the instruction) take an immediate, a register or either. Easier to see:
# | ins. | 1st | 2nd | Comment |
---|---|---|---|---|
two | mov |
r2 |
5 |
write 5 into r2
|
add |
r3 |
add r3 to r2
| ||
one | out |
.H |
write H to console
| |
in |
r5 |
input data into r5
| ||
zero | end |
end program |
Let's formally introduce the first three instructions, and the first comment, that we'll be seeing a lot of:
ins. | 1st | 2nd | Comment |
---|---|---|---|
in
|
r1
|
input data into a register | |
out
|
.H & r5
|
outputs the data stored in its argument | |
end
|
ends the program | ||
; huh… |
single-line comment |
Note that in
must write to a register, whereas out
can write either an immediate, or the data stored in a register. The instruction end
is not necessary as a program will terminate once it reaches the bottom of the code, but it's good practice to include it, especially since later on our code will grow significantly.
Let's also introduce some functionality into our code, using some two-argument and one-argument instructions:
ins. | 1st | 2nd | Comment |
---|---|---|---|
mov |
r2 |
.H & r5 |
copy data to register |
add |
add data to register | ||
sub |
subtract data from register | ||
mul |
multiply register by data | ||
div |
divide register by data | ||
inc |
increment register | ||
dec |
decrement register |
These modify the first register and preserve the second argument; most instructions work like that, though this isn't true for all of them.
We can now write some rudimentary code:
; read data into registers 1 and 2 in r1 in r2 ; subtract r2 from r1, then add 3 sub r1, r2 add r1, 3 ; output r1 and 3 to console out r1 out 3 ; decrement r2 and output it to console dec r2 out r2 end
The code will now ask for two inputs, perform math, and output the result. Beware that, if it's printing in character mode (the usual behaviour), you may not see anything. Using tritium, you can tell it to output numerical values instead of characters using tritium -fintio -b32 [filepath]
; other interpreters have other methods.
If you input t
(value 116) and $
(value 36), your interpreter should output S
(value 83), a blank character (value 3), and #
(value 35).
Conditionals
The asm2bf
instruction set has instructions that check whether a condition is true, and instructions that execute if the condition evaluates as true. Using these two types of instructions you can get a lot of functionality. The condition check instructions:
ins. | 1st | 2nd | Comment |
---|---|---|---|
ceq |
r2 |
.H & r5 |
check if register is equal (==) to data |
cgt |
check if register is greater (>) than data | ||
cge |
check if register is greater/equal (≥) to data | ||
clt |
check if register is less (<) than data | ||
cle |
check if register is less/equal (≤) to data | ||
cne |
check if register is not equal to (≠) data |
These condition instructions store their result (zero if false, one if true) in the internal register f1
. The conditional instructions below will execute only if the f1
register is set to a value other than zero:
ins. | 1st | 2nd | Comment |
---|---|---|---|
cmov cmo |
r2 |
.H & r5 |
copy data to register |
cadd cad |
add data to register | ||
csub csu |
subtract data from register | ||
cmul cmu |
multiply register by data | ||
cdiv cdi |
divide register by data |
The four-letter prefixed instructions have aliases if you want to use only three-letter instructions. Do note that there are no conditional increment or decrement instructions: you should instead use the conditional add and subtract. You can also use cout, cou
and cin
, which work much the same. Later on, I will introduce instructions with their aliases and conditional variants at once, so you have everything in one place.
If you instead want to store the result of the condition instructions in the register you are evaluating, you can use the unprefixed versions of the conditions:
ins. | 1st | 2nd | Comment |
---|---|---|---|
eq eq_ |
r2 |
.H & r5 |
check if register is equal (==) to data |
gt gt_ |
check if register is greater (>) than data | ||
ge ge_ |
check if register is greater/equal (≥) to data | ||
lt lt_ |
check if register is less (<) than data | ||
le le_ |
check if register is less/equal (≤) to data | ||
ne ne_ |
check if register is not equal to (≠) data |
These store the results in the first argument, and also have three-character aliases: this time with an underscore added to the end.
Some more useful instructions:
ins. | 1st | 2nd | Comment |
---|---|---|---|
mod cmod , cmd |
r2 |
.H & r5 |
calculate register modulo data |
gcd cgcd , cgc |
calculates GCD of register and data | ||
swp cswp , csw |
r5 |
swaps two registers | |
par cpar , cpa |
sets register to zero if odd number of bits, otherwise to 1 | ||
cflip |
inverts the f1 register
|
Since the f1
register is just like any other register, you can even bypass the condition instructions by doing something silly like mov f1, 3
, which is practically guaranteed to make your code buggy: you will currently find that, among other things, using cflip
no longer toggles the conditional register, and instead increments it even higher.
Finally, you can do full branching, as asm2bf
also includes a form of if-clause: code that is between the instructions cbegin/cbs
and cend/cbe
will be executed, as a whole, only if f1
is set to positive. Think of it as the equivalent of a C-style if (f1) { ... }
conditional block.
We can use conditionals to do some fancier data manipulation:
; read data into register 1, copy it to register 2 in_ r1 mov r2, r1 ; check whether input is odd mod r2, 2 ceq r2, 1 ; write stuff if odd cbegin out .O out .d out .d out . out .n out .u out .m out .! cend ; output the input if even cflip cout r1 end
See what you can do with this! Apart from handling input and output, you're still limited to writing programs without flow control, or able to access memory beyond the 6+4 registers, but there's still a lot you can make.
Intro to memory: stack and taperam
Apart from the registers, the CPU has two additional types of memory storage borrowed from traditional CPU designs: it implements a stack (a first-in-last-out data structure) and a linear flat memory like traditional RAM called the taperam (after Brainfuck's tape). All of this is an abstraction atop of the Brainfuck tape: the registers, stack and taperam are all just separate ways of interacting with the underlying Brainfuck cells in predictable, more familiar ways.
Unlike in most modern CPUs, all asm2bf
you write is totally divorced from your program data: it can't modify itself or jump to arbitrary locations in the code. There is nothing like a traditional instruction pointer that will tell you where you are, since we're working on a significantly higher level of abstraction relative to Brainfuck.
While the permagen (where the registers and other internals are) is fixed in size, and the taperam is practically infinite (limited only by your Brainfuck implementation), the stack size is determined at compilation, and this is used to locate the taperam accordingly. The stack here grows upwards and away from the permagen.
For our purposes, we can for now say that if you want to work with data that isn't in registers, you will usually have to move it out of the stack or taperam into registers, operate on it, and then possibly return it. Exceptions to this do exist (look for vxcall
and amp, smp
in the manual, for example), but more on that later.
Stack access
Using the stack requires a bit more attention than working with registers: while the registers are forgiving and most operations work with them, the stack requires you to keep track of what you put in, when, and how much. The CPU model doesn't care if you put more elements on the stack than you told it you would, and will gladly overwrite memory as it grows into the taperam.
The size of the stack is set at compile time using the instruction stk 5
(or any other positive immediate integer, within reason!), which you should follow with org 0
, which tells the compiler that the taperam's address #0 starts after the last element of the stack. Thus, a skeleton for programs that use the stack would be:
stk 3 org 0 ; your code goes here ; lots and lots of code! ; whee! end ; a bit of code can also go here!
The two instructions that provide for basic stack functionality handle adding and removing elements from the stack, or in programmer parlance pushing and popping, and their conditional variants:
ins. | 1st | Comment |
---|---|---|
push , psh cpush , cpsh , cps |
r2 , 5 |
push data to stack |
pop cpop , cpo |
r4 |
pop to register |
When pushing to the stack from a register, the data in the register is preserved and added to the stack, but when popping from the stack, the top element is removed from it and put into the target register.
You could also change the stack size by manipulating the top elements using dup
and dsc
, which take no arguments:
ins. | Comment |
---|---|
dup |
duplicate top stack element |
dsc |
discard top stack element |
When you already have data on the stack, you might also want to rotate the two topmost elements, and asm provides this instruction and its conditional variants:
ins. | Comment |
---|---|
srv csrv , crv |
swaps ("swerves") top two stack elements |
You can also do more sophisticated stack stores using sgt
, spt
and tps
which can move data to and from the stack without modifying the size:
ins. | 1st | 2nd | Comment |
---|---|---|---|
sgt |
r2 |
3 & r5 |
write to register from data referenced stack element |
spt |
write register contents to data referenced stack element | ||
tps |
write data to stack element referenced by register |
The first one, the "reading" instruction sgt
, is fairly simple: it takes the contents of the stack element pointed to by its second argument and writes that to the register given in the first argument.
The "writing" instructions are a bit more complicated; spt
and tps
are inverses of each other. The "direct" one, spt
, moves data from the register in the first argument to the stack element referenced by the second argument, while the "reverse" tps
moves the data from the second argument into the stack element referenced by the first argument. To give a bit more context, consider the following program:
stk 3 org 0 psh 0 psh 0 psh 0 mov r1, 3 mov r2, 1 spt r1, r2 tps r1, r2 sgt r3, 1 out r3 sgt r3, 2 out r3 sgt r3, 3 out r3
Running this program will output 301
(if you're getting garbage output, your Brainfuck implementation is in character output mode; be sure to also add add r3, .0
after every sgt
). Essentially this means that the first store instruction spt r1, r2
stored the value r1 = 3
to the stack address r2 = 1
, and the second instruction tps r1, r2
stored the value r2 = 1
to the stack address r1 = 3
. The reverse instructions (and there are a few more) are workarounds around the inability to use immediate values as the first argument to the straightforward instructions; sometimes you really want to write spt 3, r2
without doing any extra tomfoolery, and instructions like tps
give you a way out.
There are some other stack instructions, such as for dealing with floating-point/fractional mathematics and whatnot. It might be useful/interesting to note that the zero-argument fcpush/fps
and fcpop/fpo
are basically shorthands for push f1
and pop f1
for preserving the conditional flag.
This all leads us to one final stack instruction, and the reason why we had to cover the stack before control flow:
ins. | Comment |
---|---|
ret cret , cre |
pop from stack and jump to popped label |
Jumps and control flow
Control flow is the stuff magic is made of: without the ability to control the flow of your programs, you don't have Turing completeness. Control flow in asm2bf
is implemented using good old jumps and conditionals on one hand, and labels on another. Unlike in, say, x86 assembly, you cannot just jump to a random location in memory and continue execution from there: all your jumps have to, in one way or another, be linked to a label.
The way labels are implemented is: there is the old system using the lbl
instruction (details of which you can find in the manual) which you shouldn't use, and code anchors which you should. The syntax for code references and anchors is using @name
to give an anchor to the place in code, and %name
to reference it. You can move references to and from registers and perform arithmetic as if they were immediates because they underlyingly get translated to immediates at compile time and are a layer of convenience over naked labels. Consider the following example:
stk 3 org 0 mov r1, %ref_1 mov r2, %ref_2 mov r3, %ref_3 add r1, r2 add r1, r3 add r1, .0 out r1 end @ref_1 @ref_2 @ref_3
This will output 3
, since %ref_1
evaluates to 0
, %ref_2
to 1
and %ref_3
to 2
(0+1+2=3).
Having said all that, jumps are really only a bit underwhelming! There is the unconditional jump jmp
, the jumps that check if the conditional f1
flag is true or false cjm
and cjnz/cjn
, and the check-if-register-is-zero jumps jz_/jz
and jnz. There is also the instruction we mentioned in the previous section, ret
, that takes zero arguments. These can all be seen like this:
# | ins. | 1st | 2nd | Comment |
---|---|---|---|---|
two | jnz |
r2 |
r1, , %add |
if r2 != 0 jump to %add
|
jz_ , jz |
if r2 == 0 jump to %add
| |||
one | cjnz , cjn |
r1, , %add |
if f1 != 0 jump to %add
| |
cjz |
if f1 == 0 jump to %add
| |||
jmp |
jump to %add
| |||
zero | ret cret , cre |
pop from stack and jump to it |
The last instruction, ret
, is particularly useful with the first macro we are going to use. Macros in asm2bf
are Lua code that is expanded upon preprocessing, and we will deal with those somewhat later. For now, we can use the $(call("abc"))
macro, which will perform something like a function call or a BASIC GOSUB
. It creates a label, pushes it to the stack and jumps to the argument given (in this case we told it to jump to @abc
); the canonical way to return from where we jumped is to use ret
which takes the first member of the stack (which in this case is the location from where we called this subroutine) and jumps to it unconditionally. A small example of this:
stk 3 org 0 out .x out 10 ; we enter the subroutine from here $(call("abc")) ; and we return from it to here out .x out 10 end @abc out .H out .e out .l out .l out .o out 10 ret
This should produce three lines of output for you, written here on one line: x\nHello\nx\n
. Because of the way it stores the return address, you shouldn't use $(call(""))
if you don't use the stack: every call does a push! You will overwrite your data if you don't keep track of your calls.
With these labels and conditional and unconditional jumps, you can implement pretty much anything. They are the building blocks you can use to make loops and implement Brainfuck in itself for all that it's worth. I personally like keeping my "subroutines" below the end
instruction in the source, in true BASIC fashion: that way you can make sure you'll never accidentally wander into a subroutine from the main flow of the program, and it keeps the different parts of the same source file separated and more easily manageable.