Asm2bf

From Esolang
Jump to navigation Jump to search

asm2bf is the only true brainfuck assembler toolkit. It produces quite small brainfuck code if you're able to optimize the assembly. With C preprocessor wired up, named labels and all the features you'd expect from a normal assembly, it's ready for your use.

The standard library is currently being constructed and you may contribute to make it even better ;). Github repository.

This article will go through most commonly used tools in the toolkit.

bconv

bconv is a program to convert brainfuck into equivalent 32-bit code. The conversion schema has been adapted from Brainfuck bitwidth conversions page on the wiki.

You can write a program for 16-bit celled interpreter, then pass it thru bconv, and the range of integers available will be doubled! This way, the program will run around three times slower, but on non-standard minimalistic 8-bit interpreters you will be able to create bfasm-compatible programs (as bfasm requires 16bit cells), and on 16-bit or 32-bit programs will allow respectively store 32-bit and 64-bit numbers.

BCONV uses premade snippets to generate the code required. You can study them yourself, but the very important thing that you might need to note is the fact that the program will require six times up to eight times as much of memory as it used before, and it might run three to nine times slower than before.

The I/O might be biased, so my general advice is avoiding to print numbers larger than a single cell (you could theoretically assume that you can print 16-bit numbers, as the specification says so, but it's recommended to stick to printing out just ASCII in any case as a rule of thumb).

NOTE: bconv does not work with RLE variant of brainfuck. Please decompress the code beforehand.

bfasm

bfasm is the core of the asm2bf toolkit. It uses a very low-level representation of certain code made strictly for asm2bf.

Register model

There are four general purpose registers – r1, r2, r3, and r4. Size of these registers depends on interpreter or compiler being used, but they are by definition at least 16 bits wide. There are no flags and there is no accessible instruction pointer, stack pointer nor base pointer. The assembler is case sensitive, so the registers should always be written with a lowercase "r". You can't change the segment, origin, or stack boundaries dynamically.

Memory

bfasm allows for a stack (of definable size) and memory. Memory can be accessed indirectly for pointer use. You don't have to set up a stack using stk command if you won't use memory (rcl/sto/db_/txt). Operations on memory follow a very simple pattern.

When setting an origin, it will define a point where the subsequent txt and db_ operations will place their data. When setting a segment though (using seg mnemonic), all the operations on memory will be offseted towards given whence and offset. Clever use of segment allows to read temporary registers, instruction pointer or deduce the stack pointer.

Stack doesn't have an explicit pointer, because it is not needed. Only available operations are psh, pop and srv that will (in order) push data onto the stack, pop data from stack, or reverse two topmost elements on the stack.

Comments

Comments begin with the semicolon character. They can be placed either at beginning of a line or after an instruction. Comments continue until the end of the line. Additionally, newline characters and spaces are ignored (but not in string literals). If bfasm is combined with other tools in the toolkit, the following may serve as a block comment:

#if 0
A very cool comment!
#endif

Operand types

Operand types supported by bfasm:

  • the registers
  • immediate values – numbers, characters, strings.
  • label references
Opcode Example
"string" "Hello, world!"
register r3
immediate 1453
character constant .K
Opcode combination Example
register, register r2, r4
register, immediate r2, 8
register, char constant r4, .X

Strings are used just with the "txt" instruction. There are no escape sequences, use "db_" when needed. Characters are converted to the appropriate ASCII immediate value and db_'d into memory. Immediate values can be any size (it depends) and are automatically substituted as if they were a register value. Therefore, immediate values can be used in place of a register. There can be only one immediate value per instruction. Operands are evaluated separately from instructions, so it is possible to use the wrong number of operands for an instruction with no error reported. However, doing this will lead to a crash when the assembled program is run.

Instructions are case-sensitive and must be given in lowercase letters. The returned values of a boolean operation are 0 (false) or 1 (true). This applies to: `and, eq_, ge_, gt_, le_, lt_, ne_, or_`. Only one instruction is allowed per line. Note that lbl (label) is an instruction, but it's not recommended (I would say forbidden, but using it is legal, yet you better tighten your seatbelts before doing that) to use when sticking with asm2bf toolkit.

Instruction set

Instruction Description Operation performed
add regA,regB Add regA = regA + regB
add reg,immed Add reg = reg + immed
and regA,regB AND gate regA = regA AND regB
asr regA Arithmetic shift right regA = regA >> 1
asl regA Arithmetic shift left regA = regA << 1
clr reg Clear reg = 0
db_ immed Put constant puts immed at the next memory position (see org)
dec reg Decrement reg = reg - 1
div regA,regB Divide regA = regA / regB
div regA,immed Divide regA = regA / immed
end End (Exit) program jmp 0 (exit)
eq_ regA,regB Equal to? regA = regA == regB
eq_ regA,immed Equal to? regA = regA == immed
ge_ regA,regB Greater than or equal? regA = regA >= regB
ge_ regA,immed Greater than or equal? regA = regA >= immed
gt_ regA,regB Greater than? regA = regA > regB
gt_ regA,immed Greater than? regA = regA > immed
in_ reg Input reg = input from STDIN (EOF = 0)
inc reg Increment reg = reg + 1
jmp reg Jump jumps to a line label
jmp immed Jump jumps to a line label
jnz regA,regB Jump if not zero jmp's to regB if regA != 0
jnz regA,immed Jump if not zero jmp's to immed if regA != 0
jz_ regA,regB Jump if zero jmp's to regB if regA == 0
jz_ regA,immed Jump if zero jmp's to immed if regA == 0
lbl immed Label defines a line label. do NOT use with asm2bf toolkit
le_ regA,regB Less than or equal to? regA = regA <= regB
le_ regA,immed Less than or equal to? regA = regA <= immed
log regA Change to logic value regA = regA >= 1
lt_ regA,regB Less than? regA = regA < regB
lt_ regA,immed Less than? regA = regA < immed
mod regA,regB Modulus regA = regA % regB
mod regA,immed Modulus regA = regA % immed
mov regA,regB Move (copy) regA = regB
mov reg,immed Move (set) reg = immed
mul regA,regB Multiply regA = regA * regB
mul regA,immed Multiply regA = regA * immed
ne_ regA,regB Not equal to? regA = regA != regB
ne_ regA,immed Not equal to? regA = regA != immed
neg reg Negate reg = -reg
not reg Logical NOT reg = !reg
or_ regA,regB Boolean OR regA = regA OR regB
or_ regA,immed Boolean OR regA = regA OR immed
org immed Origin sets memory position of the next db_ or txt output (default is 0, needs to be set again after stk is used)
out reg Output output to STDOUT = reg
out immed Output output to STDOUT = immed
pop reg Pop pops reg from the stack
pow regA, regB Power regA = regA ^ regB
pow regA, immed Power regA = regA ^ immed
psh reg Push pushes reg onto the stack
psh immed Push push immediate onto the stack
raw char Raw output writes char directly as code
rcl regA,regB Recall (from memory) regA = *regB (mov regA,[regB])
rcl regA,immed Recall (from memory) regA = *immed (mov regA,[immed])
ret Return pops label and jmp's to it
seg Set segment Set segment used by all peek/poke functions.
srv Swap top stack elems Swap top stack elements with each other. Requires two elements on stack minimum.
stk immed Stack size Set the maximum number of items on the stack (default is 0)
sto regA,regB Store (in memory) *regA = regB (mov [regA],regB)
sto regA,immed Store (in memory) *regA = immed (mov [regA],immed)
sub regA,regB Subtract regA = regA - regB
sub reg,immed Subtract reg = reg - immed
swp regA,regB Swap regA = regB, regB = regA
txt string Text converts string to immediate values and performs db_'s

Tips and tricks

There is no call instruction. A pure-bfasm call can be generated as follows:

psh 23          ; value of return label
jmp 1000        ; label of subroutine
lbl 23          ; return position
                ; some code
lbl 1000        ; subroutine
                ; some code
ret             ; return

Labels are numbered from 1. Jumping to label 0 is the same as invoking end.

ALL instructions have length of 3 and whenever it's needed, an underscore (_) is added at the end.

Debugging

Use bfintd, a special version of bfi. To place a breakpoint, use raw .* Then it's convenient to place in_ r1 (or any other register) to pause execution while the bug is investigated.

Run-length encoding

Run-length encoding is a feature of asm2bf v1.1.2. It allows to compress your program by merging sequences of continuing characters. You can enable this by issuing the following command before building bfasm:

setenv OPTIONS -DRLE

You can run RLE-encoded programs using bfi-rle. You can also change the RLE variant by defining RLE_POSTFIX.

Project examples

In this section I'll try to cover example programs in pure bfasm – no label preprocessor, no preprocessor at all, no fancy asm2bf stuff.

Note: All longer programs (including the following ones) are licensed under terms of MIT license. I think they are a valuable knowledge source, so I'm sharing them here.

Project example: A JSON formatter

I'll try to present and explain a JSON formatter written in pure bfasm. Note this program doesn't utilize the preprocessor, labels, memory allocator and all that sweet stuff coming with the distribution. It's probably also the most portable one, as it will run even on pre-release versions of asm2bf toolkit. You can view entire source code and all good stuff omitted in this article in the Github repository.


; JSON Formatter in Brainfuck webservice.
; Copyright (C) by Krzysztof Szewczyk, 2019.
; Have fun!            ~~ Palaiologos/MENACE

; This file contains the separate urldecode utility to decode an url.
; You can try fiddling with it on your own.

; Remove blocks between
; -----------
; and
; -----------
; If you don't need parameter amount checking code.
; (you probably don't)

stk 0
org 0

mov r4, .0

lbl 1
	in_ r1
	jz_ r1, 0
	mov r2, r1
	eq_ r2, .%
	jnz r2, 2
	; -----------
	mov r2, r1
	eq_ r2, .&
	jnz r2, 0
	; -----------
	mov r2, r1
	eq_ r2, .+
	jnz r2, 7
	out r1
	jmp 1
lbl 2
	in_ r1
	in_ r2
	mov r3, r1
	ge_ r3, .A
	jnz r3, 3
	jmp 4
lbl 3
	sub r1, 7
lbl 4
	sub r1, r4
	mov r3, r2
	ge_ r3, .A
	jnz r3, 5
	jmp 6
lbl 5
	sub r2, 7
lbl 6
	sub r2, r4
	mul r1, 16
	add r1, r2
	out r1
	jmp 1
lbl 7
	out 32
	jmp 1

; JSON Formatter in Brainfuck webservice.
; Copyright (C) by Krzysztof Szewczyk, 2019.
; Have fun!            ~~ Palaiologos/MENACE

; This file contains the formatter itself.
; The code may come from the first POST parameter of any name.
; Please note that this program doesn't take the parameter per se,
; the data incoming has to be piped into this tool with urldecoder
; applied beforehand.

stk 10
org 10

txt "Content-Type: application/json"
db_ 10
db_ 0

mov r1, 10

lbl 15
	rcl r2, r1
	jz_ r2, 16
	out r2
	inc r1
	jmp 15
lbl 16
	out 10

clr r1
clr r2

lbl 17
	in_ r1
	eq_ r1, .=
	jz_ r1, 17

psh 0
lbl 1
	in_ r1
	jz_ r1, 0
	mov r3, r1
	eq_ r3, 13
	jnz r3, 1
	mov r3, r1
	eq_ r3, 10
	jnz r3, 1
	mov r3, r1
	eq_ r3, 58
	jnz r3, 6
	mov r3, r1
	eq_ r3, 32
	jnz r3, 4
	mov r3, r1
	eq_ r3, 34
	jnz r3, 5
	mov r3, r1
	eq_ r3, .\
	jnz r3, 7
	mov r3, r1
	eq_ r3, .,
	jnz r3, 8
	mov r3, r1
	eq_ r3, .]
	psh r4
	mov r4, r1
	eq_ r4, .}
	or_ r3, r4
	pop r4
	jnz r3, 9
	mov r3, r1
	eq_ r3, .{
	psh r4
	mov r4, r1
	eq_ r4, .[
	or_ r3, r4
	pop r4
	jnz r3, 14
	pop r3
	jz_ r3, 10
	clr r3
lbl 10
	psh r3
	out r1
	jmp 1
lbl 2
	clr r3
lbl 11
	psh r3
	ge_ r3, r2
	jnz r3, 12
	out 9
	pop r3
	inc r3
	jmp 11
lbl 12
	pop r3
	ret
lbl 6
	out r1
	jnz r4, 1
	out 32
	jmp 1
lbl 4
	jz_ r4, 1
	jmp 3
lbl 5
	pop r3
	psh r3
	jnz r3, 3
	not r4
lbl 3
	out r1
	jmp 1
lbl 7
	pop r3
	not r3
	psh r3
	out r1
	jmp 1
lbl 8
	out r1
	jnz r4, 1
	out 10
	psh 1
	jmp 2
lbl 9
	jnz r4, 3
	jz_ r2, 13
	dec r2
lbl 13
	out 10
	psh 3
	jmp 2
lbl 14
	out r1
	jnz r4, 1
	inc r2
	out 10
	psh 1
	jmp 2

Now, let's glue these together using a simple bash script:

#!/bin/sh
asmbf/bfi urldecode.b | asmbf/bfi jsonformatter.b

And it's done! You can try fiddling with it and checking it's working. Example data for you to test:

{"test": "test : tests \\test \"test",["simple","as","that"]}

Code example: Subleq emulator

Subleq emulator isn't that hard a task. The following really nicely commented code has been written for bfasm per se, it doesn't involve any asm2bf toolkit action as well. It seems a little bit bugged though. It allows you to either input the program or program it into the emulator making it a ROM-executing machine.

; Subleq emulator in asm2bfv1b
; Copyright (C) by Krzysztof Szewczyk
; Licensed under MIT license.
; Have fun!   ~~ Palaiologos/MENACE

; Note it doesn't even use stack!

org 0
stk 0

; db code if you want to create ROM based machine

; Application entry point (1).
lbl 1
    ; Read cell 0 from memory
    mov r2, 0
    rcl r1, r2

    ; If the cell is equal to zero, jump to 2 (read)
    mov r2, 2
    jz_ r1, r2

    ; Else, jump to 4 (execute)
    jmp 4
lbl 2
    ; Ensure that r4 is 1.
    inc r4, 1

    ; Trash all the registers here.
    in_ r1
    mov r3, r1

    ; Nothing to read, jump to 5 (end)
    mov r2, 5
    jz_ r1, r2
lbl 3
    ; Start reading. Current index: r1
    ; First, check is r1 equal to 0. If so, go to 4 (execute).
    mov r2, 4
    jz_ r1, r2
    
    ; Note that reading is done in inverse order.
    in_ r2
    sto r2, r1

    ; Decrement r1 and loop
    dec r1
    jmp 3
lbl 4
    ; Execution loop.
    ; If r4 is equal to 1, we need to reverse the memory.
    ; R3 is keeping copy of ending index.
    ; R2 contains some crap now.
    ; R1 is equal to zero.

    inc r1
    eq_ r1, r4
    
    ; Now, r1 is 0 if there is nothing to change.
    mov r2, 6
    jz_ r1, r2
    clr r4
    jmp 7
lbl 5
    end
lbl 6
    ; Normal execution of code.
    ; Let's assume all registers are trashed now.

    ; while(ip <= 0)
    clr r4
    mov r3, 8
    le_ r4, r1
    jnz r4, r3
    jmp 5
lbl 7
    ; Reverse memory from 0 to r3 and jump to 6.
    
    ; r4 = current start index
    ; r3 = current end index
    ; r2 and r1 are spare!

    rcl r1, r4
    rcl r2, r3
    swp r1, r2
    sto r1, r4
    sto r2, r3

    ; So now two values are swapped.
    ; First, check difference between r3 and r4.
    ; Whenever it's 1, swapping finished.
    ; If it's bigger, increment r4 and decrement r3

    mov r4, r2
    mov r3, r1
    sub r1, r2
    mov r2, 1

    ; Now r1 is 1 or bigger
    eq_ r1, r2

    ; If r1 is 1, done reversing.
    mov r2, 6
    jnz r1, r2
    jmp 7

lbl 8
    ; Main execution loop
    ; r1 - IP

    ; r2 = a
    rcl r2, r1
    inc r1

    ; r3 = -1
    clr r3
    dec r3

    ; r4 = lbl 9
    mov r4, 9

    ; A = -1?
    eq_ r3, r2
    jnz r3, r4

    ; Let's check for B
    rcl r3, r1
    inc r1
    clr r4
    dec r4
    eq_ r4, r3
    mov r3, 10
    jnz r4, r3

    ; We are here for C.
    
    ; Point context A
    dec r1
    dec r1
    rcl r2, r1
    inc r1
    rcl r3, r1
    sub r3, r2

    ; Only r3 is needed now.
    clr r4
    mov r3, 11
    le_ r4, r3
    
    ; Critical point, increment r1 (don't forget about it, we need to point C!)
    inc r1

    ; Continue the jump
    jnz r4, r3

    ; No branches, loop!
    jmp 6

lbl 9
    ; load context B
    rcl r2, r1
    inc r1

    ; Print it out
    out r2

    ; Increment (to point context C)
    inc r1
    jmp 8

lbl 10
    ; point context A
    dec r1

    ; read and load to context A
    in_ r2
    sto r1, r2

    ; point context B
    inc r1

    ; Increment (to point context C)
    inc r1
    jmp 8

lbl 11
    ; r1 = [r1], when r1 points C.
    rcl r4, r1
    mov r1, r4
    jmp 8

bfi-rle

bfi-rle is a tool executing RLE-encoded brainfuck. It's supporting either postfix or prefix notation, but it has to be chosen upfront when compiling the tool. bfi-rle itself is a simple tool which extracts the brainfuck to temp directory and runs bfi over it. It doesn't provide any speed advantage (I'd say it does it the other way). It can be implemented in a better way, but a state-of-the-art version is sufficient. Although if you, dear reader, would like to improve the interpreter, please go ahead and do so.

bfi

Generic, non-optimizing (not counting clear loops) brainfuck interpreter, capable of executing asm2bf programs compiliant with the entire toolchain specification draft. General properties of the interpreter:

  • Possibly infinite memory cell amount support
  • Loading code of arbitrary size
  • Returning 0 on EOF
  • Written in portable C89
  • Around 120 SLOC

bfintd

Basically the same as bfi, but it's bundled with a debugging instruction onboard. * will dump register contents, stack contents, memory contents and all that useful information one may need to debug asm2bf programs. Please note it's not compliant with the toolchain per se (because stripping is done on the resulting brainfuck, so breakpoint locations are long gone by then). The newest v1.1.5 version though, will keep the breakpoints so you no longer need to fiddle with the toolchain.

bfmake

bfmake will preprocess and assemble a given program. It's intended to be used inside files as a shebang.

First step in building is passing the program thru bfpp. Then, everything is assembled and temporary files are mostly removed.

There isn't very much about bfmake, please refer to examples/example-*.asm in the repository to check out its practical use.

#!/bin/bfmake

stk 2
org 0

; Build brainfuck file by executing me!

#include <example-lib.asm>

psh 2
jmp %print_zero
lbl 2
	end

To build the program, simply execute it and the brainfuck file will appear by. Please note the program won't work as is, you need a complete distribution and you need to run the existing program inside examples, because preprocessor will complain about missing include files.

bfpp

bfpp is an asm2bf preprocessor based on gcc -E.

It behaves just like normal, but the first line is being cut to get rid of shebang that will be eventually parsed by gcc and make it bail out. Also, the includes are tweaked so the lib/ and . directories are added to the include path.

All the lines containing hash (#) from output are stripped later by the following sed program:

sed '/^#/ d' < "$1.i" > "$1.p"

The shebang is removed using the following snippet:

cat "$1" | sed -e "1!b" -e '/#/d'

Stray files might be left over if the program is used incorrectly. Please refer to examples/ example-*.asm in the repository.

bfpp is usually glued together with bfmake for usage simplicity.

bfpp allows you to place newlines inside macros. For example:

#define very_fun mov r1, .A NEWLINE out r1
very_fun

will result in the following output:

mov r1, .A
out r1

derle

derle decodes brainfuck code produced by bfasm in RLE mode. It's crucial in implementing the interpreter for rle-compressed brainfuck, but it might be useful as a standalone tool. Implementation is pretty simple and straightforward.

#!/usr/bin/perl -p

my ($style) = @ARGV;

s/\d+(.)/$1x$&/ge if $style eq "postfix" or $style == undef;
s/(.)\d+/$1x$&/ge if $style eq "prefix";

strip

strip is just a simple utility to shorten the code to a certain extent. It's recommended to run it on the code until there is no observable improvement.

It's a simple regex-utilizing program to remove no-ops in brainfuck and reduce the comments when possible. As every regexp is applied just once, it's recommended to run this program until no observable improvement can be noted.

The regexes are pretty straightforward to follow.

NOTE: strip will corrupt RLE-encoded brainfuck programs. Please decompress them before using the tool.

labels

Very useful program. It allows labels to be inserted into asm2bf programs. The script is pretty trivial.

#!/usr/bin/perl

# Released to the public domain by Krzysztof Szewczyk.

# asm2bf label preprocessor.
# C'mon, who doesn't love regexes?
# Have fun! ~~ Palaiologos/MENACE

$n=0;%o=();$_=do{local$/;<>};s/^[ \t]*@([A-Za-z_]+[A-Za-z0-9_]*).*$/$o{$1}=++$n;'lbl '.$n."\n";/gem;
s/(%([A-Za-z_]+[A-Za-z0-9_]*)|"[^"\n]*%([A-Za-z_]+[A-Za-z0-9_]*))/substr($1,0,1)eq'"'?$1:%o{substr$1,1}/ge;print;

You refer to a label using percent and its name, and you declare a label using at and the newly created label's name. For instance, let's review the stages of the code being built:

#define character .A
@loop
    out character
    jmp %loop
@loop
    out .A
    jmp %loop
lbl 1
    out .A
    jmp 1
+>+[>>>+<<+<<[>>->+<<<-]>>>[<<<+>>>-]<[->+<<[>>>-<<+<-]>[<+>-]>>[<->[-]]<[<<<+>>>-]<]>>[-]<<<<[>>+>+<<<-]>>[<<+>>-]>[[-]>>>>>>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.[-]+<<<<<<<<<[-]>[-]>>>>>>>>[<<<<<<<<+>+>>>>>>>-]<<<<<<<[>>>>>>>+<<<<<<<-]>>>>>>>[-]<<<<<<]<<<[>>+>+<<<-]>>[<<+>>-]>[[-]<<<[-]>[-]>>]<<]

Installation

All C source files can be linked independently, therefore installing and building shouldn't be a problem. All programs don't rely on other libraries than the Standard C Library. You can use Make and build targets `all' and `install'. Please note the best and the most useful command is make all install - it builds everything and puts it in /bin. If you want to put it somewhere else, just use make all and put the bin folder somewhere in PATH. It's very important for proper functioning of the toolkit. Also, you may want to refer to #!/anywhere/you/placed/it/bfmake instead of #!/bin/bfmake.

Gisa

Gisa is a programming language similiar to lisp and forth at the same time. It's using switchable postfix / prefix notation. It's able to output poorly-performant asm2bf code for quick prototyping of programs. There is no official documentation on it, but you may try using it on your own and experimenting with the syntax.

This example code will display ASCII(33) - on your system it's possibly an exclamantion mark.

(
	(outb [0])
	(= 0 (+ 1 [0]))
	(= 0 (* 2 [0]))
	(= 0 (* {2 2 +} (+ 2 2)))
)

It's demonstarating the use of outb and three different bracket types. The ( bracket will evaluate all statements inside them, but backwards. Curly brackets will do the same, but the other way round (postfix notation).

Gisa doesn't use new features of asm2bf, so you may need to implement them using inline assembly. It doesnt have the label preprocessor as a dependency, so the code generated will have fixed labels. Using labels inside inline assembly is discouraged, because it may break existing programs really eaisly.

(
	(exit)
	(outb ((6) square invoke))

	({defun square (
		* dup
	)} exports)

	((
		@"end"
	) exit defmacro)
	
	({
		@"pop r2"
		@"psh r2"
		@"psh r2"
	} dup defmacro)
)

This example is utilizing the macro defining possibilities, inline assembly, function declaration and invocation. This program will define dup macro to duplicate two elements on the stack. The next macro defined will exit the program using inline assembly and the end/jmp 0 feature. Square function will square the parameter on the stack.

Another example program, this one demonstrates the looping capabilities:

/* This program, when ran, will display following text: 0123456789 */

{
	(= 0 48)
	{loop {
		(outb [0])
		(= 0 (+ [0] 1))
	} (
		neq [0] (+ 48 10)
	) while}
}

It's very simple to comprehend. It's utilizing neq that is basically a wrapper over ne_ instruction. The loop though does have inversed order of parameters. It has the body at the top, increment routine in the middle, and check at the very end. You can try reading the code. It lies in separate Github repository.