asm2bf

From Esolang
Jump to navigation Jump to search
asm2bf
Designed by Krzysztof Szewczyk
Appeared in 2017
Memory system register,cell,stack
Computational class Turing-complete
Reference implementation asmbf
Influenced by Brainfuck
File extension(s) .asm, .s


asm2bf is the only true brainfuck assembler toolkit. It produces quite small brainfuck code if you're able to optimize the assembly. With a Lua-based preprocessor wired up, named labels and all the features you'd expect from a normal assembly, it's ready for your use.

You can join the Discord server discussing asm2bf.

The article has been written in a quite technical manner, so if you spot anything unclear please mention it on the asm2bf talk page, or open an issue in the github repo, or ask me on the discord server, on #esoteric, or state your thoughts on my user page.

The toolkit description

This part of article will go through most commonly used tools in the toolkit.

bconv

bconv is a program to convert brainfuck into equivalent 32-bit code. The conversion schema has been adapted from Brainfuck bitwidth conversions page on the wiki.

You can write a program for 16-bit celled interpreter, then pass it thru bconv, and the range of integers available will be doubled! This way, the program will run around three times slower, but on non-standard minimalistic 8-bit interpreters you will be able to create bfasm-compatible programs (as bfasm requires 16bit cells), and on 16-bit or 32-bit programs will allow respectively store 32-bit and 64-bit numbers.

BCONV uses premade snippets to generate the code required. You can study them yourself, but the very important thing that you might need to note is the fact that the program will require six times up to eight times as much of memory as it used before, and it might run three to nine times slower than before.

The I/O might be biased, so my general advice is avoiding to print numbers larger than a single cell (you could theoretically assume that you can print 16-bit numbers, as the specification says so, but it's recommended to stick to printing out just ASCII in any case as a rule of thumb).

NOTE: bconv does not work with RLE variant of brainfuck. Please decompress the code beforehand.

bfasm

bfasm is the core of the asm2bf toolkit. It uses a very low-level representation of certain code made strictly for asm2bf.

Register model

There are six general purpose registers – r1, r2, r3, r4, r5 and r6. Size of these registers depends on interpreter or compiler being used, but they are by definition at least 16 bits wide. There are no flags and there is no accessible instruction pointer, stack pointer nor base pointer. The assembler is case sensitive, so the registers should always be written with a lowercase "r". You can't change the segment, origin, or stack boundaries dynamically.

There are two additional registers, r5 and r6. They are mostly made for memory and stack I/O (due to close proximity of them in memory to the stack segment). They are 16-bits wide aswell.

Memory

bfasm allows for a stack (of definable size) and memory. Memory can be accessed indirectly for pointer use. You don't have to set up a stack using stk command if you won't use memory (rcl/sto/db_/txt). Operations on memory follow a very simple pattern.

When setting an origin, it will define a point where the subsequent txt and db_ operations will place their data. When setting a segment though (using seg mnemonic), all the operations on memory will be offseted towards given whence and offset. Clever use of segment allows to read temporary registers, instruction pointer or deduce the stack pointer.

Stack doesn't have an explicit pointer, because it is not needed. Only available operations are psh, pop and srv that will (in order) push data onto the stack, pop data from stack, or reverse two topmost elements on the stack.

Comments

Comments begin with the semicolon character. They can be placed either at beginning of a line or after an instruction. Comments continue until the end of the line. Additionally, newline characters and spaces are ignored (but not in string literals).

Operand types

Operand types supported by bfasm:

  • the registers
  • immediate values – numbers, characters, strings.
  • label references
Opcode Example
"string" "Hello, world!"
register r3
immediate 1453
character constant .K
Opcode combination Example
register, register r2, r4
register, immediate r2, 8
register, char constant r6, .X

Strings are used just with the "txt" instruction. There are no escape sequences, use "db_" when needed. Characters are converted to the appropriate ASCII immediate value and db_'d into memory. Immediate values can be any size (it depends) and are automatically substituted as if they were a register value. Therefore, immediate values can be used in place of a register. There can be only one immediate value per instruction. Operands are evaluated separately from instructions, so it is possible to use the wrong number of operands for an instruction with no error reported. However, doing this will lead to a crash when the assembled program is run.

Instructions are case-sensitive and must be given in lowercase letters. The returned values of a boolean operation are 0 (false) or 1 (true). This applies to: `and, eq_, ge_, gt_, le_, lt_, ne_, or_`. Only one instruction is allowed per line. Note that lbl (label) is an instruction, but it's not recommended (I would say forbidden, but using it is legal, yet you better tighten your seatbelts before doing that) to use when sticking with asm2bf toolkit.

Instruction set

Instruction Description Operation performed
add regA,regB Add regA = regA + regB
add reg,immed Add reg = reg + immed
and regA,regB AND gate regA = regA AND regB
amp regA,regB Dereference and add *regA += regB
amp regA,immed Dereference and add *regA += immed
asr regA Arithmetic shift right regA = regA >> 1
asl regA Arithmetic shift left regA = regA << 1
clr reg Clear reg = 0
db immed Put constant puts immed at the next memory position (see org)
dec reg Decrement reg = reg - 1
div regA,regB Divide regA = regA / regB
div regA,immed Divide regA = regA / immed
end End (Exit) program jmp 0 (exit)
eq regA,regB Equal to? regA = regA == regB
eq regA,immed Equal to? regA = regA == immed
ge regA,regB Greater than or equal? regA = regA >= regB
ge regA,immed Greater than or equal? regA = regA >= immed
gt regA,regB Greater than? regA = regA > regB
gt regA,immed Greater than? regA = regA > immed
in reg Input reg = input from STDIN (EOF = 0)
inc reg Increment reg = reg + 1
jmp reg Jump jumps to a line label
jmp immed Jump jumps to a line label
jnz regA,regB Jump if not zero jmp's to regB if regA != 0
jnz regA,immed Jump if not zero jmp's to immed if regA != 0
jz regA,regB Jump if zero jmp's to regB if regA == 0
jz regA,immed Jump if zero jmp's to immed if regA == 0
lbl immed Label defines a line label. do NOT use with asm2bf toolkit
le regA,regB Less than or equal to? regA = regA <= regB
le regA,immed Less than or equal to? regA = regA <= immed
log regA Change to logic value regA = regA >= 1
lt regA,regB Less than? regA = regA < regB
lt regA,immed Less than? regA = regA < immed
mod regA,regB Modulus regA = regA % regB
mod regA,immed Modulus regA = regA % immed
mov regA,regB Move (copy) regA = regB
mov reg,immed Move (set) reg = immed
mul regA,regB Multiply regA = regA * regB
mul regA,immed Multiply regA = regA * immed
ne regA,regB Not equal to? regA = regA != regB
ne regA,immed Not equal to? regA = regA != immed
nav regA Navigate to regA A low level command. Sets the memory pointer at the register.
neg reg Negate reg = -reg
not reg Logical NOT reg = !reg
or regA,regB Boolean OR regA = regA OR regB
or regA,immed Boolean OR regA = regA OR immed
org immed Origin sets memory position of the next db_ or txt output (default is 0, needs to be set again after stk is used)
out reg Output output to STDOUT = reg
out immed Output output to STDOUT = immed
pop reg Pop pops reg from the stack
pow regA, regB Power regA = regA ^ regB
pow regA, immed Power regA = regA ^ immed
psh reg Push pushes reg onto the stack
psh immed Push push immediate onto the stack
raw char Raw output A low level command. Writes char directly as code
rcl regA,regB Recall (from memory) regA = *regB (mov regA,[regB])
rcl regA,immed Recall (from memory) regA = *immed (mov regA,[immed])
ret Return pops label and jmp's to it
seg Set segment Set segment used by all peek/poke functions.
shl regA, regB/immed Shift left Repeat asl regA regB/immed times.
shr regA, regB/immed Shift right Repeat asr regA regB/immed times.
smp regA,regB Dereference and sub *regA -= regB
smp regA,immed Dereference and sub *regA -= immed
srv Swap top stack elems Swap top stack elements with each other. Requires two elements on stack minimum.
stk immed Stack size Set the maximum number of items on the stack (default is 0)
sto regA,regB Store (in memory) *regA = regB (mov [regA],regB)
sto regA,immed Store (in memory) *regA = immed (mov [regA],immed)
sub regA,regB Subtract regA = regA - regB
sub reg,immed Subtract reg = reg - immed
swp regA,regB Swap regA = regB, regB = regA
txt string Text converts string to immediate values and performs db_'s

Conditional instructions

Note: This feature will work only with v1.3.5 and up.

Since then, asm2bf has an internal flag register (inaccessible via normal means). The flag register can be set by the following: ceq cne cle clt cge cgt (equivalent to eq, ne, le, lt, ge, gt). For instance:

mov r1, 23
ceq r1, 23
; will set the flag register to 1.
ceq r1, 0
; nope, didn't succeed, now clears the flag register.


New jump types are ready to use: cjn cjz, that depend on the state of the flag register. For example:

; Old way (trashes r3):
mov r3, r1
eq_ r3, 10
jnz r3, %target

; New way (no trashes):
ceq r1, 10
cjn %target

All of these conditional instructions execute, whenever the flag register is set to 1:

Conditional variant: cad csu cmu cdi cmd csl csr cpw cps cpo csw crv cmo crc cst cam csm
Classic variant:     add sub mul div mod asl asr pow psh pop swp srv mov rcl sto amp smp

Tips and tricks

There is no call instruction. A pure-bfasm call can be generated as follows:

psh 23          ; value of return label
jmp 1000        ; label of subroutine
lbl 23          ; return position
                ; some code
lbl 1000        ; subroutine
                ; some code
ret             ; return

Labels are numbered from 1. Jumping to label 0 is the same as invoking end.

ALL instructions have length of 3 and whenever it's needed, an underscore (_) is added at the end. Note: As of v1.3.4, you can omit the underscore.

When using bfasm alongside with the rest of asm2bf toolchain, avoid jumping to numeric labels and avoid defining them using lbl instruction.

Debugging

Use bfintd, a special version of bfi. To place a breakpoint, use raw .* Then it's convenient to place in_ r1 (or any other register) to pause execution while the bug is investigated.

Run-length encoding

Run-length encoding is a feature of asm2bf v1.1.2. It allows to compress your program by merging sequences of continuing characters. You can enable this by issuing the following command before building bfasm:

setenv OPTIONS -DRLE

You can run RLE-encoded programs using bfi-rle. You can also change the RLE variant by defining RLE_POSTFIX.

Extended instructions

Note: All these extended instructions can be accessed using xNN names (if you don't use constpp).

band - x00 - bit and
bor - x01 - bit or
bneg - x02 - bit negation
bxor - x03 - bit xor
cflip - x04 - flip the conditional execution flag register.

Project examples

In this section I'll try to cover example programs in pure bfasm – no label preprocessor, no preprocessor at all, no fancy asm2bf stuff.

Note 2: All these programs have been written before asm2bf hit it's prime. Nowadays, we don't use numerical labels (lbl n), you should resort to the bflabels instead. The same goes with comparisons (they can be replaced using conditional execution). Because the interest around asm2bf isn't quite as big as I'd like it to be, the code examples will sadly remain outdated.

Modern code example: Sierpinski triangles

This code example is quite up to date with current asm2bf features. It's utilizing bflabels and conditional instrutions.

@main
	clr r2
	@b
		psh r1
		mov r4, r2
	@c
		mov r5, r1
		mod r5, 2
		mov r6, r4
		mod r6, 2
		mul r5, r6
		cge r5, 1
		cmo r5, 1
		cjn %d
		asr r1
		asr r4
		jnz r1, %c
		jnz r4, %c
		clr r3
	@d
		cge r3, 1
		mov r3, .*
		cmo r3, 32
		out r3
		pop r1
		inc r2
		cge r2, 64
		cjz %b
		out 10
	inc r1
	cge r1, 64
	cjz %main

Project example: A JSON formatter

I'll try to present and explain a JSON formatter written in pure bfasm. Note this program doesn't utilize the preprocessor, labels, memory allocator and all that sweet stuff coming with the distribution. It's probably also the most portable one, as it will run even on pre-release versions of asm2bf toolkit. You can view entire source code and all good stuff omitted in this article in the Github repository.


; JSON Formatter in Brainfuck webservice.
; Copyright (C) by Krzysztof Szewczyk, 2019.
; Have fun!            ~~ Palaiologos/MENACE

; This file contains the separate urldecode utility to decode an url.
; You can try fiddling with it on your own.

; Remove blocks between
; -----------
; and
; -----------
; If you don't need parameter amount checking code.
; (you probably don't)

stk 0
org 0

mov r4, .0

lbl 1
	in r1
	jz r1, 0
	mov r2, r1
	eq r2, .%
	jnz r2, 2
	; -----------
	mov r2, r1
	eq r2, .&
	jnz r2, 0
	; -----------
	mov r2, r1
	eq r2, .+
	jnz r2, 7
	out r1
	jmp 1
lbl 2
	in r1
	in r2
	mov r3, r1
	ge r3, .A
	jnz r3, 3
	jmp 4
lbl 3
	sub r1, 7
lbl 4
	sub r1, r4
	mov r3, r2
	ge r3, .A
	jnz r3, 5
	jmp 6
lbl 5
	sub r2, 7
lbl 6
	sub r2, r4
	mul r1, 16
	add r1, r2
	out r1
	jmp 1
lbl 7
	out 32
	jmp 1

; JSON Formatter in Brainfuck webservice.
; Copyright (C) by Krzysztof Szewczyk, 2019.
; Have fun!            ~~ Palaiologos/MENACE

; This file contains the formatter itself.
; The code may come from the first POST parameter of any name.
; Please note that this program doesn't take the parameter per se,
; the data incoming has to be piped into this tool with urldecoder
; applied beforehand.

stk 10
org 10

txt "Content-Type: application/json"
db_ 10
db_ 0

mov r1, 10

lbl 15
	rcl r2, r1
	jz r2, 16
	out r2
	inc r1
	jmp 15
lbl 16
	out 10

clr r1
clr r2

lbl 17
	in r1
	eq r1, .=
	jz r1, 17

psh 0
lbl 1
	in r1
	jz r1, 0
	mov r3, r1
	eq r3, 13
	jnz r3, 1
	mov r3, r1
	eq r3, 10
	jnz r3, 1
	mov r3, r1
	eq r3, 58
	jnz r3, 6
	mov r3, r1
	eq r3, 32
	jnz r3, 4
	mov r3, r1
	eq r3, 34
	jnz r3, 5
	mov r3, r1
	eq r3, .\
	jnz r3, 7
	mov r3, r1
	eq r3, .,
	jnz r3, 8
	mov r3, r1
	eq r3, .]
	psh r4
	mov r4, r1
	eq r4, .}
	or r3, r4
	pop r4
	jnz r3, 9
	mov r3, r1
	eq r3, .{
	psh r4
	mov r4, r1
	eq r4, .[
	or r3, r4
	pop r4
	jnz r3, 14
	pop r3
	jz r3, 10
	clr r3
lbl 10
	psh r3
	out r1
	jmp 1
lbl 2
	clr r3
lbl 11
	psh r3
	ge r3, r2
	jnz r3, 12
	out 9
	pop r3
	inc r3
	jmp 11
lbl 12
	pop r3
	ret
lbl 6
	out r1
	jnz r4, 1
	out 32
	jmp 1
lbl 4
	jz r4, 1
	jmp 3
lbl 5
	pop r3
	psh r3
	jnz r3, 3
	not r4
lbl 3
	out r1
	jmp 1
lbl 7
	pop r3
	not r3
	psh r3
	out r1
	jmp 1
lbl 8
	out r1
	jnz r4, 1
	out 10
	psh 1
	jmp 2
lbl 9
	jnz r4, 3
	jz r2, 13
	dec r2
lbl 13
	out 10
	psh 3
	jmp 2
lbl 14
	out r1
	jnz r4, 1
	inc r2
	out 10
	psh 1
	jmp 2

Now, let's glue these together using a simple bash script:

#!/bin/sh
asmbf/bfi urldecode.b | asmbf/bfi jsonformatter.b

And it's done! You can try fiddling with it and checking it's working. Example data for you to test:

{"test": "test : tests \\test \"test",["simple","as","that"]}

Code example: Subleq emulator

Subleq emulator isn't that hard a task. The following really nicely commented code has been written for bfasm per se, therefore it doesn't involve any asm2bf toolkit action. It seems a little bit bugged though. It allows you to either input the program or program it into the emulator making it a ROM-executing machine.

; Subleq emulator in asm2bfv1b
; Copyright (C) by Krzysztof Szewczyk
; Licensed under MIT license.
; Have fun!   ~~ Palaiologos/MENACE

; Note it doesn't even use stack!

org 0
stk 0

; db code if you want to create ROM based machine

; Application entry point (1).
lbl 1
    ; Read cell 0 from memory
    mov r2, 0
    rcl r1, r2

    ; If the cell is equal to zero, jump to 2 (read)
    mov r2, 2
    jz r1, r2

    ; Else, jump to 4 (execute)
    jmp 4
lbl 2
    ; Ensure that r4 is 1.
    inc r4, 1

    ; Trash all the registers here.
    in r1
    mov r3, r1

    ; Nothing to read, jump to 5 (end)
    mov r2, 5
    jz r1, r2
lbl 3
    ; Start reading. Current index: r1
    ; First, check is r1 equal to 0. If so, go to 4 (execute).
    mov r2, 4
    jz r1, r2
    
    ; Note that reading is done in inverse order.
    in r2
    sto r2, r1

    ; Decrement r1 and loop
    dec r1
    jmp 3
lbl 4
    ; Execution loop.
    ; If r4 is equal to 1, we need to reverse the memory.
    ; R3 is keeping copy of ending index.
    ; R2 contains some crap now.
    ; R1 is equal to zero.

    inc r1
    eq r1, r4
    
    ; Now, r1 is 0 if there is nothing to change.
    mov r2, 6
    jz r1, r2
    clr r4
    jmp 7
lbl 5
    end
lbl 6
    ; Normal execution of code.
    ; Let's assume all registers are trashed now.

    ; while(ip <= 0)
    clr r4
    mov r3, 8
    le r4, r1
    jnz r4, r3
    jmp 5
lbl 7
    ; Reverse memory from 0 to r3 and jump to 6.
    
    ; r4 = current start index
    ; r3 = current end index
    ; r2 and r1 are spare!

    rcl r1, r4
    rcl r2, r3
    swp r1, r2
    sto r1, r4
    sto r2, r3

    ; So now two values are swapped.
    ; First, check difference between r3 and r4.
    ; Whenever it's 1, swapping finished.
    ; If it's bigger, increment r4 and decrement r3

    mov r4, r2
    mov r3, r1
    sub r1, r2
    mov r2, 1

    ; Now r1 is 1 or bigger
    eq r1, r2

    ; If r1 is 1, done reversing.
    mov r2, 6
    jnz r1, r2
    jmp 7

lbl 8
    ; Main execution loop
    ; r1 - IP

    ; r2 = a
    rcl r2, r1
    inc r1

    ; r3 = -1
    clr r3
    dec r3

    ; r4 = lbl 9
    mov r4, 9

    ; A = -1?
    eq r3, r2
    jnz r3, r4

    ; Let's check for B
    rcl r3, r1
    inc r1
    clr r4
    dec r4
    eq r4, r3
    mov r3, 10
    jnz r4, r3

    ; We are here for C.
    
    ; Point context A
    dec r1
    dec r1
    rcl r2, r1
    inc r1
    rcl r3, r1
    sub r3, r2

    ; Only r3 is needed now.
    clr r4
    mov r3, 11
    le r4, r3
    
    ; Critical point, increment r1 (don't forget about it, we need to point C!)
    inc r1

    ; Continue the jump
    jnz r4, r3

    ; No branches, loop!
    jmp 6

lbl 9
    ; load context B
    rcl r2, r1
    inc r1

    ; Print it out
    out r2

    ; Increment (to point context C)
    inc r1
    jmp 8

lbl 10
    ; point context A
    dec r1

    ; read and load to context A
    in r2
    sto r1, r2

    ; point context B
    inc r1

    ; Increment (to point context C)
    inc r1
    jmp 8

lbl 11
    ; r1 = [r1], when r1 points C.
    rcl r4, r1
    mov r1, r4
    jmp 8

bfi-rle

bfi-rle is a tool executing RLE-encoded brainfuck. It's supporting either postfix or prefix notation, but it has to be chosen upfront when compiling the tool. bfi-rle itself is a simple tool which extracts the brainfuck to temp directory and runs bfi over it. It doesn't provide any speed advantage (I'd say it does it the other way). It can be implemented in a better way, but a state-of-the-art version is sufficient. Although if you, dear reader, would like to improve the interpreter, please go ahead and do so.

bfi

Generic, non-optimizing (not counting clear loops) brainfuck interpreter, capable of executing asm2bf programs compiliant with the entire toolchain specification draft. General properties of the interpreter:

  • Possibly infinite memory cell amount support
  • Loading code of arbitrary size
  • Returning 0 on EOF
  • Written in portable C89
  • Around 120 SLOC

bfintd

Basically the same as bfi, but it's bundled with a debugging instruction onboard. * will dump register contents, stack contents, memory contents and all that useful information one may need to debug asm2bf programs. Please note it's not compliant with the toolchain per se (because stripping is done on the resulting brainfuck, so breakpoint locations are long gone by then). The newest v1.1.5 version though, will keep the breakpoints so you no longer need to fiddle with the toolchain.

bfmake

bfmake will preprocess and assemble a given program. It's intended to be used inside files as a shebang.

First step in building is passing the program thru bfpp. Then, everything is assembled and temporary files are mostly removed.

There isn't very much about bfmake, please refer to examples/example-*.asm in the repository to check out its practical use.

#!/bin/asmbf/bfmake

stk 2
org 0

; Build brainfuck file by executing me!

#include("example-lib.asm")

psh 2
jmp %print_zero
lbl 2
	end

To build the program, simply execute it and the brainfuck file will appear by. Please note the program won't work as is, you need a complete distribution and you need to run the existing program inside examples, because preprocessor will complain about missing include files.

You can check also

% bfmake -h

for list of available options.

bfpp

bfpp is a asm2bf preprocessor based on Lua. All the lines starting with hash (#), are executed as Lua code during compile-time. To embed the value of an expression to an arbitrary place in your code, you can use $() syntax. Lua makes it really easy to make rich compile-time macros simplifying software development using asm2bf.

The foundation macro list as of 8.04.2020:

* #include("filename.asm") - include a file.
* #alloc(): signature:
; Allocate a page and pass the pointer in target_register.
; This function has a side effect on register passed in clear.
; It's getting, simply, cleared.
#function alloc(target_register, clear)

Note alloc() will allocate just a single page.

* #free(): signature:
; Free the page in target_register.
#function free(target_register)

Note free() will free just a single page.

Note that alloc and free are inlined, so if you plan on using alloc and free a lot, make your own stubs.

Legacy bfpp

WARNING: THIS IS OUTDATED SINCE V1.3.0

bfpp is an asm2bf preprocessor based on gcc -E.

It behaves just like normal, but the first line is being cut to get rid of shebang that will be eventually parsed by gcc and make it bail out. Also, the includes are tweaked so the lib/ and . directories are added to the include path.

All the lines containing hash (#) from output are stripped later by the following sed program:

sed '/^#/ d' < "$1.i" > "$1.p"

The shebang is removed using the following snippet:

cat "$1" | sed -e "1!b" -e '/#/d'

Stray files might be left over if the program is used incorrectly. Please refer to examples/ example-*.asm in the repository.

bfpp is usually glued together with bfmake for usage simplicity.

bfpp allows you to place newlines inside macros. For example:

#define very_fun mov r1, .A NEWLINE out r1
very_fun

will result in the following output:

mov r1, .A
out r1

derle

derle decodes brainfuck code produced by bfasm in RLE mode. It's crucial in implementing the interpreter for rle-compressed brainfuck, but it might be useful as a standalone tool. Implementation is pretty simple and straightforward.

#!/usr/bin/perl -p

my ($style) = @ARGV;

s/\d+(.)/$1x$&/ge if $style eq "postfix" or $style == undef;
s/(.)\d+/$1x$&/ge if $style eq "prefix";

note: derle has been replaced with bfderle written in C sinve v1.2.7. The behaviour is left unchanged.

bfstrip

bfstrip is just a simple utility to shorten the code to a certain extent. It's recommended to run it on the code until there is no observable improvement.

It's a simple regex-utilizing program to remove no-ops in brainfuck and reduce the comments when possible. As every regexp is applied just once, it's recommended to run this program until no observable improvement can be noted. As of v1.2.5, the regexes are applied over and over again.

As of newer versions (v1.3.0), bfstrip is now implemented in C.

NOTE: bfstrip will purge RLE and debugging data, if you'd like it to do otherwise, use --keep-rle

bflabels

A very useful program. It allows labels to be inserted into asm2bf programs. At first, it has been implemented in asm2bf, but now the implementation relies on lex.

@loop
    out .A
    jmp %loop
lbl 1
    out .A
    jmp 1
+>+[>>>+<<+<<[>>->+<<<-]>>>[<<<+>>>-]<[->+<<[>>>-<<+<-]>[<+>-]>>[<->[-]]<[<<<+>>>-]<]>>[-]<<<<[>>+>+<<<-]>>[<<+>>-]>[[-]>>>>>>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.[-]+<<<<<<<<<[-]>[-]>>>>>>>>[<<<<<<<<+>+>>>>>>>-]<<<<<<<[>>>>>>>+<<<<<<<-]>>>>>>>[-]<<<<<<]<<<[>>+>+<<<-]>>[<<+>>-]>[[-]<<<[-]>[-]>>]<<]

Installation

All C source files can be linked independently, therefore installing and building shouldn't be a problem. All programs don't rely on other libraries than the Standard C Library. You can use Make and build targets `all' and `install'. Please note the best and the most useful command is make all install - it builds everything and puts it in /bin. If you want to put it somewhere else, you may want to refer to #!/anywhere/you/placed/it/bfmake instead of #!/bin/asmbf/bfmake. To set the install path, set ASMBF_DIR environmental variable (by default, it's equal to /bin).

Gisa

Gisa is a programming language similiar to lisp and forth at the same time. It's using switchable postfix / prefix notation. It's able to output poorly-performant asm2bf code for quick prototyping of programs. There is no official documentation on it, but you may try using it on your own and experimenting with the syntax.

This example code will display ASCII(33) - on your system it's possibly an exclamantion mark.

(
	(outb [0])
	(= 0 (+ 1 [0]))
	(= 0 (* 2 [0]))
	(= 0 (* {2 2 +} (+ 2 2)))
)

It's demonstarating the use of outb and three different bracket types. The ( bracket will evaluate all statements inside them, but backwards. Curly brackets will do the same, but the other way round (postfix notation).

Gisa doesn't use new features of asm2bf, so you may need to implement them using inline assembly. It doesnt have the label preprocessor as a dependency, so the code generated will have fixed labels. Using labels inside inline assembly is discouraged, because it may break existing programs really eaisly.

(
	(exit)
	(outb ((6) square invoke))

	({defun square (
		* dup
	)} exports)

	((
		@"end"
	) exit defmacro)
	
	({
		@"pop r2"
		@"psh r2"
		@"psh r2"
	} dup defmacro)
)

This example is utilizing the macro defining possibilities, inline assembly, function declaration and invocation. This program will define dup macro to duplicate two elements on the stack. The next macro defined will exit the program using inline assembly and the end/jmp 0 feature. Square function will square the parameter on the stack.

Another example program, this one demonstrates the looping capabilities:

/* This program, when ran, will display following text: 0123456789 */

{
	(= 0 48)
	{loop {
		(outb [0])
		(= 0 (+ [0] 1))
	} (
		neq [0] (+ 48 10)
	) while}
}

It's very simple to comprehend. It's utilizing neq that is basically a wrapper over ne_ instruction. The loop though does have inversed order of parameters. It has the body at the top, increment routine in the middle, and check at the very end. You can try reading the code. It lies in separate Github repository.

data_labels

This information is valid as of v1.2.8a.

data_labels is a separate preprocessor allowing to refer to certain adresses in memory adress space. Usage example:

stk 2
org 0

&variable
db_ 23

&hello
txt "Hello, world!"
db_ 0

&another
db_ 34

mov r1, *hello
out r1               ; outputs ASCII 1

mov r2, *another
out r1               ; outputs ASCII 15

rave

Rave is an (unfinished) implementation of BFVM concept - an interpreter made specifically to execute asm2bf code efficiently. It can be used the same way bfi is used, albeit it doesn't support commandline switches, debugging and AST dumps as of now.

constpp

constpp is a preprocessor introduced along v1.3.8. It allows to create aliases for instructions and registers. For example:

?myout=out

push r1
xchg r1, r2
myout r2

Alias is created by placing ? at the beginning of the line, followed by an alphanumeric string (must start with a character) terminated by first occurence of =. Then, the 2nd part of the alias (from = to the end of the declaration) is treated like the replacement, therefore:

?myout=out
; replace all occurences of myout with out.
; replacing occurs outside quotes.
; W A R N I N G: DO NOT define one letter aliases, otherwise character constants (i.e. .K) may
;                not work correctly.

List of builtin aliases:

?push=psh
?xchg=swp
?cadd=cad
?csub=csu
?cmul=cmu
?cdiv=cdi
?cmod=cmd
?casl=csl
?casr=csr
?cpow=cpw
?cpush=cps
?cpsh=cps
?cpop=cpo
?cxchg=csw
?cswp=csw
?csrv=crv
?cmov=cmo
?crcl=crc
?csto=cst

b2bf

b2bf is a B compiler targetting asm2bf. The code isn't of bad quality (just 3 times bigger than hand-written optimized assembly), the compiler is quite fast but may still have some bugs, so it may not be the best idea to jump into asm2bf development using just b2bf (at least not now). Example code:

getnumb() {
    auto a, c, x, z 1, s;
    s = &s + 1;
    while (isdigit(c = getchar()) && a < 10) s[a++] = c;
    for (0; a--; z =* 10) x =+ z * (s[a] - 48);
    return x;
}

versioning

Contents of the VERSIONING document; pasted for various reasons, one of which is uniformity.

As most of us already know, there have been many asm2bf iterations, the lastest one being asm2bfv4c. As the versioning scheme usually featured asm2bf [version] [optional v] [letter; the higher the newer], many version numbers are already taken.

Declaring this, refurbished asm2bf with a few new instructions asm2bf5 (skipping 4 numbers inbetween!) doesn't seem like a good idea to me, so I've switched into another versioning scheme.

From now on, the Izmit-series asm2bf (usually optimized to be used by the compilers), will be using legacy, non-semver naming. The asm2bf4c or asm2bfv2a are simple examples of the old naming scheme.

The new asm2bfv1/1 versions will be from now on defined using semver. This way, the first release before log branch is merged (as you read it, the branch has probably been merged), can be called either asm2bf v1 or asm2bf v1.0. This way we keep backwards compatibility. As asm2bf will get so advanced, calling this version as asm2bf v1 will be a not-very-good idea (merely looking by the progress), the next version will be called asm2bf v2.0. The .0 is the suffix separating each version. I think it just adds up to the confusion, but at least we have a measurable matter over asm2bf versioning.

Also, who remembers about asm2bfv2/3/4? The information about the assemblers is really tough one to get, because they publically exist just formally. They obviously correspond to the Izmit version (v2=1, v3=2, v4=3&4). As Izmit5 will be just a mere wraparound adding C++ compilation possibility, the core might stay the same. I might some time switch to asm2bf as it becomes more and more mature to be worthy adding Izmit backend.

Izmit will keep running on asm2bfv4 for a some time. I'll strive to find old Izmit versions and hopefully host them somewhere.

Changelog

v0.9.0: First appeared publicly: 28 Oct 2017

  • Introduce basic (smallest common denominator) instruction set:
add and dec div eq_ ge_ gt_ in_ inc jmp jnz jz_ lbl le_ lt_ mod mov mul ne_ neg not or_ out pop psh rcl sto sub swp clr ret end stk org db_ txt raw
  • Current (as of late December of 2019) version of the self compiler.
  • Modulus is broken.
  • Labels are introduced just using lbl keyword.
  • No preprocessor.

v0.9.1: First appeared publicly: 19 Oct 2018

  • No changes to instruction set
  • Publicly available documentation for toolchain
  • Hello world example bundled in the source tree.

v0.9.2: First appeared publicly: 20 Oct 2018

  • Brainfuck apache2 module (to be used with asm2bf).
  • Overall formalization of the project
  • The self compiler appeared publicly.

v0.9.3: First appeared publicly: 31 Oct 2018

  • Debugging tools

v0.9.4: First appeared publicly: 31 Mar 2019

  • bfasm-experimental version featuring a new register.
  • Basic preprocessor

v1.0.0: First appeared publicly: 23 Apr 2019

  • rcl and sto with reg, immed construct.
  • Documentation tweaks.

v1.0.1: First appeared publicly: 10 May 2019

  • Inclusion system.
  • Deprecate dbgasm
  • mod and puts procedures in the trunk.
  • Extended interpreter for extended possibilities.
  • "Broken modulus is no more".
  • Self compiler yet again works.

v1.0.2: First appeared publicly: 10 Aug 2019

  • Gisa toolkit birth.
  • bfpp deprecation
  • Rust translation of the toolchain
  • (broken) Malbolge interpreter.
  • Hex counter example.

v1.1.0: First appeared publicly: 2 Oct 2019

  • New tools: strip.pl, bconv
  • New instructions: log, srv
  • Instruction buffer fix.
  • Versioning information.
  • First swallow of newest changes.

v1.1.1: First appeared publicly 12 Oct 2019

  • New instructions: asl, asr
  • Toolchain programs merged into single make, install script.
  • Label preprocessor.
  • Repository separated - background files and the documentation moved out of the root.

v1.1.2: First appeared publicly: 16 Nov 2019

  • Full-featured RLE support.
  • New toolchain tool (derle.pl)
  • Prefix/postfix compression
  • bfi now accepts path starting with / or - (no longer treats them like an option).

v1.1.3: First appeared publicly: 17 Nov 2019

  • Offsets.
  • New instruction: seg
  • Unit testbed
  • -DNO_CHECKS bfi variant
  • Remove experimental version of bfasm.

v1.1.4: First appeared publicly: 21 Nov 2019

  • Digits allowed in label names.
  • cpp is now the bfasm preprocessor.
  • Newlines allowed inside macros.
  • strip no longer trashes breakpoints.
  • New instructions: amp, smp

v1.2.0: First appeared publicly: 25 Nov 2019

  • Debugger simplified
  • More recent hello world example.

v1.2.4: First appeared publicly: 28 Dec 2019

  • New instruction: nav
  • r5 - new GPR
  • Enhanced stripper.
  • Immediate optimalization.

v1.2.5: First appeared publicly: 29 Dec 2019

  • New example: Branchless interpreter bitness check.
  • pow optimalization
  • -DDISABLE_OPT - disable immediate optimalization.
  • Uppercase instruction names.

v1.2.6: First appeared publicly: 30 Dec 2019

  • New bfi.
  • Bugfix: register uppercase names (e.g. R2) now work.

v1.2.7: First appeared publicly: 31 Dec 2019

  • Two new registers.

v1.2.8a: First appeared publicly: 23 Jan 2020

  • Alpha version.
  • Separate preprocessor for data labels. Not merged to the mainline yet.
  • `seg` now clears previous `org` value, setting it to 0. Can be treated like a bugfix from v1.1.3.

v1.2.9: First appeared publicly: 29 Feb 2020

  • Important bugfixes regarding registers - r5 and r6 temporairly removed.

v1.3.0: First appeared publicly: 13 Mar 2020

  • Milestone release, bugfixes.

v1.3.1: First appeared publicly: 10 Apr 2020

  • Milestone release, bugfixes.

v1.3.2: First appeared publicly: 13 Apr 2020

  • Minor project structure cleanup.
  • Documentation files removed from the repository.
  • Repository contains a mirror of the article.

v1.3.3: First appeared publicly: 14 Apr 2020

  • bfvm (rave) - a brainfuck interpreter targeted at optimizing, running and debugging asm2bf code at high speed.
  • new installation system (fixes a critical bug occuring without a ASMBF_DIR set)

v1.3.4: First appeared publicly: 16 Apr 2020

  • the floor character (_) at the end of an instruction is optional.

v1.3.5: First appeared publicly: 17 Apr 2020

  • Now, asm2bf has an internal flag register (inaccessible via normal means),
  • The flag register can be set by the following: ceq cne cle clt cge cgt (equivalent to eq, ne, le, lt, ge, gt). For instance:
mov r1, 23
ceq r1, 23
; will set the flag register to 1.
ceq r1, 0
; nope, didn't succeed, now clears the flag register.
  • New jumps: cjn cjz, that depend on the state of the flag register. For example:
; Old way (trashes r3):
mov r3, r1
eq_ r3, 10
jnz r3, %target

; New way (no trashes):
ceq r1, 10
cjn %target
  • New instructions! All of these execute, whenever the flag register is set to 1:
Conditional variant: cad csu cmu cdi cmd csl csr cpw cps cpo csw crv cmo crc cst
Classic variant:     add sub mul div mod asl asr pow psh pop swp srv mov rcl sto
  • b2bf (coined by comrade Maviek) - a B to asm2bf compiler. It's an initial release, so a few things in b2bf may be rigged, but it's able to produce decent code (just three times bigger than the handcoded and optimized assembly equivalent).

v1.3.6: First appeared publicly: 18 Apr 2020

  • purely bugfix-related update.
  • optimized modulus
  • (now working) r5 and r6 due to a programming error introduced while implementing conditional instructions.

v1.3.7: First appeared publicly: 15 May 2020

  • constpp is a new preprocessor introduced alongside asm2bf. It can be used to alias instruction and register names, like so:
?r6=sp
; and now
sub sp, 4
; is equal to
sub r6, 4
; No spaces are allowed after and before =, no spaces are allowed inside aliases,
; no spaces are allowed after the question mark (?).
  • Rave has now a 16-bit compatibility option.
  • Raw left and right bracket fix for Rave (oops)
  • Now the bitness example has the correct brackets used to create a while loop.
  • smp and amp micropatch
  • Removed dependency on -lfl

v1.3.8: First appeared publicly: 15 May 2020

  • This release features a tool to alias registers and instruction names. Example:
?bp=r1
?sp=r2
mov bp, sp
  • List of aliases:
?push=psh
?xchg=swp
?cadd=cad
?csub=csu
?cmul=cmu
?cdiv=cdi
?cmod=cmd
?casl=csl
?casr=csr
?cpow=cpw
?cpush=cps
?cpsh=cps
?cpop=cpo
?cxchg=csw
?cswp=csw
?csrv=crv
?cmov=cmo
?crcl=crc
?csto=cst
  • Instruction alias example:
push r1
xchg r1, r2
out r2

v1.3.9: First appeared publicly: 16 May 2020

  • New instructions:
band - bit and
bor - bit or
bxor - bit xor
shl - shift left (e.g. shl r1, 3)
shr - shift right (e.g. shr r1, 3)
cflip - flip the conditional execution flag register.

Programming techniques

This part of article will go through common programming techniques worked out before.

Portability

The future of asm2bf targets isn't clear yet. I might spend some of my time on creating another language targets. If this becomes the case, low level instructions and possibly low level constructs (for instance raw, nav, or sometimes seg) may behave differently.

The registers are by definition 16 bits wide. If you wish to run asm2bf-built code on another non-compiliant interpreter, you may want to utilize bconv as a part of the build process.

In case of 8-bit interpreters, the following limitations apply:

  • 255 or 127 cells of adressable memory - the rest has to be done either on stack or registers.
  • registers limited to range of -127 - 127 or 0 - 255.
  • labels up to 255.

Compact loops

Instead of using labels for very simple (branchless) loops, you can glue the following at the top of your asm2bf code.

#function bf_while(reg) print("nav " .. reg); print("raw .["); end
#function bf_wend(reg) print("nav " .. reg); print("raw .["); end

Note: This code is very low-level and possibly unportable assembly.

As an usage example, let's add all character ascii codes from stdin and print the result as a character.

; Standard crap, may be omitted
stk 0
org 0
seg 0

; Place the defines here.

in_ r1
$(bf_while("r1"))
    add r2, r1
    in_ r1
$(bf_wend("r1"))
out r2

What to do if asm2bf breaks?

First of all, you need to examine which stage of asm2bf broke. One of these may have failed:

  • Lua preprocessor -> Usually spits up message that stands out.
  • Label preprocessor -> Doesn't tell you about failure.
  • bfasm core -> Hard to debug, will get to that later
  • bfi -> Very unlikely

Labels preprocessor is orders of magnitude harder to debug, but it becomes quite obvious too. All you need to do is run the preprocessor standalone on the file.

/bin/asmbf/bflabels < "source.asm"

Will output contents on stdin - the ready-to-build bfasm core source. Examine it for weird label statements and jumps to undefined labels. Helpful tip: try to seek for *lbl 0*, or *lbl* followed by non-digits.

asm2bf core can fail in multiple ways. First of all, build it again and disable optimalizations and run-length encoding. Build the resulting .p file, don't rely on bfmake as it will make the process harder. If the output has NUL characters in it, you possibly passed incorrect amount of parameters to an instruction. If the output ends with hash (#), an syntax/semantical error occured near and the assembler was halted. You can use raw instructions to approximate the location (very bad error reporting system will be changed soon, hopefully). If you are using a self compiler, please stop as it's outdated. The last step is bfi. If your code executes fine on an interpreter with *unsigned short int* cells, please report it in the issues tab on the github repository or on the talk page (I look at issues more frequently though, so there is a higher chance of me noticing and fixing it).

Useful listings

lib-bfm.lua

lib-bfm.lua is a file prepended to most asm2bf programs. It contains some useful macros for development. The listing below will outline the contents of it and (hopefully) all symbols it defines.

$(

function include(path)
    ; Code truncated (because it's too large), but this function's
    ; duty is including other files into the current file.
    ; For example, #include('file.asm') will pump the contents of
    ; file.asm in place of the #include directive.
    ; The file will be fully preprocessed.

    ; This file (lib-bfm.lua) is included automatically.
end
	
)

; call given label (provide only the name! without a % sign)
$(
RET_ID = 0
function call(x)
	print("psh %_return_" .. RET_ID)
	print("jmp %" .. x)
	print("@_return_" .. RET_ID)
	RET_ID = RET_ID + 1
end
)

#MM_BASE = 0
#PAGE_SIZE = 16

; Allocate a page and pass the pointer in target_register.
; This function has a side effect on register passed in clear.
; It's getting, simply, cleared.
#function alloc(target_register, clear)
#	print("psh " .. clear)
#	print("mov " .. target_register .. ", " .. tostring(MM_BASE))
#	print("mov " .. clear .. ", 1")
#	print("nav " .. clear)
#	print("raw .[")
#	print("rcl " .. clear .. ", " .. target_register)
#	print("add " .. target_register .. ", " .. tostring(PAGE_SIZE))
#	print("inc " .. target_register)
#	print("nav " .. clear)
#	print("raw .]")
#	print("sub " .. target_register .. ", " .. tostring(PAGE_SIZE))
#	print("dec " .. target_register)
#	print("sto " .. target_register .. ", 1")
#	print("inc " .. target_register)
#	print("pop " .. clear)
#end

; Free the page in target_register.
#function free(target_register)
#	print("dec " .. target_register)
#	print("sto " .. target_register .. ", 0")
#end

; Repeat str n times
#function times(str, n)
#	for i=1,n,1 do print(str) end
#end

Regular expressions for preprocessors

bflabels

^[ \t]*\@([A-Za-z_][A-Za-z0-9_]*) { addlabel(yytext); }
(%([A-Za-z_][A-Za-z0-9_]*)|"[^"\n]*%([A-Za-z_][A-Za-z0-9_]*)) { getlabel(yytext); }
. { putchar(yytext[0]); }

data-labels

^[ \t]*\&([A-Za-z_][A-Za-z0-9_]*) { addlabel(yytext); }
(\*([A-Za-z_][A-Za-z0-9_]*)|"[^"\n]*\*([A-Za-z_][A-Za-z0-9_]*)) { getlabel(yytext); }
^[ \t]*db_ { origin++; printf("%s", yytext); }
^[ \t]*txt[ \t]*\".*\" { origin += strlen(strchr(yytext, '"') + 1) - 1; printf("%s", yytext); }
^[ \t]*seg[ \t]*([0-9]+) { segment = atoi(strpbrk(yytext, "0123456789")); printf("%s", yytext); }
^[ \t]*org[ \t]*([0-9]+) { origin = atoi(strpbrk(yytext, "0123456789")); printf("%s", yytext); }
. { putchar(yytext[0]); }

constpp

^[ \t]*\?([A-Za-z_][A-Za-z0-9_]*)\=([A-Za-z_][A-Za-z0-9_]*) { push_def(yytext); }
(([A-Za-z_][A-Za-z0-9_]*)|"[^"\n]*([A-Za-z_][A-Za-z0-9_]*)) { pop_def(yytext); }
. { putchar(yytext[0]); }