Minimal assembly language
|Memory system||Variables, dynamic memory, etc.|
|Computational class||Turing complete|
|Reference implementation||Original C implementation|
|Influenced by||Assembly languages (especially x86)|
Minimal assembly language is a very low-level esoteric programming language designed by me (User:Peter). The language has eight different instructions, each taking one argument and — when compiled — one byte of space. ".masm" is the file extension for written minimal assembly files, and ".mexe" is the extension for computer-generated executable files.
While all variables have to be allocated on heap in MASM, there is a stack of up to thirty-two pointers called the symbol table. The symbol table can be looked at as a list of identifiers. Five of the pointers are built-in, but the others twenty-seven can be used to store the location of identifiers. Half the symbol table, including the five built-in pointers, are shared between all files in a program. The other half is unique to each file.
The first element in the symbol table is the value pointer, or
vpt. The value pointer is read, written to, or both, by all the instructions in MASM (except for
fre; more on that later). It acts as a central station for transportation of data.
The second element in the symbol table is the instruction pointer, or
ipt. The instruction pointer runs a minimal assembly program by running one by one instruction. It always points to the instruction after the one that's currently being executed, because it moves before executing a command. The instruction pointer can be changed just like any other element in the symbol table, and that's how functions, loops and conditional jumps are made in MASM. The instruction pointer never stops executing commands, so all programs should have an infinite loop in the end to avoid
ipt moving to illegal memory addresses.
The third element of the symbol table is the input, or
in. The input is a null-terminated string containing null-terminated user input, which will be prompted before the program starts to run.
The fourth element of the symbol table is the output, or
out. It constantly prints what it's pointing to as a null-terminated string. Since
out only prints one continuous block of memory and it's impossible to allocate more than 255 bytes at once (
alc only takes one byte as an argument, after all), the output can't usually be longer than 255 bytes without causing the program to crash by printing unallocated memory. The only exceptions to this rule are to print the input or a label, both of which can have a theoretically infinite length. Creating a long label can actually be a pretty useful, if not a little inelegant and cheating, way to allocate large blocks of memory in general.
The fifth element of the symbol table is one. It points to a byte with a value of 1, and can be used to assign values to variables.
There's eight different types of instructions in MASM, each taking in an argument for a total of one byte of space. An instruction takes three bits of space (2³ = 8), and the other five bits are used for the argument, which is an index in the symbol table (2⁵ = 32, which is the maximum length of the symbol table).
get x command will move a single byte from where
x is pointing to, to where
vpt is pointing to.
set x command will make
x point to the same memory location as
alc x command will allocate
x bytes and make
vpt point to the beginning of the newly allocated memory.
fre x command frees the allocated memory
x is pointing to.
adr x command will move nine bytes to
vpt. These bytes will point towards
x, and can later be dereferenced by the
drf command. The eight last bytes stores the actual memory address, and the first byte is simply an unsigned offset from that address.
drf x command will make
vpt point to where the first nine bytes of
x points to. It reverses the
nnd x command will perform a bitwise nand operator on one byte from
x and move the result to
rot x command will perform a rotational bitshift
x times to the right.
All commands must have a separate line, and there's very little syntactic sugar in MASM.
cmt at the beginning of a line will cause the rest of the line to be a comment.
Labels in MASM are simply pointers to commands. A label can be defined by writing
lab x, where
x is the name of the label and instructions are written beneath the label. The compiler will automatically allocate enough space for the instructions written below such labels.
drf can be used to get the memory address of such labels, and
set ipt can be used afterwards to move execution. Execution always starts in a label named
main in a file named "main.masm".
Identifiers in minimal assembly are simply indices in the symbol table. There's no way to discern between functions, variables or pointers.
Half the symbol table is unique to all separate files. The name of these identifiers must be written in the top of a file to be usable within it. Files can import labels (but not regular variables) from other files in this way. A variable can have an "alias", meaning that it has two names. In that case, both names must be written in the same line.
Labels imported from other files aren't shared in the sense that they always have the same value. They simply start pointing to the same memory address, but changing where they're pointing to using
set in one file won't impact any other file.
Global variables are the same for all files, and must be declared in a file named
globals.masm in the same way that local variables are declared. Changing the location a global variable points to in one file will change it for all other files too. Built-in identifiers count as global variables, so you can't have more than eleven (sixteen minus five built-ins) user defined globals.
Even the simplest programs in minimal assembly requires quite a few lines of code, which is why the example programs below are links to other pages.
The following program prints out the input. This is quite simple in MASM as you can just make
out point to
This program first constructs the number 65 (the ascii value of 'A'). Then, it creates a two byte long buffer where it stores 65 in the first byte and a null terminator in the second. Finally, it makes
out point to this buffer so it can be printed.
The nand gate is functionally complete, so any other boolean (or in this case, bitwise) gate can be creates with it. The following programs are all the possible bitwise operators for up to two unordered arguments. The two first bytes of input (only one in bitwise not) are used as arguments, and the result will be printed to the screen (although the result won't make much sense to human eyes).
This program contains an
if-else label as well as
print-B, both of which can be used as arguments in
This program uses an if/else statement and logic similar to that of the bitwise operators to add two numbers together. It will print out the first number of input to the second one. For instance,
9, because 4 + 5 equals 9. It only works with numbers whose sum has only one digit.
This program is similar to addition. The
subtract function simply makes
arg1 negative before adding it to
arg. It will subtract the second character of input from the first one and print it to the screen. For example,
97 will output
2, because 9 - 7 = 2. It only works for single digit numbers with a positive difference.
MASM is probably Turing complete due to the functional completeness of the nand gate, the possibility to create theoretically unbounded arrays or linked lists with pointers, and the if- and goto statements that can be combined to create different types of control flow.
- My (User:Peter's) original C compiler (or interpreter, not sure what definition it goes under): https://github.com/Peter919/Minimal-assembly-language