Minimal assembly language

From Esolang
Jump to navigation Jump to search
Minimal Assembly Language
Paradigm(s) Imperative
Designed by User:Peter
Appeared in 2022
Memory system Variables, dynamic memory, etc.
Computational class Turing complete
Reference implementation Original C implementation
Influenced by Assembly languages (especially x86)
File extension(s) .masm, .mexe

Minimal assembly language is a very low-level esoteric programming language designed by me (User:Peter). The language has eight different instructions, each taking one argument and — when compiled — one byte of space. ".masm" is the file extension for written minimal assembly files, and ".mexe" is the extension for computer-generated executable files.

Symbol table

While all variables have to be allocated on heap in MASM, there is a stack of up to thirty-two pointers called the symbol table. The symbol table can be looked at as a list of identifiers. Five of the pointers are built-in, but the others twenty-seven can be used to store the location of identifiers. Half the symbol table, including the five built-in pointers, are shared between all files in a program. The other half is unique to each file.

Value pointer

The first element in the symbol table is the value pointer, or vpt. The value pointer is read, written to, or both, by all the instructions in MASM (except for fre; more on that later). It acts as a central station for transportation of data.

Instruction pointer

The second element in the symbol table is the instruction pointer, or ipt. The instruction pointer runs a minimal assembly program by running one by one instruction. It always points to the instruction after the one that's currently being executed, because it moves before executing a command. The instruction pointer can be changed just like any other element in the symbol table, and that's how functions, loops and conditional jumps are made in MASM. The instruction pointer never stops executing commands, so all programs should have an infinite loop in the end to avoid ipt moving to illegal memory addresses.

Input

The third element of the symbol table is the input, or in. The input is a null-terminated string containing null-terminated user input, which will be prompted before the program starts to run.

Output

The fourth element of the symbol table is the output, or out. It constantly prints what it's pointing to as a null-terminated string. Since out only prints one continuous block of memory and it's impossible to allocate more than 255 bytes at once (alc only takes one byte as an argument, after all), the output can't usually be longer than 255 bytes without causing the program to crash by printing unallocated memory. The only exceptions to this rule are to print the input or a label, both of which can have a theoretically infinite length. Creating a long label can actually be a pretty useful, if not a little inelegant and cheating, way to allocate large blocks of memory in general.

One

The fifth element of the symbol table is one. It points to a byte with a value of 1, and can be used to assign values to variables.

Instructions

There's eight different types of instructions in MASM, each taking in an argument for a total of one byte of space. An instruction takes three bits of space (2³ = 8), and the other five bits are used for the argument, which is an index in the symbol table (2⁵ = 32, which is the maximum length of the symbol table).

Get

The get x command will move a single byte from where x is pointing to, to where vpt is pointing to.

Set

The set x command will make x point to the same memory location as vpt.

Allocate

The alc x command will allocate x bytes and make vpt point to the beginning of the newly allocated memory.

Free

The fre x command frees the allocated memory x is pointing to.

Address

The adr x command will move nine bytes to vpt. These bytes will point towards x, and can later be dereferenced by the drf command. The eight last bytes stores the actual memory address, and the first byte is simply an unsigned offset from that address.

Dereference

The drf x command will make vpt point to where the first nine bytes of x points to. It reverses the adr command.

Nand

The nnd x command will perform a bitwise nand operator on one byte from vpt and x and move the result to vpt.

Rotate

The rot x command will perform a rotational bitshift x times to the right.

Syntax

All commands must have a separate line, and there's very little syntactic sugar in MASM.

Comments

Writing cmt at the beginning of a line will cause the rest of the line to be a comment.

Labels

Labels in MASM are simply pointers to commands. A label can be defined by writing lab x, where x is the name of the label and instructions are written beneath the label. The compiler will automatically allocate enough space for the instructions written below such labels. drf can be used to get the memory address of such labels, and set ipt can be used afterwards to move execution. Execution always starts in a label named main in a file named "main.masm".

Identifiers

Identifiers in minimal assembly are simply indices in the symbol table. There's no way to discern between functions, variables or pointers.

Files

Half the symbol table is unique to all separate files. The name of these identifiers must be written in the top of a file to be usable within it. Files can import labels (but not regular variables) from other files in this way. A variable can have an "alias", meaning that it has two names. In that case, both names must be written in the same line.

Labels imported from other files aren't shared in the sense that they always have the same value. They simply start pointing to the same memory address, but changing where they're pointing to using set in one file won't impact any other file.

Global variables

Global variables are the same for all files, and must be declared in a file named globals.masm in the same way that local variables are declared. Changing the location a global variable points to in one file will change it for all other files too. Built-in identifiers count as global variables, so you can't have more than eleven (sixteen minus five built-ins) user defined globals.

Example programs

Even the simplest programs in minimal assembly requires quite a few lines of code, which is why the example programs below are links to other pages.

Cat program

The following program prints out the input. This is quite simple in MASM as you can just make out point to in.

Cat program

Print 'A'

This program first constructs the number 65 (the ascii value of 'A'). Then, it creates a two byte long buffer where it stores 65 in the first byte and a null terminator in the second. Finally, it makes out point to this buffer so it can be printed.

Print A

Bitwise operators

The nand gate is functionally complete, so any other boolean (or in this case, bitwise) gate can be creates with it. The following programs are all the possible bitwise operators for up to two unordered arguments. The two first bytes of input (only one in bitwise not) are used as arguments, and the result will be printed to the screen (although the result won't make much sense to human eyes).

Bitwise not

Bitwise or

Bitwise and

Bitwise nor

Bitwise xnor

Bitwise xor

If/else statement

This program contains an if-else label as well as print-A and print-B, both of which can be used as arguments in if-else.

If/else statement

Addition

This program uses an if/else statement and logic similar to that of the bitwise operators to add two numbers together. It will print out the first number of input to the second one. For instance, 45 outputs 9, because 4 + 5 equals 9. It only works with numbers whose sum has only one digit.

Addition

Subtraction

This program is similar to addition. The subtract function simply makes arg1 negative before adding it to arg. It will subtract the second character of input from the first one and print it to the screen. For example, 97 will output 2, because 9 - 7 = 2. It only works for single digit numbers with a positive difference.

Subtraction

Computational class

MASM is probably Turing complete due to the functional completeness of the nand gate, the possibility to create theoretically unbounded arrays or linked lists with pointers, and the if- and goto statements that can be combined to create different types of control flow.

External resources