ByteByteJump

ByteByteJump is an extremely simple One Instruction Set Computer (OISC). Its single instruction copies 1 byte from a memory location to another, and then performs an unconditional jump.

An instruction consists of 3 addresses stored consecutively in memory: A,B,C A is the source address, B is the destination address, and C is the jump address. N.B: ByteByteJump uses byte addressing.

ByteByteJump has no ALU, but arithmetic operations and conditional jumps can still be performed by using self-modifying code and lookup tables (see Example). Despite its apparent simplicity, ByteByteJump actually belongs to the computational class of real microprocessors: the Linear bounded automaton.

WordWordJump is the larger family of machines to which ByteByteJump belongs. An X*Y-bit WordWordJump machine has Y-bit data words and X*Y-bit address words, where X must be &ge;2 for the machine to be able to compute. The optimal value for X (as explained here) seems to be 3.

ByteByte/Jump is ByteByteJump's sister machine. It splits the single instruction of ByteByteJump into two for improved code density.

Virtual Machine implementation
Here's an implementation in C of a ByteByteJump VM with 4-byte addresses:

uint8_t mem[MEMSIZE]; uint32_t *pc = (uint32_t *)mem; for { mem[pc[1]] = mem[pc[0]]; pc = (uint32_t *)(mem + pc[2]); }

The VM above will run forever. Instead, we could quit upon reaching a "do-nothing-forever" instruction, i.e. one that:
 * has the same address for both source and destination, and
 * jumps back to itself.

WordWordJump
ByteByteJump can be considered to be a member of the larger family of WordWordJump machines. The simplest possible WordWordJump machine is BitBitJump which moves 1 bit at a time. We define an X*Y-bit WordWordJump machine to have X words per address, and Y bits per word. Thus, a 32-bit ByteByteJump machine could also be referred to as a 4*8-bit WordWordJump machine.

Optimal number of words per address?
How many data words should an address word optimally consist of? I.e. what should X be for an optimal X*Y-bit WordWordJump machine? Well, since we have no ALU, arithmetic/logic operations have to be performed by way of table lookups, which requires X to be &ge; 2 for the machine to function at all. But having at least 3 words per address simplifies things by allowing us to directly index into a 3-dimensional array of type [opcode][operand][operand]. On the other hand, X should be as small as possible to maximize code density. So it seems 3 is the magic number. Which means that an (in this sense) optimal ByteByteJump machine is one with 24-bit (3-byte) addresses. It would perhaps seem more natural to have 4-byte addresses for a ByteByteJump VM running on a 32-bit host. And with 24-bit addresses we can only access 16 MiB (vs 4 GiB using 32-bit addresses). But if we need a bigger address range, we can instead keep the 3-words-per-address format and increase the wordsize. A 3*10-bit WordWordJump machine for example can address 1 GiWords, and a 3*12-bit one can address 64 GiWords.

Here's a 3*10-bit WordWordJump VM:

uint16_t *mem, *pc; ... for { mem[pc[3]<<20 | pc[4]<<10 | pc[5]] = mem[pc[0]<<20 | pc[1]<<10 | pc[2]]; pc = mem + (pc[6]<<20 | pc[7]<<10 | pc[8]); }

The above machine uses 16 bits to store each 10-bit word. Alternatively you could pack 3 10-bit words into a 32-bit integer.

The two-instruction ByteByte/Jump
We could split the aggregate move-and-jump instruction into its constituent parts: move and jump, using a 1-bit opcode. This would result in improved code density, the flipside being a halved address range and some added instruction decoding work.

Let's use the most significant address bit as the opcode: 0=move byte, 1=jump. Since we now have 1 bit less for the source and jump addresses, the destination msb no longer serves any real purpose. But we'll put that bit to good use by adding support for a variable number of destinations. How? By defining a destination msb of 0 to mean that this is the last destination, and a 1 to mean that there are more destinations to follow.

To distinguish this new 2-opcode ByteByteJump dialect from the original opcode-less one, we'll put in a forward slash and call it ByteByte/Jump. We analogously define an X*Y-bit WordWord/Jump machine to have X*Y-bit instruction words (but only X*Y-1 address bits, since the most significant bit is used for the opcode / destination endmarker.

Since we now have separate move and jump instructions, we could simply halt on detecting an infinite one-instruction loop: A: Jump A

Example: Subtract and jump if negative
Suppose we have the following values stored in memory:

Address       | Value ---+-- 000800..00087F | 01 000880..0008FF | 02 01XXYY        | XX 02XXYY        | YY 03XXYY        | XX-YY

Then the following ByteByteJump program (using 3-byte addresses) will take the byte value at address 100h, subtract the byte value at address 200h, store the resulting byte value at address 300h, and jump to address 400h if the result was negative (&ge; 80h). Addresses which differ between the big-endian and little-endian versions are marked as bold.

Big-endian version          | Little-endian version -+- 000000: 000100 000013 000009 | 000000: 000100 000013 000009 000009: 000200 000014 000012 | 000009: 000200 000012 000012 000012: 030000 000300 00001B | 000012: 030000 000300 00001B 00001B: 000300 000026 000024 | 00001B: 000300 000024 000024 000024: 000800 00002D 00002D | 000024: 000800 00002F 00002D 00002D: 003F36 000035 000000 | 00002D: 003F36 000033 000000 000036: 000000 000000 000400 | 000036: 000000 000000 000400 00003F: ...... ...... ...... | 00003F: ...... ...... ......

Below is the previous ByteByteJump example rewritten in ByteByte/Jump machine code. Instruction words which differ between the big-endian and little-endian versions are marked as bold.

Big-endian version          | Little-endian version -+- 000000: 000100 00000D       | 000000: 000100 00000D 000006: 000200 00000E       | 000006: 000200 00000C 00000C: 030000 800300 000017 | 00000C: 030000 800300 000015 000015: 000800 00001B       | 000015: 000800 00001D 00001B: 002724 000023       | 00001B: 002724 000021 000021: 800000              | 000021: 800000 000024: 800400               | 000024: 800400 000027: ......               | 000027: ......

The ByteByteJump version takes up 63 bytes, while the ByteByte/Jump version takes up 39 bytes. At address 00000C in the ByteByte/Jump program, notice the use of multiple (in this case 2) destinations for the move instruction.

X*Y-bit move machines
A similar architecture to the X*Y-bit WordWordJump is the X*Y-bit Move machine. Just as with WordWordJump, X must be greater than or equal to 2 for the machine to be able to work without an ALU. This type of machine has only 2 address operands: source and destination. The jump address (or program counter) is instead mapped to some fixed location in memory.

Eugene Styer created two machines of this type in 1996: Both of these machines are described here. From the same page you can also download a simulator with an integrated macro assembler and some example programs.
 * 1) Byte Move with 16-bit addresses and byte-addressable memory
 * 2) As above, but with 24-bit addresses

As mentioned at the bottom of that page: "It should be noted that with the exception of Word Move, none of the machines has any requirement for 'magic' locations except for the program counter and ordinary memory-mapped I/O devices (although Byte Move 24-bit uses some, those are for convience and not necessity). This is in contrast to some Move machine designs that use special addresses for addition, subtraction, etc. (write A to 0010, B to 0020, read A+B from 0030)."

External resources

 * NAP (no ALU processor): the great communicator
 * One Instruction Computers - How Low Can You Go?
 * IBM 1620, a.k.a the CADET - "Can't Add, Doesn't Even Try"