Y86

Y86 is a toy RISC CPU instruction set for education purpose. It was invented before 1996 as a companion for the book The Art of Assembly Language to illustrate the basic principles of how a CPU works and how you can write programs for it.

In older editions of the book, Y86 was called x86. Apparently the architecture we now know as x86 wasn't called that back then, the book instead calls that architecture 80x86. Later editions of the book mostly call this language Y86, but there are a few places in the text where they forgot to replace the name.

The language is supposedly implemented by four hypothetical CPUs with different performance characteristics, called 886, 8286, 8486, 8686. The book defines execution times of the instructions (measured in clock cycles) for the 886, and some information for how much pipelining the other CPUs do.

Architecture
Y86 accesses a single memory of bytes with a 16-bit address space. The CPU is little-endian.

The Y86 registers include
 * an instruction pointer,
 * a comparison indicator (arithmetic status flags, condition code) whose state can be one of above, equal, below
 * and four 16-bit general purpose registers called AX, BX, CX, DX,
 * and possibly some save registers for supporting interrupt handling routines of whose workings I do not know.

The instructions for Y86 are one, two, or three bytes long. The instructions are ran sequentially from lower to higher address, except when a jump instruction, interrupt, or return from interrupt is ran.

Instruction set

 * or reg, reg/mem/imm: computes bitwise or
 * and reg, reg/mem/imm: computes bitwise and
 * cmp reg, reg/mem/imm: compare unsigned integers
 * sub reg, reg/mem/imm: subtracts from register
 * add reg, reg/mem/imm: adds to a register
 * mov reg, reg/mem/imm: loads to a register
 * mov mem, reg: stores from a register to memory
 * not reg/mem: computes bitwise complement
 * je addr16: jump if equal
 * jne addr16: jump if not equal
 * jb addr16: jump if below
 * jbe addr16: jump if below or equal
 * ja addr16: jump if above
 * jae addr16: jump if above or equal
 * jmp addr16: jump unconditionally
 * brk: pause program execution until continued from debug console
 * iret: return from interrupt service routine
 * halt: end program execution
 * get: wait for and get input integer from debug console to AX
 * put: put AX as output to debug console

The move and arithmetic instructions work only with 16-bit integer values. The sub instruction subtracts the second (source) operand from the first (destination) register and writes the result to the first (destination) register.

If the cmp instruction finds that its first (destination) operand is unsigned greater than the second, then it sets the comparison indicator to above, thus the ja, jae, jne instructions will take the branch. If the second operand is unsigned greater than the first, then it sets the comparison indicator to below, thus the jb, jbe, jne instructions will take the branch. If the two operands are equal, then the comparison indicator is set to equal, thus the jae, jbe, je instructions will take the branch.

(This description uses the assembly syntax in the older version of the book, where the destination operand is written first. The newer version writes the two operands of instructions swapped.)

Addressing modes
Ordinary instructions (or, and, cmp, sub, add, mov) have two operands. The first operand is used as both source and destination, and must be a register. The second operand can be a register, one of the three memory addressing modes listed below, or an immediate. Thus, ordinary instructions can load from memory but can't store there.

The second operand of ordinary instructions can have eight forms:
 * AX, BX, CX, DX: one of the registers
 * [BX]: two adjacent bytes at memory address given by the BX register
 * [disp16+BX]: two adjacent bytes at memory address given by the sum of the BX register and a 16-bit offset encoded in the instruction
 * [addr16]: two adjacent bytes at 16-bit direct address encoded in the instruction
 * imm16: 16-bit immediate value encoded in the instruction

The exception is that the mov instruction has a store form that has the two operands backwards: the first operand is one of the three memory addressing modes above, and the second operand is a register.

The not instruction works differently: it has a single operand that can be a register or one of the three memory addressing modes, and uses that single operand as both source and destination. This instruction can thus do load-modify-store, unlike the other arithmetic instructions.

The jump and conditional jump instructions all take a constant 16-bit jump target address. This points to the first byte of the next instruction to execute.

Instruction encoding
00         01          02          03         brk 04        iret 05        halt 06        get 07        put 08 aa aa  je aaaa 09 aa aa  jne aaaa 0A aa aa  jb aaaa 0B aa aa  jbe aaaa 0C aa aa  ja aaaa 0D aa aa  jae aaaa 0E aa aa  jmp aaaa 0F 10        not AX 11         not BX 12         not CX 13         not DX 14         not [BX] 15 dd dd  not [dddd+BX] 16 dd dd  not [dddd] 17         18          19          1A 1B 1C 1D 1E 1F 20        or  AX, AX 21         or  AX, BX 22         or  AX, CX 23         or  AX, DX 24         or  AX, [BX] 25 dd dd  or  AX, [dddd+BX] 26 dd dd  or  AX, [dddd] 27 ii ii  or  AX, iiii 28        or  BX, AX 29         or  BX, BX 2A         or  BX, CX 2B         or  BX, DX 2C         or  BX, [BX] 2D dd dd  or  BX, [dddd+BX] 2E dd dd  or  BX, [dddd] 2F ii ii  or  BX, iiii 30        or  CX, AX 31         or  CX, BX 32         or  CX, CX 33         or  CX, DX 34         or  CX, [BX] 35 dd dd  or  CX, [dddd+BX] 36 dd dd  or  CX, [dddd] 37 ii ii  or  CX, iiii 38        or  DX, AX 39         or  DX, BX 3A         or  DX, CX 3B         or  DX, DX 3C         or  DX, [BX] 3D dd dd  or  DX, [dddd+BX] 3E dd dd  or  DX, [dddd] 3F ii ii  or  DX, iiii 40        and AX, AX 41         and AX, BX 42         and AX, CX 43         and AX, DX 44         and AX, [BX] 45 dd dd  and AX, [dddd+BX] 46 dd dd  and AX, [dddd] 47 ii ii  and AX, iiii 48        and BX, AX 49         and BX, BX 4A         and BX, CX 4B         and BX, DX 4C         and BX, [BX] 4D dd dd  and BX, [dddd+BX] 4E dd dd  and BX, [dddd] 4F ii ii  and BX, iiii 50        and CX, AX 51         and CX, BX 52         and CX, CX 53         and CX, DX 54         and CX, [BX] 55 dd dd  and CX, [dddd+BX] 56 dd dd  and CX, [dddd] 57 ii ii  and CX, iiii 58        and DX, AX 59         and DX, BX 5A         and DX, CX 5B         and DX, DX 5C         and DX, [BX] 5D dd dd  and DX, [dddd+BX] 5E dd dd  and DX, [dddd] 5F ii ii  and DX, iiii 60        cmp AX, AX 61         cmp AX, BX 62         cmp AX, CX 63         cmp AX, DX 64         cmp AX, [BX] 65 dd dd  cmp AX, [dddd+BX] 66 dd dd  cmp AX, [dddd] 67 ii ii  cmp AX, iiii 68        cmp BX, AX 69         cmp BX, BX 6A         cmp BX, CX 6B         cmp BX, DX 6C         cmp BX, [BX] 6D dd dd  cmp BX, [dddd+BX] 6E dd dd  cmp BX, [dddd] 6F ii ii  cmp BX, iiii 70        cmp CX, AX 71         cmp CX, BX 72         cmp CX, CX 73         cmp CX, DX 74         cmp CX, [BX] 75 dd dd  cmp CX, [dddd+BX] 76 dd dd  cmp CX, [dddd] 77 ii ii  cmp CX, iiii 78        cmp DX, AX 79         cmp DX, BX 7A         cmp DX, CX 7B         cmp DX, DX 7C         cmp DX, [BX] 7D dd dd  cmp DX, [dddd+BX] 7E dd dd  cmp DX, [dddd] 7F ii ii  cmp DX, iiii 80        sub AX, AX 81         sub AX, BX 82         sub AX, CX 83         sub AX, DX 84         sub AX, [BX] 85 dd dd  sub AX, [dddd+BX] 86 dd dd  sub AX, [dddd] 87 ii ii  sub AX, iiii 88        sub BX, AX 89         sub BX, BX 8A         sub BX, CX 8B         sub BX, DX 8C         sub BX, [BX] 8D dd dd  sub BX, [dddd+BX] 8E dd dd  sub BX, [dddd] 8F ii ii  sub BX, iiii 90        sub CX, AX 91         sub CX, BX 92         sub CX, CX 93         sub CX, DX 94         sub CX, [BX] 95 dd dd  sub CX, [dddd+BX] 96 dd dd  sub CX, [dddd] 97 ii ii  sub CX, iiii 98        sub DX, AX 99         sub DX, BX 9A         sub DX, CX 9B         sub DX, DX 9C         sub DX, [BX] 9D dd dd  sub DX, [dddd+BX] 9E dd dd  sub DX, [dddd] 9F ii ii  sub DX, iiii A0        add AX, AX A1         add AX, BX A2         add AX, CX A3         add AX, DX A4         add AX, [BX] A5 dd dd  add AX, [dddd+BX] A6 dd dd  add AX, [dddd] A7 ii ii  add AX, iiii A8        add BX, AX A9         add BX, BX AA         add BX, CX AB         add BX, DX AC         add BX, [BX] AD dd dd  add BX, [dddd+BX] AE dd dd  add BX, [dddd] AF ii ii  add BX, iiii B0        add CX, AX B1         add CX, BX B2         add CX, CX B3         add CX, DX B4         add CX, [BX] B5 dd dd  add CX, [dddd+BX] B6 dd dd  add CX, [dddd] B7 ii ii  add CX, iiii B8        add DX, AX B9         add DX, BX BA         add DX, CX BB         add DX, DX BC         add DX, [BX] BD dd dd  add DX, [dddd+BX] BE dd dd  add DX, [dddd] BF ii ii  add DX, iiii C0        mov AX, AX C1         mov AX, BX C2         mov AX, CX C3         mov AX, DX C4         mov AX, [BX] C5 dd dd  mov AX, [dddd+BX] C6 dd dd  mov AX, [dddd] C7 ii ii  mov AX, iiii C8        mov BX, AX C9         mov BX, BX CA         mov BX, CX CB         mov BX, DX CC         mov BX, [BX] CD dd dd  mov BX, [dddd+BX] CE dd dd  mov BX, [dddd] CF ii ii  mov BX, iiii D0        mov CX, AX D1         mov CX, BX D2         mov CX, CX D3         mov CX, DX D4         mov CX, [BX] D5 dd dd  mov CX, [dddd+BX] D6 dd dd  mov CX, [dddd] D7 ii ii  mov CX, iiii D8        mov DX, AX D9         mov DX, BX DA         mov DX, CX DB         mov DX, DX DC         mov DX, [BX] DD dd dd  mov DX, [dddd+BX] DE dd dd  mov DX, [dddd] DF ii ii  mov DX, iiii E0         E1          E2          E3          E4         mov [BX], AX E5 dd dd   mov [dddd+BX], AX E6 dd dd   mov [dddd], AX E7          E8          E9          EA          EB          EC         mov [BX], BX ED dd dd   mov [dddd+BX], BX EE dd dd   mov [dddd], BX EF          F0          F1          F2          F3          F4         mov [BX], CX F5 dd dd   mov [dddd+BX], CX F6 dd dd   mov [dddd], CX F7          F8          F9          FA          FB          FC         mov [BX], DX FD dd dd   mov [dddd+BX], DX FE dd dd   mov [dddd], DX FF

Unknown details
The description in the book doesn't specify the language completely. The book comes with an interpreter for win16, and it should be possible to find out the missing details by studying that interpreter.

At least the following details of the operation are unknown
 * How do instructions other than CMP affect the comparison indicator? (The instruction set has between 0 and 16 NOP instructions, depending on this question.)
 * Does the CMP instruction modify its destination register?
 * How does the interrupt mechanism work? That is, what happens when an interrupt is triggered, and when the IRET instruction is executed?
 * What are the rules for self-modifying instructions? Do you need a taken jump between modifying an instruction and executing it?
 * How is the CPU initialized?
 * How do you bypass the cache for memory-mapped IO and DMA?