Y86
Y86 is a toy RISC CPU instruction set for education purpose. It was invented before 1996 as a companion for the book The Art of Assembly Language to illustrate the basic principles of how a CPU works and how you can write programs for it.
In older editions of the book, Y86 was called x86. Apparently the architecture we now know as x86 wasn't called that back then, the book instead calls that architecture 80x86. Later editions of the book mostly call this language Y86, but there are a few places in the text where they forgot to replace the name.
The language is supposedly implemented by four hypothetical CPUs with different performance characteristics, called 886, 8286, 8486, 8686. The book defines execution times of the instructions (measured in clock cycles) for the 886, and some information for how much pipelining the other CPUs do.
Architecture
Y86 accesses a single memory of bytes with a 16-bit address space. The CPU is little-endian.
The Y86 registers include
- an instruction pointer,
- a comparison indicator (arithmetic status flags, condition code) whose state can be one of above, equal, below
- and four 16-bit general purpose registers called AX, BX, CX, DX,
- and possibly some save registers for supporting interrupt handling routines of whose workings I do not know.
The instructions for Y86 are one, two, or three bytes long. The instructions are ran sequentially from lower to higher address, except when a jump instruction, interrupt, or return from interrupt is ran.
Instruction set
- or reg, reg/mem/imm
- computes bitwise or
- and reg, reg/mem/imm
- computes bitwise and
- cmp reg, reg/mem/imm
- compare unsigned integers
- sub reg, reg/mem/imm
- subtracts from register
- add reg, reg/mem/imm
- adds to a register
- mov reg, reg/mem/imm
- loads to a register
- mov mem, reg
- stores from a register to memory
- not reg/mem
- computes bitwise complement
- je addr16
- jump if equal
- jne addr16
- jump if not equal
- jb addr16
- jump if below
- jbe addr16
- jump if below or equal
- ja addr16
- jump if above
- jae addr16
- jump if above or equal
- jmp addr16
- jump unconditionally
- brk
- pause program execution until continued from debug console
- iret
- return from interrupt service routine
- halt
- end program execution
- get
- wait for and get input integer from debug console to AX
- put
- put AX as output to debug console
The move and arithmetic instructions work only with 16-bit integer values. The sub instruction subtracts the second (source) operand from the first (destination) register and writes the result to the first (destination) register.
If the cmp instruction finds that its first (destination) operand is unsigned greater than the second, then it sets the comparison indicator to above, thus the ja, jae, jne instructions will take the branch. If the second operand is unsigned greater than the first, then it sets the comparison indicator to below, thus the jb, jbe, jne instructions will take the branch. If the two operands are equal, then the comparison indicator is set to equal, thus the jae, jbe, je instructions will take the branch.
(This description uses the assembly syntax in the older version of the book, where the destination operand is written first. The newer version writes the two operands of instructions swapped.)
Addressing modes
Ordinary instructions (or, and, cmp, sub, add, mov) have two operands. The first operand is used as both source and destination, and must be a register. The second operand can be a register, one of the three memory addressing modes listed below, or an immediate. Thus, ordinary instructions can load from memory but can't store there.
The second operand of ordinary instructions can have eight forms:
- AX, BX, CX, DX
- one of the registers
- [BX]
- two adjacent bytes at memory address given by the BX register
- [disp16+BX]
- two adjacent bytes at memory address given by the sum of the BX register and a 16-bit offset encoded in the instruction
- [addr16]
- two adjacent bytes at 16-bit direct address encoded in the instruction
- imm16
- 16-bit immediate value encoded in the instruction
The exception is that the mov instruction has a store form that has the two operands backwards: the first operand is one of the three memory addressing modes above, and the second operand is a register.
The not instruction works differently: it has a single operand that can be a register or one of the three memory addressing modes, and uses that single operand as both source and destination. This instruction can thus do load-modify-store, unlike the other arithmetic instructions.
The jump and conditional jump instructions all take a constant 16-bit jump target address. This points to the first byte of the next instruction to execute.
Instruction encoding
00 <invalid> 01 <invalid> 02 <invalid> 03 brk 04 iret 05 halt 06 get 07 put 08 aa aa je aaaa 09 aa aa jne aaaa 0A aa aa jb aaaa 0B aa aa jbe aaaa 0C aa aa ja aaaa 0D aa aa jae aaaa 0E aa aa jmp aaaa 0F <invalid> 10 not AX 11 not BX 12 not CX 13 not DX 14 not [BX] 15 dd dd not [dddd+BX] 16 dd dd not [dddd] 17 <invalid> 18 <invalid> 19 <invalid> 1A <invalid> 1B <invalid> 1C <invalid> 1D <invalid> 1E <invalid> 1F <invalid> 20 or AX, AX 21 or AX, BX 22 or AX, CX 23 or AX, DX 24 or AX, [BX] 25 dd dd or AX, [dddd+BX] 26 dd dd or AX, [dddd] 27 ii ii or AX, iiii 28 or BX, AX 29 or BX, BX 2A or BX, CX 2B or BX, DX 2C or BX, [BX] 2D dd dd or BX, [dddd+BX] 2E dd dd or BX, [dddd] 2F ii ii or BX, iiii 30 or CX, AX 31 or CX, BX 32 or CX, CX 33 or CX, DX 34 or CX, [BX] 35 dd dd or CX, [dddd+BX] 36 dd dd or CX, [dddd] 37 ii ii or CX, iiii 38 or DX, AX 39 or DX, BX 3A or DX, CX 3B or DX, DX 3C or DX, [BX] 3D dd dd or DX, [dddd+BX] 3E dd dd or DX, [dddd] 3F ii ii or DX, iiii 40 and AX, AX 41 and AX, BX 42 and AX, CX 43 and AX, DX 44 and AX, [BX] 45 dd dd and AX, [dddd+BX] 46 dd dd and AX, [dddd] 47 ii ii and AX, iiii 48 and BX, AX 49 and BX, BX 4A and BX, CX 4B and BX, DX 4C and BX, [BX] 4D dd dd and BX, [dddd+BX] 4E dd dd and BX, [dddd] 4F ii ii and BX, iiii 50 and CX, AX 51 and CX, BX 52 and CX, CX 53 and CX, DX 54 and CX, [BX] 55 dd dd and CX, [dddd+BX] 56 dd dd and CX, [dddd] 57 ii ii and CX, iiii 58 and DX, AX 59 and DX, BX 5A and DX, CX 5B and DX, DX 5C and DX, [BX] 5D dd dd and DX, [dddd+BX] 5E dd dd and DX, [dddd] 5F ii ii and DX, iiii 60 cmp AX, AX 61 cmp AX, BX 62 cmp AX, CX 63 cmp AX, DX 64 cmp AX, [BX] 65 dd dd cmp AX, [dddd+BX] 66 dd dd cmp AX, [dddd] 67 ii ii cmp AX, iiii 68 cmp BX, AX 69 cmp BX, BX 6A cmp BX, CX 6B cmp BX, DX 6C cmp BX, [BX] 6D dd dd cmp BX, [dddd+BX] 6E dd dd cmp BX, [dddd] 6F ii ii cmp BX, iiii 70 cmp CX, AX 71 cmp CX, BX 72 cmp CX, CX 73 cmp CX, DX 74 cmp CX, [BX] 75 dd dd cmp CX, [dddd+BX] 76 dd dd cmp CX, [dddd] 77 ii ii cmp CX, iiii 78 cmp DX, AX 79 cmp DX, BX 7A cmp DX, CX 7B cmp DX, DX 7C cmp DX, [BX] 7D dd dd cmp DX, [dddd+BX] 7E dd dd cmp DX, [dddd] 7F ii ii cmp DX, iiii 80 sub AX, AX 81 sub AX, BX 82 sub AX, CX 83 sub AX, DX 84 sub AX, [BX] 85 dd dd sub AX, [dddd+BX] 86 dd dd sub AX, [dddd] 87 ii ii sub AX, iiii 88 sub BX, AX 89 sub BX, BX 8A sub BX, CX 8B sub BX, DX 8C sub BX, [BX] 8D dd dd sub BX, [dddd+BX] 8E dd dd sub BX, [dddd] 8F ii ii sub BX, iiii 90 sub CX, AX 91 sub CX, BX 92 sub CX, CX 93 sub CX, DX 94 sub CX, [BX] 95 dd dd sub CX, [dddd+BX] 96 dd dd sub CX, [dddd] 97 ii ii sub CX, iiii 98 sub DX, AX 99 sub DX, BX 9A sub DX, CX 9B sub DX, DX 9C sub DX, [BX] 9D dd dd sub DX, [dddd+BX] 9E dd dd sub DX, [dddd] 9F ii ii sub DX, iiii A0 add AX, AX A1 add AX, BX A2 add AX, CX A3 add AX, DX A4 add AX, [BX] A5 dd dd add AX, [dddd+BX] A6 dd dd add AX, [dddd] A7 ii ii add AX, iiii A8 add BX, AX A9 add BX, BX AA add BX, CX AB add BX, DX AC add BX, [BX] AD dd dd add BX, [dddd+BX] AE dd dd add BX, [dddd] AF ii ii add BX, iiii B0 add CX, AX B1 add CX, BX B2 add CX, CX B3 add CX, DX B4 add CX, [BX] B5 dd dd add CX, [dddd+BX] B6 dd dd add CX, [dddd] B7 ii ii add CX, iiii B8 add DX, AX B9 add DX, BX BA add DX, CX BB add DX, DX BC add DX, [BX] BD dd dd add DX, [dddd+BX] BE dd dd add DX, [dddd] BF ii ii add DX, iiii C0 mov AX, AX C1 mov AX, BX C2 mov AX, CX C3 mov AX, DX C4 mov AX, [BX] C5 dd dd mov AX, [dddd+BX] C6 dd dd mov AX, [dddd] C7 ii ii mov AX, iiii C8 mov BX, AX C9 mov BX, BX CA mov BX, CX CB mov BX, DX CC mov BX, [BX] CD dd dd mov BX, [dddd+BX] CE dd dd mov BX, [dddd] CF ii ii mov BX, iiii D0 mov CX, AX D1 mov CX, BX D2 mov CX, CX D3 mov CX, DX D4 mov CX, [BX] D5 dd dd mov CX, [dddd+BX] D6 dd dd mov CX, [dddd] D7 ii ii mov CX, iiii D8 mov DX, AX D9 mov DX, BX DA mov DX, CX DB mov DX, DX DC mov DX, [BX] DD dd dd mov DX, [dddd+BX] DE dd dd mov DX, [dddd] DF ii ii mov DX, iiii E0 <invalid> E1 <invalid> E2 <invalid> E3 <invalid> E4 mov [BX], AX E5 dd dd mov [dddd+BX], AX E6 dd dd mov [dddd], AX E7 <invalid> E8 <invalid> E9 <invalid> EA <invalid> EB <invalid> EC mov [BX], BX ED dd dd mov [dddd+BX], BX EE dd dd mov [dddd], BX EF <invalid> F0 <invalid> F1 <invalid> F2 <invalid> F3 <invalid> F4 mov [BX], CX F5 dd dd mov [dddd+BX], CX F6 dd dd mov [dddd], CX F7 <invalid> F8 <invalid> F9 <invalid> FA <invalid> FB <invalid> FC mov [BX], DX FD dd dd mov [dddd+BX], DX FE dd dd mov [dddd], DX FF <invalid>
Unknown details
The description in the book doesn't specify the language completely. The book comes with an interpreter for win16, and it should be possible to find out the missing details by studying that interpreter.
At least the following details of the operation are unknown
- How do instructions other than CMP affect the comparison indicator? (The instruction set has between 0 and 16 NOP instructions, depending on this question.)
- Does the CMP instruction modify its destination register?
- How does the interrupt mechanism work? That is, what happens when an interrupt is triggered, and when the IRET instruction is executed?
- What are the rules for self-modifying instructions? Do you need a taken jump between modifying an instruction and executing it?
- How is the CPU initialized?
- How do you bypass the cache for memory-mapped IO and DMA?
References
- Homepage of The Art of Assembly Language
- The Art of Assembly Language DOS 16-bit edition, chapter 3, PDF (this chapter defines Y86)
- The Art of Assembly Language DOS 16-bit edition, chapter 3.3, HTML (this starts to define Y86)
- The Art of Assembly Language Windows 32-bit edition, chapter 2.5, PDF (this chapter defines Y86)
- The Art of Assembly Language Windows 32-bit edition, chapter 2.5, HTML (this chapter defines Y86)
- Software for The Art of Assembly Language (includes a debugging Y86 interpreter for win16 and examples of Y86 assembly code in the CH03 directory)