Y86

From Esolang
Jump to navigation Jump to search

Y86 is a toy RISC CPU instruction set for education purpose. It was invented before 1996 as a companion for the book The Art of Assembly Language to illustrate the basic principles of how a CPU works and how you can write programs for it.

In older editions of the book, Y86 was called x86. Apparently the architecture we now know as x86 wasn't called that back then, the book instead calls that architecture 80x86. Later editions of the book mostly call this language Y86, but there are a few places in the text where they forgot to replace the name.

The language is supposedly implemented by four hypothetical CPUs with different performance characteristics, called 886, 8286, 8486, 8686. The book defines execution times of the instructions (measured in clock cycles) for the 886, and some information for how much pipelining the other CPUs do.

Architecture

Y86 accesses a single memory of bytes with a 16-bit address space. The CPU is little-endian.

The Y86 registers include

  • an instruction pointer,
  • a comparison indicator (arithmetic status flags, condition code) whose state can be one of above, equal, below
  • and four 16-bit general purpose registers called AX, BX, CX, DX,
  • and possibly some save registers for supporting interrupt handling routines of whose workings I do not know.

The instructions for Y86 are one, two, or three bytes long. The instructions are ran sequentially from lower to higher address, except when a jump instruction, interrupt, or return from interrupt is ran.

Instruction set

or reg, reg/mem/imm
computes bitwise or
and reg, reg/mem/imm
computes bitwise and
cmp reg, reg/mem/imm
compare unsigned integers
sub reg, reg/mem/imm
subtracts from register
add reg, reg/mem/imm
adds to a register
mov reg, reg/mem/imm
loads to a register
mov mem, reg
stores from a register to memory
not reg/mem
computes bitwise complement
je addr16
jump if equal
jne addr16
jump if not equal
jb addr16
jump if below
jbe addr16
jump if below or equal
ja addr16
jump if above
jae addr16
jump if above or equal
jmp addr16
jump unconditionally
brk
pause program execution until continued from debug console
iret
return from interrupt service routine
halt
end program execution
get
wait for and get input integer from debug console to AX
put
put AX as output to debug console

The move and arithmetic instructions work only with 16-bit integer values. The sub instruction subtracts the second (source) operand from the first (destination) register and writes the result to the first (destination) register.

If the cmp instruction finds that its first (destination) operand is unsigned greater than the second, then it sets the comparison indicator to above, thus the ja, jae, jne instructions will take the branch. If the second operand is unsigned greater than the first, then it sets the comparison indicator to below, thus the jb, jbe, jne instructions will take the branch. If the two operands are equal, then the comparison indicator is set to equal, thus the jae, jbe, je instructions will take the branch.

(This description uses the assembly syntax in the older version of the book, where the destination operand is written first. The newer version writes the two operands of instructions swapped.)

Addressing modes

Ordinary instructions (or, and, cmp, sub, add, mov) have two operands. The first operand is used as both source and destination, and must be a register. The second operand can be a register, one of the three memory addressing modes listed below, or an immediate. Thus, ordinary instructions can load from memory but can't store there.

The second operand of ordinary instructions can have eight forms:

AX, BX, CX, DX
one of the registers
[BX]
two adjacent bytes at memory address given by the BX register
[disp16+BX]
two adjacent bytes at memory address given by the sum of the BX register and a 16-bit offset encoded in the instruction
[addr16]
two adjacent bytes at 16-bit direct address encoded in the instruction
imm16
16-bit immediate value encoded in the instruction

The exception is that the mov instruction has a store form that has the two operands backwards: the first operand is one of the three memory addressing modes above, and the second operand is a register.

The not instruction works differently: it has a single operand that can be a register or one of the three memory addressing modes, and uses that single operand as both source and destination. This instruction can thus do load-modify-store, unlike the other arithmetic instructions.

The jump and conditional jump instructions all take a constant 16-bit jump target address. This points to the first byte of the next instruction to execute.

Instruction encoding

00         <invalid>
01         <invalid>
02         <invalid>
03         brk
04         iret
05         halt
06         get
07         put
08 aa aa   je aaaa
09 aa aa   jne aaaa
0A aa aa   jb aaaa
0B aa aa   jbe aaaa
0C aa aa   ja aaaa
0D aa aa   jae aaaa
0E aa aa   jmp aaaa
0F         <invalid>
10         not AX
11         not BX
12         not CX
13         not DX
14         not [BX]
15 dd dd   not [dddd+BX]
16 dd dd   not [dddd]
17         <invalid>
18         <invalid>
19         <invalid>
1A         <invalid>
1B         <invalid>
1C         <invalid>
1D         <invalid>
1E         <invalid>
1F         <invalid>
20         or  AX, AX
21         or  AX, BX
22         or  AX, CX
23         or  AX, DX
24         or  AX, [BX]
25 dd dd   or  AX, [dddd+BX]
26 dd dd   or  AX, [dddd]
27 ii ii   or  AX, iiii
28         or  BX, AX
29         or  BX, BX
2A         or  BX, CX
2B         or  BX, DX
2C         or  BX, [BX]
2D dd dd   or  BX, [dddd+BX]
2E dd dd   or  BX, [dddd]
2F ii ii   or  BX, iiii
30         or  CX, AX
31         or  CX, BX
32         or  CX, CX
33         or  CX, DX
34         or  CX, [BX]
35 dd dd   or  CX, [dddd+BX]
36 dd dd   or  CX, [dddd]
37 ii ii   or  CX, iiii
38         or  DX, AX
39         or  DX, BX
3A         or  DX, CX
3B         or  DX, DX
3C         or  DX, [BX]
3D dd dd   or  DX, [dddd+BX]
3E dd dd   or  DX, [dddd]
3F ii ii   or  DX, iiii
40         and AX, AX
41         and AX, BX
42         and AX, CX
43         and AX, DX
44         and AX, [BX]
45 dd dd   and AX, [dddd+BX]
46 dd dd   and AX, [dddd]
47 ii ii   and AX, iiii
48         and BX, AX
49         and BX, BX
4A         and BX, CX
4B         and BX, DX
4C         and BX, [BX]
4D dd dd   and BX, [dddd+BX]
4E dd dd   and BX, [dddd]
4F ii ii   and BX, iiii
50         and CX, AX
51         and CX, BX
52         and CX, CX
53         and CX, DX
54         and CX, [BX]
55 dd dd   and CX, [dddd+BX]
56 dd dd   and CX, [dddd]
57 ii ii   and CX, iiii
58         and DX, AX
59         and DX, BX
5A         and DX, CX
5B         and DX, DX
5C         and DX, [BX]
5D dd dd   and DX, [dddd+BX]
5E dd dd   and DX, [dddd]
5F ii ii   and DX, iiii
60         cmp AX, AX
61         cmp AX, BX
62         cmp AX, CX
63         cmp AX, DX
64         cmp AX, [BX]
65 dd dd   cmp AX, [dddd+BX]
66 dd dd   cmp AX, [dddd]
67 ii ii   cmp AX, iiii
68         cmp BX, AX
69         cmp BX, BX
6A         cmp BX, CX
6B         cmp BX, DX
6C         cmp BX, [BX]
6D dd dd   cmp BX, [dddd+BX]
6E dd dd   cmp BX, [dddd]
6F ii ii   cmp BX, iiii
70         cmp CX, AX
71         cmp CX, BX
72         cmp CX, CX
73         cmp CX, DX
74         cmp CX, [BX]
75 dd dd   cmp CX, [dddd+BX]
76 dd dd   cmp CX, [dddd]
77 ii ii   cmp CX, iiii
78         cmp DX, AX
79         cmp DX, BX
7A         cmp DX, CX
7B         cmp DX, DX
7C         cmp DX, [BX]
7D dd dd   cmp DX, [dddd+BX]
7E dd dd   cmp DX, [dddd]
7F ii ii   cmp DX, iiii
80         sub AX, AX
81         sub AX, BX
82         sub AX, CX
83         sub AX, DX
84         sub AX, [BX]
85 dd dd   sub AX, [dddd+BX]
86 dd dd   sub AX, [dddd]
87 ii ii   sub AX, iiii
88         sub BX, AX
89         sub BX, BX
8A         sub BX, CX
8B         sub BX, DX
8C         sub BX, [BX]
8D dd dd   sub BX, [dddd+BX]
8E dd dd   sub BX, [dddd]
8F ii ii   sub BX, iiii
90         sub CX, AX
91         sub CX, BX
92         sub CX, CX
93         sub CX, DX
94         sub CX, [BX]
95 dd dd   sub CX, [dddd+BX]
96 dd dd   sub CX, [dddd]
97 ii ii   sub CX, iiii
98         sub DX, AX
99         sub DX, BX
9A         sub DX, CX
9B         sub DX, DX
9C         sub DX, [BX]
9D dd dd   sub DX, [dddd+BX]
9E dd dd   sub DX, [dddd]
9F ii ii   sub DX, iiii
A0         add AX, AX
A1         add AX, BX
A2         add AX, CX
A3         add AX, DX
A4         add AX, [BX]
A5 dd dd   add AX, [dddd+BX]
A6 dd dd   add AX, [dddd]
A7 ii ii   add AX, iiii
A8         add BX, AX
A9         add BX, BX
AA         add BX, CX
AB         add BX, DX
AC         add BX, [BX]
AD dd dd   add BX, [dddd+BX]
AE dd dd   add BX, [dddd]
AF ii ii   add BX, iiii
B0         add CX, AX
B1         add CX, BX
B2         add CX, CX
B3         add CX, DX
B4         add CX, [BX]
B5 dd dd   add CX, [dddd+BX]
B6 dd dd   add CX, [dddd]
B7 ii ii   add CX, iiii
B8         add DX, AX
B9         add DX, BX
BA         add DX, CX
BB         add DX, DX
BC         add DX, [BX]
BD dd dd   add DX, [dddd+BX]
BE dd dd   add DX, [dddd]
BF ii ii   add DX, iiii
C0         mov AX, AX
C1         mov AX, BX
C2         mov AX, CX
C3         mov AX, DX
C4         mov AX, [BX]
C5 dd dd   mov AX, [dddd+BX]
C6 dd dd   mov AX, [dddd]
C7 ii ii   mov AX, iiii
C8         mov BX, AX
C9         mov BX, BX
CA         mov BX, CX
CB         mov BX, DX
CC         mov BX, [BX]
CD dd dd   mov BX, [dddd+BX]
CE dd dd   mov BX, [dddd]
CF ii ii   mov BX, iiii
D0         mov CX, AX
D1         mov CX, BX
D2         mov CX, CX
D3         mov CX, DX
D4         mov CX, [BX]
D5 dd dd   mov CX, [dddd+BX]
D6 dd dd   mov CX, [dddd]
D7 ii ii   mov CX, iiii
D8         mov DX, AX
D9         mov DX, BX
DA         mov DX, CX
DB         mov DX, DX
DC         mov DX, [BX]
DD dd dd   mov DX, [dddd+BX]
DE dd dd   mov DX, [dddd]
DF ii ii   mov DX, iiii
E0         <invalid>
E1         <invalid>
E2         <invalid>
E3         <invalid>
E4         mov [BX], AX
E5 dd dd   mov [dddd+BX], AX
E6 dd dd   mov [dddd], AX
E7         <invalid>
E8         <invalid>
E9         <invalid>
EA         <invalid>
EB         <invalid>
EC         mov [BX], BX
ED dd dd   mov [dddd+BX], BX
EE dd dd   mov [dddd], BX
EF         <invalid>
F0         <invalid>
F1         <invalid>
F2         <invalid>
F3         <invalid>
F4         mov [BX], CX
F5 dd dd   mov [dddd+BX], CX
F6 dd dd   mov [dddd], CX
F7         <invalid>
F8         <invalid>
F9         <invalid>
FA         <invalid>
FB         <invalid>
FC         mov [BX], DX
FD dd dd   mov [dddd+BX], DX
FE dd dd   mov [dddd], DX
FF         <invalid>

Unknown details

The description in the book doesn't specify the language completely. The book comes with an interpreter for win16, and it should be possible to find out the missing details by studying that interpreter.

At least the following details of the operation are unknown

  • How do instructions other than CMP affect the comparison indicator? (The instruction set has between 0 and 16 NOP instructions, depending on this question.)
  • Does the CMP instruction modify its destination register?
  • How does the interrupt mechanism work? That is, what happens when an interrupt is triggered, and when the IRET instruction is executed?
  • What are the rules for self-modifying instructions? Do you need a taken jump between modifying an instruction and executing it?
  • How is the CPU initialized?
  • How do you bypass the cache for memory-mapped IO and DMA?

References