User:Ian/Computer architectures

These are some CPU architectures that I find interesting that have unusual or interesting features.

DEC VAX
The VAX (which you can emulate with the SIMH emulator) has what is probably the most orthogonal instruction set ever invented. It has a single set of registers for integer, floating point, addressing, and all other uses. Like the ARM, but unlike most other computers, the stack pointer and program counter are general purpose registers. In some models of VAX, moves to the PC are the same as jumps (in others, they're reserved operands). Because of the VAX's extreme orthogonality, it allows you to use all addressing modes for all operands for all instructions (which is very rare for any computer, even in orthogonal RISCs you can't use more than one immediate). In fact, because of its orthogonality, in VAX microcode, there's only one routine that decodes all of the specifiers and places their value/address into temporary registers. This allows a single add instruction to have as many addressing modes as an x86 or 6502 with 20,000 different add opcodes!

All VAX math and logical instructions are available in 2 and 3-operand formats for all available data types. I think it's also one of the few CISCs (if not the only CISC) that has 3-operand arithmetic. The VAX has both 2 and 3-operand instructions, with the 2-operand forms responsible for crowding up the opcode space and the "equivalent to...but is 1 byte shorter" notes in the manual. Apparently, in 1977, one byte was pretty valuable. I'm wondering if the VAX is actually the origin of 3-operand arithmetic. MIPS, ALPHA, and ARM were all influenced by the VAX and it's ancestor, the PDP-11. The displacement addressing mode was chosen for MIPS because it was the most common mode on the VAX. The ARM and VAX both use BCC/BCS/BVC/BVS mnemonics derived from the PDP-11 (instead of C/NC and V/NV or O/NO). The Dis virtual machine, used for the Inferno operating system is very VAX-like. Inferno is based on Plan 9, which is derived from Unix, although heavily modified. The VAX was the first 32-bit architecture to run Unix, and the Bell Labs C Machine was also very VAX-like, so it makes sense that they would choose a VAX-like virtual machine.

The VAX's method of organizing opcodes is also very logical. A single mnemonic always corresponds to a single one or two-byte opcode, irrespective of the operands (true orthogonality). All conditionals can be reversed by flipping the lower bit of the opcode. All 2 and 3-operand instructions are the same except for the lower bit, and opcode numbers are grouped by data types. 84 is MULB2, 85 is MULB3, 44 is MULF2, 45 is MULF3, 42 is SUBF2, 82 is SUBB2, etc. G-Floating (extended 64-bit) and H-Floating (128-bit) floating point are the same as the standard 32-bit F-Floating and 64-bit D-Floating, respectively, except for the addition of an FD prefix (eg. 42FD is SUBG2, 43FD is SUBG3). For addressing modes that allow deferral (an extra level of indirection), the low bit of the "mode" nybble in each specifier controls whether the operand is deferred.

It can be used as a stack machine by using the -(sp) (push to stack), (sp)+ (pop from stack), and @(sp)+ (pop from stack indirect) addressing modes, which are just autoincrement/decrement on the stack pointer. Ex. addl2 (sp)+,(sp). With the VAX, you can use immediates for all operands (except for using the Short Literal as a destination). Instructions like addl3 #45,#50,#900 and mull3 a[r2],@(sp)+,5(r7)[r8] are allowed on the VAX. It's the only computer that has 6 general operands that can reference up to 12 registers and modify six in one instruction.

This is perfectly valid VAX assembly code: start: incl i^#0 bneq start On most computers, incrementing a literal like this results in an error. On the VAX, it's perfectly legal. It actually modifies the instruction and sets the flags, so it will become an incl i^#1 the next time it runs. But you need i^#0 instead of #0 because #0 makes a short literal specifier (which has no address, sort of like r0 on CPUs with a zero register, but it has a range of 0-63, or a few positive floating-point numbers). Something like tstl (pc)+ can also be assembled, which is like a literal (same specifier), but reads from part of the following instruction. Technically, the spec[r] indexed mode can have an unlimited number of indexes, but Double-Indexed Mode is prohibited by the microcode.

There are only 4 illegal addressing modes: spec[pc] Index with PC (because multiplying the PC by a shift count isn't very useful) spec[rx1][rx2] Double/Multi-Indexed (because it could take really a long time to decode just one specifier) s^#imm[r] Short Literal Indexed (because it has no address) rn[rx] Register Indexed (because it has no address either) But the PC can be the base register in indexing (with the same restrictions as any other base register), which is the recommended way to do position-independent code. Of course the illegal modes can all be assembled, but they will cause reserved operand exceptions.

A few are "undefined" (work on some VAXen, but not others): -(pc)	PC Autodecrement (because it points to a random part of the previous instruction) pc   PC as destination (probably because of branch prediction or instruction caching) -(rn)[rn] or (rn)+[rn]   Index same as the base, with modify (because the order of modifying and indexing depends on the implementation) r12-r14 (as destination with 64 or 128-bit writes)   SP, AP, or FP (because they would cause PC/r15 to be a destination)

Special registers
r12=Argument Pointer (AP) r13=Frame Pointer (FP) r14=Stack Pointer (SP) r15=Program Counter (PC) Only the PC (just when used as an index or destination) has any special properties for addressing or instruction usage. The other 3 can use any addressing modes. XORing with the stack pointer or using it to hold a floating point number is perfectly fine (unless an interrupt happens).

Some interesting/complex VAX instructions

 * EDITPC converts a number from packed BCD to ASCII using its own mini instruction set (called patterns). It's used extensively in the printf function in VAX BSD for %d formats. Of course, this mini instruction set itself has variable-length instructions!
 * MOVTC: move translated character. Has 6 operands, and uses a table to convert from one encoding to another (basically sed's y/// or the Unix tr command in one instruction). If the destination length is longer than the source length, the fill character (3rd operand) is used.
 * MOVTUC: move translated until character. Has 6 operands, and uses a table like MOVTC, but compares the translated character with an escape character instead of comparing lengths.
 * EDIV: 4 operands. It divides the 64-bit first operand by the 32-bit second operand, and stores the quotient in the 3rd and remainder in the 4th operands. That's two write operands! Both could be immediates, too (though that wouldn't be that useful unless they were accessed as memory operands in another instruction).
 * CASEx has a format like this: selector.rx, base.rx, limit.rx, displ[0].bw, ..., displ[limit].bw The instruction has "limit+3" operands. Not only can the VAX not tell how long the instruction is until it's completely decoded, it can't even tell how many operands it has! The whole thing can be 4GB long. Technically it can be 8GB long, but the VAX only has 4GB of addressing space.
 * DIVP: divrlen.rw, divraddr.ab, divdlen.rw, divdaddr.ab, quolen.rw, quoaddr.ab, {R0-5.wl, -16(SP):-1(SP).wb}! This is the BCD divide instruction, for numbers up to 31 decimal digits plus a sign. It uses 6 operands, 6 registers, and 16 bytes of stack space (the most temporary space out of any VAX instruction). Now that's just crazy.
 * PUSHR and POPR. Like the ARM's LDM and STM, only instead of the register list being part of the instruction, it's an operand specifier. This means it could be immediate, but also on the stack, in a list, in a register, or anywhere else in RAM.
 * EMODH: the longest instruction on the VAX, besides the CASE instructions. As I wrote on Wikipedia, EMODH #5345.1524[r7], @mul_ext_ptr[r0], #3.141592765[r5], @int_table[r1], @frac_table[r2] is 2+18+6+18+6+6, or 56 bytes. The first and third operands consist of a 1-byte specifier, 16-byte H_floating literal, and 1-byte index. The [r5] and [r7] indices are allowed on immediates. The address of the immediate is the base and the given immediate will be used if the index register is 0. The other three operands are 6 bytes each: a 1-byte specifier, 4-byte address, and a 1-byte index. The last 2 operands are also write destinations: the integer portion and fractional portion of the result of the extended multiply.
 * ACBx: limit.rx, add.rx, index.mx, displ.bw. It's an ADDLEQ if the addend is positive, and ADDGEQ if the addend is negative. When the addend is zero it's just a BLEQ. I'm not sure if ACB is Turing Complete because of its change in operation based on sign, but maybe ACB+MNEG (move negated) is Turing Complete.
 * CRC: tbl.ab, inicrc.rl, strlen.rw, stream.ab. Takes a table of coefficients and computes a CRC for up to 64K of data in a single instruction. CRC and EDITPC are so complex that they each make up one of the 12 "major sections" of the VAX instruction set.
 * BBxx: branch on bit. The VAX has instructions for branching based on whether a bit is set or clear, and then optionally clearing or setting the bit regardless of the condition. This is unusual because most non-RISC CPUs with this feature (like the Z80 and 6502) use a separate bit test instruction followed by a conditional branch instead of combining them. Also, the bit is a general specifier, not hardcoded like in the Z80.
 * POLYx: evaluate polynomial. Evaluates a polynomial using a table of coefficients: result = C[0]+x**0 + x*(C[1] + x*(C[2] + ... x*C[d])) and leaves the result in r0-r3. Used for logarithms, trig, square root, and other math functions. It's also an instruction widely cited as an example of the complexity of the VAX.
 * CVTxy: convert. There are instructions available for converting between any of the VAX data types: ASCII to BCD, BCD to integer, integer to floating point, available in all sizes and formats supported by the VAX.
 * ADAWI: add aligned word interlocked. One of only 5 VAX instructions that care about alignment whatsoever (all interlocked).
 * BICx: bit clear. Unlike most computers, the VAX has no logical AND (a&b), just a BIC (a&~b).
 * INSQUE and REMQUE: instructions for inserting and removing elements from a queue (doubly-linked list). Also one of the 3 types of interlocked (multiprocessor-safe) instructions on the VAX (BBxxI and ADAWI are the others).
 * CMPxV, INSV, FFx, and EXTxV: bitfield instructions. The VAX has arithmetic shift, but no logical shift. These bitfield instructions (or a 64-bit arithmetic shift) are used instead. They wrap across into the next register if the field is too big for just one, another odd feature of the VAX.
 * INDEX: subscript.rl, low.rl, high.rl, size.rl, indexin.rl, indexout.wl. Calculates an array index. If subscript is outside of the range, it throws a "subscript range" exception. Indexin is used for multi-dimensional arrays. Commonly cited as an example of how complex instructions can be slower than several simple ones.
 * XFC: extended function call. The user can create their own instruction in microcode and the VAX will decode all the specifiers for them. Some EDITPC patterns are also reserved for users, so it's possible to add a new pattern, which will be treated just like any other pattern, as are some types of specifiers (like PC autodecrement or using the PC as an index register).
 * MOVAx and TSTx: move address, and test. TST is basically a compare with zero that sets the flags on it's source operand. MOVA is used to give the programmer access to the power of addressing modes. Use movaq 5000(r0)[r0],r2 to multiply R0 by 9, add 5000, and store the result in R2. TST is similarly used to increment or decrement a register by a small fixed amount. Use tstl (sp)+ to pop the stack (and set the flags). It's actually more efficient than adding 4 to the SP.
 * CALLG: call with general argument list. Because of confusion by PDP-11 programmers using different methods for calls (saving PC in a register, saving PC on the stack, saving PC in a certain memory location, getting args from the stack, getting args from memory after the call instruction, getting args in specific registers or memory locations, etc.), the VAX has a single calling standard for all operating systems and programming languages, enforced by the hardware. There are two ways to call procedures. In this method, instead of using the stack, arguments are specified in a specific memory location. This makes it easy to implement things like continuations and tail-recursion by keeping an argument list off of the stack where it can be saved for later. Called procedures have no way of knowing (without reading below their own stack frame) whether it was called with CALLG or the normal stack-based CALLS. The argument list can even be copied onto the stack and called with calls (pc)+,procedure where it will be automatically popped.

For a "good" use of the really CISCy VAX instructions, see doprnt.c (actually assembly, not C) from BSD 4.3 (the printf routine). And see libnm for examples of POLY.

Self-replicating MOVC3
movc3 s^#6,b^-4(pc),b^0(pc) This instruction will copy itself forwards through memory, until it hits ROM, I/O space, or other unwritable memory. It's similar to PDP-11's mov -(pc),-(pc) but works forwards in memory instead of backwards. It also alters r0-r5. As far as I know, nobody else has mentioned that the VAX has a self-replicating MOV like this.

Other Neat CPUs
Besides the VAX, there are other CPUs that have interesting, or almost esolang-like features.
 * Hardware objects/malloc/GC: S/38, AS/400 "Future Systems" (128-bit...in 1988!) (check out its instruction set: Materialize User Profile? Free Activation Group-Based Heap Space Storage? Call Program with Variable Length Argument List?), iAPX432, B5000 (stack-based, older than FORTH and inspired it!), hardware Java/.NET/FORTH/LISP processors
 * DSPs: DSP56300 (hardware DO loops), SHARC (hardware COME FROM!!!)
 * Parallel: Transputer, Parallax Propeller, IBM CELL, N64 RCP, SEAForth, typical GPUs
 * Ultra CISC: TRON, NEC V60, z/Architecture, AS/400 "Future Systems" (again), B5000 (again), NS32000, C Machine/Hobbit/CLIPPER, Tahoe (VAX-like CPU, but simpler, second CPU architecture supported by BSD)
 * Really Unusual:
 * PIC (weird bit sizes, "latched" PC, W-register, Harvard architecture with different sized pointers for code and data)
 * NES CIC (LFSR-based PC, meaning the address of the next instruction is determined by what is basically a random number generator!)
 * VAX microcode (every instruction contains a jump, huge VLIW - some variants are 188 bits per instruction!)
 * PDP-8 (12-bit, two main registers - the accumulator and the program counter, one flag - L for "link" (carry), weird names for instructions - TAD for add, DVI for divide, MUY for multiply, JMS for jump to subroutine, DCA for "deposit and clear", the inability to store without also clearing the accumulator, JMS stores return address in the first word of the jump target, 8 memory locations autoincrement and work like stack pointers/counters)

Possible VAX Extensions
These are probably never going to happen (except maybe if I modify an emulator), but I came up with some ideas anyway.
 * Make registers addressable (with an implementation-dependent address) by using MOVAx with a register as a source operand. This would allow code to be run from the registers like in the PDP-11 and would also allow strings and packed BCD to be used directly from registers.
 * Make the PC completely general. Making it possible to do any operation on the PC register would allow any jump to be possible. It's no more dangerous than the JMP and JSR already used by the VAX.
 * Extend the VAX to 64-bit. A quadword instruction can be encoded as a byte prefixed by FD, just like extended floating points are. These would allow extension of up to 256-bit by simply using the FD prefix for byte=quadword, word=octaword, longword=hexadecaword. Like normal VAX instructions, using a smaller bit width consistently ignores the upper part of the register, meaning they can coexist with any other VAX instruction. There can be a bit in the call mask to determine whether 64-bit or more instructions (including current quadword and double-precision floating point) work on 32-bit registers or 64-bit registers.
 * Extend the string length to full bit-width. This includes CRC as well. The VAX uses 16-bit (unsigned) length identifiers for strings, which requires kludges in BSD to do longer memcpy and strlen and other instructions. The extended string size could either be given in instruction name, like MOVCL3/MOVCQ3 instead of MOVC3, or in a PSL bit set by the call mask.
 * Make the call mask bigger. Currently there are 11 free bits in the PSL and 2 free bits in the call mask. One of these bits can be used to specify that the next 16 bits are part of an extended call mask. Another of these bits could control whether an absolute address is 32-bit or 64-bit (with a possibility for more). Another could allow arguments and pushed registers to be 64 bits rather than 32. The CALLx and RET instructions would scan the mask and automatically adjust the PSL and CPU modes. Since every program is called using a call mask, it would be possible to use 64-bit programs from a 32-bit OS and vice-versa without changing a thing (as long as the right argument widths are used).
 * Extend argument limit for functions. A function in the calling standard may only have up to 255 arguments. The CALLS instruction uses the standard argcount portion of the stack frame to specify the number of arguments. Since CALLS takes a longword determining the number of arguments and the upper 24 bits of argcount are reserved, it would be easy to extend this to use all of them (maybe by using a bit in the call mask to determine whether the called procedure supports extended argument counts).
 * Speed up CALLx and POLYx just to anger VAX detractors who don't follow Moore's Law (as in leaving out a barrel shifter or hardware multiplier because it couldn't fit back in the 80's, when the VAX had 64-bit variable arithmetic shift and a full set of 128-bit floating point operations in 1977). Also extend the maximum degree of POLYx from 31 to 65,535 because it already uses a word operand. Since POLYx is a fused multiply-add (it actually uses FMA-like rounding semantics), scientists will be moving to VAX as their platform of choice. ;)
 * Require all VAXen to use a full VAX instruction set. There is enough room on the chip to include MOVTUC and CRC instructions and the entire vector co-processor these days. A full VAX (the V-11, made in 1986) has about 1,183,600 transistors. This is roughly one-thousandth the amount modern processors have. These instructions can also be optimized to fulfill the elegant CISC philosophy of planning ahead for the future.

Almost impossible VAX extensions
Here are some really weird ideas for the VAX that would make it almost esolang-like (and practically impossible to implement on real hardware at the current time).
 * Add an Execute instruction like the S/360 and PDP-10 that executes one instruction at its target and then returns. The target can be a call, in which case the entire target procedure executes, or a jump, in which case it executes one instruction at the jump target and returns. These can be chained indefinitely like in the PDP-10.
 * Add a variadic instruction mode. The VAX already uses variable arguments for CASEx instructions, but those are just bw word branch displacements. Something like CALLX #8,printf,A,B,C,D,E,F,G,H is an example. A CALL in VAX is already defined to be an "extended instruction" that takes a series of arguments and returns a result in R0.
 * Add an Instruction addressing mode. Like Execute, but the idea that an instruction is an operand. The last operand (destination) is not used and instead it returns it as the effective operand of the instruction. Combined with the variadic instructions, it would be something like LISP only in hardware. Ex. MULL3 SUBL3 #4, #5, #2, A would be (set! A (* 2 (- 5 4))) in LISP.
 * Extend the BCD length to 16-bit, including EDITPC. Right now, BCD instructions use 16-bit lengths but only allow lengths of up to 32 digits. This is comparable to 108-bit binary arithmetic. Hardware BCD bignums (with a fast enough speed) would make the VAX ideal for scientific computing. With modern transistor counts compared to the ~1 million transistors used by the VAX, a special BCD engine could be added to the VAX to add many digits per cycle. The VAX Rigel already had a vector unit and IBM's mainframes have a hardware BCD engine already, so it wouldn't be too strange.
 * The VAX stack is already used implicitly by some instructions (like DIVP and CRC) so it could be used to trace Executes and Instruction addressing.