XGCC

From Esolang
Jump to navigation Jump to search

Based on General Compute Coprocessor.

Registers

  • Program counter: The address of the next instruction to execute.
  • Data stack: A stack of data values.
  • Return stack: A stack of return records; see below.
  • Environment: A reference to a frame; this frame is called the current environment.

Data types

  • Integer: A 32-bit integer; whether it is treated as signed or unsigned depends on the instructions being used.
  • Pair: An ordered pair of two values (of any type except stop).
  • Closure: A pair of an instruction address and a frame. There are also special closures used with the SAVE instruction, which do not follow this format. There is also a I/O closure, which has its own environment but instead of an instruction address it has an implementation-specific behaviour; this I/O closure is not accessible by any of the instructions mentioned below.
  • Frame: A sequence of zero or more values (of any type except stop), and a link to the parent frame. The link to the parent frame can be zero if it has no parent frame. A frame can be a dum frame which only has a length and parent but no values (reading from or writing to a dum frame is an error, although the parent will still be accessible), or a normal frame which does have values. This is by reference; a value of this type is a reference to the frame.
  • String: A sequence of zero or more 8-bit unsigned integers. This is by reference; a value of this type is a reference to the string.
  • Pipe: A pipe stores a queue of values (with some restrictions), and has two sides, the reading side and writing side, which are separate objects. Some instructions, if given a reading side of a pipe, will instead use the first value in the queue (without removing it) and will wait until it is available if the queue is empty; these instructions are: CEQ, CGT, CGTE, CGTU, CGTEU, ATOM, SEL, TSEL.
  • Stop: A value which can only exist in the data stack and not in a frame or pair. Only the RTN instruction can remove it from the stack; if any other instruction would remove it or inspect it, it is an error (although a few instructions will be able to check for its presence).

Return stack types

  • Join: An instruction address.
  • Return: An instruction address and environment.
  • Stop: Also an instruction address and environment, although it is handled in a special way. Like in the data stack, only the RTN instruction can remove it; if any other instruction would remove it or inspect it, it is an error (although a few instructions will be able to check for its presence). A stop may be a system stop (which cannot be created by any instructions); its effect when accessed by RTN or TRTN is implementation-dependent.

Instructions

Arithmetic/bitwise

All instructions will keep only the low 32-bits of the result; higher bits are discarded.

  • LDC $n ( -- n ) : Push a constant number to the stack.
  • INC ( x -- z ) : Increment.
  • ADD ( x y -- z ) : Addition.
  • SUB ( x y -- z ) : Subtraction.
  • MUL ( x y -- z ) : Multiplication.
  • DIV ( x y -- z ) : Signed division. Division by zero is an error. Rounds down toward negative infinity.
  • DIVU ( x y -- z ) : Unsigned division. Division by zero is an error. Rounds down.
  • MOD ( x y -- z ) : Signed modulo. Result has the same sign as the second operand.
  • MODU ( x y -- z ) : Unsigned modulo.
  • AND ( x y -- z ) : Bitwise AND.
  • OR ( x y -- z ) : Bitwise OR.
  • XOR ( x y -- z ) : Bitwise XOR.
  • XORN ( x y -- z ) : Bitwise XOR NOT.
  • POPC ( x -- z ) : Bit population count.
  • SHL ( x y -- z ) : Shift left. The shift amount is y and is treated as unsigned; if it is 32 or more than all bits are shifted out.
  • SHR ( x y -- z ) : Signed shift right. The shift amount is y and is treated as unsigned; if it is 32 or more than all bits are shifted out.
  • SHRU ( x y -- z ) : Unsigned shift right. The shift amount is y and is treated as unsigned; if it is 32 or more than all bits are shifted out.
  • PEXT ( x y -- z ) : Bit select; same as the ~ operator in INTERCAL.
  • MING ( x y -- z ) : Bit interleave; same as the $ operator in INTERCAL.
  • CEQ ( x y -- boolean ) : Check if equal; if x=y then 1 but otherwise 0. It is an error if either one is a closure or the writing side of a pipe. Otherwise, 0 if they are different types, 1 if they are integers that are equal, if a pair (a:b) and (c:d) then 1 if a=c and b=d but 0 otherwise, if a string then 1 if they are the same length and contain the same sequence of values but 0 otherwise, and if a frame then 1 if they are the same frame or 0 if they are not (regardless of what the values in the frame are).
  • CGT ( x y -- boolean ) : Check if x greater than y (signed); 1 if it is or 0 if it is not. (Only works with integers.)
  • CGTU ( x y -- boolean ) : Check if x greater than y (unsigned); 1 if it is or 0 if it is not. (Only works with integers.)
  • CGTE ( x y -- boolean ) : Check if x greater or equal than y (signed); 1 if it is or 0 if it is not. (Only works with integers.)
  • CGTEU ( x y -- boolean ) : Check if x greater or equal than y (unsigned); 1 if it is or 0 if it is not. (Only works with integers.)

Pairs

  • CONS ( x y -- xy ) : Make a pair.
  • CAR ( xy -- x ) : Read left half of pair.
  • CDR ( xy -- x ) : Read right half of pair.

Frames

(The LEN, GET, PUT instructions for strings also work with frames.)

  • LD $level $index ( -- value ) : Read a value from the environment, where $level is how many parents to go (e.g. 0 means current environment, 1 means parent of environment, 2 means parent of parent of environment, etc), and $index is the zero-based index of the value to read.
  • ST $level $index ( value -- ) : Write a value to the environment.
  • LDA $level $index ( offset -- value ) : Like LD but also add an offset (must be an integer) to the $index.
  • STA $level $index ( offset value -- ) : Like ST but also add an offset (must be an integer) to the $index.
  • ENV ( -- frame ) : Push current environment to stack.
  • USE ( frame -- ) : Set the current environment.
  • PARE ( frame -- parent ) : The parent of a frame.
  • NEW $length ( ... parent -- frame ) : Make a new frame, with the values from the stack and the specified parent frame (which can be zero to mean no parent).
  • DUM $length ( -- ) : Make a dum frame of a specified length, and set the current environment to the new frame. The new frame's parent is the environment that was set before this instruction was executed.
  • NDUM $length ( frame -- frame ) : Make a dum frame of the specified length, with the specified parent frame (zero if no parent), and push the new frame to the data stack.
  • NNDUM ( length frame -- frame ) : Like NDUM but reads the length from the stack instead of from the operand.

Strings

  • LDS $string ( -- string ) : Load a copy of a string from the current program. This instruction is dependent on the file format, and does not quite work like real instructions.
  • STR ( length -- string ) : Make a new string with all values being zero.
  • LEN ( object -- length ) : Tell the length of a string or frame.
  • GET ( object index -- value ) : Read from a string or frame, from a specified zero-based index.
  • PUT ( object index value -- ) : Write to a string or frame, to a specified zero-based index. If a string, then the value must be a integer, and only the low 8-bits will be used.

Stack operations

  • DIS ( x -- )
  • DUP ( x -- x x )
  • OVER ( x y -- x y x )
  • SWAP ( x y -- y x )
  • ROT ( x y z -- y z x )
  • PICK ( ... index -- ... value ) : Pick from stack; 0 means the top (directly below the index). Cannot pick a stop and cannot pick below a stop.

Flow controls

  • SEL $ifnonzero $ifzero ( test -- ) : Push a join record to the return stack. Read a number from stack; go to $ifnonzero if nonzero or to $ifzero if zero.
  • TSEL $ifnonzero $ifzero ( test -- ) : Like SEL but does not affect the return stack.
  • AP $length ( ... closure -- ) : Pushes a return record to the return stack, and then sets the new instruction pointer and environment frame to that of the closure. It will then create a new environment frame with the current one as the parent, as the new current environment frame, and pop $length values from the stack into the new environment frame.
  • RAP $length ( ... closure -- ) : Like AP but the closure's environment must be a dum frame, which must be the same frame as the current frame; instead of adding a new environment frame, it will use the current frame but make it not dum and will add the arguments into that frame.
  • TAP $length ( ... closure -- ) : Like AP but does not affect the return stack.
  • TRAP $length ( ... closure -- ) : Like RAP but does not affect the return stack.
  • SAP $length ( ... closure -- ) : Like AP but pushes a stop record instead of a return record, and pushes a stop into the data stack after popping the arguments.
  • SRAP $length ( ... closure -- ) : Like RAP but pushes a stop record instead of a return record, and pushes a stop into the data stack after popping the arguments.
  • STAP $length ( ... closure -- ) : Like SAP but will discard everything from the data stack up to but not including the next stop, and the same with the return stack; does not add a stop into the return stack.
  • STRAP $length ( ... closure -- ) : Like SRAP but will discard everything from the data stack up to but not including the next stop, and the same with the return stack; does not add a stop into the return stack.
  • JOIN ( -- ) : Pop a join record from the return stack (it is an error if it is not a join record), and go to the instruction address from the join record.
  • RTN ( -- ) : Pop a record from the return stack; it is an error if it is a join record. Restores the environment and instruction pointer from the record that was accessed. If it is a stop record, then also pops one value from data stack; if it is not a stop then discards everything up to and including the next stop, and then pushes 0 if only a stop was popped or otherwise pushes that value and then 1 on top of it.
  • TJOIN ( -- ) : Like JOIN but the join record will be kept in the return stack.
  • TRTN ( -- ) : Like RTN but keeps the record in the return stack unless it is a stop record (in which case it is removed).
  • STOP ( -- ) : Pop records from the return stack untion a stop record is found; the stop record is then handled as for RTN.
  • SAVE $address ( -- closure ) : Makes a new special closure, which stores the current data stack, return stack, next instruction address, and current environment (although does not remember the values of the environment), and then jumps to the specified instruction address. Using AP, RAP, TAP, etc with the new closure will have the usual effect, setting the current environment to the saved environment before doing the other stuff, but using the values from the old stack into the new environment frame if applicable.

Others

  • ATOM ( value -- boolean ) : Check if the value is a number; the result is 1 if it is or 0 if it is not.
  • TYPE ( value -- integer ) : Tells the type of a value: 0=empty stack (or a stop, which will not be removed), 1=integer, 2=pair, 3=closure, 4=frame, 5=string, 6=reading side of pipe, 7=writing side of pipe.
  • LDF $address ( -- closure ) : Make a new closure, with the specified instruction address and the current environment.
  • LDP ( program frame -- closure ) : Load the specified program (in binary format), and make a new closure which points to the loaded program. The new closure's environment is the specified frame.
  • FORG ( number -- ) : Discard a number of records from the return stack, without doing anything with them (the program continues from here). If a stop is reached, does not discard the stop and does not discard anything below it.
  • PIPE ( -- readingside writingside ) : Make a new pipe (with an empty queue), and make the writing and reading sides of the pipe.
  • SEND ( value writingside -- ) : Send a transformed value into the end of the queue of the corresponding pipe. The value must be a frame, integer, pair, string, or the writing side of a pipe. A frame will be copied and the copy will have no parent; if it is not a dum frame then all values it contains are also transformed in the same way and none of them can be frames. A string is also copied. A pair can only contain a integer, writing side of a pipe, or another pair (which itself is restricted in the same way, recursively). If you try to send the reading side of a pipe, then it will wait until there is a value in the queue (if there isn't one already), and remove it from the queue and send it (unlike SEL, TSEL, etc, which do not remove the received value from the queue).
  • RECV ( readingside -- value ) : If the queue is empty, wait until it is not empty before continuing. Removes the first value from the queue and pushes it to the stack. If multiple processes send to the same queue, then the order which they are received is unspecified, although any values sent by one process will be received in the same order that they were sent by that process (values sent by other processes might or might not intervene).
  • ASYNC $address ( value -- ) : Start a new process, which may run asynchronously. The new process starts at the specified instruction address, and its initial environment will be: If the value is a frame, then it is that frame transformed in the same way as SEND. If the value is not a frame, then it is transformed like SEND and then a new frame with only that value, will be the environment.
  • DBUG ( x -- ) : Used for debugging; implementation dependent; normally does nothing (same as DIS).
  • BRK ( -- ) : Used for debugging; implementation dependent; normally does nothing.

Syntax

With both formats, there will always be an implicit STOP instruction at the end of the program. Both formats are considered as a sequence of bytes (not as characters), although the text format is specified as ASCII characters, so the bytes will be the ASCII codes of those characters.

Text format

Consists of a sequence of tokens. Most tokens (except strings) must consist of any printable ASCII character codes except ' " < > \ ; and spaces, but ( ) [ ] are always considered as tokens by themself. Put spaces between tokens (multiple spaces are allowed but redundant; bytes 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x20, are considered spaces), except that ( ) [ ] do not require spaces (but are allowed anyways). A comment begins with ; and ends at the next carriage return or line feed, and is treated as a space.

The following instructions are considered to be terminal instructions: TSEL, TAP, TRAP, STAP, STRAP, JOIN, RTN, TJOIN, TRTN, STOP. Any other instructions are called nonterminal instructions.

An instruction must be written by its name in uppercase as one token, and then the operands (if any).

A numeric token is decimal (with an optional plus or minus sign in the case of the LDC instruction and the second operand of LDA and STA, but other instructions do not allow a sign), or hexadecimal with a preceding $ sign.

A token ending with : in place of an instruction will make the name excluding the final : to refer to the address of the next instruction; this affects any code in the same () block as this appears in, both before and after this, as well as any other () and [] blocks containing it. (If it occurs inside of a [] block, then its scope is the nearest () block containing it, or the entire file if not in a () block.) This is called a "label name".

A token beginning with % in place of an instruction will make the name excluding the initial % usable in place of the operands of some instructions, with the same scope rules as for above. This is called a "variable name". A number will be assigned according to the () block it is in, consecutively starting from zero for each () block; the % can optionally be preceded by a decimal or hexadecimal number in which case that is how many numbers are skipped (1 being the default, and 0 meaning the next variable name has the same index as this one).

The operand of any instruction that expects an instruction address can be one of:

  • A numeric token, where 0 means the first instruction of the () or [] block it is in, or the first instruction in the file if not in any block; 1 means the next instruction after instruction 0, 2 means the next instruction after instruction 1, etc. Only instructions are counted for this purpose; operands, definitions of names, etc do not count.
  • A label name, to refer to the instruction following its definition.
  • = meaning the address of this instruction.
  • # meaning the address of the next instruction.
  • A list of instructions inside of a () or [] block. This represents the address of the first instruction in that block, appended after the end of the file (after the implicit STOP). If the last instruction inside of such a block is not a terminal instruction, then an instruction is automatically added to the end: RTN for a () block and JOIN for a [] block.

The operands of LD, ST, LDA, and STA, can be one of:

  • Two numeric tokens.
  • A variable name. The level is how many () blocks you must go out of to find the definition.
  • A numeric token (the level) and then a variable name. The level of that variable name is added to the explicit level number given.

Any other operands (except LDS) must be a numeric token.

For the LDC instruction, the word "LDC" may be omitted. This is also true of LDF if the operand is a () block.

Binary format

All instruction addresses in the binary format are relative; this allows the possibility to prepend code to an existing program.

(TODO)

Examples

Copy strings

PIPE ROT SWAP SEND RECV

Truth machine

This assumes that the initial environment has pipes to send/receive the numeric data.

LD 0 0 RECV
x: DUP LD 0 1 SEND
DUP TSEL x #