PointerB

Pointer B is an esolang created by Ilari in 2010. It is a stack-based one-codepoint per operation language.

Valid unicode codepoints
A unicode codepoint is considered valid if it passes these checks:
 * It's not in range 0xD800-0xDFFF (surrogates)
 * Its plane number (floor(codepoint/65536)) is at most 16
 * Its bits 1-15 have at least one zero bit (since there are only 65534 codepoints per plane).

Code memory
Code memory is an array of unicode codepoints. Initially the program is loaded into code memory codepoint by codepoint (interpreting the file as UTF-8) resulting in a code memory of exactly the smallest size required to hold the loaded program. Even LF codepoints are loaded like anything else.

All references to code memory are relative to the IP and signed (The IP is regarded to be after the currently executing instruction.) Unless the instruction has some special effect on control flow, the IP increments by one before each instruction.

Words
Words are N bits long (N is at least 21 since words must contain Unicode codepoints as a subset). When words are interpreted as signed they are in 2's complement form. All bit patterns of N bits are valid words.

Data memory
The program has access to a data memory of 2^N words. Data memory is initialized to random values on program startup (In practice, N is so large that this has to be done as-if.) Reads of data memory locations are consistent, even if the location has not been written to before (and thus has the random initial value).

Stack
The final storage mechanism is the stack. Stack elements are not words but pairs. The first element in a pair is always a word. The second may be a word or the special value NAA (which does not equal any valid word). The first element is called the value and the second is called the address. Usually one of those elements is discarded when the stack is popped ("pop value" or "pop address"). The notation for a stack element is (x,y), where x is the value part and y is the address part.

Extensions
Extensions can define additional instructions. Each extension has a number and some instructions. Each instruction in an extension has a number.

Extension 0 is special, as it refers to built-in instructions. On startup, built-in instructions are mapped with fixed instruction numbers. No instructions from other extensions are mapped (unless explicitly done). Note that extension 0 instructions do not reflect what gets mapped and unmapped by 'c' and 'd' instructions.

Loading errors
The following errors prevent program loading:
 * Empty source code file
 * Invalid UTF-8 in source code file
 * Invalid codepoint in source code file

Runtime errors
The following errors cause the program to abort (unless specifically noted that some of these do not cause abort for that instruction):
 * Unmapped instruction executed.
 * Attempt to pop empty stack.
 * Attempt to transfer control flow outside program (via jump, falling off or searching jumps).
 * Attempt to access code memory outside code memory bounds.
 * Attempt to store invalid unicode codepoint to code memory.
 * Attempt to reference codepoint or instruction outside valid Unicode codepoint space.
 * Attempt to load nonexistent extension.
 * Attempt to read, write or load address of NAA.
 * Attempt to reference nonexistent instruction in extension.
 * Attempted division by zero.
 * Attempt to print invalid codepoint.

Instructions
The instructions are referred by their default codepoints:
 * #: Search next LF in code memory and jump to instruction after it.
 * 0: Push (0, NAA) to stack
 * 1: Push (1, NAA) to stack
 * 2: Pop value x from stack, read data address x yielding y and push (y, x). Note that this can't generate an invalid address because the value of the stack element can't be NAA.
 * 3: Pop address x from stack, pop value y from stack. Write y to data memory at address x.
 * 4: Pop value x from stack, and load y from the corresponding address in code memory. Push (y, NAA)
 * 5: Pop value x from stack, pop value y from stack. Store y to code memory address corresponding to x.
 * 6: Pop value x from stack, extend code memory by 1 codepoint and write x to the newly created code memory location. Note: Can't trigger access outside code memory bounds error.
 * 7: Pop value x from stack. If x is signed-negative, push (-1, NAA). If x is zero, push (0, NAA). If x is signed-positive, push (1, NAA).
 * 8: Pop value x from stack, pop value y from stack. Push ((x+y) mod 2^N, NAA).
 * 9: Pop value x from stack, pop value y from stack. Push ((x-y) mod 2^N, NAA).
 * A: Pop value x from stack, pop value y from stack. Push ((x*y) mod 2^N, NAA). The multiplication is signed.
 * B: Pop value x from stack, pop value y from stack. Push (x/y, NAA). The quotient is signed. The quotient is done such that the remainder is always nonnegative.
 * C: Pop value x from stack, pop value y from stack. Push (x%y, NAA). The division is signed. The remainder output is always positive.
 * D: Pop value x from stack. Push (-x mod 2^N, NAA).
 * E: Pop value x from stack, pop value y from stack. Push (1, NAA) if x < y, otherwise push (0, NAA).
 * F: Pop value x from stack, pop value y from stack. Push (1, NAA) if x <= y, otherwise push (0, NAA).
 * G: Pop value x from stack, pop value y from stack. Push (1, NAA) if x == y, otherwise push (0, NAA).
 * H: Pop value x from stack, pop value y from stack. Push (1, NAA) if x != y, otherwise push (0, NAA).
 * I: Pop value x from stack, pop value y from stack. Push (1, NAA) if x >= y, otherwise push (0, NAA).
 * J: Pop value x from stack, pop value y from stack. Push (1, NAA) if x > y, otherwise push (0, NAA).
 * K: Pop value x from stack. Push (bitwise-not x, NAA).
 * L: Pop value x from stack, pop value y from stack. Push (x bitwise-and y, NAA).
 * M: Pop value x from stack, pop value y from stack. Push (x bitwise-or y, NAA).
 * N: Pop value x from stack, pop value y from stack. Push (x bitwise-xor y, NAA).
 * O: Pop value x from stack. Jump to code memory location x.
 * P: Pop value x from stack. Quit program with return status x.
 * Q: Pop value x from stack, pop value y from stack. Push ((x*y) mod 2^N, NAA). The multiplication is unsigned.
 * R: Pop value x from stack, pop value y from stack. Push (x/y, NAA). The quotient is unsigned. The quotient is done such that the remainder is always nonnegative.
 * S: Pop value x from stack, pop value y from stack. Push (x%y, NAA). The division is unsigned. The remainder output is always positive.
 * T: Pop address x from stack. If x is not NAA, push (1, NAA). Otherwise push (0, NAA).
 * U: Pop address x from stack. If x is not NAA, push (0, NAA). Otherwise push (1, NAA).
 * V: Pop address x from stack. Push (x, NAA) (triggers error if x is NAA).
 * W: Pop value x and print x as a codepoint to stdout.
 * X: Read codepoint x from stdin and push (x, NAA).
 * Y: Pop value x and print the low 8 bits of x as a byte to stdout.
 * Z: Push (0, NAA) or (1, NAA) randomly.
 * a: Pop value x and print x as a codepoint to stderr.
 * b: Pop value x and print the low 8 bits of x as a byte to stderr.
 * c: Pop value x from stack, pop value y from stack, pop value z from stack. Load extension x. Look up instruction y. Then map that instruction as codepoint z. If there already was an instruction at codepoint z, it is overwritten (the definition in the extension it came from won't change).
 * d: Pop value x from stack. Unmap the instruction at codepoint x.
 * e: Pop (x,y) from stack. Push (x,y) to the stack twice (duplicate the topmost stack element).
 * f: Pop value x from stack. Try loading extension x. If it loads, push (1, NAA), otherwise push (0,NAA). Does not generate bad extension error.
 * g: Pop value x from stack, pop value y from stack. Load extension x. If it contains instruction y, push (1, NAA), otherwise push (0,NAA). Can generate bad extension error but not bad instruction error.
 * h: Pop value x from stack. If some instruction is mapped to codepoint x, push (1, NAA), otherwise push (0,NAA). Will not generate bad instruction exception.
 * i: If stack is empty, push (1, NAA). Otherwise push (0, NAA). Does not generate popping empty stack error.
 * j: Pop value x. Load extension x and enumerate the instructions in x by pushing the instruction numbers in increasing order.

Computional class
Pointer B is very likely turing complete since code memory can be used as unbounded storage (although the accessible window is limited, code can be copied around and more memory accessed from the new location).

Hello, World
118e41OHWe41OeWe41OlWe41OlWe41OoWe41O,# We41O We41OWWe41OoWe41OrWe41OlWe41OdWe41O!W41O W0P

Cat program
Xe180He8O0PW118eeeQQQ118Q1O 08DO

External resources
Reference implementation ('pointerb/*' branches).