User:Zzo38/Programming languages with unusual features
Here I list various programming languages and VMs and computers and so on with some kind of unusual features (and stuff possibly of interest); if you disagree you might change it if enough agreement to change it. See also: User:Ian/Computer architectures and Prehistory of esoteric programming languages
Apollo Guidance Computer
Some of its features include:
- Like many old computers, it uses ones complement (with signed zero) rather than modern twos complement.
- The registers are exposed on I/O ports.
- The instruction to write the value from the accumulator into RAM also checks for overflow. If there is overflow, it skips the next instruction and then sets the accumulator to the carry amount.
- The conditional branch instruction is unusual: It reads its operand and stores it in the accumulator, skips 0 to 3 instructions based on its value (positive nonzero, positive zero, negative nonzero, or negative zero), and then changes the accumulator one step toward zero.
- Some memory addresses perform special operations when accessed, such as returning from interrupts, shifting the data written to them, etc.
- The INDEX instruction adds its operand to the next instruction; both the operand bits and the opcode bits of the next instruction are affected. This is normally used for indexing, but can also be used to change the next instruction into a different kind of instruction.
- It is possible to call the accumulator as a subroutine.
BANCStar is a programming language which was actually in use, and very strange (see the article on this wiki for some more information, although some of it is wrong). It was designed for financial systems and for computer operators to fill in forms on the screen and to make the report and so on.
Here are some features:
- Each instruction consists of four signed 16-bit numbers, although instead of using bitwise like most instruction sets it is based on multiples of ten, presumably in order to make it easier to read (the compiler does not accept symbolic names or comments at all).
- The "goto page" instruction points to pages, which are absolute; there is no relative addressing or label names. Pages seem to be introduced immediately after a prompt instruction with a negative screen position is encountered.
- Block statements are (presumably) built-in to and executed by the VM, rather than being translated by a compiler at first.
- There are no local variables; only globals.
- There is a limit of 2000 variables in the entire system; string constants must also be stored in these variables, and so must any non-integer you wish to use, or anything you want to display on the screen or print on a form.
- Some of these 2000 variables are special and are used for return address and other things.
- A separate instruction is used to save the return address and to jump to the subroutine, while returning from the subroutine is done by using a "combination GOTO" to the saved return address.
The GOSUB and RETURN commands can be used both at module-level and inside of SUB and FUNCTION routines, although only at module-level is a RETURN command allowed to specify which label or line number to return to (meaning it pop off of the GOSUB stack but will go to the specified label instead of after where it was called from); this can be used to implement something similar to the "DO RESUME #2" of INTERCAL (and I have done this once as it seemed useful in the particular program I was writing).
You can specify types for variables based on what letter their name starts with. For example with "DEFSTR Q-S" then the variables called "QUARTZ" and "SAVEGAME" are string variables.
There is no 8-bit type nor any unsigned types; if you want to store a 8-bit value you must use a single-character string instead.
The dc programming language has the unusual feature that although it does not have arithmetic IF, it would probably be much better if it did; most programming languages work fine with the conditions they already have.
It also allows you to use digits up to F even in bases lower than that, so for example "Adio" will always reset to base ten.
Like INTERCAL, every register is also a stack, and it is possible to exit out of multiple blocks at once by using a computed number (it doesn't have to be a constant). Unlike INTERCAL, arrays are stored in the same register as single values, and are stashed together with them.
The following program prints "google" if the input is 2:
FurryScript is a domain-specific programming language (which may be considered as esoteric). Although the author believes it works well for this purpose but it has a few strange thing compared to other programming language with similar purpose.
Some of these features include:
- No negative integer literals (although negative numbers are possible)
- No scalar variables; only variable containing list of scalars
- Text strings can contain subroutine calls, continuations, and references to list variables
- Picking a random entry from a list is done implicitly when a string references it, so you do not need a command to do this explicitly
- Subtraction, but no addition
- The return value of a subroutine is "OK", "bad", or "very bad"
The collection of operations is unusual; here are a few:
- Discard the top value of the stack with a given probability
- Pop two values from the stack, and discard one more if those two don't match
- A sophisticated dice-rolling function
Some features of Glulx are unusual for assembly language, even if some of them are stuff you might ordinarily find in high-level languages. Some of its features include:
- The amount of bit shifting can be out of range and it will work. The bit shift amount is considered to be unsigned, so if it is not in the range 0 to 31, then all bits will be shifted out. This means that, for example, you can write "ushiftr 1,$,$" to convert 0 to 1 and other numbers to 0.
- There are five addressing modes: ROM, RAM, stack, local variables, and immediate. You can write to an immediate if that immediate is zero; doing so will discard the value to be written. (ROM and RAM are actually in the same address space, although they do have separate addressing modes.)
- There are instructions for linear search, binary search, and linked search. You can use the linear search to implement the strlen function of C.
- Glulx has a single calling convention for all programming languages (including assembly language), I suppose similar to what VAX does. There are two ways to call a subroutine, either with arguments on the stack, or with up to three arguments given as operands to the instruction. Each subroutine itself also specifies in the header how it receives its arguments (and how many local variables it has): either on the stack (which can be retrieved using stack operands or using instructions dealing with the stack), or copied into local variables. There is a built-in tail-call instruction (simply jumping won't work).
- All numbers are big-endian in ROM and RAM, but use the native byte order in the stack. The stack is not addressable, so you cannot take the address of the stack or of local variables, and this byte order usually doesn't matter.
- There is support for accelerated functions. You can tell the VM to replace calls to a function in your program with its own implementation.
- You can make VM saves, both in memory (in another separate address space which your program has no access to) and to I/O streams (which you can access). This includes not only the contents of RAM, but also the stack, program counter, etc. (Note that, unlike with Z-machine, you must open and close the I/O streams yourself.)
- Since the ROM and RAM are in the same address space (and code is stored there), self-modifying code is possible. This makes it possible to implement dynamic linking; there is no built-in support for dynamic linking.
- There are instructions for dealing with bit arrays of any length. Counting starts from the low bit of each byte, and then continues with the next byte and so on, so they are small-endian even though everything else is big-endian.
- Most instructions deal with 32-bit data with any addressing mode. However, the copyb and copys instructions are an exception; they deal with 8-bit and 16-bit data being pointed to by ROM and RAM operands, but 32-bit data for stack operands.
- Some instructions use relative branch offsets. If a relative branch offset is 0 or 1, then instead of branching, it will return from the currently executing subroutine.
See MMIX (has Knuth read this yet?). Other features:
- No operation instruction is not called "NOP" or "NOOP"; they called it "SWYM" instead. ("SWYM" is short for "Sympathize With Your Machinery")
- Forward references are resolved at loading time and not at compile time.
- There is no relocatable code in MMIXAL.
- There are some kinds of exotic stuff such as "sideways add" (a bit population count, but you can omit some bits from the count), "multiple or" (which can be used to convert endianness but can also be use for many other kind of things), "multiple exclusive-or" (similar to multiple or; can be used for hash functions), and others, which can be useful even if exotic.
- The most unusual feature of MMIX is its register stack.
OASYS is an old system for text adventure games. OASYS VM actually has two different programming languages, OAC (the original one) and OAA. The interpreter is OAI, and there is also a disassembler called OAD.
General features and VM features:
- There are no general procedures, only methods that can be called on objects.
- There is no support for arrays whatsoever (although a limited imitation of arrays is possible).
- There are classes of objects, but all classes share the same properties and methods.
- The types of values are integer, string, and object. Integer and string types are actually interchangeable, although variables are not typechecked at runtime.
- The type is used at runtime to determine how to parse the user's input, to save/restore games, and to delete references to objects that have been destroyed.
- Pointer variables cannot be stored in save games (but OAC does not support pointer variables); normally pointer values exist only on the stack.
- The only operation on a string value is to print it (strings are represented as ID numbers into a constant string pool).
- It is a stack machine, although there are no operations like Forth's DUP and DROP and so on. (A conditional branch to the immediately following statement can be a substitute for a DROP though.)
- The standard interpreter crashes if there are no vocabulary words defined or if there are no properties defined.
- Each argument of a method optionally has another method (called a selector) associated with it. If the user's input specifies a class of object as the argument value, the selector will be called on each object of that class, and the first one it returns nonzero for will be the chosen object.
- Each method optionally has a string associated with it, which is displayed if it is being used as a selector and returns zero for all objects of the specified class.
- There are no reserved words; any keyword can be used as the name of a class, variable, method, etc. A variable can even share the name of a method, or the name of a property with the name of a class, or whatever.
- Static type checking is done to distinguish integers from strings even though they are actually interchangeable in the VM.
- Everything must be defined before it is mentioned later in the program.
- There are no delimiters between statements; when a token cannot be part of the current statement it will be a part of the next one.
- No operators are needed for property accessing and method calls, nor are parentheses needed around the argument list or commas in between the arguments; just spaces will do.
- No optimization is done by the compiler. If a string literal occurs more than once, it will compile multiple instances of that string literal into the binary (and there is no way to avoid this either; you can't define a constant). (The included documentation does document this feature.)
- There is no support for include files or for multiple modules linked together.
- If the name of a variable occurs in a phrase, that name will be included in the vocabulary list but the game will not actually understand any uses of that word. (The included documentation does document this feature.)
- Pointer types are not possible at all (the compiler will only generate them temporarily on the stack before a DEREF or ASSIGN instruction; you cannot pass pointers anywhere).
- OAA uses a much more terse (and strange) syntax than OAC, although it does add support for include files and macros, and several other features.
- No type checking is done. A string literal is actually syntactically equivalent to a numeric literal, and is treated exactly like a numeric literal in all cases!
- Names will have a prefix and/or suffix to indicate the types and so on (similar to BASIC and Perl, I suppose).
- The name of a method or property or variable is allowed to be blank (other than the type prefix and/or type suffix) in OAA without causing problems. A method may also be anonymous and have no name at all.
- Unlike in OAC where a class must be defined before it is used, in OAA it is possible to use a class without ever defining it. The same is true of variables and properties. However, methods must be defined (but do not have to be defined before they are used; you can define them in any order).
- Pointer types are possible (and can be passed anywhere), although they should not be used as the types of global variables or types of properties.
This happened to the author of this page once: I wanted to copy the DOS versions of OAC and OAI from one computer onto another one, but I had no disk nor a C compiler on the target computer, so instead I printed it out and completely rewrote OAC and OAI (and later, OAD and OAA) in BASIC. (Note: I did not invent OASYS, and the inventor did not document the VM; they only documented OAC. OAA is my own invention.)
Some of its features:
- There are no reserved words. All keywords (even IF and DO) are not reserved.
- An extremely sophisticated (and unusual for a non-Lisp system) preprocessor.
- Apparently you can use a SQL statement anywhere a PL/I statement is allowed, and you can use UTF-16 but there is no support for UTF-8.
An early UNIX fortune file mentions the following:
- You can allocate an array and then free the middle third.
- You can multiply a character string by a bit string and assign the result to a float decimal.
PostScript is often used for printable documents, although some users (such as myself) use it as a programming language (especially suitable for graphics). It is a stack based programming language, like Forth and others are, although with some unusual features.
Some of its features:
- There are no variables. Values are normally stored in dictionaries in the dictionary stack, and executing a name will look it up in the dictionary stack and execute whatever it refers to (or just push it to the operand stack, if it is not executable).
- Procedures are just arrays, and can be manipulated just like any other array. Executing it involves executing each object it contains in order; if an object is not executable, it is just pushed to the operand stack.
- Whether or not an object is executable is independent of its type. You can use "cvx" and "cvlit" commands to make an object executable or not executable.
- There is global memory and local memory. There are then VM saves, which save the contents of local memory; when a VM save is restored, all local memory is restored to its previous values while the global memory is left untouched.
- One type of object is a mark, normally put in the stack to later make an array or dictionary from everything to the nearest mark, although it is an object like any other and can be stored in arrays, dictionaries, etc.
- Many tokens have a binary representation. You can even mix text and binary tokens in the same program.
Some of its features:
- The program counter is memory-mapped (at address 0).
- The memory is 32768 cells each storing a 16-bit number.
- The only real flow-control instruction is CALL; anything else must be done using other instructions manipulating cell 0 (for example PUT ,,50 jump the program flow to address 50)
- Most instructions do more than one thing, for example MUL is actually a multiply, add, and test if zero, all in one instruction, while AND both does a bitwise AND of two numbers, and compares if the result is equal to a third number.
- It uses bankswitching where each bank of the ROM or disk (the disk has bankswitching too in QUACKVM) has a set size and banks may be of any size.
- There is one built-in instruction for decoding Huffman data. (This instruction has not been used yet so far, as far as I know.)
- All instructions can optionally store the result and can optionally have a conditional branch associated with it. For example, ADD with neither will just end up to do nothing.
TECO is a very unusual text editor. (I have heard that Emacs was originally written in TECO, although now it uses Lisp.)
Some of its features:
- Everything including control-characters are commands.
- It has 36 global and 36 local registers called "Q-registers".
- Each register stores a string *and* a number.
- There is a mode to display a lot of information whenever an error occurs, which is called "War and Peace mode" (I don't know why).
- Normally it won't read past a CTRL+Z, but in "SUPER TECO mode" it does.
- Strings can use any character you want to as a delimiter.
- Search specifications like regular expressions, but with control characters and some other worse stuff.
TeX is a (very good) typesetting system, but when you go beyond such things it can begin to get strange. It is guaranteed to be the same on all computers in all time periods; however some computers might run out of memory with some input files and/or execute them very slowly.
A number consists of individual tokens for each digit. Same thing with words; a dimension measured in points consists of a "p" token and a "t" token.
You can change the meaning of any character in the input, and of any control sequence (a token that begins with a control sequence introducer and then several letters). You can also configure which characters are letters, and make it different in different parts of the file; you can change which characters introduce a comment, which delimit parameters/groups, and what character is automatically inserted at the end of each line of input. And it does support trigraphs; you can change which character introduces a trigraph but it is always the same as the character to indicate a superscript in a math formula.
The stack is also unusual compared to other programming languages. Nearly all registers and macros and so on are global; subroutines do not have their own local variables. However, you can begin/end a group anywhere, and any changes made normally persist only until the end of the group, but you can tell it to persist globally.
For many things it can be useful to take advantage of features which are meant to do something else.
A 6502 assembler. Here are some of its features:
- Nonstandard syntax. Uses square brackets for indirection, and uses a less-than sign to indicate zero-page addressing (rather than being implicit).
- Support for stable unofficial opcodes. The author tends to find this feature useful.
- Support for custom postprocessors and output-routines, written in 6502 assembly language (executed with a slightly modified version of lib6502).
- Normally banks are fixed at 8K and only INCBIN can cross banks, although you can also define multiple banks to have the same name to make them contiguous.
- Macros are text-based, although they can modify their own arguments and "go to" other macros; the only other flow control is "if" blocks.
- Any expression can also ask which pass of the assembler is active (it uses two passes), as well as the number of errors and several other things.
- It is even possible for assembly-language programs to interactively ask the user for input at compile-time.
- Built-in support for NES/Famicom graphics and PC-Engine graphics.
VAX uses a very orthogonal instruction set from what I have read; you can even increment an immediate (this makes perfect sense to me, although perhaps not to everyone).
There is 64K of addressable memory and the rest of it can contain only instructions and text strings (the first 64K also can), and the stack is not stored in this memory space, but registers are (this is the opposite of most CPU architectures).
There are many features Infocom has documented but never used, as well as many optimizations that could be made but weren't.
The ORIGINAL? instruction is supposed to check if the game disk is the original rather than a pirated copy, although it is unclear how the interpreter is possibly supposed to know. Infocom has never used this instruction in any of their games though.
Although Infocom did not use it, the BCOM instruction with an immediate operand would actually be a more efficient way to write a long number (i.e. one that doesn't fit in 8-bits) into a local or global variable (in ZIP and EZIP only, not in XZIP and YZIP). For example, "SET 31,-1" encodes as "CD 4F 1F FF FF" which is 5 bytes long; "BCOM 0 >VAR31" would have the same effect and encode as only three bytes. In XZIP, figuring out the most efficient way may involve the compiler doing prime factorization.
The documentation for the PTSIZE instruction (which is used to determine the length of a property) says that it is "guaranteed to return a meaningless answer if given any other kind of table". (It isn't actually true; it simply looks at the byte preceding the address given and performs a simple calculation on it, although this calculation differs between ZIP and EZIP.)
There are no absolute jumps, although there is absolute function call. Jumps can only be relative, although unconditional jumps can be computed rather than constant.
Although there is no bitwise XOR instruction (and bitwise shifts were not added until XZIP), the similar "D-machine" (which was only ever used for one game!) (also invented by Infocom) does.