Nellephant

From Esolang
Jump to navigation Jump to search

Nellephant is an esoteric programming language created by User:ais523 in 2019. The intention behind the language is that programming in Nellephant will give you an understanding of how the wikipedia:NL (complexity) class operates; it's not intended to be particularly difficult to program in beyond the restrictions implied by NL, with most of the operations that are possible having simple implementations, but the language is nonetheless weird to program in (mostly, but not entirely, because NL itself is weird).

Specification

The input array

Before a Nellephant program runs, it takes input from the environment. The input is a finite list of unbounded nonnegative integers (implementations must allow these to be input directly as integers, but may offer alternative modes such as, e.g., reading a file and treating its bytes integers, thus the file itself as a list). The implementation will pad the input with trailing 0s until its length becomes a power of 2 (note: 0 is not a power of 2, so the input will be padded to at least a length of 1); then format each integer in the resulting input in binary (big-endian), padding them with leading zero bits so that they're all the same length, and that length is also a power of 2; and concatenating all the resulting strings of bits into a single "input array". (The length of the input array will be a power of 2.) Then, a number of zero bits equal to the length of the input array is appended to the input array (doubling it in length); these additional, appended zeroes are known as the Shadow Zone.

The input array is now fixed for the lifetime of the program and cannot be modified at all. As such, it is shared between all threads.

Threads and pointers

An executing Nellephant program has one or more threads (which all run in parallel, with no requirement on the implementation as to how they are scheduled relative to each other except that wikipedia:Starvation (computer science) should be impossible; note that using OS threads for them is not recommended as the number of threads that could potentially be required is huge, so implementations probably want to implement their own threading system). Each thread has its own set of pointers; pointers always point somewhere within the input array (including the Shadow Zone). The only pointers that exist are those mentioned in the program, but there is no other limit on how many pointers can exist (thus, each individual Nellephant program uses a finite number of pointers). Pointers are named using numbers, and their initial positions (at the start of the original thread) are:

  • Pointer 0: leftmost bit of input array
  • Pointer 1: second-leftmost bit of input array
  • Pointer 2: bit after rightmost bit of the first element of the input array (i.e. this indicates how many bits each of the integers were padded to)
  • Pointer 3: bit after rightmost bit of the last element of the input array (i.e. this indicates how many integers there were, before the list was padded)
  • Pointer 4: leftmost bit of the Shadow Zone
  • Pointer 5: rightmost bit of the Shadow Zone
  • Other pointers: leftmost bit of input array

One thread, the "original thread", is created at the start of program execution (and starts running commands from the start of the program). Other threads are created only in response to another thread crashing/failing (these terms are interchangeable), at handle instructions; each handle instruction references another instruction, and if an instruction referenced by a handle instruction crashes, a new thread is created, with execution starting at that handle instruction, and with pointers copied from the state of the crashed thread just before the offending instruction was run. (Note that although every thread creation involves a thread crash, this is not 1-to-1; it is possible for multiple instructions to handle the same crash, meaning that multiple threads will be created as a consequence.)

Each thread also accumulates an output sequence, a sequence of bits (this is inherited on the crash handling thread after a crash is handled, just like the pointers are). If execution of a thread "falls off" the end of the program (i.e. the last instruction in the program ran without crashing), its output sequence is output, and then the entire program exits (not just the thread, everything exits at once, with all the other threads just disappearing). The output sequence is interpreted as a list of integers by looking at the thread's pointers 2 in effectively the reverse of the input encoding (i.e. the output sequence is split into chunks of bits whose lengths are determined by looking at the position of pointer 2 is from the leftmost end of the input array, then each of those chunks is interpreted as an integer – or whatever type of object was used for the input – and output). As an exception, if pointer 2 is at the leftmost end of the input array, the entire output is interpreted as a single integer, even if the output sequence is longer than the input array even including the Shadow Zone.

Instructions

After preprocessing, the syntax of an instruction is a keyword that specifies the instruction, followed by one or more arguments as numbers (either binary numbers preceded by ', hexadecimal numbers preceded by $, or decimal numbers with no prefix). Each instruction is written on a line of its own. (Blank lines are also allowed, and do nothing; the preprocessor uses these to mark places where comments were removed.) Keywords can be abbreviated to their first letter if desired.

Threads cannot interact with each other once created; any mentions of pointers are talking about the appropriate pointer of the running thread.

handle linenum
Handles crashes in the instruction written on line linenum (where the first line of the program is 1, the second is 2, and so on). This does nothing when run directly, but is used as an entry point for crash-handling threads (i.e. all threads apart from the original thread). You can think of this as a cross between an exception handler and a COME FROM statement.
attract pointer pointer
Moves the second of the mentioned pointers halfway towards the first (e.g. if they're currently a distance of 12 apart, attracting will move the second pointer to a distance of 6 from the first pointer in the same direction). If the pointers are an odd distance apart, the remaining distance is rounded down (i.e. the moved distance is rounded up). As an exception, if the pointers are in the same place, the thread crashes.
repel pointer pointer
The opposite of attract; the second of the mentioned pointers is moved away from the first, by an amount equal to the existing distance between them (e.g. if they're currently a distance of 12 apart, attracting will move the second pointer to a distance of 24 from the first pointer in the same direction). Pointers cannot point outside the input string; if an attempt is made to repel a pointer past either end of the input string, the thread crashes. (However, pointers can point to the Shadow Zone, which is part of the input string, so repelling a pointer there is legal.) Unlike with attract, repelling a pointer from another pointer in the same location is legal (and does nothing).
query pointer
If the given pointer is pointing to a 1 bit, does nothing. If the given pointer is pointing to a 0 bit, crashes.
output number
Appends the bits of the given number to this thread's output string (in big-endian, as normal). The number must be given in binary or hexadecimal, and the number of bits appended will be the number of bits in the number as written (thus, output $4 appends the bits 0100, and output $04 appends the bits 00000100). Remember that these will not actually be output until the program ends, and only if the current thread (or a crash-handling thread for it, including indirectly) ends up being the one that makes it all the way to the end of the program without a crash.

The preprocessor

Nellephant has a preprocessor, that generates the above (fairly hard to write) command sequence from somewhat higher-level code. It has four main features:

Comments
A comment, in Nellephant, goes from # to the end of a line. The preprocessor will remove comments (not including the trailing newline), plus any leading and trailing horizontal whitespace. As such, lines with only comments become blank lines, and # comments can also be placed after commands.
Labels
A label is written as a colon followed by an arbitrary sequence of letters and digits, case-sensitive. The preprocessor will replace each label with a number, with different labels being changed to different numbers; the numbers that are chosen for labels are ones that are not used elsewhere in the program (e.g. as pointer names). This means that pointers can be given descriptive names to make reading a program easier.
Line labels
Writing a label at the start of a line (followed by horizontal whitespace) is a special case: the label is deleted from that location, but the value given to the label is the line number of the line it was written on. So, for example, a command handle :x1 would be referencing a command :x1 attract 0 0 (and if, e.g., the latter command was on line 15, it would preprocess to attract 0 0, and the former command to handle 15). This can force a label to have the same value as a pointer written literally into the program (presumably, this would normally not matter as a line label would typically not also be used to name a pointer, but could be a handy feature for obfuscated Nellephant programming).
Macros
The most complicated preprocessor feature. A macro can be defined using an identifier (i.e. an arbitrary sequence of letters and digits; Nellephant has no "starting with a letter" rule), followed by a sequence of commands within braces (the opening brace is on the same line as the identifier, and the closing brace on a line of its own, because the commands inside the braces each have to be on a line of their own). The macro definition itself is deleted from the program by the processor. However, the macro name effectively becomes a Nellephant command that can be used elsewhere in the program; when it appears elsewhere in the program, it's replaced by the entire sequence of commands that appeared in the macro definition (macro expansion).
When a macro is expanded, this changes the line numbers in the rest of the program. The preprocessor will thus alter references to lines that moved (i.e. lines after the macro definition) so that they continue to reference the same line. Likewise, if the macro referenced any lines within itself, after expansion these references will be changed to reference the corresponding line within the expansion.
Macros can take arguments; if a macro expansion has numbers on the line after the macro name, any sequence %1 in the macro definition will expand to the first such number in the resulting expansion, %2 will expand to the second such number, and so on. (Labels can be used in place of numbers, because they also expand to numbers.)
Macros can be nested (i.e. a macro expansion can appear within a macro definition), and multiple nesting levels are allowed. However, macro recursion is not allowed (a macro expansion cannot appear, even directly, within the definition of that macro), as it would cause the program to become infinitely long.

Computational class

When seen as a computational class, NL programs are those with unrestricted control flow, the ability to make nondeterministic choices, and ability to store data as a fixed number of pointers to a bit of an input, a fixed number of counters whose maximum value is proportional the length of the input, and/or a number of individual bits proportional to the logarithm of the length of the input.

Nellephant can implement all programs in NL. To get the unrestricted control flow, you can implement a goto using two pointers that never move from position 0, attracting one to the other, and handling the resulting failure. Nondeterministic choices can be implemented using a "double goto", with two handlers for the same failure; this effectively works like the double-COME FROM in Threaded INTERCAL. As for data storage, all three types of storage can be implemented using a pair of pointers. For pointers to the input and counters, a pair of Nellephant pointers maintained pointing to adjacent locations can be moved left and right (via repel+attract), read from the input (query), and tested for zero (attempt to move left, this will fail if at the leftmost end). Bits can be implemented using the binary digits of two counters (treating them like a pair of stacks, and making a tape).

Meanwhile, it's clear that Nellephant cannot use any more memory per thread (other than that used to hold the input, which is not writable, and that used to hold the output, which is not readable) than a linear function of the logarithm of the length of the input (as it can only store data in a fixed number of pointers, each of which can only point to the input or the Shadow Zone, a number of possibilities linear in the length of the input). Additionally, the threads cannot interact with each other. Thus, Nellephant falls within the NL complexity class, and all programs within NL can be compiled to it, and thus it is NL-complete.