Esimpl

From Esolang
Jump to navigation Jump to search

Esimpl is an esoteric programming language created by User:ais523 in 2022. Its intended purpose is as an intermediate representation for compilers and interpreters related to low-level esolangs (and Turing tarpits in particular), aiming to be powerful enough to express the typical low-level esolang easily, whilst sufficiently constrained to be easy to implement in esolangs itself. Efficiency was also a consideration in designing the language: many languages used for proving Turing-completeness, such as tag systems, tend to emulate programs written in other languages very slowly (e.g. the standard proof of Turing-completeness for tag systems incurs an exponential slowdown); Esimpl was designed to incur only a linear slowdown when compiling, e.g., brainfuck or a Turing machine into it.

The main inspirations for Esimpl were Advance The Wheel! and cyclic tag, but the resulting language ended up more like a generalized version of StackFlow. As for its name, it can be seen either as an abbreviation of "esoteric implementation", or as a cyclic anagram of "simple".

Data storage

A running Esimpl program stores its data in a number of semideques (technically an output-restricted deque: see wikipedia:Double-ended queue#Distinctions and sub-types). A semideque is a type of list/array/vector which can only be modified via three basic operations: adding one or more elements at the start; adding one or more elements at the end; or removing one element from the start (and branching based on which element was removed). Both queues and stacks are special cases of semideques (you can specialise a semideque into a queue via removing the "add at start" operation, or into a stack by removing the "add at end" operation). The elements stored in a semideque are non-negative integers; in theory these are unbounded, but because all integers added to a semideque must appear literally in the program, any given program will have some maximum for the elements it uses (and it is fairly easy to rewrite an Esimpl program so that it uses only 0 and 1 as semideque elements). Each program uses a fixed number of semideques, named using numbers ("semideque 0", "semideque 1", and so on); these are consecutive nonnegative integers starting from 0.

In addition to the semideques, an Esimpl program also has an instruction pointer, as usual for imperative programming languages; this points to stanzas as a whole rather than individual commands. The instruction pointer does not change implicitly; each stanza must contain a control flow instruction (usually a pop-goto instruction) that specifies which stanza should run next.

An Esimpl program is defined as the list of stanzas that form it. The first of these stanzas, "stanza 0", is special – it specifies the initial state of every semideque (using its push instructions; it must not contain pushback instructions), and the initial instruction pointer (using its pop-goto instruction; note that this does actually pop from the specified semideque, so you will need to add an extra 0 element to the front of that semideque so that it can be popped). The other stanzas hold the code that makes up the program. The contents of stanza 0 are what specifies which semideques are used by the program.

Commands, stanzas and program structure

The code of an Esimpl program is formed of a number of stanzas. The stanzas are implicitly numbered, with the first being 0, the second being 1, and so on.

Each stanza is formed of two parts:

  1. A possibly empty sequence of "push" and "pushback" instructions, which specify elements that should be added to semideques (at the start and end respectively);
  2. A "pop-goto" instruction (or one of its alternatives: see below), which specifies which semideque should have an element removed from its start, and which determines which stanza should run next; the pop-goto instruction specifies which stanza should run next if a 0 was removed, with larger removals causing later stanzas to be run (specifically, the specified stanza number is added to the removed element, and the resulting number is used as the number for the stanza that should run next).

It is undefined behaviour if a pop-goto instruction attempts to run stanza 0, or to specify a stanza past the end of the program, or to pop from an entirely empty semideque. Additionally, for each stanza, every instruction capable of reaching that stanza must be identical (e.g. it is legal for a stanza to be reachable from two different pop-goto instructions, but those instructions must specify the same semideque and stanza number as each other). This means that the list of stanzas can, in effect, be interpreted as a sequence of "jump tables", each dedicated to a particular semideque (although unlike in StackFlow, it is possible to use more than one jump table with a single semideque).

The instructions within a stanza run as a single group, in no defined order (except that the pop-goto instruction always happens last, which can matter if it targets the same semideque as a push instruction). A stanza is not permitted to contain two push instructions targeting the same semideque, nor two pushback instructions targeting the same semideque, meaning that the unspecified order of instruction execution never ends up mattering. (It is, however, permissible for a single push or pushback instruction to push multiple elements onto the same semideque; the instructions are effectively concatenating lists of elements onto the ends of the semideque, rather than consing single elements there.)

Stanzas can also include I/O and halting instructions. The "halt" and "input-goto" instructions replace the pop-goto instruction; halting halts the program, and an input-goto reads input rather than popping a semideque (the value returned as a consequence of reading the input is added to the stated stanza number and used to specify which instruction runs next). The "output" instruction appears in a similar position to push and pushback instructions, and can be mixed with them; there can only be one output instruction per stanza but it can output an arbitrary amount of output. Esimpl implementations do not need to support any instructions other than push and pop-goto in stanza 0.

The I/O convention for Esimpl is to encode each byte read from input in a variant form of unary; if standard input contains a byte with value n, this will cause the input instruction to return 0 n times, followed by 1. There is no requirement for implementations to be able to handle EOF conditions – Esimpl implementations are allowed to halt at end of input, or read garbage data – but for implementations that can reliably handle EOF, the recommended method is to read it as a 2 (as opposed to the 0s and 1s that represent valid data). Output is encoded similarly, and the outputting of a value is allowed to be split across multiple stanzas (e.g. you could output five 0s in one stanza, and five 0s and a 1 in another, to produce an 0x0A byte in the output). It is syntactically illegal to try to output an EOF, and outputting 256 or more 0s in a row is undefined behaviour (upon outputting a 1, this would lead to a byte with value 256 or more being output, which is impossible; and it is undefined even if the 1 never arrives).

Syntax

Esimpl supports three main syntaxes, a text syntax and two binary syntaxes. The binary syntaxes do not contain quite enough information to distinguish between some programs that are conceptually different, but whose behaviour is always either equivalent or undefined, and are intended primarily as internal representations for Esimpl interpreters.

Text syntax

In the text syntax, a command is written as the semideque being operated on, then a space, then the type of command, then a space, then the data (space-separated if the data is a list). Semideque numbers, stanza numbers, and data to be pushed onto semideques are expressed in decimal. For example, a command that pushes 1, 2, and 3 onto semideque 4 would be

4 push 1 2 3

When pushing multiple data onto a semideque at once, the list is ordered such that the first datum listed is the first that will be popped (i.e. in order to write the above command as three separate pushes in three separate stanzas, you would have to push 3, then 2, then 1; but to split up 4 pushback 1 2 3, you would pushback 1, then 2, then 3). Output is similar (but has no stanza number), e.g. to output a newline, you would write

output 0 0 0 0 0 0 0 0 0 0 1

The input-goto command also omits the semideque number. Both pop-goto and input-goto are followed by the stanza number that will be jumped to if 0 is reached ()

Each command's name can be abbreviated to a single letter, if desired. Here are the commands' abbreviated and full names:

o output
p push
q pushback

h halt
i input-goto
j pop-goto

As such, a complete stanza might look like this:

0 p 1 0 2 0
0 q 1 0 3 0
1 q 1 1 1
o 0 0
2 j 100

Commands must be separated by at least one newline; blank lines are permitted, as are comments (which run from # to a newline). Newlines cannot be placed within a command. Stanzas are written in order, and do not need explicit separators (because the pop-goto/input-goto/halt commands uniquely identify the ends of stanzas).

Absolute binary syntax

The binary syntaxes for Esimpl is intended for ease of parsing in low-powered esolangs (rather than for efficiency, like most binary formats are). As such, they use only bytes with small values, and numbers are expressed in unary. Stanzas are "normalized" so that they contain a push and a pushback instruction for every semideque (which pushes or pushbacks an empty list if the stanza would not otherwise add to that semideque). This section describes the "absolute" variant of the binary syntax.

A stanza starts with the data pushed onto semideque 0, then 0x03, then the data pushbacked onto semideque 0, then 0x02, then the data pushed onto semideque 1, then 0x03, and so on – this "pushing section" ends with the 0x03 after the data pushbacked onto the last semideque. The data itself is specified in the usual modified unary used by Esimpl – a datum with value n is specified as n 0x00 bytes followed by an 0x01 byte. For stanza 0, the encoding is different: the initial data of semideque 0, then 0x02, then the initial data of semideque 1, then 0x02, and so on, ending with the 0x02 after the initial state of the last semideque.

This is followed by the data of the output instruction; 0x05 specifies a 0 being output, and 0x06 specifies a 1 being output. (With no output instruction, or an empty output instruction, there are no 0x05 or 0x06 bytes; these two cases can be encoded the same way due to being indistinguishable.)

For stanzas with a pop-goto instruction, the semideque number of the pop-goto instruction is then encoded in unary, with the digit being the two-byte sequence "0x03 0x02" (i.e. nothing appears here when you pop-goto semideque 0, but to pop-goto semideque 3, you would write "0x03 0x02 0x03 0x02 0x03 0x02"). The stanza number of the pop-goto instruction is not listed here, but rather in the stanza's pushing section, by placing that many 0x00 bytes (with no trailing 0x01) at the start of the "push" section for the semideque being popped. (A pop-goto instruction jumps to the stanza whose number is the popped data + the specified stanza number, so what's conceptually happening here is that the "push" instruction is doing the addition of the popped data and the stanza number, because the lack of the trailing 0x01 causes the 0x00 bytes to add onto those of the semideque's first element.) The stanza then finishes with an 0x04 byte.

For stanzas with an input-goto instruction, the stanza number is encoded via pushing it onto semideque 0 (with a trailing 0x01 byte this time, unlike with pop-goto instructions), and the stanza finishes with an 0x07 byte.

For stanzas with a halt instruction, the stanza finishes with an 0x08 byte.

Relative binary syntax

The relative binary syntax is very similar to the absolute binary syntax, but stanza and semideque numbers are relative, rather than being absolute. Stanza numbers are relative to the next stanza (i.e. stanza "0x01" is the stanza after the currently running one, stanza "0x00 0x01" the stanza after that, etc.; the numbers wrap around past the end of the program, with stanza 0 counting). Semideque numbers are relative to the semideque that was most recently popped (so that semideque is listed first in the pushing section); the program must be written in such a way that this is statically known. An input-goto's stanza number is specified using the most-recently-popped semideque (i.e. the semideque whose number is encoded with the zero-length string in pop-goto instructions, just like semideque 0 is encoded with the zero-length string in the absolute binary syntax).

Generally speaking, the absolute syntax is more convenient for implementations that try to track the individual semideques or individual stanzas of a program, whereas the relative syntax will be more convenient for implementations that store all the semideques and stanzas together in one queue (either mixed together, or with one queue for semideques and one queue for stanzas).

Computational class

Esimpl is obviously Turing complete, because many Turing-complete languages can be compiled into it more or less directly. The most obvious is StackFlow, which can be defined as the special case of Esimpl in which pushback instructions are never used, and in which all pop-goto instructions that share a semideque number also share a stanza number, plus a couple of restrictions to statically ensure that there's no undefined behaviour. The Waterfall Model also compiles into Esimpl directly, because it can be compiled directly into StackFlow. Turing machines can be compiled into Esimpl simply by splitting their tape into two stacks (causing every Turing machine state/symbol combination to be directly expressible as an Esimpl stanza). Bignum brainfuck without negative-valued cells also has a fairly simple compilation into Esimpl, via representing the tape as two stacks, expressing the stack elements in unary with separators between them; if you do that, every brainfuck instruction except < and > then directly compiles into an Esimpl stanza (with < and > compiling into simple loops). Brainpocalypse can be compiled into Esimpl using a similar construction.

When compiling a Turing machine into Esimpl, the above construction incurs only a linear slowdown, meaning that Esimpl can be efficient in addition to being general. The pushback instruction is not used in this construction, so it can be removed from the language (making it fully stack-based) whilst still maintaining the same speed. (In fact, any Esimpl program can be modified to remove pushbacks via replacing one semideque with two stacks, which is a very similar construction to the implementation of a queue in terms of two stacks; in the general case, this makes the program a lot longer, but nonetheless incurs only a linear slowdown.) It is also possible to use binary rather than unary encodings of numbers, in order to gain similar efficiency when compiling number-based languages.

Esimpl is still Turing-complete if the push instruction is removed (except in stanza 0), making it a fully queue-based language; this variant of the language can, e.g., implement a tag system directly. When doing this, some efficiency is lost – after storing data on a semideque, it then takes time proportional to the size of that semideque to retrieve it again, which means that some programs that can be implemented in linear time on a Turing machine will have quadratic performance in queue-based Esimpl. (One example is a program that inputs an arbitrary string, and outputs that string in reverse.) However, all purely queue-based languages for which queue elements are drawn from a finite set share this restriction, and it is much easier to run Esimpl at the "queue-based speed limit" than it is for most queue-based esolangs (e.g. simple Turing-completeness constructions for tag systems run in exponential time, because control flow for tag systems is based on a sum of all elements in the queue rather than on elements in specific positions, and this makes it hard to store data other than with counter-like representations). This is the main reason why Esimpl has the otherwise redundant pushback instructions – it allows it to serve as an efficient intermediate representation when compiling into queue-based languages.

Esimpl is also still Turing-complete, with only a linear slowdown, using only 0 and 1 as elements of the semideques. The basic idea is to replace each stanza with two: the second stanza of the pair corresponds to the stanza in the original program, whereas the first stanza of the pair consists of a single pop-goto instruction (specifying the first stanza of the next pair, and the same semideque that was popped to reach it). The data in the stanzas, and in push and pushback instructions, is now encoded in Esimpl's normal modified unary (n 0s followed by a 1), and the resulting program will behave identically to the original.

It is also possible to write an Esimpl program to use only a single semideque (unfortunately, with a loss of performance). The general idea is to store all the semideques in a single queue, using an otherwise unused values as a separator, and split stanzas up into sections that read/write data separated by loops that rotate the queue to move the semideque you want to operate on to the start or end as appropriate (with each section or loop becoming a stanza of its own).

See also