Takeover

From Esolang
Jump to navigation Jump to search

Takeover is an esoteric programming language created by User:ais523 in 2016. The original idea that inspired the language was to create a language in which standard input was appended to the program before it started to execute (thus making the null program a self-interpreter), and the rest of the language was designed around this concept. The resulting language has some similarities to Splinter, and some similarities to the non-esoteric language Forth.

The descriptions below give a definition of the language, and a guide to one way to implement it. However, the "as if" rule applies: you can implement the language via any method that causes all programs to behave identically (in terms of I/O behaviour, not necessarily in terms of efficiency) to their behaviour in a hypothetical implementation that worked exactly as described below.

Data storage

A Takeover program has the following places to store data:

  • The program itself. Much as in Underload, the program is viewed as a stack (the start of the program is the top of the stack), and commands are repeatedly popped from the top of it and executed. Unlike Underload, in which only one command (^) pushes onto the program, it is fairly common for Takeover commands to push onto the program. The program is made of octet snapshots, which consist of an octet, and optionally a positive integer (in this article, an octet snapshot's octet is written in Latin-1, and its integer (if any) is written in decimal as a subscript, e.g. X and y3 are octet snapshots). When the Takeover program starts to run, it loads the program (interpreting it as a sequence of octets), appends standard input to it (interpreting that as a sequence of octets too, and reading it until end-of-file), and uses that as the initial state of the program (with no positive integer associated with any of the octet snapshots in question, so it's initially just a sequence of octets). Each octet snapshot is a command; two different snapshots could potentially have vastly different behaviour even if they're similar (e.g. -1 is not very similar in behaviour to -2).
  • The active definition. This is a list of octet snapshots, which supports only two operations. One operation is appending to it, which is mostly accomplished via the use of commands whose integer portion is 1. The other operation is moving the entire list to an element of the definition matrix, which causes the active definition to be empty again (its previous contents are now elsewhere). The active definition has one other unusual property: when the program ends naturally (which happens once it empties and has no more commands to pop), the active definition is printed to standard output (ignoring the integer part of the octet snapshots that form it, and just printing the octet part). This is the only way to produce output.
  • The definition matrix. This associates definitions with octet snapshots. A definition could either be hardcoded internally to the interpreter, or written as a sequence of octet snapshots, i.e. Takeover commands; in the latter case, it must have been placed there via moving it from the active definition. For each octet, there are only finitely many definitions; the first is associated with 1 (and is always hardcoded), the second is associated with 2 (and is always hardcoded), the third is associated with 3 (and is always hardcoded), and any subsequent definitions are associated with further nonnegative integers in sequence, and must be written as Takeover code. Thus, it makes sense to talk about the number of definitions for a given octet (and the number of definitions for each octet is initially 3). Trying to use an octet snapshot with no associated definition should cause the program to crash. However, executing an octet snapshot with no associated integer is not an error; in such a case, the integer is assumed to be equal to the number of definitions of the octet (i.e. the last, or equivalently the most recent, definition is the one that is used). When a non-hardcoded command runs, its definition is appended to the start of the program, along the lines of Underload's ^ command.
  • The command modification state. This is an operation that's performed on each octet snapshot that's popped from the program, before it's executed, causing a different command to execute from the one stated in the program. (For example, the state could be "set the integer part to 1", in which case if the program started with a4, the command that actually ran when this octet snapshot was popped would be a1.) This is initially a no-op, and in most (but not all) cases, this resets itself to a no-op after one command.

Initial definition matrix

The above section is actually almost a complete definition of Takeover; the only thing missing is a description of the hardcoded commands in the definition matrix.

Simplest is the definitions for octet snapshots with integer part 2. These are all analogous: they append an octet snapshot to the end of the active definition. The octet part of the appended snapshot is equal to the octet part of the command in question, and the integer part is equal to the number of definitions of the octet. (So for example, if @ has five definitions, then @2 will append @5 to the active definition.)

The definitions for octet snapshots with integer part 1 are also all analogous to each other. Each of them adds a new definition for the octet part of the command in question to the definition matrix, via moving it from the active definition. Thus, if the active definition is x4y4 and @ has five definitions, then @1 would empty the active definition, and define @6 as x4y4; now @ would have six definitions. This is the only way to add new definitions for a command.

Most of the definitions for octet snapshots with integer part 3 will push two octet snapshots onto the program. First, .4 is pushed, and then the snapshot whose octet code is lower by 1 is pushed (with no associated integer, and with 0 wrapping to 255). (As usual for a program-stack, the commands will run in the reverse order to that with which they were pushed.) However, there are several exceptions to this general rule, most of which change the command modification state:

  • +3 sets the command modification state to interpret the next command as having an integer larger by 1 than its actual value (thus +3a3 is equivalent to a4). This state resets itself to a no-op after it modifies one command (i.e. before that command executes).
  • -3 sets the command modification state to interpret the next command as having an integer smaller by 1 than its actual value (thus -3a3 is equivalent to a2). This state resets itself to a no-op after it modifies one command (i.e. before that command executes). Note that there is no means via which you can get an integer component of an octet that's smaller than 3 into the program, so there's no way for this to create an invalid octet snapshot.
  • >3 sets the command modification state to interpret the next command as having an integer equal to 1 (thus >3a3 is equivalent to a1). This state resets itself to a no-op after it modifies one command (i.e. before that command executes).
  • <3 sets the command modification state to interpret the next command as having an integer equal to the number of defintions that exist for its octet (thus <3a3 would be equivalent to a7 if a had seven definitions). This state resets itself to a no-op after it modifies one command (i.e. before that command executes).
  • ,3 sets the command modification state to interpret the next command as follows: the octet code is higher by 1 than its actual octet code (with 255 wrapping to 0); and the integer portion is equal to 2 (thus ,3a3 is equivalent to b2). This state resets itself to a no-op after it modifies one command (i.e. before that command executes).
  • [3 sets the command modification state to interpret each command as having an integer portion equal to 2. Unusually, this state does not reset itself after it modifies one command. Instead:
    • If this state modifies a [ (with any subscript), it turns it into a [2 as expected, but also increases an internal counter (that's initially 0).
    • If this state modifies a ] (with any subscript), then:
      • If the internal counter in question is not 0, it decreases it, and turns the command into a ]2 as expected.
      • If the internal counter in question is 0, it's left at 0, and the command modification state is reset to a no-op immediately. The ] is thus left as is and so does whatever it would have done without the [ command.
  • ]3 does nothing.

Programming in Takeover

Hello, world!

Let's start out simple, with an attempt to write a hello world program. Placing "Hello, world!" onto the active definition (so that it can be output) is trivial:

[Hello, world!]

If standard input were empty, this would print "Hello, world!" as expected (in fact, you could leave off the trailing ] and the program would still work, as there's no requirement to end with a no-op command modification state). However, a correct hello world program should print the string in question regardless of what's on standard input, making things rather more complex.

An obvious first plan is to define all command's last definitions to do nothing (meaning that nothing further will happen during program execution). The problem here is that defining a command always empties the active definition, and with no commands available, you couldn't push onto the active definition. However, this is easy to work around, via use of the integer part of the octet snapshots in the program. This leads to a hello world program that looks like this:

[>+>->[>]><>>>,>.[Hello, world!]]>++

The seven octets which have unusual integer-part-3 definitions are given a no-op definition (] doesn't need redefining as it's already a no-op, but redefining it anyway keeps brackets matched), as is . (otherwise an unknown character will expand to something containing the non-defined .4 and crash the program), and then the active definition is changed to Hello, world! as expected. All this is done using octet snapshots with an explicit integer part of 3 (introduced via adding a definition for + – which has explicit integer parts, because of the way that integer-part-2 commands work – then immediately executing it); at the time when the active definition is changed, [4 has been defined to a no-op, but the [ that runs is still [3, as that's how it was added to the active definition. (Now you can see why they're called "octet snapshots": they give the command as of a particular moment in time, rather than the most recent version of a command.)

This sort of technique can be used to print any string. The only mildly tricky characters to print are [ and ], but you can escape those using -[ and -] (note that the escaping needs to be done on two levels, or the unmatched brackets will throw off the original attempt to define +); a -3 can change an associated integer from 3 to 2 just as well as a [3 can. For example, foo]bar[baz can be printed as follows:

[>+>->[>]><>>>,>.[foo]-]-][[bar]-]-[[[baz]]>++

Alternatively, you could use , to escape problematic octets by replacing them with the less problematic octets with adjacent octet codes:

[>+>->[>]><>>>,>.[foo],\[bar],Z[baz]]>++

Cat program

A cat program copies its input to its output. One possible approach to this would therefore involve defining every command to append its own octet to the active definition. You could accomplish this with code similar to the following:

[[[a]]>a[[b]]>b[[c]]>c ... ]>++

with every octet redefined as shown above. (The square brackets could be handled as suggested in the previous section. Note that as all the commands that run during the redefinitions have an explicit integer part of 3, you don't have to worry about redefining them from underneath yourself.)

However, this causes the program to contain a bunch of non-ASCII characters, and also to be quite long. A more general technique would be to implement cat via octet code manipulation. The basic idea will be as follows: all characters but . will be given, as their most recent definition, a default-integer-part-3 definition (i.e. x.4 where x is the octet with code one lower. We have two places for "internal storage", called "last" and "rest", both initially empty. . (unsubscripted) will empty the active definition, append "last" to "rest", set "last" to . (the integer part doesn't matter), and set the active definition to "rest" concatenated with "last". .4 will empty the active definition, increment the octet code of "last", and set the active definition to "rest" concatenated with "last". This means that whenever input (and thus the program) happens to end, we already have a copy of it sitting on the active definition ready to be output.

The obvious question here is where we're going to store "last" and "rest". The answer is to store them in the second-most-recent definition of some octet (which one doesn't really matter, so we can use l and r to make the code clearer), and thus be able to access them via -3 (which in turn implies that both . definitions need to be defined before - is redefined, and thus that there are only finitely many . definitions; most reasonably they would be .4 and .5).

The following program is a first attempt at a working cat along these lines. The Greek letters represent unprintable ascii characters, which it is assumed will not appear in the input, except for ω which is EOF. λ is instead of l and ρ instead of r in the description above. π is a pointer, which allows us to redefine .4 on the fly. δ is simply there to keep -3 within easy reach. τ is just a temporary variable, and it is assumed τ has an octet code one lower than ψ. The letters ω, ... , τ, ψ occupy octet codes 0 to 7, and the expected input uses codes from 8 onwards (so ψ has the highest octet code that is not permitted). Here is the code:

-->δ[>λ<λ>τ-δ,/>π]>f[<π]>.[f<τ[δ,]-]>g-g>π[f<ρ]>ω[-δ,/>]-][-δ,Z<ρ<λ-δ,/>ρ-δ,/>]-][[δg]>π]>ψ>ρ[.e]>f[.f]>g[.*]>+[.,]>-[.;]><[.+]>,[.=]>>[.Z]>[δ[.-]>.δ[δ./]>]

Readers may be concerned by the amount of Greek in use here. If so, they are invited to look at the talk page to see some of the problems it solves, and then, if they think they can do better, to attempt the problem themselves.

This particular line of thought is unfinished, but should hopefully give an idea of how to read characters by character code in Takeover. For writing cat itself, an alternative technique is to use [3's command modification state, and redefine ] to return to that state immediately upon exiting it. This makes it possible to write a program like this:

[,Z]>[-,-\--,Z>]-[

Computational class

Takeover can easily be proved to be at least a push-down automaton via compiling Splinter minus output into it:

  • x{y} in Splinter becomes [y]>x in Takeover (where the same Takeover-to-Splinter compilation is recursively done on y);
  • x not followed by { in Splinter becomes <x in Takeover.

(You might also want to add code at the end of the program to cause the appended standard input to become a no-op, which is easy as none of the six unusual commands, nor ., have been redefined.)

This effectively causes Takeover to act like Splinter via only ever taking the most recent definition for a command. However, Takeover has much more functionality available than this relatively rudimentary programming technique. It thus seems very likely that Takeover is Turing complete, but nobody has proved this yet. (It cannot be brainfuck complete because it reads all input before producing any output.)

If Takeover is Turing complete, an obvious next question is as to which integer-part-3 commands are necessary. > is clearly needed for any form of flow control (you have no other way to access integer-part-1 commands). Without at least some other form of command modification state, you can prove that the program always terminates (you can only have finitely many unsubscripted commands in the program, and each subscripted command can only expand into other subscripted commands that were defined earlier), meaning that you also need at least + or <. You also need some means for appending to the active definition, thus at least one of [, -, or ,. ([ also needs ] or there's no way to do anything with your active definition). It seems plausible that the language could be Turing complete with just one command from each of these sets, though.

See also

External resources