Flow of Holes

From Esolang
Jump to navigation Jump to search

Flow of Holes is an esoteric programming language created by User:ais523 in 2018. Its purpose is to act as a reversible counter machine that's relatively high-level/unrestricted as esolangs go, thus hopefully making computational class proofs for reversible languages easier. Its main unusual feature is that data and control flow move in opposite directions (thus the name). Other unusual features include the fact that the syntax is primarily specified in terms of how it would be represented in a GUI IDE (rather than as text), and support for formal verification that functions obey their specifications.

Data storage

A Flow of Holes memory state is a graph, i.e. effectively consists of a number of nodes joined by a number of connections. There are two sorts of node: data nodes and control nodes; all control nodes are identical, but each data node stores a value, which is a non-negative integer. There are also two sorts of connection, primary connections and secondary connections. A connection always links a data node and a control node (data nodes cannot be linked to each other, nor can control nodes). Connections are directional (a direction simply specifies which of the two connected nodes the connection "points towards"); every connection has a data direction, and primary connections also have a control direction, which is always the opposite of the data direction. When representing the memory state as an image, data nodes look like a circle or ellipse with their value written inside, control nodes like a small filled circle, primary connections as solid lines, secondary connections as dashed lines, and directions are shown as arrowheads; a filled triangle for a control direction, a hollow triangle for a data direction.

Over the lifetime of the program execution, only two things in this memory state change: the values of the data nodes, and which control node is active. (At any given moment, there's always one active control node, although it's interesting to imagine what a "multithreaded" version with multiple active nodes could be like.) Active nodes are represented graphically by recolouring them. A Flow of Holes program is basically an initial state of memory (including a specification of which node is active); because the actual graph structure never changes, it can alternatively be viewed as the graph, combined with initial values for the data nodes.

There is a restriction on the graph structure: each data node must have exactly two primary connections (one control-connected to it, one control-connected from it). A program that does not obey this restriction is syntactically incorrect. (Additionally, two nodes cannot have multiple direct connections between them; it's possible to define semantics for this case which are consistent with the other cases, but would be rather more complex than the rest of the language and thus somewhat inelegant.) There is also a restriction on the data node values: out of all data nodes control-connected to the active node, exactly one must have the value 0. ("To" here – and elsewhere on this page – is directional, i.e. this doesn't count data nodes control-connected from the active node.) This latter restriction is what makes the language reversible (as it makes it possible to uniquely identify where the control flow came from).

Program execution

One step of a Flow of Holes program execution proceeds as follows:

  1. Identify the lowest value among the data nodes control-connected from the active node. (If there are no such nodes, the program terminates.)
  2. Subtract this value from every data node data-connected to the active node (either via a primary or secondary connection). If this would bring a node value below 0, this is illegal behaviour.
  3. Add this value to every data node data-connected from the active node (again, either via a primary or secondary connection).
  4. Identify the unique data node that's control-connected from the active node, and has value 0. If there is more than one such node, this is illegal behaviour. (Mathematically, there must be at least one such node.)
  5. Find the control node that's control-connected from that data node. This becomes the new active node.
  6. If the newly active node has more than one data node with value 0 control-connected to it, this is illegal behaviour. Otherwise, the step is finished.

An implementation repeatedly runs execution steps until the program terminates, or illegal behaviour occurs. Correct Flow of Holes programs must ensure that illegal behaviour never occurs. Fully-featured Flow of Holes implementations should report an error when illegal behaviour occurs. Less fully-featured implementations may choose to instead interpret illegal behaviour as undefined behaviour, in which case their behaviour will only be defined on correct Flow of Holes programs.

It can be seen that control moves in the control direction, because the choice of active node is determined by following primary connections in the control direction. Meanwhile, data gets removed from data nodes and added to data nodes that are two steps away from them in the data direction, and thus is effectively moving the other way round the graph. (It should be noted that the common case of the above rules is to zero a data node "ahead" of the active node, and to add its value to a zeroed data node "behind" the active node, so control flowing along a linked chain of nodes effectively moves all the values one step in the opposite direction.)

Reversibility

Because there's always a 0 "behind" the active node, that records where the activity came from, and any situation where there were two such 0s would be illegal behaviour, it's always possible to determine where the control flow came from in the previous step of execution. Likewise, because there was always exactly one 0 behind the active node before the transfer of data, and negative numbers would be illegal behaviour, you can determine the size of the transfer by looking at the value of the smallest such node. As such, it's always possible to deduce the state of the program immediately before a step was run, meaning that a step can be reversed (and by repeatedly reversing steps you can run the program in reverse, getting back to the start of the program).

More interestingly, it turns out that you can change a program into its reverse version simply by reversing every connection, as the algorithm you'd use to reverse the behaviour of a step turns out to be the same as the algorithm to run a step forwards, simply with every connection reversed. This makes the language symmetrical in a sense, and also eases development (as it's often easier to write the reverse of the function you want, then reverse every connection to produce the behaviour you actually wanted).

Extensions

Several aspects of Flow of Holes will exist in fully-featured implementations, but may be omitted from simpler implementations, either because they can easily be mechanically removed from a program without changing its semantics, or because they can be implemented by a separate processor.

Output

A data node with no primary connections (which would otherwise be illegal) can be used to represent an output device; in this case, adding a value to that data node (via a secondary connection) would cause the added amount to be output (either as a number or as a codepoint, depending on implementation settings).

Functions

The above description of a language represents a "compiled" or "inlined" form of the program. However, actual Flow of Holes source code is typically written in a modular form, with reusable "functions", sections of graph which are duplicated in multiple places. The way these work is that a function is defined in one section of the source code, which contains some insufficiently connected data nodes (i.e. they have less than the required two control connections); each of these nodes is named. The function can then be "instantiated" any number of times, which produce copies of the entire function section; each instantiation has a name, and the instantiation name and data node name are combined to form names for the insufficiently connected nodes in the instantiations. Those names can then be used to identify the nodes in question when connecting them to the rest of the graph, so that the program as a whole has no insufficiently connected nodes.

It's possible for functions to themselves contain function instantiations, but recursion (even indirectly) is not allowed. It's also possible to instantiate a function in reverse (i.e. with all its connections reversed), via appending ' to its name when instantiating it.

Functions are typically implemented entirely by a preprocessor, rather than existing at runtime; in other words, they can be seen as syntactic sugar for inlining. This means that less fully-featured implementations can handle them using the help of a separate preprocessor, rather than needing to do the preprocessing themselves.

When drawing a Flow of Holes program (or the state of a running program) as an image, functions are represented as large rectangles (with function and node names labelled), containing the code of the function; and function instantiations are also represented as rectangles, hiding their contents (i.e. displaying just the name of the instantiation and which function it instantiates). (While an instantiation contains an active node, the rectangle containing the function in question changes to reflect the state of that instantiation.)

Testing / verification

Within a function, each data node is associated with a quiescent value. This is a constraint that relates the value of the data node to the function's history of execution, and acts as a comment or specification of how the function works internally: this represents a claim that whenever no control node within the function is active, each data node will have a current value that's correctly described by its quiescent value.

The simplest case of a quiescent value is an integer, specifying that the data node's value should always be that integer. (For example, many data nodes may be expected to always have the value 0 while the function isn't actively running.) However, quiescent values can also take on more complex forms, related to things like how many times specific nodes within the function have become active. They can also be inexact, e.g. "any value" is a legal quiescent value (which can be used as a method of disabling the verification). There is currently no fixed syntax for quiescent values; this is an area in which implementations could reasonably support their own ideas and extensions.

In addition to inlining the function instantiations, the preprocessor also performs "test runs" of functions from each possible entry point (i.e. each data node that's missing its inbound control connection), by connecting each entry point from, and each exit point to, a (distinct) hypothetical control node, then working out what would happen when each of the control nodes control-connected to entry points started as active, if each data node started at some valid quiescent value for that node. (As such, this is effectively "symbolic computation" of a Flow of Holes program.) The function passes verification if, by the time that control reaches an exit point (a data node that's missing its outbound control connection), every node within the function is again described by its quiescent value. This thus proves that the program is reusable, and will conform to its specification each time it's run. Preprocessors should produce errors when such a test run causes a function to exit in a way that doesn't match the quiescent values, or can be proven to enter an infinite loop, or produces illegal behaviour; and warnings when a function can't be proven to exit (e.g. because it's still executing after a given number of steps of the simulation).

It's permissible to write functions purely for triggering the verification behaviour, without any instantiations of those functions. These can be referred to as "test cases".

Computational class

It's fairly easy to implement most of the operations of The Amnesiac From Minsk level 1 in Flow of Holes. The control graph of the Minsky machine can become a literal control graph of primary connections within the Flow of Holes program (using, e.g., 2 for all data nodes other than the one via which control was just transferred), and secondary connections can be used to add and subtract 2 from data nodes which act as counters (which store twice the value of the corresponding TAFM counter), so there are only two complex operations: a zero-test, and a "merge of control" in which a control node can be reached via either of two data nodes (and thus indirectly via either of two different control nodes).

A zero-test is fairly easy, given that (as we're implementing TAFM, not a full Minsky machine) only one zero-test is required per data node: activate a control node which has a primary connection control-to both the counter, and a data node with value 1. If the counter is zero, execution will proceed to the counter (without changing data), and its control connection can be used to move to a new section of the program. If the counter is nonzero (meaning its value is 2 or more), its value will be reduced by 1, and control flow will continue on the data node with value 1. Both of these should be followed by a short chain of data nodes with value 1 which allow "repairing" the state of the data nodes (some of which will now have moved away from their quiescent values) via using secondary connections to adjust the value of any node that has an incorrect value. The transition back to data nodes with value 2 is accomplished by subtracting 1 from a data node "ahead" of the control flow, so that when control flow reaches it, a 1 (2 - 1) will flow back into the section of 1s, whereas the data node in question will become 2 as it takes a 2 from the rest of the control flow graph.

The merge of control is more complex, as it effectively has to implement a bit bucket to remember the sequence of control flow. It's easier to write it in reverse, at which point the component you need is one which effectively stores a "string" of sequences to pass control to, and passes control accordingly. This effectively works via repeatedly zero-testing a counter, except that instead of testing against 1, the numbers subtracted are taken (in reverse for the reversed control merge, forwards for the forwards control merge) from a sequence that grows at least as fast as 2x (and the value of the counter is not restored afterwards, but left at its subtracted version). This makes it possible to encode any finite string in the counter via adding together elements of the sequence. Although 2x is a natural sequence to use for this purpose, sequences such as "alternate terms of the Fibonacci sequence" may be easier to implement. (Note that although the reverse control merge can only be used finitely many times, a regular control merge works unboundedly many times because it's always possible to encode a longer string.)