Globe

From Esolang
Jump to navigation Jump to search

Globe is an esoteric programming language created by User:ais523 in 2025, while working on the problem of trying to write efficient programs in tag systems (i.e. that did not incur an exponential slowdown). In particular, the intention behind the language is that a Globe program can be compiled into a tag system with no more than an O(n) slowdown (i.e. the program is slower by a factor asymptotically proportional to the length of the Globe symbol string, which is also proportional to the length of the tag system's queue), and that a Turing machine can be compiled into Globe with no more than an O(log n) slowdown (i.e. the program is slower by a factor asymptotically proportional to the logarithm of the portion of the Turing machine's tape that has been visited). The language specification is subject to change if it turns out that either or both of these performance bounds are unreachable, or if either or both could be reached more easily via changing the language.

Globe is a descendant of the language family containing An Odd Rewriting System and Genera Tag, but diverges somewhat from those languages in order to get the desired behaviour.

Semantics

The data operated on by a Globe program consist of a state – one of finitely many possibilities enumerated in the Globe program – plus a possibly empty string of symbols, each of which is drawn from a finite alphabet specified by the Globe program. The program specifies the initial state and the initial string of symbols. (The initial string must be non-empty; this is because, once the string of symbols becomes empty, there is no way for it to become non-empty again.)

Each symbol/state combination is defined by the program as being either odd or even; this quality is known as a width. A symbol's width is always equal to the width of the combination of that symbol and the current state, i.e. unlike An Odd Rewriting System, the width of a symbol is not necessarily intrinsic to the symbol, but can vary according to what the current state is. It is not, however, influenced by the other symbols in the string.

Program execution proceeds by simultaneously a) changing the state to another legal value (leaving it unchanged is allowed, but considered to be changing it to itself), and b) replacing each occurrence of a symbol in the symbol string with 0 or more symbols (these replacements flatten, e.g. if you replace "a" with "cd" and "b" with "e" in "ab", you get "cde", and could potentially be the same as the original). The following factors influence what the state/symbol is replaced with:

  • the current state;
  • for symbols, the symbol that is being replaced;
  • for symbols, whether the total number of odd symbols that appear before it in the symbol string is even or odd (this is the same criterion as in An Odd Rewriting System);
  • for the state, whether the total number of odd symbols in the symbol string is even or odd.

A program specifies all the possible replacements simply by listing the appropriate replacement for every combination of circumstances, e.g. "in state A, if there are an odd number of odd symbols, change to state C", or "in state A, each symbol 'x' with an even number of odd symbols to its left should be changed to 'yz'". There are only finitely many possible circumstances, and thus only finitely many combinations that need to be specified in the program.

One possibility for the state is specified by the program to indicate halting: a program is executed by repeatedly doing the "simultaneously change the state and replace symbols" operation until reaching the halt state. The program does not list replacements for the halt state (or for symbols in the halt state), because the program would already have halted before the replacement could occur.

Syntax

As in Genera Tag, source files are encoded in UTF-8, and any line whose first non-space character is a # is ignored (allowing such lines to be used as comments).

Possible states and symbols are each represented as a string consisting of one Unicode cased letter, followed by zero or more Unicode subscript decimal digits (i.e. characters in ₀₁₂₃₄₅₆₇₈₉). (Exception: $ is the possible state in which the program halts.) Because the width of a symbol can vary based on the current state, there is no expectation that the case of a symbol matches its width. The sets of possible states and symbols are not explicitly specified, but rather inferred by looking at what states and symbols exist in the initial string and in replacements.

The first line of a program specifies the initial string (which is determined simply by concatenating the characters), followed by a space, followed by the initial state.

The rest of the program consists of replacements, each of which consists of a circumstance, followed by a colon, followed by a replacement (replacements must not contain whitespace). Circumstances are expressed as follows:

  • state0symbol for symbol replacements that are used when the symbol is preceded by an even number of odd symbols;
  • state1symbol for symbol replacements that are used when the symbol is preceded by an odd number of odd symbols;
  • state0 for state changes that are used when there are an even number of odd symbols;
  • state1 for state changes that are used when there are an odd number of odd symbols.

If a symbol is odd in a given state, this is represented by appending @1 to all replacements for that symbol/state combination (appending it to some but not all is a syntax error). @0 can be used in the same way to specify that a symbol is even in a symbol/state combination, but this is optional (being even is the default).

As syntactic sugar, - can be used to represent "an arbitrary cased letter" and can be used to represent "an arbitrary subscript" (which includes the null subscript and multi-digit subscripts) in the state part of the circumstance for a symbol replacement; this is equivalent to writing the replacement multiple times, once for each possible cased letter / subscript that could legally appear in that situation. Additionally, replacements that replace a symbol or state with itself (e.g. A0B:B or A0:A) may be omitted (exception: you cannot omit both the replacements for an odd state/symbol pair, because then you would have nowhere to place the oddness annotation) – implementations must use this as a default replacement in circumstances where the program does not specify a replacement explicitly.

I/O extension

A state replacement may be followed (without whitespace) by any sequence made of + and ! characters, which are run like commands when that state replacement is used to change the state; + increments an internal counter (which is initialised to 0), and ! outputs a byte whose value is that counter, resetting the counter to 0. (If the value of the counter ever goes above 255, this is undefined behaviour.)

A state replacement may be followed (without whitespace) by ?; this must come after the + and ! characters if both are present. This is only allowed for replacements for which, in the state that results after the replacement, all symbols have an even width (otherwise it is syntactically invalid). ? acts as follows: a) it reads one byte from input, b) it causes the next n uses of the ? command to be skipped (unless the read failed due to end-of-file), where n is the value of that byte, and c) if the read did not fail due to end-of-file, it causes 0 and 1 replacements to be swapped during the next set of simultaneous replacements.

Computational class

Globe is Turing-complete because it embeds Genera Tag with m=2. The embedding uses five possibilities for the state: one which simulates a Genera Tag generation at position 0, one which simulates a Genera Tag generation at position 1, two which check to see if Genera Tag's halt condition has been reached (the position is remembered via the choice of state), and one which performs the halting. The compilation is very direct, with almost everything translating directly: the only subtlety is that in the "simulate a generation" states, symbol width in Globe is based on symbol width in Genera Tag, whereas in the "check for halting" states, the Genera Tag halt symbol is odd and all other symbols are even (making it possible to detect whether or not the halt symbol is present).

Globe is believed to be able to implement Turing machines efficiently (slower by a factor of O(log n)), but this has not yet been proven.

Compiling into efficient Globe

Although there is not a concrete implementation yet, ais523 believes that it is possible to implement an efficient "test and delete the first symbol that belongs to a given set of symbols in Globe" operation. This is done using a state in which only the symbols of interest are odd (and all other symbols are even), then change the symbols of interest that are preceded by an odd number of odd symbols to a temporary replacement version of the same symbol (which is mostly the same, but is always even). The temporarily replaced symbols are known to not be the first symbol, and about half the symbols are temporarily replaced; so doing this repeatedly eventually causes all the symbols except the first to be temporarily replaced. Then the program would change to a different state, that causes the non-replaced symbol to become odd or even depending on what sort of symbol it is, and/or delete/replace itself, whilst undoing the temporary replacements. This technique allows the first symbol among the symbols of interest to be found, tested and replaced/deleted in O(log(n)) time, which is sufficient to implement an efficient stack or queue (as push/enqueue operations are trivial to implement in Globe, with only the pop/test being non-trivial), and thus allow efficient compilation from stack-based, queue-based, and (because a tape is equivalent to two stacks) tape-based languages.

One subtlety in the above construction is that the program needs some means of tracking how many times the "temporarily replace half the symbols" state should perform its replacements – a constant is not enough, it needs to grow dynamically to always be greater than the log₂ of the number of symbols. However, the loop counter can be stored efficiently even using the usual inefficient representation of integers in 2-tag (i.e. an integer n is stored as a list of symbols of length 2n): this does not bloat memory because the number being stored is a logarithm, and thus taking 2 to the power of the logarithm still produces a number that is proportional to the number of other symbols being used. This sort of number can be efficiently used as a loop counter using the same temporary replacement algorithm as above (because it produces an odd number of odd symbols on, and only on, the cycle where all but 1 symbol has been temporarily replaced, and when the temporary replacements are undone the value of the number will be restored), and can easily be incremented by replacing every symbol in it with two symbols.

Compiling from Globe into tag systems

ais523 believes that Globe (omitting I/O extensions that cannot be represented by the target language) can be efficiently compiled into 2-Genera Tag (and from there into 2-tag, cyclic tag, etc.) via representing each Globe symbol using two Genera Tag symbols for each Globe state: one which handles the symbol replacements, and one which handles the state replacements. At all times, all the Genera Tag symbols encode the same state as each other (which matches the Globe state in the program that was compiled from). During symbol replacements, the Genera Tag symbols have the same width as the Globe symbols, and replace the same way (but they replace into the state-replacement version). During state replacements, all symbols have width 0 (allowing them to observe the position as of the end of the previous cycle – with all the symbols being width 0, the width will not change during the generation).

This compilation scheme has one issue: in Globe, the replacement rule used for a symbol is based on whether there are an odd number of odd symbols to its left (i.e. An Odd Rewriting System's rule), whereas Genera Tag bases it on whether the symbol is at an odd position (which is equivalent to saying that there are an odd number of odd symbols to its left when the position at the start of the generation was even, or an even number of odd symbols to its left when the position at the start of the generation was odd). In other words, the compilation scheme only works if the position at the start of a symbol-replacement generation is 0. One easy way to solve the problem is to add an additional symbol at the end of the state string that's always even in symbol-replacement generations, and temporarily replaces itself with an odd version for the state-replacement generations if it saw oddness in the symbol-replacement generation; this has no effect on the rest of the construction, and yet resets the position to 0 when it needs to be.