Infinite Vector

From Esolang
Jump to navigation Jump to search

Infinite Vector is an esoteric programming language created by User:ais523 in 2015. It is based heavily around the concept of vectors. Contrary to its name, the vectors are not infinite; the "Infinite" in the name is merely to sound cool.

Note: this language is still something of a work in progress. If I ever try to implement it, I may end up changing some details to make doing so easier, and if I ever try to write programs in it, I may end up changing some details to make doing that easier.

Data types and storage

All data types in Infinite Vector are vector types: that is, they are a sequence of elements, each of which has the same data type. Vectors can be arbitrarily long (but must be finitely long). Because all data types are vectors, there is no need to specify "vector of" or the like in the name of a type, it's just assumed.

The types are as follows:

  • `flag`: vector of 1-bit unsigned integers
  • `byte`: vector of 8-bit unsigned integers
  • `word`: vector of 16-bit unsigned integers
  • `sentence`: vector of 32-bit unsigned integers
  • `paragraph`: vector of 64-bit unsigned integers
  • `page`: vector of 128-bit unsigned integers
  • `chapter`: vector of 256-bit unsigned integers
  • `char`: vector of 8-bit signed integers
  • `short`: vector of 16-bit signed integers
  • `long`: vector of 32-bit signed integers
  • `long long`: vector of 64-bit signed integers
  • `longer long`: vector of 128-bit signed integers
  • `long long long`: vector of 256-bit signed integers
  • `sink`: vector of 16-bit floating-point numbers
  • `float`: vector of 32-bit floating-point numbers
  • `double`: vector of 64-bit floating point numbers

A program can use any number of variables (each of which is a vector), each of which has a name (made out of lowercase ASCII characters and underscores; and any sequence of characters that corresponds to this syntax, if not a type name, is a variable name (or part of one) and is used for no other purpose). A single variable can change type during the execution of the program; whatever type is stored into the variable, the variable changes type to match. However, if a program reads from a variable before its first non-initialization write to that variable, then that variable must have the same type at the end of every iteration of the main loop and at every flow control command. (Because all commands treat types deterministically, the type of each variable at each point in the program can thus be determined statically.)

There are also "implicit vectors" which have all elements the same; the value of their elements is the same as their name (which must be an integer or floating-point number, according to the type). The type and length of an implicit vector is inferred from context (causing a compile error unless exactly one type and length is legal in that context). For example, a = a + 1 would add 1 to every element of a; this is conceptually an element-wise addition that adds corresponding elements of a, and a vector 1 that has the same number and type of elements as a does, all of which have value 1. Implicit vectors cannot be assigned to.

Commands

An Infinite Vector program consists of a sequence of commands. The vast majority of commands operate on vectors; there is almost no flow control. The entire program is placed in an implicit infinite loop (that can only be exited via exiting the program directly). Commands are separated by newlines.

Arithmetic

Arithmetic operations act element-wise on two vectors, which must have the same type and length, and produce a third vector as output (which can be assigned to either of the source variables or to a third variable), with the same type and length as the inputs (except in a few special cases, explained below). Infinite Vector compilers are allowed to reorder and rearrange sequences of arithmetic operations to mathematically equivalent sequences, even if this would produce different results (due to overflow behaviour or floating point rounding); to suppress this, place a non-arithmetic operation between the two operations that you don't want reordered.

a = b + c
a = b +? c

Addition adds corresponding elements of its input vectors. Floating-point overflow gives an appropriate infinity. There are two variants, which differ in their integer overflow behaviour (and work the same way on floating-point numbers): + wraps to the other end of the range (giving correct behaviour modulo 2 to the power of the number of bits), whereas +? returns the largest/smallest integer for positive/negative overflow.

a = b - c
a = b -? c

Subtraction subtracts corresponding elements of its input vectors. It has two variants, analogous to those for addition.

a = b * c
a = b *+ c
a = b *- c

Multiplication multiplies together corresponding elements of its input vectors. On floating-point numbers, all three operations are equivalent. On integers, the three variants differ in how they handle overflow: *- returns the value modulo 2 to the power of the number of bits, *+ returns the value divided by 2 to the power of the number of bits (rounding so that the modulus will work correctly), and plain * unconditionally doubles the width of the data type (regardless of the magnitude of the result) so that overflow is impossible.

a = b / c

Division divides corresponding elements of its input vectors. In most Infinite Vector implementations, inexact results will round towards zero, although an implementation could make the rounding behaviour customizable via command-line options. It only works on floating-point numbers. Division by zero produces NaN.

a = // b

Square root sets each element of the output vector to the square root of the corresponding element of the input vector (the square root of a negative number is NaN). The input must have a floating-point type.

a = /// b

Inverse square root sets each element of the output vector to the reciprocal of the square root of the corresponding element of the input vector (the inverse square root of a negative number is NaN, or possibly negative infinity if using a buggy processor stares at SSE, and of zero is infinity). The input must have a floating-point type.

a = b & c

Bitwise AND combines corresponding bits of corresponding elements of its input vectors, which must have an unsigned integer type, such that two 1 bits combine to form a 1 bit and other combinations combine to form a 0 bit.

a = b | c

Bitwise OR combines corresponding bits of corresponding elements of its input vectors, which must have an unsigned integer type, such that two 0 bits combine to form a 0 bit and other combinations combine to form a 1 bit.

a = b ^ c

Bitwise XOR combines corresponding bits of corresponding elements of its input vectors, which must have an unsigned integer type, such that two equal bits combine to form a 0 bit and two unequal bits combine to form a 1 bit. (This operation can also be used to complement every bit in a vector, as a = 1 ^ b.)

a = b << c

Left shift multiplies each element of its input vector, which must have an integer type, by two to the power of the corresponding element of the second input vector, to produce an output vector. Any overflow is lost (i.e. the result is reduced modulo 2 to the power of the number of bits). With this instruction, and the other shift/rotate instructions, the type of the second input vector is the smallest unsigned integer type that can represent the number of bits in the type of the first input vector.

a = b >> c

Right shift divides each element of its first input vector, which must have an integer type, by two to the power of the corresponding element of the second input vector, to produce an output vector. Rounding is towards negative infinity.

a = b <@< n

Left rotate is similar to a left shift, except that any bits that overflow past the top of the integer are not lost, but rather are divided by 2 to the power of the number of bits and added onto the result. Additionally, the right hand argument isn't a variable or implicit vector; it's just an integer, that is used as the right-hand argument for every element. The input vector must have an unsigned integer type.

a = b >@> n

Right rotate is similar to a right shift or left rotate, just with the obvious changes. It only works on unsigned integer types, and bits shifted off the LSB end of the integer are added on at the MSB end.

a = b <? c

Minimum uses the smaller of the corresponding elements from the input vectors as the output. (On a flag vector, this is equivalent to &.)

a = b ?> c

Maximum uses the larger of the corresponding elements from the input vectors as the output. (On a flag vector, this is equivalent to |.)

Conditionals

These instructions act element-wise on vectors of the same length. Depending on the instruction, some of the vectors are flags, and the others must all have the same type.

a = b ? c : d

The choosing instruction looks at the bits in a flag vector (b in this case), and uses them to determine which of the two possible corresponding elements in the vectors c and d to place in the return vector. If an element of b is 1, then the corresponding element of c will be returned as the corresponding element of a. If an element of b is 0, then the corresponding element of d will be returned as the corresponding element of a.

a = b < c
a = b = c
a = b > c
a = b <= c
a = b => c
a = b <> c
a = b <=> c
a = b !< c
a = b != c
a = b !> c
a = b !<= c
a = b !=> c
a = b !<> c
a = b !<=> c

There are fourteen comparison operations, as shown above. Each compares corresponding elements of two vectors given as arguments, and returns a flag vector. The return bits are 0 for false and 1 for true if there is no ! in the operator name, or 1 for false and 0 for true if there is an ! in the operator name. The comparison itself is true if < appears in the operator name and the first argument is less than the second; or if = appears in the operator name and the first argument is exactly equal to the second; or if > appears in the operator name and the first argument is greater than the second. (Note that NaN is not less than, equal to, or greater than anything, and nothing is less than, equal to, or greater than it. Thus a comparison like <=> is potentially useful, at least in that it won't always return true. It isn't very useful on integers, though.)

Changing types and lengths

In order to be able to mix vectors of different types, and different lengths, the following operations exist:

a = (type)b

The cast operation produces a vector as output that has the same length as the input vector, where corresponding elements are numerically equal (as far as is possible, e.g. inexact results will be rounded) but with a different type (which is specified as part of the operator's syntax). In the case of overflow on integer-to-integer conversions, the value will be reduced modulo 2 to the power of the number of bits in the destination. Overflow in floating-point-to-integer conversions is undefined behaviour, and overflow with floating point targets will produce a suitable infinity.

a = b ++ c
a = b ++? n

The first version of the 'concatenate operation concatenates two vectors of the same type and length to form a vector that's twice as long as the original. The ++? version concatenates a single element n (which is a literal floating point or integer, not a variable or implicit vector) to a vector, and only if the vector contains an odd number of elements (thus making it possible to split it into two halves).

a = b --? n

This operation, which (just to be different) doesn't have a name, entirely deletes the last element of a vector (making it one element shorter) if it has the same value as the single element given; the command may only be used if the vector currently has an even number of elements. As such, it's basically the reverse of ++. Note that you don't have to worry about whether NaN values are equal or not for this, because you have no way to specify them.

a ++ b = c

The deconcatenate operation splits a vector into two equal-length parts, which when concatenated form the original vector. (Obviously, this only works if the input vector has an even length. If its length is odd, this causes undefined behaviour.)

a = b $ c

The mingle operation (from INTERCAL, although it works on types other than flags) interleaves elements in two vectors that have the same type and length; that is, the first element of the output is the first element of the first input, the second element of the output is the first element of the second input, the third element of the output is the second element of the first input, the fourth element of the output is the second element of the second input, and so on.

a $ b = c

The demingle operation is the reverse of the mingle operation; it produces two outputs which, when mingled, give the original input. (Just like deconcatenation, this only works on an even-length vector.)

Control

a !!!

The maybe skip operation !!! takes two vectors as arguments. For each nonzero element in the argument, nothing happens, and the program proceeds as normal. Where an element in the argument is zero, it is flagged as a "skip element", while retaining its value. Any operations that read skip elements (arithmetic, casts, copies, concatenations, etc.) produce skip elements (with the correct value) as output for any calculation that depends on a skip element ("dependency" is interpreted strictly here, like taint in Perl, e.g. 0 bitwise-AND a skip element produces a skip-0.) At the end of the current main loop iteration, all skip elements cease to be skip elements, but might or might not have their value spontaneously set to zero (depending on what is faster/easier for the compiler).

The purpose of this operation is to allow Infinite Vector implementations to entirely skip sequences of operations that would have no useful effect, and just zero the relevant sections of output instead. As such, a few scattered skip elements are unlikely to make any difference, but a long string of them might lead to noticeable performance savings. (That said, the semantics of !!! mean that it could correctly be implemented to do nothing at all, and still comply with the spec; as such, it is best considered to be an optimization hint.)

# command

The initialize modifier # is placed before a command, and causes it to only run on the first iteration of the main loop; it will be ignored in all other iterations.

# a = (type)[1,2,3]

There's also an initialize command; this specifies an initial value for a vector on the first iterator. The type specifies the type of the vector, and the length of the vector is inferred from the given values. Like the initialize modifier, this only runs on the first iteration of the main loop, and is ignored on all other iterations.

a :-(

The exit if zero operation :-( exits the program if every single element of the vector used as an argument is zero. Otherwise it does nothing.

I/O

Input and output only happen in a batch style (all input happens at the start of a program, all output at the end). If the program contains a variable input, it will be initialized by reading input from the environment; it must be used on the right-hand side of exactly one cast expression (and the type that it's cast to gives the type with which the input will be interpreted), and its length will be determined from the input itself. Likewise, a variable named output will have its value output to the environment at program exit.

Computational class

Infinity vector is Turing-complete. We can prove this by simulating a Turing-machine through a one-dimensional cellular automaton. Our representation of the cellular automaton state is rather wasteful, requiring an Infinite Vector vector that's exponentially longer than the state array.

Start from a (one-tape) Turing-machine that we'll simulate. First, make sure that the Turing-machine doesn't have too large an alphabet or too many states. We want less than 2**256 (minus some overhead) symbols plus states. If the alphabet is too large, reduce it by representing each cell as multiple subsequent cells; if there are too many states, add a universal Turing machine that interprets your program.

Next, transform the Turing-machine to a one-dimensional cellular automaton. You can do this by assigning a cellular automaton state to each Turing-machine tape symbol, plus one state for each Turing-machine state. In the cellular automaton state array, store the tape of the Turing-machine with the state inserted immediately before the head.

The cellular automaton state array must be padded with zeroes while the Tape is running. This should not be a problem if zero represents the Turing machine tape's blank symbol (the one found on the unwritten part of the tape). We will run the cellular automaton on a finite but extending cyclical array, one that grows faster than the Turing machine head can move, by zeroes appearing at both the start and end of the array. This way when the Turing machine head moves to a yet unwritten part of the tape, it always finds zeroes. Then modify the cellular automaton such that if the Turing-machine halts, the cellular automaton clears its entire state array to zeroes, and does this faster than the array grows. You can do this by introducing two extra cellular automaton states that the Turing-machine's halting state decays to, then make those two states zero everything to their left and right to a distance of several cells per tick, faster than the array grows, and annihilate to zero when they meet.

We represent the length n state array of the cellular automaton in a length (1<<n) Infinite Vector chapter vector. The element of index k of the cellular automaton is stored at index (1<<k) in the Infinite Vector vector, and all other elements of the Infinite Vector vector (the ones whose index isn't a power of two) are zero. This representation is useful because IV can compute a copy of the state array rotated left or right by one element easily. Indeed, if A is the vector, then A0++A1=A;Ar=A0$A1; computes the roll right into the array Ar, and A0$A1=A;Al=A0++A1; computes the roll left into the array Al. We can extend the cellular automaton state array by inserting a zero to the left or right by A=0$A; and A=A++0; respectively.

We can then encode the transition rules of the cellular automaton into IV, acting on each element of the state array on parallel. Rotate the tape to examine the neighborhood, compare a neighboring element to a constant cellular automaton state with the IV == comparison operator, combined with the bitwise & and | operators on booleans to get the condition of a rule, then use the ?: operator with the desired state and zero as constants on the right hand side. Add the results of all the rules with the + operator to compute the full state transition.

To put the whole IV program together, add an initialization # = statement to initialize the vector to the representation of the starting state of the Turing machine tape. In the implicit loop, extend the vector with zeroes with the desired speed, then apply the cellular automaton state transition function to it. Finally use the :-( operation to halt when the cellular automaton has zeroed its entire state.