Kwert

From Esolang
Jump to navigation Jump to search

Kwert is a language designed by User:Arctenik to correspond closely to certain aspects of the DEFLATE compression format. It involves a self-modifying program that is repeatedly evaluated, with commands that create copies of previous commands and/or cause following commands to be skipped.

Syntax

A Kwert program is a sequence of commands, each enclosed in square brackets. A command consists of zero or more comma-separated copy operations, the final of which may be followed by an optional trailing comma, all followed an optional skip count.

A copy operation is written as a length followed by a distance, separated by whitespace. A skip count consists of a semicolon followed by a number, which may be omitted, in which case the skip count is taken to be zero. Lengths and distances must be positive integers, while skip counts are non-negative integers.

For example, here is a command: [1 2,2 3,1 1;2]

It has three copy operations, 1 2, 2 3, and 1 1. It has a skip count of 2.

Inside commands, any whitespace may be used between syntax elements. Outside commands and ID sections (see below), all characters are ignored and may be used as comments, other than square brackets and backtick.

Command IDs

Commands may be given IDs (in a one-to-one mapping), which allows programs to be represented more succinctly. IDs are written in ID sections, which begin with a backtick and may be terminated by either a line ending, another backtick on the same line, the presence of a command expression, or the end of the program.

An ID section's content consists of a list of IDs, which may or may not be separated by whitespace. All IDs in the program must be of the same length, and may use any characters other than []` and whitespace.

An ID definition, which assigns an ID to a command, is created with an ID section containing a single, unassigned ID, followed by a command expression:

` x [1 1;2]

(This assigns the ID x to the command [1 1;2], and does not represent an actual instance of the command in the program.)

An ID section containing only IDs that have been previously defined represents a section of the actual program; e.g., given the above ID definition, this line:

`xx

Is equivalent to this one:

[1 1;2][1 1;2]

Any other ID section (i.e., one containing multiple IDs including at least one unassigned ID, or one containing a single, unassigned ID that's followed by another ID section or the end of the program) is an error.

Evaluation

The program is self-modifying, and is repeatedly evaluated from start to finish, indefinitely. A command is evaluated as follows:

  • Each copy operation is evaluated, in sequence; this means copying <length> commands one-by-one, starting from the command <distance> spaces before the current command, and inserting each one directly before the current command. A distance that extends past the beginning of the program is an error.
  • The next <skip count> commands are skipped. A skip count that extends beyond the end of the program is an error.
  • At some point in the process, the command being evaluated gets deleted. (It doesn't actually matter when this happens, since the command can't interact with itself.)

For instance, the example command [1 2,2 3,1 1;2] will copy 1 command from 2 before the current command, 2 commands starting from 3 before the current command, and 1 command from directly before the current command. (Note that the command copied by the first copy operation is actually the same as the first command copied by the length-2 copy; the distance in the second copy operation is different because of the additional command that has been inserted.)

Once a cycle is completed (that is, once the end of the program is reached), evaluation loops back around to the beginning of the program.

The first command of the program is always automatically skipped, making it easier to have an unchanging part of the program that can provide commands for following commands to copy.

Relationship to DEFLATE

Background: DEFLATE and DEFLATE quines

DEFLATE is a common compression format, with a decompression process that involves, in part, copying from sections of previously-output data. A DEFLATE stream is composed of blocks, which can be compressed blocks (where the copying is done) or uncompressed blocks (which contain a specified amount of literal data).

It's possible to create a DEFLATE stream that decompresses to itself (essentially a quine). It's also possible to create a quine that has additional, changing data appended to it.

Kwert and DEFLATE

Kwert is designed to correspond closely to certain possible behaviors of an extended DEFLATE quine. If Kwert commands are converted to sections of DEFLATE data that all have the same length, then Kwert's copy operations can be straightforwardly converted to DEFLATE's. Having commands skip some number of following commands can be done by having each converted command end with the header of an uncompressed block, causing some amount of following data to be treated as literal data that just stays in place, rather than as compressed blocks that can perform copies. A converted program can be "run" by repeated inflating the data, corresponding to cycles of the Kwert program.

Limitations

The relatively straightforward method outlined above of compiling Kwert to DEFLATE is limited by the bounded values present in the latter. If a distance or skip count is too long (taking into account the length of a compiled command), this method will fail. (DEFLATE's length values are also bounded, but larger ones can be simulated using multiple copy operations.)

Additionally, it's not entirely straightforward to make all compiled commands have the same length, so whether a program can be compiled might, in theory, be difficult to predict, and may depend on the specific set of commands used in the program. Because of this, if a method of constructing a Kwert program is to be described in a way that guarantees that the resulting program can be compiled to DEFLATE, it's necessary to show that the specific set of commands the program draws from can be compiled. (Technically it also works to show that a superset of the commands used in the program can be compiled when all used together, since any of those commands that are not already used in the program can be inserted at the beginning of the program and made to be always skipped.)

Example programs

Fibonacci words

[1 1;2][1 1;2][1 2,2 3,1 1;2][1 2;2]
[1 2;2][1 2,2 3,1 1;2][1 2;2]

This program simulates a symbol-replacement system where, repeatedly, all instances of A are replaced with AB and all instances of B are replaced with A -- a process which generates Fibonacci words.

The final three commands of the program are the initial symbol, which here is the B symbol. The first command of a symbol determines whether the symbol behaves as A or B. The other two commands are always skipped, and are always equal to the A and B commands, in that order -- they exist to be copied by other commands

The B command just replaces itself with the A command, by copying it from the previous symbol's skipped commands. The A command constructs an entirely new A symbol by copying commands from the previous symbol's skipped commands, and then replaces itself with the B command by copying it from the symbol that was just constructed.

The first four commands in the program stay the same after being evaluated, and are there to act like a preceding symbol for the first simulated symbol in the program.

The Fibonacci number corresponding to a given program cycle can be calculated by taking the number of commands in the program, subtracting 4, and then dividing by 3; additionally, a similar calculation can be performed with the compiled version of the program (given here as base64-encoded raw DEFLATE data):

AAAA//8AAAD//wAAAP//ABQA6/8AAAD//wAAAP//AAAA//8AFADr/8ImBiAAjo0kSZIgCAIiWgEAEgDt/xbZewCtI38ALgDR/8ImBiAAjo0kSZIgCAIiWgEAEgDt/xbZewCtI38ALgDR/8ImBiAAjo0kSZIgCAIiWgEAEgDt/wKLgAUAWwAMAPP/XSMjIyMjIwKLgAUAAAwA8/9CZgMEEEAAABgA5/9CZgMEEEAAABgA5/9CFselBpkNABgA5/9CFgcIIIAAABgA5/9CFgcIIIAAABgA5/9CFselBpkNABgA5/9CFgcIIIAAABgA5/8AAAD//wAAAP//AAAA//8AFADr/wAAAP//AAAA//8AAAD//wAUAOv/wiYGIACOjSRJkiAIAiJaAQASAO3/Ftl7AK0jfwAuANH/wiYGIACOjSRJkiAIAiJaAQASAO3/Ftl7AK0jfwAuANH/wiYGIACOjSRJkiAIAiJaAQASAO3/AouABQBbAQAA//9dIyMjIyMjAouABQABAAD//w==

If the length in bytes is plugged into the formula (x - 358)/36, or the length in base64 characters is plugged into (x - 480)/48, then a Fibonacci number will be produced corresponding to the number of inflations that have been performed.

Thue-Morse sequence

` x [1 1;2]
` 0 [1 2,2 3,1 1;2]
` 1 [1 1,2 3,1 2;2]
`xx01 001

This program works similarly to the Fibonacci program, but is written using command IDs, and simulates a system where 0 is replaced with 01, and 1 is replaced with 10. The generated sequence can be read by looking at every third command, starting from the fifth in the program.

External resources