Fuun DNA
Fuun DNA is a self-modifying string-rewriting language. It was defined for the task of the ICFP contest 2007 (“Morph Endo”).
Description
The DNA (program) of the extraterrestrial Fuun species (the species of Endo) is a string of DNA bases, where a base is one of the following four letters: I, C, F, P. When a DNA is executed (with suitable enzimes), it produces RNA, but unlike in terrestrial species, the DNA also directly modifies its own DNA sequence. The RNA is normally interpreted as a Fuun RNA program.
At each step of execution of the Fuun DNS, the interpreter takes a match pattern and replacement template from the start of the DNA, removing it, then tries to match the start of the DNA with the match pattern, and if the match succeeds, replaces the matched prefix according to the replacement template and the captures during from the match. However, parsing the pattern and the template can emit RNA bases as a side effect, and these are immediately appended to the RNA output. Pattern matching is a simple process without backtracking. Patterns can contain the following operations:
- match a base failing if the matched string,
- match any substring of a given length,
- find the next occurrence of a fixed substring and match everything up to the end of that substring,
- start or end a capture group (there can be any number of capture groups in the pattern but they must be properly nested).
Replacement templates can contain the following operations:
- append a given base,
- append a given capture, possibly quoted a given number of levels,
- append the length of a given capture.
Fuun DNA encodes integers in binary, least significant digit first. The digits are I or F for 0, C for 1, and integers end in a terminator P. This encoding of integers is used in skip elements in the pattern to tell how many arbitrary bases to match, and replacement can write the length of a capture in this encoding. Fuun DNA quotes bases the following way
- I => C
- C => F
- F => P
- P => IC
This quoting is used to encode literal bases to match in a pattern, literal bases to output in a template, but also to encode the string for a string search operation in a match pattern. Literal bases in a pattern and template aren't marked in any other way, but this is no problem, because the encoding other operations in patterns and templates start with an I and can't be confused with quoted text. The string for a string search operation in a pattern is marked with a prefix, but not with a prefix, so it's not self-terminating: the encoded string simply ends where a quoted base can't be decoded. This implies that a pattern can't encode a literal base match operation directly following a string search operation, but this is no problem, because you can write a no-op into a pattern as a fixed length match with zero as the length. Replacement templates can write captured strings unquoted, quoted, double quoted, or quoted to even higher levels to the replacement.
Execution of the DNA terminates when there is not enough of the DNA left to parse a properly terminated match pattern and replacement template, in which case the rest of the DNA is ignored.