User:Hakerh400/How to write quines

From Esolang
Jump to navigation Jump to search

A quine is a program that prints its own source code. For an inexperienced esolanger it may sometimes seem non-trivial to write a quine program in a given esoteric programming language. There is a "How to write quines" tutorial on the official Quine article, but it seems not very descriptive and provides only an example of a particular programming language rather than focusing on the abstract idea behind the algorithm.

This article aims to provide a deeper understanding of how quines actually work and how to develop them in (almost) any esolang.

Is a quine possible

A beginner developer, when asked to write a program that prints its own source code, usually asks "Is it even possible?". In fact, in some programming languages it is not possible. But mathematicians have proved that in any Turing-complete programming language it is possible to write infinitely many different quine programs. It also implies that if there is only a finite number of quines, the programming language is not Turing-complete. However, if there are infinitely many quines, it does not imply that the language is Turing-complete.

In this article we only consider Turing-complete languages, since we know quines are possible in them.

How to write a quine

If you tried to write a quine yourself for the first time, you have probably succeeded, but you have done it by trial-and-error method - started from a function that prints an emty string and then iteratively adjusted surrounding content to match the source code. It is trivial in any practical language (Java, JavaScript, Python, C, C++, etc). However, it may not be so trivial when we move to esoteric languages. So, we will try to explain an actual algorithm (not a trial-and-error attempt) to construct quines. There are some languages that are fully non-practical for programming and writing a quine in them would probably be a lot harder than you may expect.

Program structure

Quine program structure

In this chapter we are interested only in quine programs that have the following structure (their source code have this structure):

  1. Heading
  2. Code
  3. Mediator
  4. Data
  5. Trail

Data represents a static data structure (array, string, matrix, ...) which somehow encodes the Code segment. In other words, there is an algorithm in the given programming language which reconstructs the Code using the information from the Data segment.

Code contains an algorithm that accesses Data and prints it in two different forms. Firstly, it decodes Data to obtain the Code and prints it, then it prints raw data, as it would appear in the source code.

Heading, Mediator and Trail are code segments that are neither Code nor Data. We must hardcode them in the Code segment, so we tend to minimize the length of these three segments.

Note: Code and Data may switch places, if it makes implementing a quine easier.

Initial draft

The quines works like this: the algorithm that is contained inside the Code segment accessed Data and prints it in two different manners (decoded and raw), and also prints Heading, Mediator and Trail if they are present (but they are anyway hardcoded in the Code segment itself, so we don't need to worry about them).

Given a programming language, the first task is to determine how to encode any string and represent it as a continuous block in the source code. While in practical languages you always have strings and arrays and bytes, in esolangs these data structures may be absent, so you would either need to implement them or to figure out another way of doing so. Here are some problems thay you may face and how to solve each of them.

Problem 1: There are no data structures at all. We cannot represent an arbitrarily long continuos data inside the source code. Source code consists only of instructions.
Solution 1: Use instructions to write the encoded data into memory, then start executing Code. In this case, it may be easier to put Data before Code, but Data is now also a code, but it simply puts the encoded string into memory.

Problem 2: The only data structure is a string, but there is no character escape sequence. We cannot encode string delimiters inside string itself.
Solution 2: Since the language is Turing-complete, it can produce any value. Split your strings into pieces that do not contain string delimiters and between them put a code that produces the required character.

Note: whatever encoding you decide to use, make sure that Data does not perform any complicated computation, because we will generate Data programmatically by processing the Code segment once we finishes the Code. Also, make sure that Data itself does not access the Code segment at runtime.

Implementing a decoder

Once you figured out how Data should be represented so that you can generate it programmatically based on the Code, you can start writing Code segment itself. Basically, it should somehow be able to access the Data segment and print it in two different ways. This will be a lot clearer when we see the first example in the next paragraphs. Here are some common problems and solutions:

Problem 1: There is no way to access data without modifying it. On the second pass, we would not be able to access the original data.
Solution 1: There must be some way to copy each bit (or byte, or whatever data unit is) to another location upon accessing it. Or if you know how data will be modified and it is reversible, then apply the reversed process.

Problem 2: The language does not allow printing directly to the output. Instead, the whole output needs to be in memory when program halts.
Solution 2: That is not really a problem. Instead of calling the print function, call your function that stores the data unit to some memory location where the output is expected. If there are no functions, then simply inline the content that would be in a function.

Implementing a meta-encoder

Now you need to implement an encoder that generates Data segment based on your Code segment. It is called meta-encoder because it is not something that will be present in the final quine. You can write this encoder in any programming language you want, it simply needs to spit out the Data segment that you would inject into your source code to finalize the quine. In this article, we will write meta-encoder in JavaScript for simplicity.

Cheaty quines

We need to define what a cheaty quine is. There are some disagreement among esolangers regarding this topic. To make it clear, in this article we use the following definition of cheaty quines:

Cheating:

  • Accessing file system
  • Accessing the internet
  • Accessing the current date or random number generator (not sure how it would help writing quines though)
  • Accessing the source code of the program through the host operating system API

Not cheating:

  • Stringifying the wrapper function
  • Accessing the source code of the program through the language features itself
  • Reading the assembly code from memory (if langauge supports it) and reconstructing the source code

Basically, every feature that is provided by the language itself and does not require contacting operating system for an external resource is allowed. Some esolangers may disagree that "Stringifying the wrapper function" is not cheating, but if you think about it, we can consider each langauge as a black box Turing-machine. We do not care of its internal algorithms, we are only interested in the source code and the output it generates. As long as the machine (language) does not ask for an external resource, it is not cheating.

Examples with practical languages

It would be too abstract without examples. We will try to provide a lot of different examples in various programming languages (starting from practical and sliding to esoteric). In each example, we would reference the above abstract concepts, problems and solutions that we applied.

Final version of each quine is represented as a code snippet with green background, to make it clear what code you can directly paste in the interpreter and obtain the output that is equivalent to the source code. Note: new lines may be added for readability, but you should strip them.

JavaScript

The simplest language for writing a quine program is probably JavaScript. We assume the reader is probably familiar with it and is able to write the shortest JavaScript quine in several seconds. However, we focus on the steps that are taken to generate the quine, not the length of the final program itself. We can try to optimize it later, once we succeed to create at least one quine.

Example 1 - Byte array

This is the first example of writing a quine in JavaScript. We will not move to another programming language until we study it in details on JavaScript. The first example shows the most intuitive way to write a quine (well, it depends what someone considers to be intuitive, but this quine is obtained by strictly following the rules explained in the prevous chapter).

Step 1 - Draft

In this step we need to decide what encoding we will be using to represent the Code inside the Data segment. Because we are going to write Code in the next step, we don't know what it would contain, so we assume it may contain any valid language character. Therefore using a string to represent the Code is probably a bad idea, since we would face a character escaping problem. We can, however, encode any string as an array of bytes. It is probably the simplest encoding and there is no character escaping problem, so we will go with it in this first example. The Data would be an array containg decimal (base 10) numbers.

Now, we need to decide whether Code or Data will appear first in the source code. Since we first construct Data, it may be easier to first put Data and then put Code.

The quine so far looks like this:

D=[<Data>];<Code>

where <Data> is the content of the Data segment and <Code> is the content of the Code segment. Note that Data does not include even the array brackets. We tend to shrink it as much as possible.

Step 2 - Decoder

Now we need to write the content of the Code segment - that is, an algorithm that is capable of accessing the data from the Data segment and printing it in two different ways. As we decided, Data will contain literal integers representing the ASCII codes, separated by commas. So, we need to construct an algorithm to firstly traverse the array and print each number and commas between. Then it should traverse the array again and convert each number to the corresponding ASCII character, resulting in the source code representation of the Code segment itself. Here is how we can do it:

<Code>::=
  console.log('D=[' + D + '];' + D.map(n => String.fromCharCode(n)).join(''))

It may seem obvious how it works, but anyway here is the explanation. console.log is used to output a string. In some environments (like Node.js) it appends a new line, so process.stdout.write can be used instead, or a new line should literally be appended to the Code segment. The string 'D=[' is the hardcoded Heading segment. Then we literally append data array D. In JavaScript, it will automatically convert it to string by joining decimal representation of each number and separating them by commas. If we did not have such property of array, we would need to iterate over the array and stringify each number in the loop or map function callback.

Then we append the content of Mediator, which is '];'. Finally, we append the data again, but now we decode it so that it results in the Code segment itself. As said, we traverse array two times, the first time we print it to match the Data segment and the second time we print it to match the Code segment. Spaces are added for readability, but we can remove them to reduce the code length:

<Code>::=
  console.log('D=['+D+'];'+D.map(n=>String.fromCharCode(n)).join(''))

Step 3 - Meta-encoder

Finally, we have to implement an algorithm (in any programming language we want) to generate the Data segment based on the Code segment. As said, we will implement all meta-encoders in JavaScript, independently of the language we are writing quine in (you can also do it using a pencil and paper if you want). Currently, our quine looks like this:

D=[<Data>];console.log('D=['+D+'];'+D.map(n=>String.fromCharCode(n)).join(''))

Now we imlement a meta-encoder:

const metaEncoder = code => {
  return code.split('').
    map(char => char.charCodeAt(0)).join(',');
};

Applying this encoder to our Code segment, we obtain the following Data segment (new lines added for readability):

<Data>::=
  99,111,110,115,111,108,101,46,108,111,103,40,39,68,61,91,39,43,68,43,39,93,59,39,43,68,46,109,97,112,
  40,110,61,62,83,116,114,105,110,103,46,102,114,111,109,67,104,97,114,67,111,100,101,40,110,41,41,46,
  106,111,105,110,40,39,39,41,41

Finally, we substitute the Data segment into our quine, resulting in the final quine program (new lines added for readability):

D=[99,111,110,115,111,108,101,46,108,111,103,40,39,68,61,91,39,43,68,43,39,93,59,39,43,68,46,109,97,
112,40,110,61,62,83,116,114,105,110,103,46,102,114,111,109,67,104,97,114,67,111,100,101,40,110,41,41,
46,106,111,105,110,40,39,39,41,41];console.log('D=['+D+'];'+D.map(n=>String.fromCharCode(n)).join(''))

Example 2 - String

The previous example works, but someone may argue that it is excessively long and not very intuitive. Maybe someone could start with a string instead of a byte array. So, in this example, we will use strings instead.

Step 1 - Draft

Code will be encoded as a string inside the Data content. In JavaScript there are three types of string literals: single-quoted string, double-quoted string and template string (uses backticks). However, given that at this step we do not know anything about the Code, we cannot decide which string literal we will be using. We will pick one of the string types and if it turns out that Code contains the string delimiter, then we will do one of the following:

  1. Escape the delimiter inside the Code segment, removing it from the Code source code, or
  2. Escape it in the Data segment, replacing it with another character that is not used inside the Code segment

In this example, we will use the first option. Pick, for example, a single-quoted string and assume that Code will not contain the string delimiters inside itself. Our quine would look like this:

D='<Data>';<Code>

Step 2 - Decoder

We can trivially write a decoder like this:

<Code>::=
  console.log("D='" + D + "';" + D)

or with spaces removed:

<Code>::=
  console.log("D='"+D+"';"+D)

We output D literally two times, because it is really a literal representation of the Code segment, so Data does not encode it in any way.

As you can probably already see, there is a problem. The source code contains literal character ', which is used for string delimiters for the Data segment. So, we need to escape it. The easiest way is to use \x27, which is a string escape sequence for literal character '.

<Code>::=
  console.log("D=\x27"+D+"\x27;"+D)

Now we no longer have single-quote characters in our Code segment. However, we now have a backslash, which is also a special string character. We need to escape the slash itself, but the problem is because escaping a backslash requires another backslash, leading to a new problem.

So, we need to use some other trick. There is a way to generate any character without using literal string. It can be done by calling String.fromCharCode() internal function. We can do it here:

<Code>::=
  console.log("D="+String.fromCharCode(39)+D+String.fromCharCode(39)+";"+D)

It is even simpler if we introduce a new variable:

<Code>::=
  C=String.fromCharCode(39);console.log("D="+C+D+C+";"+D)

Step 3 - Meta-encoder

Meta encoder cannot be simpler:

const metaEncoder = code => {
  return code;
};

So, the Data is identical to the Code:

<Data>::=
  C=String.fromCharCode(39);console.log("D="+C+D+C+";"+D)

After substituting all segments, we obtain:

D='C=String.fromCharCode(39);console.log("D="+C+D+C+";"+D)';C=String.fromCharCode(39);console.log("D="+C+D+C+";"+D)

It is much shorter and readable than the previous exmaple.

Example 3 - String (option 2)

It would not be complete example if we do not explain the second option we had in the previous example. When we faced an issue with escaping character problem, we decided to pick the first option and escape the problematic character inside the Code segment. The second option would be to escape (replace) the problematic character inside the Data segment and then replace it again when printing.

Step 1 - Draft

Like in the previous example, Data is the string content, but all single-quote characters are replaced with some other character (for example with digit 2). Our quine template:

D='<Data>';<Code>

Step 2 - Decoder

Now we can use single-quote characters freely in the Code segment, but must replace 2 with ' when printing the Data for the second time. However, since we decided that 2 is an escape character, we cannot use it in the Code segment. We need to generate it somehow. The easiest way is as a result of expression 1+1.

<Code>::=
  console.log("D='" + D + "';" + D.split(1 + 1).join("'"))

Without spaces:

<Code>::=
  console.log("D='"+D+"';"+D.split(1+1).join("'"))

Basically, what we have done here is surpassing language barrier of obscure escaping mechanism (not obscure itself, only from the quine perspective). We must have at least one forbidden character, but we decided it to be a character that can be generates much simpler than a single-quote character.

Step 3 - Meta-encoder

Simply replace the single-quote character with digit 2.

const metaEncoder = code => {
  return code.replace(/'/g, '2');
};

It generates out Data segment:

<Data>::=
  console.log("D=2"+D+"2;"+D.split(1+1).join("2"))

All segments together:

D='console.log("D=2"+D+"2;"+D.split(1+1).join("2"))';console.log("D='"+D+"';"+D.split(1+1).join("'"))

Example 4 - Function

It is time for more advanced examples. Since Data segment encodes Code segment and Code can be reconstructed from Data, sometimes it may be possible for Data to decode and execute itself. This situation is not always achievable, as the process of decoding would involve updating Data itself, but then the encoded data is different. It may or may not converge (and meta-encoder can even be automatized).

One of the examples where Data acts like Code - it decodes itself and executes the decoded function. It can also be viewed as a reversed process - Code stringifies itself and does not need Data. While the latter point of view may be more intuitive (when we talk about stringifying functions), what actually happens is that Data executes itself. It "effectively" stringifies the function, but if you think about it, the function was already stringified in the source code itself - so Data parses and executes it (interpreter does that for you).

Anyway, the situation when Data is capable of decoding itself is nothing special - we can still separate the source code into five segments, but the only difference is that Code delegates the actions to Data. As said, it can be viewed as Code performing all computation by interpreting the instructions stored inside Data.

In this example, we will define a function that effectively stringifies itself and uses that string to generate the output.

Step 1 - Draft

We know that we need at least one function, so we start with:

D=()=><Data>;<Code>

This code defines a variavle D that is an arrow function.

Step 2 - Decoder

Simply delegate all the computation to function D:

<Code>::=
  D()

Step 3 - Meta-encoder

Unlike in previous examples, here is the core part inside the Data segment, because it contains all the logic.

const metaEncoder = code => {
  return `console.log('D='+D+';${code}')`;
};

It gives the following Data segment:

<Data>::=
  console.log('D='+D+';D()')

Final quine:

D=()=>console.log('D='+D+';D()');D()

Example 5 - Comment

Another obscure way to write a quine is to use a comment to store the Data segment.

Step 1 - Draft

We store all the Data content inside a comment and the comment is inside a function, so that we can access it from the Code segment.

D=()=>/*<Data>*/{};<Code>

Step 2 - Decoder

We have access to variable D, which also can be stringified and the comment can be extracted. We assume the content contains the literal Code segment, so no replacing is needed. We do not even have an illegal character, though we have an illegal sequence of characters: we cannot use literal */ in the Code segment, because it would end the comment in the Data segment. One of the solutions is:

<Code>::=
  console.log('D=' + D + ';' + (D + '').replace(/^.*?\*|\*[^*]*$/g, ''))

With spaces removed:

<Code>::=
  console.log('D='+D+';'+(D+'').replace(/^.*?\*|\*[^*]*$/g,''))

Explanation: we simply stringify the function and remove everything before the first * (including it) and after the last * (including it).

Step 3 - Meta-encoder

const metaEncoder = code => {
  return code;
};

The Data is identical to the Code:

<Data>::=
  console.log('D='+D+';'+(D+'').replace(/^.*?\*|\*[^*]*$/g,''))

Final quine:

D=()=>/*console.log('D='+D+';'+(D+'').replace(/^.*?\*|\*[^*]*$/g,''))*/{};console.log('D='+D+';'+(D+'').replace(/^.*?\*|\*[^*]*$/g,''))

Example 6 - Advanced encoding

All examples so far are more or less readable and we can se what is going on. However, in this example we will demonstrate that the entire Data can be encoded using any reversible algorithm and once it is decoded at runtime, we can delegate the control from Code to Data, making it appear that neither Data nor Code are present in the source code.

For simplicity we will use Base64 encoding. In browsers there are functions btoa (for encoding) and atob (for decoding). In Node.js you can use Buffer.from and Buffer.toString. In this example we will focus on browser only.

Step 1 - Draft

We want our entire Data to be executable (like in the example of stringifying a function), because Code is going to delegate the control flow to Data once it decodes it. We also want Data to be encoded using the Base64 encoding. So, we need something like this:

<Code>(atob`<Data>`)

Basically, the data needs to be decoded and Code is probably going to be a function that will parse and execute the decoded Data. To make the code shorter, we use templated function call.

Step 2 - Decoder

In order to parse a code and execute it, we can use eval:

<Code>::=
  eval

Step 3 - Meta-encoder

This is interesting part and we will do it manually.

The Data should itself be a quine, but instead of outputting itself, it outputs the Base64 encoded itself. Using the template from Example 4 we can easily construct a self-stringifying function:

 atob(<Data>)::=
  (a=b=>console.log('<Code>(atob`'+btoa(`(a=${a})()`)+'`)'))()

Now we substitute the Code segment:

 atob(<Data>)::=
  (a=b=>console.log('eval(atob`'+btoa(`(a=${a})()`)+'`)'))()

And convert it to Base64:

<Data>::=
  KGE9Yj0+Y29uc29sZS5sb2coJ2V2YWwoYXRvYmAnK2J0b2EoYChhPSR7YX0pKClgKSsnYCknKSkoKQ==

Final quine:

eval(atob`KGE9Yj0+Y29uc29sZS5sb2coJ2V2YWwoYXRvYmAnK2J0b2EoYChhPSR7YX0pKClgKSsnYCknKSkoKQ==`)

You can construct a lot more interesting quines, but we think this chapter covered the majority of patterns and we can now move to other languages.

Python

Still going with practical languages, to practice a little more.

Step 1 - Draft

We use a simple array of bytes, like Example 1 from the JavaScript chapter. We do not care about quine length.

D=[<Data>];<Code>

Step 2 - Decoder

<Code>::=
  print('D=['+','.join([str(a) for a in D])+'];'+''.join([chr(a) for a in D]))

Step 3 - Meta-encoder

const metaEncoder = code => {
  return code.split('').
    map(char => char.charCodeAt(0)).join(',');
};

Data:

<Data>::=
  112,114,105,110,116,40,39,68,61,91,39,43,39,44,39,46,106,111,105,110,40,91,115,116,114,40,97,41,32,102,111,
  114,32,97,32,105,110,32,68,93,41,43,39,93,59,39,43,39,39,46,106,111,105,110,40,91,99,104,114,40,97,41,32,102,
  111,114,32,97,32,105,110,32,68,93,41,41

Final quine:

D=[112,114,105,110,116,40,39,68,61,91,39,43,39,44,39,46,106,111,105,110,40,91,115,116,114,40,97,41,32,102,111,
114,32,97,32,105,110,32,68,93,41,43,39,93,59,39,43,39,39,46,106,111,105,110,40,91,99,104,114,40,97,41,32,102,
111,114,32,97,32,105,110,32,68,93,41,41];print('D=['+','.join([str(a) for a in D])+'];'+''.join([chr(a) for a in D]))

Haskell

Step 1 - Draft

import Data.Char
a=[<Data><Code>

Step 2 - Decoder

<Code>::=
  ]
  b(d:a)e=b a(e++(show d)++",")
  b[]e=init e
  c(d:a)e=c a(e++[chr d])
  c[]e=e
  main=putStr("import Data.Char\na=["++(b a"")++(c a""))

Step 3 - Meta-encoder

const metaEncoder = code => {
  return code.split('').
    map(char => char.charCodeAt(0)).join(',');
};
<Data>::=
  93,10,98,40,100,58,97,41,101,61,98,32,97,40,101,43,43,40,115,104,111,119,32,100,41,43,43,34,44,34,41,10,98,91,93,
  101,61,105,110,105,116,32,101,10,99,40,100,58,97,41,101,61,99,32,97,40,101,43,43,91,99,104,114,32,100,93,41,10,99,
  91,93,101,61,101,10,109,97,105,110,61,112,117,116,83,116,114,40,34,105,109,112,111,114,116,32,68,97,116,97,46,67,
  104,97,114,92,110,97,61,91,34,43,43,40,98,32,97,34,34,41,43,43,40,99,32,97,34,34,41,41
import Data.Char
a=[93,10,98,40,100,58,97,41,101,61,98,32,97,40,101,43,43,40,115,104,111,119,32,100,41,43,43,34,44,34,41,10,98,91,93,
101,61,105,110,105,116,32,101,10,99,40,100,58,97,41,101,61,99,32,97,40,101,43,43,91,99,104,114,32,100,93,41,10,99,91,
93,101,61,101,10,109,97,105,110,61,112,117,116,83,116,114,40,34,105,109,112,111,114,116,32,68,97,116,97,46,67,104,97,
114,92,110,97,61,91,34,43,43,40,98,32,97,34,34,41,43,43,40,99,32,97,34,34,41,41]
b(d:a)e=b a(e++(show d)++",")
b[]e=init e
c(d:a)e=c a(e++[chr d])
c[]e=e
main=putStr("import Data.Char\na=["++(b a"")++(c a""))

Other practical languages

You can probably catch the pattern. In almost any practical language it is trivial to write a quine using the same meta-encoder. There is no point of showing more examples - all look almost identical.