COVID-19

From Esolang
Jump to navigation Jump to search

Not to be confused with the pandemic.
The Computation-Oriented Virtual Infection of Disks, version 2019 (COVID-19) is an esoteric programming language. It infects the hard drive by making copies of the source file itself (reminiscent of the forking process), therefore emulating non-deterministic Turing machines (i.e. COVID-19 can solve NP problems in polynomial time regardless of whether P=NP, as long as it is executed on the correct hardware). What also distinguishes COVID-19 from other languages is that the code is written on an RNA strand. Therefore, the most simple COVID-19 programs only require a minimal alphabet {A,U,C,G}, while more complex programs require some non-natural bases.

Note that a fully standard-compliant COVID-19 implementation may be very difficult to achieve, since the interpreter must be able to manipulate the hardware in unusual ways, such as turn off the CPU fan, burn a random resistor or permanently destroy the computer (vide infra). Moreover the behavior of the program is explicitly dependent on the CPU temperature, which can also complicate the implementation. Due to this difficulty, it is decided that a COVID-19 interpreter that does not implement these features can still be called a valid COVID-19 implementation.

The file extension of COVID-19 source code is .sarscov2 (not .covid19 as one will naively assume).

Language overview

General overview of source code format: the RNA strand

The source file of COVID-19 is initially composed of one read-only RNA strand, although more RNA strands may be automatically appended to the file during execution. The RNA strand should only include the following characters (called bases): A, U, C, G, R, and F. While the former four bases are self-documenting, it is worth pointing out that the latter two bases represent Remdesivir and Favipiravir, respectively. Unless otherwise noted, R is treated as A. Each F is treated as A when it has not been visited by SEQ or RDRP before (vide infra); if it has been visited by SEQ or RDRP at least once, it is treated as G if the last visit treats it as A, and is treated as A if the last visit treats it as G.

The RNA strand is composed of the following parts:

  • The non-coding region(s), which can be anywhere in the strand.
  • The coding region(s), which can be anywhere in the strand. It must start with AUG and end with one of UAG, UGA or UAA.
  • The poly(A) tail (optional). Must be at the end of the RNA strand, and must contain at least 20 consecutive A's. RNA strands without the poly(A) tail will be deleted after the program finishes, so it is very important to append the poly(A) tail to any code that you wish to run more than once.

The program starts by finding the first occurrence of the string AUG, known as the start codon. Then a process called translation occurs, where the bases are translated three at a time to amino acids. The translation rule is (* denotes an arbitrary base):

UUU/UUC -> F
UUA/UUG/CU* -> L
AUU/AUC/AUA -> I
AUG -> M
GU* -> V
UC*/AGU/AGC -> S
CC* -> P
AC* -> T
GC* -> A
UAU/UAC -> Y
CAU/CAC -> H
CAA/CAG -> Q
AAU/AAC -> N
AAA/AAG -> K
GAU/GAC -> D
GAA/GAG -> E
UGU/UGC -> C
UGG -> W
CG*/AGA/AGG -> R
GG* -> G

When one of the stop codons UAG, UGA or UAA is met, the translation stops. The interpreter examines the string of the amino acids (known as a protein) and performs actions accordingly (see next section).

After that, the interpreter finds the next occurrence of AUG and starts translation again. When the instruction strand is exhausted, the interpreter returns to the first occurrence of AUG of the RNA strand, ad infinitum. If the RNA strand is exhausted during translation, the program throws an error and exits.

Procedures, enzymes and drugs

Once the translation of a protein is complete, the interpreter searches for the following substrings in the protein:

SEQ, PCR, RDRP, RDV, FAV, CQ

Among these, the strings SEQ and PCR are known as procedures, RDRP is known as an enzyme, and RDV, FAV and CQ are drugs. Any amino acids that are neither part of these substrings nor part of their parameters are treated as no-ops.

SEQ

Short for "sequencing". Takes two parameters immediately following the procedure: the first parameter is the start position of the sequencing procedure, expressed as a binary number where 1 is represented by T and 0 is represented by F; the second parameter is the length of the RNA segment to be sequenced, expressed as a binary number where 1 is represented by Y and 0 is represented by N. The "sequenced" RNA segment is transformed to a binary sequence, with G representing 1 and A representing 0 (C and U are neglected), and output to stdout as ASCII characters. If the length of the binary sequence is not a multiple of 8, it is padded by 0's in the end.

PCR

Synonym "RTPCR". Takes two DNA strings and two code blocks immediately following the procedure as arguments.

The two DNA strings, also known as the primers, use the alphabet {A,T,C,G} (however keep in mind that they are the namesake amino acids, not true bases). The PCR procedure determines if there exist two segments in the RNA strand such that one of them is the same with the first DNA string (herein we view U and T as identical), and the other is complementary (in the Watson-Crick sense) to the second DNA string. If such two segments are found, we say that the PCR test is positive; otherwise the PCR test is negative. The two DNA strings are delimited by "W", meaning "with". The two primers must not be mutually complementary, otherwise the program throws an error and exits. If the RNA includes the base F, it will match either A or G, regardless of how many times the base has been visited. Thus, GAUFU is considered as the same with both GATAT and GATGT, and complementary with both CTATA and CTACA.

The two code blocks start by "I" and "H", standing for "Infected" (not "Ill", as some patients infected with COVID-19 may have no symptoms) and "Healthy", respectively, and contain code that should be executed if the PCR test is positive or negative, respectively. The last code block concludes with a "K", standing for "Keep yourself at home." Note that both code blocks can be empty, and the relative order of the two code blocks is not important.

Example: the following protein

MRTPCRATGWTTTTTISEQTYNNNHSEQTFFTYNNNNK

checks if the RNA contains the segments AUG and AAAAA; the first segment is trivially present (since it is the start codon; otherwise the protein will not exist in the first place), and the second segment is present whenever the RNA has a poly(A) tail. If the test is positive, display the first 8 bases as ASCII; if the test is negative, display the 9th to 24th bases as ASCII.

Note: the PCR test is only reliable if the program is executed at least 14 days after the installation of the interpreter; this is known as the incubation period. Otherwise the PCR test may occasionally return "false negative" results, i.e. it is negative when it ought to be positive. The false negative rate decreases linearly from 1 to 0 as the time from interpreter installation to program execution increases from 0 day to 14 days. This feature can be used to construct a random number generator.

RDRP

Short for "RNA-dependent RNA polymerase". Generates the complementary chain of the last RNA strand in the source file, and append the new chain as a separate line to the end of the source file.

Special cases arise if either the old or the new RNA strand contains the base R. RDRP aborts after replicating the 5th base after the R. The current chain is thus shorter than the original (template) RNA strand. However, if there are less than 5 bases after the R, RDRP continues normally.

After RDRP finishes, the new RNA strand is copied to another file and executed as an independent session, if and only if the new RNA strand is terminated by a poly(A) tail.

RDV

Short for "Remdesivir". Takes one binary integer m as parameter, where 1 is represented by Y and 0 is represented by N. The first U encountered after position m, in the first RDRP run that follows the administration of the drug, will be replicated to R, rather than A.

FAV

Short for "Favipiravir". Takes one binary integer m as parameter, where 1 is represented by Y and 0 is represented by N. The first U or C encountered after position m, in the first RDRP run that follows the administration of the drug, will be replicated to F, rather than A or G. If the F is incorporated into the new RNA strand as a complementary base of U, we say that it has been treated as an A, and will thus be treated as G when it is visited by SEQ or RDRP next time; vice versa if the F acted as a complementary base of C during its incorporation into the new RNA strand.

CQ

Short for "Chloroquine". Stops the program. Note that there must not be more than five occurrences of CQ in any given protein, otherwise the computer will self-destruct immediately. This is known as the adverse effect of chloroquine. A drug that has less adverse effect than CQ, HCQ (hydroxychloroquine), is available, and the protein can include up to 10 occurrences of HCQ before triggering the self-destruction of the computer.

Language features related to hardware

Since COVID-19 only proves to be superior to most computational languages when it has access to infinite CPU and disk resources (it is only in this limit that it can simulate non-deterministic Turing machines), the COVID-19 interpreter gets really angry when the available CPU and/or disk resources probably cannot support the computation. Therefore one will observe unusual behaviors in this regime, specifically:

A COVID-19 program will abort and delete itself after running for 30 minutes on a CPU whose temperature is 56 degrees Celsius or above. This phenomenon is known as disinfection.

When the disk is 50% full due to the replicating .sarscov2 files, the CPU fan(s) will be turned off automatically. The user should immediately start to cool the CPU using mechanical ventilation to prevent the computer from self-destructing.

When the disk is 90% full due to the replicating .sarscov2 files, one of the resistors of the computer will melt down. The user must immediately find a suitable resistor and connect it to the broken resistor in order to keep the computer running. This process is called the Electric Circuit Maintenance Operation (ECMO).

When the disk is 100% full due to the replicating .sarscov2 files, the computer self-destructs irreversibly.

Example: Hello, world!

AGAAGAAACCAGGAAGAGCCAGGAGGAACCAGGAGGAACCAGGAGGGGUUAAGAGGAAUUAAGAAAAAUUAGGGAGGGCCAGGAGGGGCCAGGGAAGACCAGGAGGAACCAGGAAGAAUUAAGAAAAUGUCUGAGCAAACCUACAAUAAUAAUAACAACAACUACUGCCAGUAAAAAAAAAAAAAAAAAAAA

Note that the last two A's of the stop codon UAA can overlap with the poly(A) tail to save two A's. The first AUG is in position 127. Translation from this point yields the protein

MSEQTYNNNNNNYCQ

which is interpreted as follows

SEQ 1 129
CQ

Therefore, the 129 bases starting from position 1 (inclusive) are sequenced and output to stdout:

AGAAGAAACC = 01001000 = 'H'
AGGAAGAGCC = 01100101 = 'e'
AGGAGGAACC = 01101100 = 'l'
AGGAGGAACC = 01101100 = 'l'
AGGAGGGGUU = 01101111 = 'o'
AAGAGGAAUU = 00101100 = ','
AAGAAAAAUU = 00100000 = ' '
AGGGAGGGCC = 01110111 = 'w'
AGGAGGGGCC = 01101111 = 'o'
AGGGAAGACC = 01110010 = 'r'
AGGAGGAACC = 01101100 = 'l'
AGGAAGAAUU = 01100100 = 'd'
AAGAAAAUG  = 00100001 = '!'

After the SEQ procedure completes, the drug CQ is administered to stop the activities of the virus. The program thus terminates normally.

Predecessor

The COVID-19 language is preceded by a simpler language, SARS (also known as COVID-02). The only differences from COVID-19 are that SARS does not support the bases R and F, nor the drugs RDV, FAV and CQ, and that there is no incubation period for the PCR procedure. The file extension of SARS source code is .sarscov.