MONOD

Molecular biology influenced esoteric programming language named after French biologist Jacques Monod. Since I have little better to do, I'll probably start documenting it here now it's in workable shape. If anything passes for an original design document, it's probably this blog post.

=Instructions=

Wat :||||||
The codewheel above is that for v1.0 of MONOD. It is based on the same principle as the, being one of the better ways of displaying essentially three-dimensional data in two.

The main conceit of MONOD is that the sourcecode is a string over Σ = {G,A,T,C} and so looks like DNA. It is stored as plaintext with a .dna suffix. MONOD, like DNA, is divided into 64 triplets Σ³ = {GGG,GGA...CCC} each of which can code for an instruction/amino acid. The first member of the triplet is the innermost circle, with the second and third being read off, so TAT and TAA both => BIND. The interpreter reads in the string over Σ and converts it into instructions, also inaccurately called "codons" (this properly being the name for the Σ³ triplets).

Execution Control
Before explaining what the instructions are, a brief explanation of control of execution and a digression into data types in MONOD:

(real) Genomes are not so much programs as file structures from which the cell can pull up programs to execute for a million different purposes. MONOD works on the same principle. By itself, a .dna file is useless -- just as DNA floating around in a test-tube is.

There are three datatypes in MONOD - codons, binding sequences and integers. Codons are represented as four-letter capital strings and at present there are 16 - these are the operations. Integers are just integers. Binding sequences take the form "*aaabbb...nnn", where nnn is a lower case version of Σ³ -- *gagtat, *gggggg, *ggg are valid binding sequences, *ga, *gagagu, *tremme are not. * by itself is also valid. It is binding sequences that are effectively the main agents of control in MONOD.

Execution does not happen to notional "DNA" strings -- instead what is executed are "proteins", which are copies of sections of the .dna sourcecode. Since it is necessary to start somewhere, a protein is added to "prime" execution of the program, this being provided in .rna (or parsed .dna) form. .rna files are plaintext strings of codons, binding sequences and integers that *are* executed upon initiation. For any interesting programs to happen, this priming "Kadmon" protein should contain code that "transcrates" (or TRANs) one or more proteins into being and add them to the list of proteins in existence (to begin with, just Kadmon). When the interpreter has finished executing the Kadmon protein (or any protein), it moves onto the next protein in the list of proteins produced.

Proteins, therefore, can interact with other proteins in two ways - by binding to them, or by TRANSing them into existence (in which case, the protein is automatically bound to them). Binding is achieved by matching of binding sequences - so that a protein with the correct codon sequence and the binding sequence *tatgag will look for that binding sequence in the genome. Every time it finds one. If the protein is attempting to TRANS all *tatgag proteins into existence, it will check to see if each instance of *tatgag is followed by a codon indicating that A Protein Starts Here, and if so, TRANSes all relevant proteins into existence. Binding is a similar process.

In MONOD, state is stored in two places -- locally, individual protein instantiations have an attribute "phosphorylation" derived from a biochemical term. In MONOD this is simply a (signed!) integer. Globally there is a "chemical array", 64 signed integers large. Proteins can increment or decrement this in several ways, and it is displayed finally when all proteins have been executed.

If a protein is bound to another, then certain operations which would change the state of the protein itself, or change state based on protein state will instead act according to, or upon the bound protein. In this way, proteins can act upon proteins that are yet to be executed and some long-distance control can be wielded.

If you have gotten this far, you are probably about ready to understand the...

Instruction List
If the operation has a dollar after it, eg. , then any triplet following it is interpreted as an integer, referred to as n. A trailing asterisk eg. indicates that a binding sequence follows, referred to as. This notation is strictly for illustrative purposes and is not present in actual code.

Blank space on the codewheel indicates reserved triplets for future use. triplets are reserved reserved triplets, used when meaningless codons, written ????, are translated back into DNA code.

=Sample Session= pthag@pthag-desktop:~/python/monod$ python monod.py -x bettergenome.dna kadmon.rna -e -s __ __  ___  _   _  ___  ____        |  \/  |/ _ \| \ | |/ _ \|  _ \       | |\/| | | | |  \| | | | | | | |      | |  | | |_| | |\  | |_| | |_| |      |_|  |_|\___/|_| \_|\___/|____/       robert harry nicodemus williams~ Hello. Priming with KadA (Kadmon)! ['TRAN', 'BIND', '*tatgag', 'BEND', 'BACK', 'NULL', 'LYSE'] Starting to execute KadA (Kadmon) KadA finds 4 "*tatgag" in genome. New Protein XhiA! ['PHOS', 'PHOS', 'PHOS', 'META', 21, 'FOOO', 'LYSE'] (genome offset 5) New Protein XagA! ['PHOS', 'META', 13, 'DEPH', 'JESU', 21, 'JESU', 2, 'BACK', 'BIND', '*catgat', 'BEND', 'PHOS', 'PHOS', 'PHOS', 'PHOS', 'JESU', 21, 'BACK', 'LYSE'] (genome offset 18) New Protein DjtA! ['BIND', '*catgat', 'BEND', 'PHOS', 'PHOS', 'PHOS', 'PHOS', 'PHOS', 'META', 8, 'LYSE'] (genome offset 46) New Protein CvjA! ['PHOS', 'PHOS', 'META', 52, 'NULL', 'BIND', '*', 'BEND', 'FOOO', 'FOOO', 'BEND', 'DEPH', 'LYSE'] (genome offset 63) KadA does nothing (BEND) KadA unbinds CvjA! KadA does nothing (NULL) KadA lyses itself. Finished executing KadA (Kadmon) Starting to execute XhiA. XhiA increased phosphorylation to 1 XhiA increased phosphorylation to 2 XhiA increased phosphorylation to 3 XhiA increased register 21 by 3. XhiA does nothing (FOOO) XhiA lyses itself. Finished executing XhiA. Starting to execute XagA. XagA increased phosphorylation to 1 XagA increased register 13 by 1. XagA decreased phosphorylation to 0 XagA phosphor-sets register 0 to 21 XagA phosphor-sets register 0 to 2 XagA unbound nothing (BACK) XagA activates binding site *catgat XagA binds DjtA through *catgat XagA increases phosphorylation of DjtA to 1 XagA increases phosphorylation of DjtA to 2 XagA increases phosphorylation of DjtA to 3 XagA increases phosphorylation of DjtA to 4 XagA phosphor-sets register 4 by 21 to DjtA XagA unbinds DjtA! XagA lyses itself. Finished executing XagA. Starting to execute DjtA. DjtA does nothing (BIND) DjtA does nothing (BEND) DjtA increased phosphorylation to 5 DjtA increased phosphorylation to 6 DjtA increased phosphorylation to 7 DjtA increased phosphorylation to 8 DjtA increased phosphorylation to 9 DjtA increased register 8 by 9. DjtA lyses itself. Finished executing DjtA. Starting to execute CvjA. CvjA increased phosphorylation to 1 CvjA increased phosphorylation to 2 CvjA increased register 52 by 2. CvjA does nothing (NULL) CvjA does nothing (BIND) CvjA does nothing (BEND) CvjA does nothing (FOOO) CvjA does nothing (FOOO) CvjA does nothing (BEND) CvjA decreased phosphorylation to 1 CvjA lyses itself. Finished executing CvjA. Protein array empty. Stopping.

Final chemical memory state: 2 0  0  0 21  0  0  0  9  0  0  0  0  1  0  0 0  0  0  0  0  3  0  0 0  0  0  0  0  0  0  0 0  0  0  0  0  0  0  0 0  0  0  0  0  0  0  0 0  0  0  0  2  0  0  0 0  0  0  0  0  0  0  0

=Example Programs=

The One In The Sample
Not terribly exciting, but at least it makes numbers appear. If put on the spot, I guess you could say it sets chem[8] = (0+4-1)+5 and futzes around with chem[4].

bettergenome.dna
AAACCCCCCAAATAATATGAGATAGCGCCCCCCCCCTACTTTTTTCAGGGCAAATAATATGAGATAGCGCCCTACACTGGGACTTTTACTAAAAATTAACATGATATACC CCCCCCCCCCACTTTTAATCAGGGCAAAACTCCCTAATATGAGATAGCGTAACATGATATACCCCCCCCCCCCCCCTACAGACAGGGCAAATAATATGAGATAGCGCCCCC CTACCTAAAATAAATTTTTTTTATAGGGCAGGGC

kadmon.rna
TRAN BIND *tatgag BEND BACK NULL LYSE

null.dna
this genome is A useful one for debugging purposes - it does Absolutely nothing at All. Useful, sure, but also enlightening! The interpreter ignores all non{AGCT} characters, so this is equivalent to AAA =The Interpreter= Is written poorly in Python, and at MONOD/interpreterv1 for want of anywhere else to put it. It goes into py2exe quite nicely, which is available.

Command line options
-s -- prints ASCII equivalents of chemical array numbers next to it at the end -e -- prints all messages (rather than some) in English. -h -- is supposed to display help -t -- mode for converting .rna (easier to write) files into .dna files -n -- prints off the list of triplet-equivalents for numbers 0 to 64 -x foo.dna bar.rna -- main functionality -c -- inputs text and spits out a DNA string version - useful for meaningful binding site names -f -- old school version of -x, asks for filenames and waits for input

If none of -h, -t, -n, -x, -c, -f are given, then the interpreter reverts to debug mode, and executes the genome and Kadmon protein hardcoded into it.