Brainfuck interpreter in Thue

This brainfuck interpreter in Thue was written by User:Koen. It has been tested using Mykola Makhin's Thue interpreter in Java (see below), against a few brainfuck programs and should work properly. Because the program is a little long, doesn't contain any comments, and is hardly readable at all, it has been divided and structured in this page. Please contact Koen if you're interested in the original file.

Features
You might want to call them bugs.
 * The brainfuck program must be submitted from standard input
 * Input must contain characters  exclusively
 * comments or whitespace would produce unspecified results
 * end of input must be signified either with a linefeed, or end of file
 * The tape uses lazy evaluation and is unbounded both to the left and to the right
 * Every cell is one byte long, with wrapping for  and
 * The input instruction  is treated as a no-op
 * The output instruction  requires your thue implementation to
 * interpret  as "replace a with the empty string and print b, not followed by a newline"
 * interpret  as "replace a with the empty string and print a newline"
 * The output instruction  might not work exactly as is usually expected:
 * if the current cell is 10 or in the range 32-126, the matching ASCII character will be outputted
 * if the current cell is 0-9, 11-31, or 127, nothing will be printed
 * if the current cell is in the range 128-255, execution will terminate
 * A Thue implementation cleverly abusing Thue's nondeterminism might significantly boost speed performances:
 * This program contains a lot of substitution rules, but only a few are used at a time. For instance, if the instruction  is currently being executed, then the handful of increment-related substitution rules will be called a lot, while output-related rules will be useless. This means that if the thue implementation tries rules at random, a lot of time will be wasted trying irrelevant rules. This could be avoided by constantly sorting the substitution rules list: instead of randomly picking rules, the implementation should try rules in the order they are listed, and every time a rule is used, that rule is placed on top of the list. This way, recently-used rules will be faster to access.
 * Similarly, the string representing the program state might be quite long, as it contains both the tape and the brainfuck code. But at any time, only small parts of it need to be accessed: if a part of the program state hasn't been modified in a while, then it is likely that it won't be modified soon - in fact, if a substring is not the original string of any rule, then it won't be until it has been modified. Since all rules' original strings are only a few characters long, searching for substrings near the area where the last replaced was made might save a lot of time.

Initial string
BEGIN;(00000000);END $brainfuck The part between  and   is the tape; cells are separated with semicolons , and the current cell is surrounded in brackets. The string  will be replaced at the beginning of execution will the brainfuck program; the   sign indicates the next instruction is ready to be executed.

General purpose rules
brainfuck::=:::

?0::=0? ?1::=1? ?;::=;? ?(::=(? ?)::=)? ?END ::=END $ The first rule takes one line of input for the brainfuck program. This rule will never be triggered again. During execution, the  character is a marker than means "the current instruction has been successfully executed; proceed to next instruction".

Instruction (increment current cell)
$+::={+} 0{+}::={+}0 1{+}::={+}1 ){+}::=+) 0+::={+}1? 1+::=+0 (+::={+}(? ({+}::={+}( BEGIN{+}::=+ BEGIN The symbol  move through the tape, protected inside curly brackets. Once it has reached the current cell, it disposes of the curly brackets and performs the increment. Once it is done, it continues its movement to the left, protected inside curly brackets again, and send the  marker to the next instruction.
 * END {+}::={+};END
 * {+}::={+};

Instruction (decrement current cell)
$-::={-} 0{-}::={-}0 1{-}::={-}1 ){-}::=-) 0-::=-1 1-::={-}0? (-::={-}(? ({-}::={-}( BEGIN{-}::=- BEGIN
 * END {-}::={-};END
 * {-}::={-};

Instruction (move to previous cell)
$<::={<} 0{<}::={<}0 1{<}::={<}1 ){<}::={<}{)<} 0{)<}::={)<}0 1{)<}::={)<}1 0{(<}::={(<}0 1{(<}::={(<}1 BEGIN{<}::=< BEGIN BEGIN{(<}::=< BEGIN;(00000000 Similar to, the symbol   proceeds through the tape inside curly brackets. Once it has reached the current cell, it is divided in two threads: one will move the  one cell to the left, then send a   marker for the next instruction, and the other will move the   one cell to the left, then continue to the left to the already executed instructions.
 * END {<}::={<};END
 * {<}::={<};
 * {)<}::=);?
 * ({<}::={(<};
 * {(<}::={<};(

Instruction (move to next cell)
$>::={>} 0{>}::={>}0 1{>}::={>}1 ){>};::={>};{)>} {)>}0::=0{)>} {)>}1::=1{)>} {)>};::=); {)>}END::=00000000);END {(>}0::=0{(>} {(>}1::=1{(>} {(>};::=;(? BEGIN{>}::=> BEGIN
 * END {>}::={>};END
 * {>}::={>};
 * ({>}::={>};{(>}

Instruction (begin loop)
This instruction is more complicated than the others and will be divided in several parts. ;END $[::={=};END [ 0{=}::={=}0 1{=}::={=}1 ){=}::={(=)}) 0{(=)}::={(=)}0 1{(=)}::=1{!=} ({(=)}::=({==} {==}0::=0{==} {==}1::=1{==} {==})::=){==} {==};::=;{==} {!=}0::=0{!=} {!=}1::=1{!=} {!=})::=){!=} {!=};::=;{!=} {!=}END [::={'[}END $ The first part sends a  signal to test whether the current cell is zero  or nonzero. If it is nonzero, then execution continues with the body of the loop: the  instruction is neutralized as   and the   marker triggers next instruction. Neutralizing instruction  as   means "move instruction   to the left to the already executed instructions, without interacting at all with the tape".
 * {=}::={=};

{==}END [::={'[}END &[&% O[::={'[}I I[::=[O END &[&[::={'[}END &[&I I]::={']}O{?'} O]::=]I &[&]I::=&[&] END &[&]{%}::={']}END $ %+::={'+}% %-::={'-}% %>::={'>}% %<::={'<}% %.::={'.}% %,::={',}% %[::=[% %]::=]{%} {?'}I::=I{?'} {?'}O::=O{?'} {?'}{%}::=% If the current cell is zero, a loop skip operation begins. The loop skip is marked by, when the binary number counts the "depth" of potentially nested loops. The effect will be to skip all characters until the matching  by neutralizing them inside. Instructions  are immediately neutralized, instruction   increments the depth counter then is neutralized, and instruction   interrupts the loop skip operation (by replacing   with  ), decrements the depth counter, then restart the skipping if the counter was nonzero (by sending   to deblock  ), or deletes the loop counter and triggers next instruction with   if it was zero (remark: it was zero iff it contains only 1s after having been decremented). The depth counter is binary but uses bits  and   instead of   and   so that it cannot be confused with the content of the tape.

O{'+}::={'+}O I{'+}::={'+}I END &[&{'+}::={'+}END &[& 0{'+}::={'+}0 1{'+}::={'+}1 ){'+}::={'+}) ({'+}::={'+}( BEGIN{'+}::=+ BEGIN O{'-}::={'-}O I{'-}::={'-}I END &[&{'-}::={'-}END &[& 0{'-}::={'-}0 1{'-}::={'-}1 ){'-}::={'-}) ({'-}::={'-}( BEGIN{'-}::=- BEGIN O{'<}::={'<}O I{'<}::={'<}I END &[&{'<}::={'<}END &[& 0{'<}::={'<}0 1{'<}::={'<}1 ){'<}::={'<}) ({'<}::={'<}( BEGIN{'<}::=< BEGIN O{'>}::={'>}O I{'>}::={'>}I END &[&{'>}::={'>}END &[& 0{'>}::={'>}0 1{'>}::={'>}1 ){'>}::={'>}) ({'>}::={'>}( BEGIN{'>}::=> BEGIN O{'.}::={'.}O I{'.}::={'.}I END &[&{'.}::={'.}END &[& 0{'.}::={'.}0 1{'.}::={'.}1 ){'.}::={'.}) ({'.}::={'.}( BEGIN{'.}::=. BEGIN O{',}::={',}O I{',}::={',}I END &[&{',}::={',}END &[& 0{',}::={',}0 1{',}::={',}1 ){',}::={',}) ({',}::={',}( BEGIN{',}::=, BEGIN 0{'[}::={'[}0 1{'[}::={'[}1 ){'[}::={'[}) ({'[}::={'[}( O{'[}::={'[}O I{'[}::={'[}I END &[&{'[}::={'[}END &[& BEGIN{'[}::=[ BEGIN 0{']}::={']}0 1{']}::={']}1 ){']}::={']}) ({']}::={']}( O{']}::={']}O I{']}::={']}I END &[&{']}::={']}END &[& BEGIN{']}::=] BEGIN This is nothing more than a definition of how neutralized instructions behave. Because Thue doesn't allow for any kind of variables or pattern-matching, it has to be done for every instruction. This is one of the reasons why comments are not supported in the brainfuck program; because deleting comments would require to test for every possible character than can be used for a comment. Another reason is that Thue doesn't have mechanisms to protect it from "code injection": if the brainfuck program contains comments, they might trigger some substitution rules and result in completely undefined behaviour.
 * {'+}::={'+};
 * {'-}::={'-};
 * {'<}::={'<};
 * {'>}::={'>};
 * {'.}::={'.};
 * {',}::={',};
 * {'[}::={'[};
 * {']}::={']};

Instruction (end loop)
This instruction is very similar to. ;END $]::={=};END ] {==}END ]::={']}END $ {!=}END ]::={]}END ] 0{]}::={]}0 1{]}::={]}1 ){]}::={]}) ({]}::={]}( BEGIN{]}::=*&]& BEGIN ]o::=i{]'} ]i::=o] ]&]& BEGIN::=i&]& BEGIN{]'} [i::={'?}o{['} [o::=i[ i[&]&::=[&]& {*}[&]& BEGIN::=[ BEGIN? +*::=*{+'} -*::=*{-'} >*::=*{>'} <*::=*{<'} .*::=*{.'} ,*::=*{,'} [*::={*}[ ]*::=*] i{'?}::={'?}i o{'?}::={'?}o {*}{'?}::=* {+'}0::=0{+'} {+'}1::=1{+'} {+'};::=;{+'} {+'}i::=i{+'} {+'}o::=o{+'} {+'}&]& BEGIN::=&]& BEGIN{+'} {+'}(::=({+'} {+'})::=){+'} {+'}END ::=END + {-'}0::=0{-'} {-'}1::=1{-'} {-'};::=;{-'} {-'}i::=i{-'} {-'}o::=o{-'} {-'}&]& BEGIN::=&]& BEGIN{-'} {-'}(::=({-'} {-'})::=){-'} {-'}END ::=END - {>'}0::=0{>'} {>'}1::=1{>'} {>'};::=;{>'} {>'}i::=i{>'} {>'}o::=o{>'} {>'}&]& BEGIN::=&]& BEGIN{>'} {>'}(::=({>'} {>'})::=){>'} {>'}END ::=END > {<'}0::=0{<'} {<'}1::=1{<'} {<'};::=;{<'} {<'}i::=i{<'} {<'}o::=o{<'} {<'}&]& BEGIN::=&]& BEGIN{<'} {<'}(::=({<'} {<'})::=){<'} {<'}END ::=END < {.'}0::=0{.'} {.'}1::=1{.'} {.'};::=;{.'} {.'}i::=i{.'} {.'}o::=o{.'} {.'}&]& BEGIN::=&]& BEGIN{.'} {.'}(::=({.'} {.'})::=){.'} {.'}END ::=END. {,'}0::=0{,'} {,'}1::=1{,'} {,'};::=;{,'} {,'}i::=i{,'} {,'}o::=o{,'} {,'}&]& BEGIN::=&]& BEGIN{,'} {,'}(::=({,'} {,'})::=){,'} {,'}END ::=END , {['}0::=0{['} {['}1::=1{['} {['};::=;{['} {['}i::=i{['} {['}o::=o{['} {['}&]& BEGIN::=&]& BEGIN{['} {['}(::=({['} {['})::=){['} {['}END ::=END [ {]'}0::=0{]'} {]'}1::=1{]'} {]'};::=;{]'} {]'}i::=i{]'} {]'}o::=o{]'} {]'}&]& BEGIN::=&]& BEGIN{]'} {]'}(::=({]'} {]'})::=){]'} {]'}END ::=END ] This instruction is very similar to. It first sends  to check whether the current cell is zero or not. This "instruction" has already been defined in the implementation of  and needs not be redefined. Then, if the current cell is zero, the  symbol is neutralized and execution proceeds to the next instruction. Otherwise, the  travels all the way to the left to the previously executed instructions, and begins a loop restart operation, very similar to the loop skip operation triggered by. The depth counter uses bits  and   so that it can't be confused with the tape content or the depth counter from a loop skip. The symbol  is replaced with its symmetric equivalent. All instructions are send to the right (neutralized in ), until the matching   is detected. Then the loop restart operation halts and send a  marker to the right to execute the content of the loop (which has just been sent to the right).
 * {]}::={]};

Instruction (output current cell)
$.::={.} 0{.}::={.}0 1{.}::={.}1 ){.}::=.) (.::={.}(? BEGIN{.}::=. BEGIN First, the symbol  move to the current cell. This is the general part - the next part will be to output it as an ascii character, which unfortunately has to be done for every possible value of the current cell (as opposite to instructions like  which are bitwise). 00000000.::=.00000000 00000001.::=.00000001 00000010.::=.00000010 00000011.::=.00000011 00000100.::=.00000100 00000101.::=.00000101 00000110.::=.00000110 00000111.::=.00000111 00001000.::=.00001000 00001001.::=.00001001 00001010.::=.00001010{PRINT_n} 00001011.::=.00001011 00001100.::=.00001100 00001101.::=.00001101 00001110.::=.00001110 00001111.::=.00001111 00010000.::=.00010000 00010001.::=.00010001 00010010.::=.00010010 00010011.::=.00010011 00010100.::=.00010100 00010101.::=.00010101 00010110.::=.00010110 00010111.::=.00010111 00011000.::=.00011000 00011001.::=.00011001 00011010.::=.00011010 00011011.::=.00011011 00011100.::=.00011100 00011101.::=.00011101 00011110.::=.00011110 00011111.::=.00011111 00100000.::=.00100000{PRINT } 00100001.::=.00100001{PRINT!} 00100010.::=.00100010{PRINT"} 00100011.::=.00100011{PRINT#} 00100100.::=.00100100{PRINT$} 00100101.::=.00100101{PRINT%} 00100110.::=.00100110{PRINT&} 00100111.::=.00100111{PRINT'} 00101000.::=.00101000{PRINT(} 00101001.::=.00101001{PRINT)} 00101010.::=.00101010{PRINT*} 00101011.::=.00101011{PRINT+} 00101100.::=.00101100{PRINT,} 00101101.::=.00101101{PRINT-} 00101110.::=.00101110{PRINT.} 00101111.::=.00101111{PRINT/} 00110000.::=.00110000{PRINT0} 00110001.::=.00110001{PRINT1} 00110010.::=.00110010{PRINT2} 00110011.::=.00110011{PRINT3} 00110100.::=.00110100{PRINT4} 00110101.::=.00110101{PRINT5} 00110110.::=.00110110{PRINT6} 00110111.::=.00110111{PRINT7} 00111000.::=.00111000{PRINT8} 00111001.::=.00111001{PRINT9} 00111010.::=.00111010{PRINT:} 00111011.::=.00111011{PRINT;} 00111100.::=.00111100{PRINT<} 00111101.::=.00111101{PRINT=} 00111110.::=.00111110{PRINT>} 00111111.::=.00111111{PRINT?} 01000000.::=.01000000{PRINT@} 01000001.::=.01000001{PRINTA} 01000010.::=.01000010{PRINTB} 01000011.::=.01000011{PRINTC} 01000100.::=.01000100{PRINTD} 01000101.::=.01000101{PRINTE} 01000110.::=.01000110{PRINTF} 01000111.::=.01000111{PRINTG} 01001000.::=.01001000{PRINTH} 01001001.::=.01001001{PRINTI} 01001010.::=.01001010{PRINTJ} 01001011.::=.01001011{PRINTK} 01001100.::=.01001100{PRINTL} 01001101.::=.01001101{PRINTM} 01001110.::=.01001110{PRINTN} 01001111.::=.01001111{PRINTO} 01010000.::=.01010000{PRINTP} 01010001.::=.01010001{PRINTQ} 01010010.::=.01010010{PRINTR} 01010011.::=.01010011{PRINTS} 01010100.::=.01010100{PRINTT} 01010101.::=.01010101{PRINTU} 01010110.::=.01010110{PRINTV} 01010111.::=.01010111{PRINTW} 01011000.::=.01011000{PRINTX} 01011001.::=.01011001{PRINTY} 01011010.::=.01011010{PRINTZ} 01011011.::=.01011011{PRINT[} 01011100.::=.01011100{PRINT\} 01011101.::=.01011101{PRINT]} 01011110.::=.01011110{PRINT^} 01011111.::=.01011111{PRINT_} 01100000.::=.01100000{PRINT`} 01100001.::=.01100001{PRINTa} 01100010.::=.01100010{PRINTb} 01100011.::=.01100011{PRINTc} 01100100.::=.01100100{PRINTd} 01100101.::=.01100101{PRINTe} 01100110.::=.01100110{PRINTf} 01100111.::=.01100111{PRINTg} 01101000.::=.01101000{PRINTh} 01101001.::=.01101001{PRINTi} 01101010.::=.01101010{PRINTj} 01101011.::=.01101011{PRINTk} 01101100.::=.01101100{PRINTl} 01101101.::=.01101101{PRINTm} 01101110.::=.01101110{PRINTn} 01101111.::=.01101111{PRINTo} 01110000.::=.01110000{PRINTp} 01110001.::=.01110001{PRINTq} 01110010.::=.01110010{PRINTr} 01110011.::=.01110011{PRINTs} 01110100.::=.01110100{PRINTt} 01110101.::=.01110101{PRINTu} 01110110.::=.01110110{PRINTv} 01110111.::=.01110111{PRINTw} 01111000.::=.01111000{PRINTx} 01111001.::=.01111001{PRINTy} 01111010.::=.01111010{PRINTz} 01111011.::=.01111011{PRINT{} 01111100.::=.01111100{PRINT|} 01111101.::=.01111101{PRINT}} 01111110.::=.01111110{PRINT~} 01111111.::=.01111111 If the current cell has a printable value (either 10 or between 32 and 126), then a marker {PRINTx} is added. This marker has then to trigger output (which in Thue has always the side effect of replacing the marker with the empty string). Note that the  operator operates from the right of the cell, which holds the less significant bit. This is far from optimal since characters in the range 128-255 can be detected easily by their most significant bit, which is always 1; unfortunately the author only realized that after making all the substitution rules for characters 0-127, and certainly did not feel like doing it again. The result is that characters 128-255 are not considered at all, which means if the current cell does hold a value in the range 128-255 when the instruction  is triggered, no substitution rules can be applied, and execution will immediately halts. {PRINT_n]::=~\n {PRINT }::=~ {PRINT!}::=~! {PRINT"}::=~" {PRINT#}::=~# {PRINT$}::=~$ {PRINT%}::=~% {PRINT&}::=~& {PRINT'}::=~' {PRINT(}::=~( {PRINT)}::=~) {PRINT*}::=~* {PRINT+}::=~+ {PRINT,}::=~, {PRINT-}::=~- {PRINT.}::=~. {PRINT/}::=~/ {PRINT0}::=~0 {PRINT1}::=~1 {PRINT2}::=~2 {PRINT3}::=~3 {PRINT4}::=~4 {PRINT5}::=~5 {PRINT6}::=~6 {PRINT7}::=~7 {PRINT8}::=~8 {PRINT9}::=~9 {PRINT:}::=~: {PRINT;}::=~; {PRINT<}::=~< {PRINT=}::=~= {PRINT>}::=~> {PRINT?}::=~? {PRINT@}::=~@ {PRINTA}::=~A {PRINTB}::=~B {PRINTC}::=~C {PRINTD}::=~D {PRINTE}::=~E {PRINTF}::=~F {PRINTG}::=~G {PRINTH}::=~H {PRINTI}::=~I {PRINTJ}::=~J {PRINTK}::=~K {PRINTL}::=~L {PRINTM}::=~M {PRINTN}::=~N {PRINTO}::=~O {PRINTP}::=~P {PRINTQ}::=~Q {PRINTR}::=~R {PRINTS}::=~S {PRINTT}::=~T {PRINTU}::=~U {PRINTV}::=~V {PRINTW}::=~W {PRINTX}::=~X {PRINTY}::=~Y {PRINTZ}::=~Z {PRINT[}::=~[ {PRINT\}::=~\ {PRINT]}::=~] {PRINT^}::=~^ {PRINT_}::=~_ {PRINT`}::=~` {PRINTa}::=~a {PRINTb}::=~b {PRINTc}::=~c {PRINTd}::=~d {PRINTe}::=~e {PRINTf}::=~f {PRINTg}::=~g {PRINTh}::=~h {PRINTi}::=~i {PRINTj}::=~j {PRINTk}::=~k {PRINTl}::=~l {PRINTm}::=~m {PRINTn}::=~n {PRINTo}::=~o {PRINTp}::=~p {PRINTq}::=~q {PRINTr}::=~r {PRINTs}::=~s {PRINTt}::=~t {PRINTu}::=~u {PRINTv}::=~v {PRINTw}::=~w {PRINTx}::=~x {PRINTy}::=~y {PRINTz}::=~z {PRINT{}::=~{ {PRINT|}::=~| {PRINT}}::=~} {PRINT~}::= The  marker is removed and the appropriate character is printed. It is assumed that  will be recognized by the Thue implementation as a linefeed, and that a   alone represents itself, not an escape sequence. As far as the author is concerned, there are no standards regarding escape sequences in Thue. Feel free to modify this so that it matches your implementation's specs. In fact, if your implementation supports usual escape sequences, it might be a good idea to add substitution rules for printing characters in the range 0-9 11-31 and 126.
 * END {.}::={.};END
 * {.}::={.};

Instruction (unfortunately a no-op)
END $,::={',}END $ If the next instruction to be executed is  it is instead neutralized and the next instruction is executed. Implementing it would require a very dull and boring explicit conversion similar to that of  and as User:Oerjan says, input is a lousy hack in Thue; it will input a whole line at a time rather than just one char. This basically means interactive input/output will not be possible, and the author decided that in those conditions he would not bother writing the dull char-to-binary-value conversion.

External resources

 * Mykola Makhin's Thue interpreter in Java