Brainfuck interpreter in Thue

From Esolang
Jump to navigation Jump to search

This brainfuck interpreter in Thue was written by User:Koen. It has been tested using Mykola Makhin's Thue interpreter in Java (see #External resources below), against a few brainfuck programs and should work properly. Because the program is a little long, doesn't contain any comments, and is hardly readable at all, it has been divided and structured in this page. Please contact Koen if you're interested in the original file.

Features

You might want to call them bugs.

  • The brainfuck program must be submitted from standard input
  • Input must contain characters +-><.,[] exclusively
    • comments or whitespace would produce unspecified results
    • end of input must be signified either with a linefeed, or end of file
  • The tape uses lazy evaluation and is unbounded both to the left and to the right
  • Every cell is one byte long, with wrapping for + and -
  • The input instruction , is treated as a no-op
  • The output instruction . requires your thue implementation to
    • interpret a::=~b as "replace a with the empty string and print b, not followed by a newline"
    • interpret a::=~\n as "replace a with the empty string and print a newline"
  • The output instruction . might not work exactly as is usually expected:
    • if the current cell is 10 or in the range 32-126, the matching ASCII character will be outputted
    • if the current cell is 0-9, 11-31, or 127, nothing will be printed
    • if the current cell is in the range 128-255, execution will terminate
  • A Thue implementation cleverly abusing Thue's nondeterminism might significantly boost speed performances:
    • This program contains a lot of substitution rules, but only a few are used at a time. For instance, if the instruction + is currently being executed, then the handful of increment-related substitution rules will be called a lot, while output-related rules will be useless. This means that if the thue implementation tries rules at random, a lot of time will be wasted trying irrelevant rules. This could be avoided by constantly sorting the substitution rules list: instead of randomly picking rules, the implementation should try rules in the order they are listed, and every time a rule is used, that rule is placed on top of the list. This way, recently-used rules will be faster to access.
    • Similarly, the string representing the program state might be quite long, as it contains both the tape and the brainfuck code. But at any time, only small parts of it need to be accessed: if a part of the program state hasn't been modified in a while, then it is likely that it won't be modified soon - in fact, if a substring is not the original string of any rule, then it won't be until it has been modified. Since all rules' original strings are only a few characters long, searching for substrings near the area where the last replaced was made might save a lot of time.

Source code

Initial string

 BEGIN;(00000000);END $brainfuck

The part between BEGIN and END is the tape; cells are separated with semicolons ;, and the current cell is surrounded in brackets ( ). The string brainfuck will be replaced at the beginning of execution will the brainfuck program; the $ sign indicates the next instruction is ready to be executed.

General purpose rules

brainfuck::=:::

?0::=0?
?1::=1?
?;::=;?
?(::=(?
?)::=)?
?END ::=END $

The first rule takes one line of input for the brainfuck program. This rule will never be triggered again. During execution, the ? character is a marker than means "the current instruction has been successfully executed; proceed to next instruction".

Instruction + (increment current cell)

$+::={+}
;END {+}::={+};END 
0{+}::={+}0
1{+}::={+}1
;{+}::={+};
){+}::=+)
0+::={+}1?
1+::=+0
(+::={+}(?
({+}::={+}(
 BEGIN{+}::=+ BEGIN

The symbol + move through the tape, protected inside curly brackets { }. Once it has reached the current cell, it disposes of the curly brackets and performs the increment. Once it is done, it continues its movement to the left, protected inside curly brackets again, and send the ? marker to the next instruction.

Instruction - (decrement current cell)

$-::={-}
;END {-}::={-};END 
0{-}::={-}0
1{-}::={-}1
;{-}::={-};
){-}::=-)
0-::=-1
1-::={-}0?
(-::={-}(?
({-}::={-}(
 BEGIN{-}::=- BEGIN

Instruction < (move to previous cell)

$<::={<}
;END {<}::={<};END 
0{<}::={<}0
1{<}::={<}1
;{<}::={<};
){<}::={<}{)<}
0{)<}::={)<}0
1{)<}::={)<}1
;{)<}::=);?
;({<}::={(<};
0{(<}::={(<}0
1{(<}::={(<}1
;{(<}::={<};(
 BEGIN{<}::=< BEGIN
 BEGIN{(<}::=< BEGIN;(00000000

Similar to +, the symbol < proceeds through the tape inside curly brackets. Once it has reached the current cell, it is divided in two threads: one will move the ) one cell to the left, then send a ? marker for the next instruction, and the other will move the ( one cell to the left, then continue to the left to the already executed instructions.

Instruction > (move to next cell)

$>::={>}
;END {>}::={>};END 
0{>}::={>}0
1{>}::={>}1
;{>}::={>};
){>};::={>};{)>}
{)>}0::=0{)>}
{)>}1::=1{)>}
{)>};::=);
{)>}END::=00000000);END
;({>}::={>};{(>}
{(>}0::=0{(>}
{(>}1::=1{(>}
{(>};::=;(?
 BEGIN{>}::=> BEGIN

Instruction [ (begin loop)

This instruction is more complicated than the others and will be divided in several parts.

;END $[::={=};END [
0{=}::={=}0
1{=}::={=}1
;{=}::={=};
){=}::={(=)})
0{(=)}::={(=)}0
1{(=)}::=1{!=}
({(=)}::=({==}
{==}0::=0{==}
{==}1::=1{==}
{==})::=){==}
{==};::=;{==}
{!=}0::=0{!=}
{!=}1::=1{!=}
{!=})::=){!=}
{!=};::=;{!=}
{!=}END [::={'[}END $

The first part sends a {=} signal to test whether the current cell is zero ({==}) or nonzero ({!=}). If it is nonzero, then execution continues with the body of the loop: the [ instruction is neutralized as {'[} and the $ marker triggers next instruction. Neutralizing instruction x as {'x} means "move instruction x to the left to the already executed instructions, without interacting at all with the tape".

{==}END [::={'[}END &[&%
O[::={'[}I
I[::=[O
END &[&[::={'[}END &[&I
I]::={']}O{?'}
O]::=]I
&[&]I::=&[&]
END &[&]{%}::={']}END $
%+::={'+}%
%-::={'-}%
%>::={'>}%
%<::={'<}%
%.::={'.}%
%,::={',}%
%[::=[%
%]::=]{%}
{?'}I::=I{?'}
{?'}O::=O{?'}
{?'}{%}::=%

If the current cell is zero, a loop skip operation begins. The loop skip is marked by &[&binary number%, when the binary number counts the "depth" of potentially nested loops. The effect will be to skip all characters until the matching ] by neutralizing them inside {' }. Instructions +-><., are immediately neutralized, instruction [ increments the depth counter then is neutralized, and instruction ] interrupts the loop skip operation (by replacing % with {%}), decrements the depth counter, then restart the skipping if the counter was nonzero (by sending {?'} to deblock %), or deletes the loop counter and triggers next instruction with $ if it was zero (remark: it was zero iff it contains only 1s after having been decremented). The depth counter is binary but uses bits O and I instead of 0 and 1 so that it cannot be confused with the content of the tape.

O{'+}::={'+}O
I{'+}::={'+}I
END &[&{'+}::={'+}END &[&
;{'+}::={'+};
0{'+}::={'+}0
1{'+}::={'+}1
){'+}::={'+})
({'+}::={'+}(
 BEGIN{'+}::=+ BEGIN
O{'-}::={'-}O
I{'-}::={'-}I
END &[&{'-}::={'-}END &[&
;{'-}::={'-};
0{'-}::={'-}0
1{'-}::={'-}1
){'-}::={'-})
({'-}::={'-}(
 BEGIN{'-}::=- BEGIN
O{'<}::={'<}O
I{'<}::={'<}I
END &[&{'<}::={'<}END &[&
;{'<}::={'<};
0{'<}::={'<}0
1{'<}::={'<}1
){'<}::={'<})
({'<}::={'<}(
 BEGIN{'<}::=< BEGIN
O{'>}::={'>}O
I{'>}::={'>}I
END &[&{'>}::={'>}END &[&
;{'>}::={'>};
0{'>}::={'>}0
1{'>}::={'>}1
){'>}::={'>})
({'>}::={'>}(
 BEGIN{'>}::=> BEGIN
O{'.}::={'.}O
I{'.}::={'.}I
END &[&{'.}::={'.}END &[&
;{'.}::={'.};
0{'.}::={'.}0
1{'.}::={'.}1
){'.}::={'.})
({'.}::={'.}(
 BEGIN{'.}::=. BEGIN
O{',}::={',}O
I{',}::={',}I
END &[&{',}::={',}END &[&
;{',}::={',};
0{',}::={',}0
1{',}::={',}1
){',}::={',})
({',}::={',}(
 BEGIN{',}::=, BEGIN
0{'[}::={'[}0
1{'[}::={'[}1
;{'[}::={'[};
){'[}::={'[})
({'[}::={'[}(
O{'[}::={'[}O
I{'[}::={'[}I
END &[&{'[}::={'[}END &[&
 BEGIN{'[}::=[ BEGIN
0{']}::={']}0
1{']}::={']}1
;{']}::={']};
){']}::={']})
({']}::={']}(
O{']}::={']}O
I{']}::={']}I
END &[&{']}::={']}END &[&
 BEGIN{']}::=] BEGIN

This is nothing more than a definition of how neutralized instructions behave. Because Thue doesn't allow for any kind of variables or pattern-matching, it has to be done for every instruction. This is one of the reasons why comments are not supported in the brainfuck program; because deleting comments would require to test for every possible character than can be used for a comment. Another reason is that Thue doesn't have mechanisms to protect it from "code injection": if the brainfuck program contains comments, they might trigger some substitution rules and result in completely undefined behaviour.

Instruction ] (end loop)

This instruction is very similar to [.

;END $]::={=};END ]
{==}END ]::={']}END $
{!=}END ]::={]}END ]
;{]}::={]};
0{]}::={]}0
1{]}::={]}1
){]}::={]})
({]}::={]}(
 BEGIN{]}::=*&]& BEGIN
]o::=i{]'}
]i::=o]
]&]& BEGIN::=i&]& BEGIN{]'}
[i::={'?}o{['}
[o::=i[
i[&]&::=[&]&
{*}[&]& BEGIN::=[ BEGIN?
+*::=*{+'}
-*::=*{-'}
>*::=*{>'}
<*::=*{<'}
.*::=*{.'}
,*::=*{,'}
[*::={*}[
]*::=*]
i{'?}::={'?}i
o{'?}::={'?}o
{*}{'?}::=*
{+'}0::=0{+'}
{+'}1::=1{+'}
{+'};::=;{+'}
{+'}i::=i{+'}
{+'}o::=o{+'}
{+'}&]& BEGIN::=&]& BEGIN{+'}
{+'}(::=({+'}
{+'})::=){+'}
{+'}END ::=END +
{-'}0::=0{-'}
{-'}1::=1{-'}
{-'};::=;{-'}
{-'}i::=i{-'}
{-'}o::=o{-'}
{-'}&]& BEGIN::=&]& BEGIN{-'}
{-'}(::=({-'}
{-'})::=){-'}
{-'}END ::=END -
{>'}0::=0{>'}
{>'}1::=1{>'}
{>'};::=;{>'}
{>'}i::=i{>'}
{>'}o::=o{>'}
{>'}&]& BEGIN::=&]& BEGIN{>'}
{>'}(::=({>'}
{>'})::=){>'}
{>'}END ::=END >
{<'}0::=0{<'}
{<'}1::=1{<'}
{<'};::=;{<'}
{<'}i::=i{<'}
{<'}o::=o{<'}
{<'}&]& BEGIN::=&]& BEGIN{<'}
{<'}(::=({<'}
{<'})::=){<'}
{<'}END ::=END <
{.'}0::=0{.'}
{.'}1::=1{.'}
{.'};::=;{.'}
{.'}i::=i{.'}
{.'}o::=o{.'}
{.'}&]& BEGIN::=&]& BEGIN{.'}
{.'}(::=({.'}
{.'})::=){.'}
{.'}END ::=END .
{,'}0::=0{,'}
{,'}1::=1{,'}
{,'};::=;{,'}
{,'}i::=i{,'}
{,'}o::=o{,'}
{,'}&]& BEGIN::=&]& BEGIN{,'}
{,'}(::=({,'}
{,'})::=){,'}
{,'}END ::=END ,
{['}0::=0{['}
{['}1::=1{['}
{['};::=;{['}
{['}i::=i{['}
{['}o::=o{['}
{['}&]& BEGIN::=&]& BEGIN{['}
{['}(::=({['}
{['})::=){['}
{['}END ::=END [
{]'}0::=0{]'}
{]'}1::=1{]'}
{]'};::=;{]'}
{]'}i::=i{]'}
{]'}o::=o{]'}
{]'}&]& BEGIN::=&]& BEGIN{]'}
{]'}(::=({]'}
{]'})::=){]'}
{]'}END ::=END ]

This instruction is very similar to [. It first sends {=} to check whether the current cell is zero or not. This "instruction" has already been defined in the implementation of [ and needs not be redefined. Then, if the current cell is zero, the ] symbol is neutralized and execution proceeds to the next instruction. Otherwise, the ] travels all the way to the left to the previously executed instructions, and begins a loop restart operation, very similar to the loop skip operation triggered by [. The depth counter uses bits o and i so that it can't be confused with the tape content or the depth counter from a loop skip. The symbol % is replaced with its symmetric equivalent *. All instructions are send to the right (neutralized in { '}), until the matching [ is detected. Then the loop restart operation halts and send a ? marker to the right to execute the content of the loop (which has just been sent to the right).

Instruction . (output current cell)

$.::={.}
;END {.}::={.};END 
0{.}::={.}0
1{.}::={.}1
;{.}::={.};
){.}::=.)
(.::={.}(?
 BEGIN{.}::=. BEGIN

First, the symbol . move to the current cell. This is the general part - the next part will be to output it as an ascii character, which unfortunately has to be done for every possible value of the current cell (as opposite to instructions like + which are bitwise).

00000000.::=.00000000
00000001.::=.00000001
00000010.::=.00000010
00000011.::=.00000011
00000100.::=.00000100
00000101.::=.00000101
00000110.::=.00000110
00000111.::=.00000111
00001000.::=.00001000
00001001.::=.00001001
00001010.::=.00001010{PRINT_n}
00001011.::=.00001011
00001100.::=.00001100
00001101.::=.00001101
00001110.::=.00001110
00001111.::=.00001111
00010000.::=.00010000
00010001.::=.00010001
00010010.::=.00010010
00010011.::=.00010011
00010100.::=.00010100
00010101.::=.00010101
00010110.::=.00010110
00010111.::=.00010111
00011000.::=.00011000
00011001.::=.00011001
00011010.::=.00011010
00011011.::=.00011011
00011100.::=.00011100
00011101.::=.00011101
00011110.::=.00011110
00011111.::=.00011111
00100000.::=.00100000{PRINT }
00100001.::=.00100001{PRINT!}
00100010.::=.00100010{PRINT"}
00100011.::=.00100011{PRINT#}
00100100.::=.00100100{PRINT$}
00100101.::=.00100101{PRINT%}
00100110.::=.00100110{PRINT&}
00100111.::=.00100111{PRINT'}
00101000.::=.00101000{PRINT(}
00101001.::=.00101001{PRINT)}
00101010.::=.00101010{PRINT*}
00101011.::=.00101011{PRINT+}
00101100.::=.00101100{PRINT,}
00101101.::=.00101101{PRINT-}
00101110.::=.00101110{PRINT.}
00101111.::=.00101111{PRINT/}
00110000.::=.00110000{PRINT0}
00110001.::=.00110001{PRINT1}
00110010.::=.00110010{PRINT2}
00110011.::=.00110011{PRINT3}
00110100.::=.00110100{PRINT4}
00110101.::=.00110101{PRINT5}
00110110.::=.00110110{PRINT6}
00110111.::=.00110111{PRINT7}
00111000.::=.00111000{PRINT8}
00111001.::=.00111001{PRINT9}
00111010.::=.00111010{PRINT:}
00111011.::=.00111011{PRINT;}
00111100.::=.00111100{PRINT<}
00111101.::=.00111101{PRINT=}
00111110.::=.00111110{PRINT>}
00111111.::=.00111111{PRINT?}
01000000.::=.01000000{PRINT@}
01000001.::=.01000001{PRINTA}
01000010.::=.01000010{PRINTB}
01000011.::=.01000011{PRINTC}
01000100.::=.01000100{PRINTD}
01000101.::=.01000101{PRINTE}
01000110.::=.01000110{PRINTF}
01000111.::=.01000111{PRINTG}
01001000.::=.01001000{PRINTH}
01001001.::=.01001001{PRINTI}
01001010.::=.01001010{PRINTJ}
01001011.::=.01001011{PRINTK}
01001100.::=.01001100{PRINTL}
01001101.::=.01001101{PRINTM}
01001110.::=.01001110{PRINTN}
01001111.::=.01001111{PRINTO}
01010000.::=.01010000{PRINTP}
01010001.::=.01010001{PRINTQ}
01010010.::=.01010010{PRINTR}
01010011.::=.01010011{PRINTS}
01010100.::=.01010100{PRINTT}
01010101.::=.01010101{PRINTU}
01010110.::=.01010110{PRINTV}
01010111.::=.01010111{PRINTW}
01011000.::=.01011000{PRINTX}
01011001.::=.01011001{PRINTY}
01011010.::=.01011010{PRINTZ}
01011011.::=.01011011{PRINT[}
01011100.::=.01011100{PRINT\}
01011101.::=.01011101{PRINT]}
01011110.::=.01011110{PRINT^}
01011111.::=.01011111{PRINT_}
01100000.::=.01100000{PRINT`}
01100001.::=.01100001{PRINTa}
01100010.::=.01100010{PRINTb}
01100011.::=.01100011{PRINTc}
01100100.::=.01100100{PRINTd}
01100101.::=.01100101{PRINTe}
01100110.::=.01100110{PRINTf}
01100111.::=.01100111{PRINTg}
01101000.::=.01101000{PRINTh}
01101001.::=.01101001{PRINTi}
01101010.::=.01101010{PRINTj}
01101011.::=.01101011{PRINTk}
01101100.::=.01101100{PRINTl}
01101101.::=.01101101{PRINTm}
01101110.::=.01101110{PRINTn}
01101111.::=.01101111{PRINTo}
01110000.::=.01110000{PRINTp}
01110001.::=.01110001{PRINTq}
01110010.::=.01110010{PRINTr}
01110011.::=.01110011{PRINTs}
01110100.::=.01110100{PRINTt}
01110101.::=.01110101{PRINTu}
01110110.::=.01110110{PRINTv}
01110111.::=.01110111{PRINTw}
01111000.::=.01111000{PRINTx}
01111001.::=.01111001{PRINTy}
01111010.::=.01111010{PRINTz}
01111011.::=.01111011{PRINT{}
01111100.::=.01111100{PRINT|}
01111101.::=.01111101{PRINT}}
01111110.::=.01111110{PRINT~}
01111111.::=.01111111

If the current cell has a printable value (either 10 or between 32 and 126), then a marker {PRINTx} is added. This marker has then to trigger output (which in Thue has always the side effect of replacing the marker with the empty string). Note that the . operator operates from the right of the cell, which holds the less significant bit. This is far from optimal since characters in the range 128-255 can be detected easily by their most significant bit, which is always 1; unfortunately the author only realized that after making all the substitution rules for characters 0-127, and certainly did not feel like doing it again. The result is that characters 128-255 are not considered at all, which means if the current cell does hold a value in the range 128-255 when the instruction . is triggered, no substitution rules can be applied, and execution will immediately halts.

{PRINT_n]::=~\n
{PRINT }::=~ 
{PRINT!}::=~!
{PRINT"}::=~"
{PRINT#}::=~#
{PRINT$}::=~$
{PRINT%}::=~%
{PRINT&}::=~&
{PRINT'}::=~'
{PRINT(}::=~(
{PRINT)}::=~)
{PRINT*}::=~*
{PRINT+}::=~+
{PRINT,}::=~,
{PRINT-}::=~-
{PRINT.}::=~.
{PRINT/}::=~/
{PRINT0}::=~0
{PRINT1}::=~1
{PRINT2}::=~2
{PRINT3}::=~3
{PRINT4}::=~4
{PRINT5}::=~5
{PRINT6}::=~6
{PRINT7}::=~7
{PRINT8}::=~8
{PRINT9}::=~9
{PRINT:}::=~:
{PRINT;}::=~;
{PRINT<}::=~<
{PRINT=}::=~=
{PRINT>}::=~>
{PRINT?}::=~?
{PRINT@}::=~@
{PRINTA}::=~A
{PRINTB}::=~B
{PRINTC}::=~C
{PRINTD}::=~D
{PRINTE}::=~E
{PRINTF}::=~F
{PRINTG}::=~G
{PRINTH}::=~H
{PRINTI}::=~I
{PRINTJ}::=~J
{PRINTK}::=~K
{PRINTL}::=~L
{PRINTM}::=~M
{PRINTN}::=~N
{PRINTO}::=~O
{PRINTP}::=~P
{PRINTQ}::=~Q
{PRINTR}::=~R
{PRINTS}::=~S
{PRINTT}::=~T
{PRINTU}::=~U
{PRINTV}::=~V
{PRINTW}::=~W
{PRINTX}::=~X
{PRINTY}::=~Y
{PRINTZ}::=~Z
{PRINT[}::=~[
{PRINT\}::=~\
{PRINT]}::=~]
{PRINT^}::=~^
{PRINT_}::=~_
{PRINT`}::=~`
{PRINTa}::=~a
{PRINTb}::=~b
{PRINTc}::=~c
{PRINTd}::=~d
{PRINTe}::=~e
{PRINTf}::=~f
{PRINTg}::=~g
{PRINTh}::=~h
{PRINTi}::=~i
{PRINTj}::=~j
{PRINTk}::=~k
{PRINTl}::=~l
{PRINTm}::=~m
{PRINTn}::=~n
{PRINTo}::=~o
{PRINTp}::=~p
{PRINTq}::=~q
{PRINTr}::=~r
{PRINTs}::=~s
{PRINTt}::=~t
{PRINTu}::=~u
{PRINTv}::=~v
{PRINTw}::=~w
{PRINTx}::=~x
{PRINTy}::=~y
{PRINTz}::=~z
{PRINT{}::=~{
{PRINT|}::=~|
{PRINT}}::=~}
{PRINT~}::=~~

The {PRINTx} marker is removed and the appropriate character is printed. It is assumed that \n will be recognized by the Thue implementation as a linefeed, and that a \ alone represents itself, not an escape sequence. As far as the author is concerned, there are no standards regarding escape sequences in Thue. Feel free to modify this so that it matches your implementation's specs. In fact, if your implementation supports usual escape sequences, it might be a good idea to add substitution rules for printing characters in the range 0-9 11-31 and 126.

Instruction , (unfortunately a no-op)

END $,::={',}END $

If the next instruction to be executed is , it is instead neutralized and the next instruction is executed. Implementing it would require a very dull and boring explicit conversion similar to that of . and as User:Oerjan says, input is a lousy hack in Thue; it will input a whole line at a time rather than just one char. This basically means interactive input/output will not be possible, and the author decided that in those conditions he would not bother writing the dull char-to-binary-value conversion.

See also

External resources