User:Sgeo/binbf

The whole thing is designed to make something that can be shrunk down with adaptive Huffman coding (FGK). The binary is only used when NYT is required.

> = 0000 < = 0001 + = 0010 - = 0011 . = 0100, = 0101 [ = 0110 ] = 0111

$RLE = $a = 1000 $( = = 1001 $# = $c = 1010 $) = = 1011 $e = $EOF = 1100 $f = = 1101 $g =  = 1110 $h = = 1111

A character of the form $g$ is considered to be one symbol A character of the form $c<#> is considered to be one symbol The form $a$c5 is considered to be two symbols: $a and $c5 $#/$c is considered to start a digit 5 by itself is meaningless, it needs to be written $c5 Multi-digit numbers are writen like this: $c1$c2$c3 Digits are expressed in hex, so $cf is 15

The characters ><+-.,[] are all symbols. $ starts a symbol. Except for $c and $g, all symbols end with the letter e.g. $a is one symbol. $c lasts one digit, e.g. $c5 is one character. $g's behavior is currently reserved for future use.

The file is ended with $e = 1100

$h's don't get nested $RLE uses n-4 e.g. ++++ turns into +$RLE$#0 +++++++++ (with 10 +'s) turns into +$RLE$#6 +++++++++++++++++++ (20 +'s) turns into +$RLE$#1$#0 (+ $RLE $#1 $#0) + $RLE $#1 $#0 $EOF + $a $c1 $c0 $e

Levels
As this is supposed to be involved with compression of Brainfuck source files, it is not intended to be written in directly, although in some cases it can be convenient. First, the .B file is stripped of comments converted to a .HBR file, which uses the simply to remember symbols ($RLE, $#, $EOF). The .HBR file is then converted to a .HBA file, which uses, e.g., $a, $c, and $e. This .HBA file is then run through an adaptive Huffman encoder using the FGK algorithm, but operates on symbols (defined above), as opposed to letters.