Program Number System
Program Number System or PNS is a system designed by QTpye (talk) to take every usable program in a language and turn it into a unique number, that can then be converted back. This is done by using a dictionary to replace tokens in said language with numbers assigned in a bijective base N number system where N is the total number of tokens. This should be applicable to any language where the structure is defined by a limited number of text-based tokens. Notable exception include Piet (as its image-based), Folders (as it's based on folder directories), and Length (as it's based on line length, not token content) and pretty much any other Non-textual language.
Example
since Brainfuck is really simple we will use it.
brainfuck has 8 tokens >
<
+
-
.
,
[
]
we assign each of these tokens a number in a base 8 number system starting from one
> 1 < 2 + 3 - 4 . 5 , 6 [ 7 ] 8
so take the hello world program for brainfuck
+[-->-[>>+>-----<<]<--<---]>-.>>>+.>>..+++[.>]<<<<.+++.------.<<-.>>>>+.
and translate each char to its equivalent number giving us
374414711314444422824424448145111351155333751822225333544444452245111135
we then take this number that's in bijective base 8 and convert it into another number system we want, for instance, (non-bijective) base 10
36896804202016496244466664004224860862682828222202820606840868406
this number uniquely identifies this and only this program.
Syntax
I will lay out a simple syntax for defining a system like this:
For a language, you can define a dictionary of tokens as a simple list of space-separated tokens
For brainfuck it would be
> < + - . , [ ]
You don't really need to use this syntax, you can use any as long as its a list
the number system lang resulting from using PNS can be written as
PNS( LANG , DICTIONARY )
so in this case you would write
PNS( brainfuck, > < + - . , [ ] )
the resulting lang is one where the program syntax is an array of numbers but is identical functionally to the original
Use
This can actually be surprisingly useful for compression.
For instance, if you take the brainfuck program above and change it to bytes (base 256) we can take the program and compress it down to 27 bytes of info:
59 B0 F0 08 36 DB 28 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
That's 207 bytes to 27!
This can work even for large languages because you can compress multi-character tokens into a single number.
For a quick example for Java
while(True){ System.out.println("1"); }
you could use tokens such as while(
which would compress compress while(
from 6 characters to a single value.
This means that even languages with things like strings that would need a token for each possible char in them can often still be compressed, because of multi character tokens getting shortened.
Notes
Generally, you should not include comments in the replacements and they should be removed before conversion because comments allow arbitrary bytes, and you would have to include all of those in the dictionary.
another interesting thing:
for any program in the family Trivial brainfuck substitution
given
TrivialBrainfuckSubstitution(a,b,c,d,e,f,g,h)
the lang
PNS( TrivialBrainfuckSubstitution(a,b,c,d,e,f,g,h), a b c d e f g h )
has the same number for programs for any set of a, b, c, d, e, f, g, h.
IE program #36896804202016496244466664004224860862682828222202820606840868406
will print hello world no matter what language from the family you use.
More Mathematical Definition
given LANG A meets specific requirements. (to be worked out and fine-tuned later)
we can generate a DICTIONARY A.
when combined we can say
PSN( LANG A, DICTIONARY A ) defines LANG B
LANG B is directly correlated to LANG A such that it can run every program possible in LANG A, so it is in the same computational class as LANG A
LANG B's syntax is an array of numbers
since LANG B's syntax is an array of numbers ranging from 1 to N (where N is the length of DICTIONARY A) any program in LANG B can thus be treated as a number in a bijective base N number system.
as such it can be converted from a base N bijective number system to a number system with a different base (IE. 10, 256(bytes), etc)
thus any valid program in LANG A can be represented as a unique number in base 10
An Actual Dictionary
Here is a fully working (at the time of writing) dictionary for (NDBall)
( , ) { } | > < 0 1 2 3 4 5 6 7 8 9 + - p P $ % E # K a f q n H s L R Y[ S[ St[ ESt PSt[ ] \n
Whose truth machine code looks like this
(0)>0 (1)% (2)Y[1,>0,>1] (2,1)P (2,2)<1 (3)P (4)E
Transforms into this in the lang PNS( NDBall , ( , ) { } | > < 0 1 2 3 4 5 6 7 8 9 + - p P $ % E # K a f q n H s L R Y[ S[ St[ ESt PSt[ ] \n )
1 9 3 7 9 42 1 10 3 24 42 1 11 3 36 10 2 7 9 2 7 10 41 42 1 11 2 10 3 22 42 1 11 2 11 3 8 1 42 1 12 3 22 42 1 13 3 25
Leading to the bijective base 42 number
1 9 3 7 9 42 1 10 3 24 42 1 11 3 36 10 2 7 9 2 7 10 41 42 1 11 2 10 3 22 42 1 11 2 11 3 8 1 42 1 12 3 22 42 1 13 3 25
This when converted into base 10 (assuming my math is not off) is program number
23860539308579467379463581539340138670700418635087640886070086229953756524467
or in bytes
34 C0 96 2F 9E FB 22 D2 8C 09 E0 B4 67 19 C7 67 D9 C2 8C 8A 4B 74 95 03 A9 D4 25 85 FD BF EF B3
for a total of 32 bytes from the original 42 byte program
Interpreter
At the time of writing, this language does not have an interpreter. However, one will be in the works once I have figure out how to do bijective system number conversion for arbitrarily large numbers