Unarian

From Esolang
Jump to navigation Jump to search
Unarian
Paradigm(s) Functional
Designed by User:Crb233
Appeared in 2021
Computational class Turing complete
Major implementations Original
File extension(s) .unarian, .un

Unarian (pronounced yoo-NAIR-eein) is an esoteric programming language based on the concept that every Unarian function computes a partial unary function over the natural numbers (hence the name Unarian) and that these functions can only be constructed as combinations of existing functions.

The beauty of this language is in its simplicity. There are only two built-in functions: increment and decrement; only two ways to combine existing functions into new ones: composition and alternation; and effectively only one integer that can be accessed by running programs. Despite this simplicity, Unarian is Turing-complete and capable of representing arbitrary computable functions.

See also the Github repository for this language.

Language Specification

Syntax

Line comments start with # and are stripped from the source code before parsing. The remainder of the code is split into tokens: strings of arbitrary non-whitespace characters separated from each other by whitespace. Three tokens are considered special keywords: { (open brace), } (close brace), and | (alternation). A few additional tokens represent built-ins: + (increment), and - (decrement). Some implementations may also include ? (input), ! (output), and @ (stack trace) as additional builtins. All other tokens are considered valid function identifiers.

A Unarian expression is a sequence of alternations |, built-ins, identifiers, and bracketed groups, where a bracketed group consists of an opening brace {, an expression, and a closing brace }. For example, - | + func { - - | } | is an expression and { - { + func | } + } is a bracketed group. A Unarian program consists of a sequence of function declarations, where a function declaration is an identifier (the function name) followed by a bracketed group (containing the function definition). For example, f { - - | + } declares a function named f defined by the expression - - | +, and the following program defines three functions 0, if=0, and main:

0 { - 0 | }
if=0 { { - 0 | + } - }
main { if=0 + | 0 }

Built-ins

There are two primary built-ins: increment + and decrement -. As their names suggest, increment adds one to its input and decrement subtracts one from its input. However, decrement can fail if applied to input 0 (since doing so would not produce a natural number).

Some implementations may add additional built-ins such as: input ?, output !, and stack trace @. At the moment, these are non-standard parts of the language and largely used for debugging purposes.

Functions

Functions are identified by their name and defined (possibly recursively) by an expression consisting of built-ins, functions, compositions, and alternations. To evaluate a function on input x, simply evaluate its definitional expression on input x. For example, if function mod2 is defined by the expression - - mod2 |, then evaluating mod2 on x is semantically equivalent to evaluating - - mod2 | on x.

Composition

Composition is one method of combining existing functions to create new ones. It is an associative binary operator over Unarian functions that is comparable to sequential execution (e.g. a; b) in imperative languages. Syntactically, the composition of functions f and g is written as f g.

Evaluating a composition on input x consists of evaluating each function from left to right on the output of the previous function. The result of the composition is the result of the last function to be evaluated. For example, if ^2 is a function that squares its input, then ^2 + maps x to x^2 + 1 and + ^2 maps x to (x + 1)^2. Observe that this is similar to standard function composition in mathematics, except with the order of evaluation reversed. Significantly, if any function in a composition fails, then the composite function as a whole also fails. For example, - - - fails on input 0, 1, and 2, and returns n - 3 on input n > 2.

Finally, an empty composition is treated as the identity function, which turns out to be the identity element of function composition. Syntactically, an empty composition can be written as an empty group { } or an empty expression .

Alternation

Alternation (formerly called branching) is the second method of combining existing functions. It is an associative binary operator over Unarian functions that is comparable to conditional control flow (e.g. if c then a else b) in imperative languages. Syntactically, the alternation of functions f and g is written as f | g. This operator has a lower precedence than composition, so f g | h is interpreted as the alternation of f g and h (written { f g } | h), and f | g h is interpreted as the alternation of f and g h (written f | { g h }).

Evaluating an alternation on input x consists of evaluating each function from left to right on input x if and only if all previous functions failed. The result of the alternation is the result of the last function to be evaluated. For example, if %2 is a function that fails on odd inputs and leaves all others unchanged, then %2 + | - maps 2x to 2x + 1 and 2x + 1 to 2x (i.e. it toggles the last bit in a binary number). Syntactically, an empty 'branch' of an alternation is considered to be an empty composition. For example, - | is semantically equivalent to both - | { } and - | id, where id is an identity function.

Finally, since there is no way to represent them syntactically, we don't define the behavior of empty alternations (although it seems logical to define an empty alternation as a function that fails on all input, since this is the identity element of function alternation).

Grouping

Bracketed groups within an expression, which are surrounded by braces and can be nested, allow for the formation of expressions that don't follow normal precedence rules. While a b | c is interpreted as the alternation of a b and c, the expression a { b | c } is interpreted as the composition of a and b | c.

Evaluating an exression containing a bracketed group can be done by treating the group as a reference to a new function defined by the contents of the group. Specifically, we can evaluate a { b | c } by defining a new function b|c { b | c } and then evaluating a b|c. In general, for any expression containing a bracketed group { ... }, define a new function z { ... } and replace all instances of { ... } (aside from the definition of z itself) by z. For example, if 0 is a function that maps all x to 0, then the expression - 0 | + - also maps all x to 0. However, by adding braces, we can change this to { - 0 | + } -, which maps 0 to 0 and fails on all other x (i.e. it checks for equality with 0).

Evaluation

When interpreting or compiling a Unarian program, an expression must be chosen as the entry-point. This entry-point defaults to main. Some implementations may allow the user to specify a custom expression as the entry-point, but this is not required. It is considered undefined behavior to have references to undefined functions or multiple definitions of the same function. However, it is recommended for implementations to treat both of these cases as compilation errors.

Finally, a compiled or interpreted program is evaluated by giving it a non-negative integer input. This input is evaluated on the entry-point expression as explained above, and the resulting output, either a non-negative integer or a failure, is returned. Input and output representations are left undefined, but it is recommended for integers to be represented in decimal and for failure to be represented by -. Bounds on integer inputs and outputs, as well as the behavior when these bounds are exceeded, are also left undefined, but it is recommended that implementations support integers up to at least 2^63 - 1 and produce a runtime error when exceeding their maximum value.

Learn by Example

# This is a comment.

# This is a basic function definition.
function_name { function_definition }

# Extra spacing doesn't matter.
example_func {
    extra
    spacing
    doesn't
    matter
}

# Function names can contain any characters except whitespace and '#'.
# Tokens '{', '|', and '}' are special keywords and cannot be function names.
# Tokens '+', '-', '?', '!', and '@' are built-in and cannot be redefined.
*10 { multiply_by_10 }
/10 { divide_by_10 }
^2 { square }

# Functions can call themselves recursively.
infinite_loop { infinite_loop solve_p_vs_np }

# There are two primary builtin functions: '+' and '-'.
# Applying '+' to input x returns x + 1
# Applying '-' to input x returns x - 1 if x > 0 and fails if x = 0
add_1 { + }
add_2 { + + }
add_3 { + + + }
subtract_2_or_fail { - - }

# There are three builtin functions used for debugging: '?', '!', and '@'.
# Applying '?' reads a value from standard input and returns it.
# Applying '!' to input x prints x to standard input and returns x.
# Applying '@' to input x prints the stack trace and returns x.
print_then_add_1 { ! + }
print_stack_trace { @ }

# Functions can have branching execution paths. The special token '|', called
# alternation, is used to separate alternate paths.
do_A_or_B_or_C { A | B | C }

# Empty alternates act as the identity function.
subtract_2_or_do_nothing { - - | }

# A bracketed group starts with '{' and ends with '}'. Such groups are
# evaluated as if their contents had been defined in a separate function.
complex { a { b | c } d | { e | { f } } g }

code_1 { b | c }
code_2 { f }
code_3 { e | code_2 }
less_complex { a code_1 d | code_3 g }   # This is equivalent to 'complex'.

# The main function is the default entry-point for a program. It's evaluated
# when we run this program.
main { get_nth_prime }

Examples

Here is some example code that, given an input n, returns the number of steps for the Collatz sequence starting at n to reach 1:

# Outputs 0.
0 { - 0 | }

# Fails unless input is equal to 0.
if=0 { { - 0 | + } - }

# Fails unless input is greater than 1.
if>1 { - - + + }

# Divides by 2 if divisible by 2. Fails otherwise.
if/2 { - - if/2 + | if=0 }

# Multiplies by 3.
*3 { - *3 + + + | }

# Outputs the number of collatz steps required to reach 1.
collatz { if>1 { if/2 | *3 + } collatz + | - }

main { collatz }

While the Collatz code above is relatively concise, it's much more difficult to perform computations that require more than one number to be stored in memory. Consider the following code which computes Fibonacci numbers:

#========#
# Basics #
#========#

0 { - 0 | }

if=0 { { - 0 | + } - }

if>0 { - + }

#==========================#
# Addition and Subtraction #
#==========================#

+2 { + + }
+8 { + + + + + + + + }
-2 { - - }
-8 { - - - - - - - - }

+000 { }
+001 { + }
+010 { + + }
+011 { + + + }
+100 { + + + + }
+101 { + + + + + }
+110 { + + + + + + }
+111 { + + + + + + + }

-000 { }
-001 { - }
-010 { - - }
-011 { - - - }
-100 { - - - - }
-101 { - - - - - }
-110 { - - - - - - }
-111 { - - - - - - - }

#=============================#
# Multiplication and Division #
#=============================#

*2 { - *2 +2 | }
*8 { - *8 +8 | }

/2 { -2 /2 + | 0 }
/8 { -8 /8 + | 0 }

if/2 { -2 if/2 + | if=0 }

#=====================#
# Modulo Conditionals #
#=====================#

if=000mod8 { -8 if=000mod8 +8 | if=0 }
if=001mod8 { -001 if=000mod8 +001 }
if=010mod8 { -010 if=000mod8 +010 }
if=011mod8 { -011 if=000mod8 +011 }
if=100mod8 { -100 if=000mod8 +100 }
if=101mod8 { -101 if=000mod8 +101 }
if=110mod8 { -110 if=000mod8 +110 }
if=111mod8 { -111 if=000mod8 +111 }

#===============================#
# Getting and Setting Registers #
#===============================#

setR0
    { if=0
    | if/2 setR0 *8
    |   /2 setR0 *8 +
}

getR1
    { if=0
    | if=000mod8 /8 getR1 *2
    | if=001mod8 /8 getR1 *2
    | if=010mod8 /8 getR1 *2 +
    | if=011mod8 /8 getR1 *2 +
    | if=100mod8 /8 getR1 *2
    | if=101mod8 /8 getR1 *2
    | if=110mod8 /8 getR1 *2 +
    | if=111mod8 /8 getR1 *2 +
}

getR2
    { if=0
    | if=000mod8 /8 getR2 *2
    | if=001mod8 /8 getR2 *2
    | if=010mod8 /8 getR2 *2
    | if=011mod8 /8 getR2 *2
    | if=100mod8 /8 getR2 *2 +
    | if=101mod8 /8 getR2 *2 +
    | if=110mod8 /8 getR2 *2 +
    | if=111mod8 /8 getR2 *2 +
}

#=======================#
# Register Conditionals #
#=======================#

ifR0=0
    { if=0
    | if=000mod8 /8 ifR0=0 *8 +000
    | if=010mod8 /8 ifR0=0 *8 +010
    | if=100mod8 /8 ifR0=0 *8 +100
    | if=110mod8 /8 ifR0=0 *8 +110
}

ifR0=1
    { if=001mod8 /8 ifR0=0 *8 +001
    | if=011mod8 /8 ifR0=0 *8 +011
    | if=101mod8 /8 ifR0=0 *8 +101
    | if=111mod8 /8 ifR0=0 *8 +111
}

#=====================#
# Register Operations #
#=====================#

R2=0
    { if=0
    | if=000mod8 /8 R2=0 *8 +000
    | if=001mod8 /8 R2=0 *8 +001
    | if=010mod8 /8 R2=0 *8 +010
    | if=011mod8 /8 R2=0 *8 +011
    | if=100mod8 /8 R2=0 *8 +000
    | if=101mod8 /8 R2=0 *8 +001
    | if=110mod8 /8 R2=0 *8 +010
    | if=111mod8 /8 R2=0 *8 +011
}

R2=1
    { if=0
    | if=000mod8 /8 R2=0 *8 +100
    | if=001mod8 /8 R2=0 *8 +101
    | if=010mod8 /8 R2=0 *8 +110
    | if=011mod8 /8 R2=0 *8 +111
    | if=100mod8 /8 R2=0 *8 +100
    | if=101mod8 /8 R2=0 *8 +101
    | if=110mod8 /8 R2=0 *8 +110
    | if=111mod8 /8 R2=0 *8 +111
}

R0=R0-1
    { if>0 if=000mod8 /8 R0=R0-1 *8 +001
    |      if=001mod8 -
    |      if=010mod8 /8 R0=R0-1 *8 +011
    |      if=011mod8 -
    |      if=100mod8 /8 R0=R0-1 *8 +101
    |      if=101mod8 -
    |      if=110mod8 /8 R0=R0-1 *8 +111
    |      if=111mod8 -
}

R0=R0-2
    { if=000mod8 /8 R0=R0-1 *8 +000
    | if=001mod8 /8 R0=R0-1 *8 +001
    | if=010mod8 /8 R0=R0-1 *8 +010
    | if=011mod8 /8 R0=R0-1 *8 +011
    | if=100mod8 /8 R0=R0-1 *8 +100
    | if=101mod8 /8 R0=R0-1 *8 +101
    | if=110mod8 /8 R0=R0-1 *8 +110
    | if=111mod8 /8 R0=R0-1 *8 +111
}

R1=R1+R2
    { if=0                        +000
    | if=000mod8 /8 R1=R1+R2   *8 +000
    | if=001mod8 /8 R1=R1+R2   *8 +001
    | if=010mod8 /8 R1=R1+R2   *8 +010
    | if=011mod8 /8 R1=R1+R2   *8 +011
    | if=100mod8 /8 R1=R1+R2   *8 +110
    | if=101mod8 /8 R1=R1+R2   *8 +111
    | if=110mod8 /8 R1=R1+R2+1 *8 +100
    | if=111mod8 /8 R1=R1+R2+1 *8 +101
}

R1=R1+R2+1
    { if=0                        +010
    | if=000mod8 /8 R1=R1+R2   *8 +010
    | if=001mod8 /8 R1=R1+R2   *8 +011
    | if=010mod8 /8 R1=R1+R2+1 *8 +000
    | if=011mod8 /8 R1=R1+R2+1 *8 +001
    | if=100mod8 /8 R1=R1+R2+1 *8 +100
    | if=101mod8 /8 R1=R1+R2+1 *8 +101
    | if=110mod8 /8 R1=R1+R2+1 *8 +110
    | if=111mod8 /8 R1=R1+R2+1 *8 +111
}

R2=R1+R2
    { if=0                        +000
    | if=000mod8 /8 R2=R1+R2   *8 +000
    | if=001mod8 /8 R2=R1+R2   *8 +001
    | if=010mod8 /8 R2=R1+R2   *8 +110
    | if=011mod8 /8 R2=R1+R2   *8 +111
    | if=100mod8 /8 R2=R1+R2   *8 +100
    | if=101mod8 /8 R2=R1+R2   *8 +101
    | if=110mod8 /8 R2=R1+R2+1 *8 +010
    | if=111mod8 /8 R2=R1+R2+1 *8 +011
}

R2=R1+R2+1
    { if=0                        +100
    | if=000mod8 /8 R2=R1+R2   *8 +100
    | if=001mod8 /8 R2=R1+R2   *8 +101
    | if=010mod8 /8 R2=R1+R2+1 *8 +010
    | if=011mod8 /8 R2=R1+R2+1 *8 +011
    | if=100mod8 /8 R2=R1+R2+1 *8 +000
    | if=101mod8 /8 R2=R1+R2+1 *8 +001
    | if=110mod8 /8 R2=R1+R2+1 *8 +110
    | if=111mod8 /8 R2=R1+R2+1 *8 +111
}

#===========#
# Fibonacci #
#===========#

# register |  R0  |  R1   |   R2    |
# value    | n-2i | f(2i) | f(2i+1) |

fib'
    { ifR0=0 getR1
    | ifR0=1 getR2
    | R0=R0-2 R1=R1+R2 R2=R1+R2 fib'
}

fib { setR0 R2=1 fib' }

main { fib }

Implementation

See [1] for additional implementations. Here is a simple Python interpreter that takes a Unarian file and evaluates the main function on input given as additional arguments:

#!/usr/bin/env python3

def tokenize(file):
    for line in file:
        i = line.find('#')
        if i >= 0:
            line = line[:i]
        for tok in line.split():
            yield tok

def parse_group(toks, i):
    i += 1
    alternation = []
    composition = []
    while i < len(toks):
        if toks[i] == '}':
            alternation.append(composition)
            return alternation, i
        elif toks[i] == '|':
            alternation.append(composition)
            composition = []
        elif toks[i] == '{':
            group, i = parse_group(toks, i)
            composition.append(group)
        else:
            composition.append(toks[i])
        i += 1

def parse(file):
    toks = list(tokenize(file))
    lib = {}
    i = 0
    while i < len(toks):
        name = toks[i]
        i += 1
        grp, i = parse_group(toks, i)
        lib[name] = grp
        i += 1
    return lib

def evaluate(lib, expr, x):
    stack = [(x, expr, 0, 0)]
    while len(stack) > 0:
        y, expr, i, j = stack.pop(-1)
        if i >= len(expr):
            x = None
        elif x is None:
            x = y
            stack.append((y, expr, i + 1, 0))
        elif j < len(expr[i]):
            stack.append((y, expr, i, j + 1))
            subexpr = expr[i][j]
            if subexpr == '+':
                x += 1
            elif subexpr == '-':
                x = x - 1 if x > 0 else None
            elif isinstance(subexpr, list):
                stack.append((x, subexpr, 0, 0))
            else:
                stack.append((x, lib[subexpr], 0, 0))
    return x

def main():
    import sys
    with open(sys.argv[1]) as file:
        lib = parse(file)
    expr = 'main'
    for x in map(int, sys.argv[2:]):
        print(x, '->', evaluate(lib, expr, x), end='   ')
    print()

if __name__ == '__main__':
    main()

Computational class

It's relatively easy to translate arbitrary Fractran programs into Unarian programs. Since Fractran is known to be Turing complete, Unarian also is. Consider, for example, the following translation of the Fractran prime generating algorithm (3/11 847/45 143/6 7/3 10/91 3/7 36/325 1/2 36/5):

#========#
# Basics #
#========#

# Return 0
0 { - 0 | }

# Conditionals
if=0 {   { - 0 | + } -   }
if=1 { - { - 0 | + } - + }
if>0 {   - +   }
if>1 { - - + + }

#==========#
# Addition #
#==========#

+2  { + + }
+3  { + + + }
+5  { + + + + + }
+7  { + + + + + + + }
+11 { + + + + + + + + + + + }
+13 { + + + + + + + + + + + + + }

#=============#
# Subtraction #
#=============#

-2  { - - }
-3  { - - - }
-5  { - - - - - }
-7  { - - - - - - - }
-11 { - - - - - - - - - - - }
-13 { - - - - - - - - - - - - - }

#================#
# Multiplication #
#================#

*2  { - *2  +2  | }
*3  { - *3  +3  | }
*5  { - *5  +5  | }
*7  { - *7  +7  | }
*11 { - *11 +11 | }
*13 { - *13 +13 | }

#================#
# Exact division #
#================#

if/2  { -2  if/2  + | if=0 }
if/3  { -3  if/3  + | if=0 }
if/5  { -5  if/5  + | if=0 }
if/7  { -7  if/7  + | if=0 }
if/11 { -11 if/11 + | if=0 }
if/13 { -13 if/13 + | if=0 }

#===================#
# Base 10 Functions #
#===================#

+10 { + + + + + + + + + + }
-10 { - - - - - - - - - - }

*10 { - *10 +10 | }
if/10 { -10 if/10 + | if=0 }

exp10 { - exp10 *10 | + }
iflog10 { if>1 if/10 iflog10 + | if=1 - }

#=================#
# Fractran Primes #
#=================#

# Runs the following Fractran algorithm for generating primes (taken from the
# Fractran page on Esolang https://esolangs.org/wiki/Fractran):
#     3/11 847/45 143/6 7/3 10/91 3/7 36/325 1/2 36/5
# If running in debug mode, it will print out every prime it encounters.

print_power_of_10 { iflog10 ! 0 - | }

fractran_primes' {
    {
          if/11           *3
        | if/3 if/3 if/5  *7 *11 *11
        | if/2 if/3       *11 *13
        | if/3            *7
        | if/7 if/13      *2 *5
        | if/7            *3
        | if/5 if/5 if/13 *2 *2 *3 *3
        | if/2
        | if/5            *2 *2 *3 *3
    } print_power_of_10 fractran_primes'
}

fractran_primes { 0 +10 fractran_primes' }

main { fractran_primes }