YEOOIIOOIOA

YEOOIIOOIOA is an esolang based on μ-recursive functions and binary strings, invented in April 7, 2015 and revamped in April 27, 2018 by User:Arseniiv.

The name YEOOIIOOIOA itself means the string "2" (intended meaning) or the integer 0x132, see below.

Overview
The program describes a function from several binary strings to several binary strings. These strings may also be thought of as positive numbers in binary whose leading ‘1’ is dropped (1 → "" (empty), 10 → "0", 11 → "1", 10010 → "0010" etc.).

Functions are expressed as series of operators applied to a bunch of standard functions, in the fashion of primitive recursion framework.

Examples
This prints either the string  or 0x2A:

YEIOIOIOA

This is Cat program:

[H1H1]

(Meow! Alas, it’s a Shrödinger’s cat, as there’s no specification where to grab program arguments from and where to send results.)

This is Hello, world!:

H148656c6c6f2c20776f726c6421

This concatenates its two arguments, following a definition concat(x, "") = x, concat(x, y + c) = concat(x, y) + c:

Id [H1 H1]. Add-"0"-to-3rd Y[H3 H3]OA. Add-"1"-to-3rd Y[H3 H3]IA. U Id Add-"0"-to-3rd Add-"1"-to-3rd A

or, more compact and write-only:

U[H1H1]Y[H3H3]OAY[H3H3]IAA

This inverts its argument:

UEY[H2H2]IAY[H2H2]OAA

Shortest non-terminating programs:,.

''You are free to add more soul. For example, write a binary adder or Befunge interpreter.''

Notes for implementers
You may truncate the language considerably: imports are non-essential, as is any specific I/O mode, just give it at least one. ;)

There’s no reference implementation yet.

Lexical
The whitespace and case-sensitivity is as usual in C-like languages, if only characters  (standard digits and all ASCII printable non-alphanumerics except  ) are treated as small letters, and parentheses are treated as spaces (you will need them, I assure).

There are line comments started with  and terminated by newlines.

Identifiers start with a capital letter and continue with zero or more small letters. So, whitespace is optional between two identifiers:  parses unambiguously as.

Several identifiers are reserved and cannot be used as names of the user-defined functions or imports. These are  and all names starting with. Latter ones represent numeric constants in hexadecimal:  means 0xD0B1. If that constant is encountered as an expression, it’s treated as in I/O (in binary, with leading ‘1’ removed). E. g.  =. Also,  by itself is equivalent to   (or, say,  ) and cannot occur as a valid expression.

High-level
I like EBNF, so there is the grammar of this language in EBNF.

The program is a sequence of imports, then definitions, and then an expression, which would be run when preparations are done:

Program = Import* Definition* Expression Import = "`" Identifier

The definition is a name and an expression being bound to that name. Note this expression cannot contain the name defined and any other names which weren’t yet defined, in this source or in imports.

Definition = Identifier Expression "."

Semantics of expressions will be discussed in the following section.

Expression = "E" | "O" | "I" | HexLiteral | "[" HexLiteral+ "]" | "{" Expression+ "}" | "Y" Expression+ "A" | "U" Expression Expression Expression "A" | "W" Expression

The programs are saved in files with extension e. g.. This only matters when using imports.

Semantics
For each import, definitions of the corresponding file are processed (not the final expression, if any). For example, when encountering,   should be searched for, and if it’s found, the definitions there should be processed.

Each definition is processed from earlier to latter ones, and the names statically bound to the defining expressions. Expressions are checked for validity as usual (see below).

It’s a static error if import source haven’t been found or an expression is invalid.

Expressions
There’s only so many valid expressions. I’ll give each of them a type, f : m → n meaning that f is a function with m inputs and n outputs. Both m and n may be zero. If m is zero, then f is effectively a constant, not needing any inputs to evaluate, and zero n means there are no output (a trivial result, e. g.  in Python,   in C-like languages or   in Haskell). Functions of the latter kind are useful when we need to substitute a constant somewhere a function of one or more arguments is needed. I’ll give the function corresponding to  as [expr].

Firstly, there are direct string manipulation functions.


 * Empty string constant:


 * [E] : 0 → 1
 * [E] = ""


 * Adding a character to the right:


 * [O], [I] : 1 → 1
 * [O](s) = s + "0"
 * [I](s) = s + "1"


 * Recursing through a string (so-called primitive recursion):


 * if [f] : m → n and [gc] : m+1+n → n, then [U f g0 g1 A] : m+1 → n
 * [U f g0 g1 A] = h where
 * h(xs…, "") = [f](xs…)
 * h(xs…, x + c) = [gc](xs…, x, h(xs…, x))


 * Looping through all possible strings lexicographically to find a suitable one (so-called μ recursion):


 * if [f] : m + 1 → n, then [W f] : m → 1
 * [W f](xs…) = for (x ∈ all_strings_ordered_lexicographically) if [f](xs…, x) = ("", …, "") then return x

Note that μ recursion is the only mechanism adding non-totality (and Turing-completeness) to the language. Any valid expression which doesn’t contain  represents a total function.

Also, there are helpers to glue these parts.


 * Projections:


 * if mi ∈ 1..n, then [[ m1 … mk n ]] : n → k
 * [[ m1 … mk n ]](x1, …, xn) = (xm 1, …, xm k )
 * Note that k may be zero.


 * Concatenation of results:


 * if [fi] : m → ni and k ≠ 0, then [{ f1 … fk }] : m → Σi ni
 * [{ f1 … fk }](xs…) = ([f1](xs…), …, [fk](xs…))
 * Allowing zero k here would lead to type ambiguity.


 * Composition:


 * if [fi] : mi → ni, ni = mi+1 and k ≠ 0, then [Y f1 … fk A] : m1 → nk
 * [Y f1 … fk A] = [fk] ∘ … ∘ [f1]
 * That is, the argument is passed to [f1], its result to [f2] etc., and the result of [fk] is the result of the composition.

The order of functions in the composition syntax is selected such that translating string constants into the corresponding code is easy: replace all 0s with Os, 1s with Is, and "…" with YE…A. Also, it’s quite natural in expressing data flow, compared to the traditional f ∘ g (g then f).

I/O semantics
There are many ways to input and output binary strings. The two standard ways are as byte strings or as numbers.

Byte strings
On input, bytes of the sequence are expressed as binary and these expressions concatenated (the below examples assume UTF-8):


 * "2" →  =   →   =  ;
 * "bi" →  =   →   =  ;
 * "ҩба" →  =   →   =.

On output, if there is not enough bits in the string, zeros are added:


 * →  =   → "2";

It is decision of the implementer how to delimit arguments to the program on input and its possibly multiple results on output. (One possibility is for the implementation to support only up to one input and/or output in byte string mode.)

Integers
Input value is converted to binary in usual way, least bits to the right, and then leading ‘1’ is removed. If the integer is zero or negative, or it isn’t an integer at all, the interpreter should signal an error.


 * 1 →  →   =  ;
 * 42 →  →   =.

On output, the conversion is inverse:


 * →  → 100.

The interpreter should probably use hexadecimal notation for integer I/O, as (1) it’s not alien to the source syntax, and (2) it’s more compact than binary and still is “binary-transparent”.