ULEX

From Esolang
Jump to navigation Jump to search
ULEX
Paradigm(s) Functional
String-rewriting
Designed by ULEX-dev
Appeared in 2024
Memory system Cell-based
Computational class Turing complete
Major implementations Original
Influenced by Brainfuck
λ-calculus
File extension(s) .txt

ULEX is a minimalist programming language based on a variation of λ-calculus envisioned in the 80s by Nicolaas Govert de Bruijn.

Language overview

The whole language boils down to a set of valid expressions represented by strings and a function capable of doing reductions on them, following both the string rewriting and functional paradigms.

Etymology

ULEX is both an acronym meaning (U)nnamed (L)ambda (Ex)pressions and a homage to the uppercase "LISP", a family of programming languages based on the functional paradigm. Large programs written in the language may contain many asterisk characters which can resemble the needles of the plant Ulex europaeus, trivia that gave the language its six pointed star symbol.

Author's note:

I admit that FLUX ([F]unctional [L]anguage of [U]nnamed e[X]pressions) would've been a better name choice if it wasn't already picked before I created it. Since ULEX is also a valid english word we'll roll with it!

Specifications

AST
An AST of a *-expression representing the Ω combinator

The following EBNF notation explains how to form valid *-expressions:

expression  = redex | function | placeholder;
redex       = '(', expression, '|', expression, ')';
function    = expression, '<';
placeholder = '*'*;

Once an expression is valid, each character can be interpreted according to the table below:

Character Name Use
< Less than sign Ending functions
( Left parenthesis Starting redexes
| Vertical bar Separating redex functions and arguments
) Right parenthesis Ending redexes
* Asterisk Starting and ending placeholders

The main difference between ULEX and λ-calculus is the way variables are handled. With λ-expressions a finite length name must be assigned to each variable, but if De Bruijn indices are used then each variable is assigned a natural number according to its scope inside function expressions, excluding the need for names altogether. In this sense, ULEX uses a metavariable scheme which disposes the use of types beyond the valid expressions defined previously.

Combinator λ-expression *-expression
I λx.x *<
K λxy.x **<<
S λxyz.xz(yz) ((***|*)|(**|*))<<<
Y λf.(λx.f (x x)) (λx.f (x x)) ((**|(*|*))<|(**|(*|*))<)<

Execution

Executing a piece of ULEX code is not as straight forward as in imperative languages: a reduction might not give the result straight away, instead relying in a sequence of steps much like with instruction registers in modern CPUs. With arbitrary memory to store a copy of the string containing the valid *-expression, successive reductions take place until the result is invariant with relation to the source expression, that is, a reduction no longer changes the string.

Reductions

Redex expressions are defined by an ordered pair of two other expressions, respectively called the redex function and redex argument. Placeholders are strings of asterisks interpreted as natural numbers. An interpreter for ULEX models the reduction function β, which takes an input string s containing a redex expression and yields the reduced string s'. An example of reduction is presented below:

// redex function = *<
// redex argument = ***
s = (*<|***)
β(s) = ***

In *-expressions every placeholder corresponds to a numbered function scope that indicates which metavariable belongs to each function expression. Placeholders not belonging to the outermost function scope will not be replaced by the redex argument during a β-reduction:

s = (*<<|***)
β(s) = *<

Additionally, all placeholders members of redex functions and arguments which are free get rebound to their new scope, property demonstrated by the examples below.

// rebinding a redex function with scope 6 to 5:
s = (******<|*)
β(s) = *****
// rebinding a redex argument from scope 2 to 4:
s = (***<<<|**)
β(s) = ****<<

Generally, β-reductions act as function applications using placeholders as variables, with special cases for scope rebinding. In this fashion, halting can be achieved by stopping computations if s = β(s).

Examples

ULEX computations can be displayed as a top-down list of expression reductions like so:

((((**|*)|**)<<|**<<)|*<<) // starting expression
(((**<<|*)|**<<)<|*<<)     // reduction 1
((**<<|*<<)|**<<)          // reduction 2
(*<<<|**<<)                // reduction 3
*<<                        // reduction 4, end of computation

Cat

A cat program will parrot its input to its output, acting as an identity function with the print side effect. Since ULEX is purely functional there are no side effects - the output string must be interpreted from the resulting reductions of the source code, therefore only the identity function is needed:

*<

Quine

Because all ULEX programs are homoiconic, it is not possible to write a quine program in it without falling in the "cheating" category of the esolang.org definition of quines:

"Often a program that just performs access to its own source code (reading it either from memory or from a disk file) and prints it out is considered cheating."

Hello, World!

By using natural numbers as ascii character representations in place of copies of the asterisk character * it is possible to write a hello world program in ULEX:

(((((((((((((72|101)|108)|108)|111)|44)|32)|87)|111)|114)|108)|100)|33)

Infinite loop

infinite loops are implemented in the language with the help of fixed-point combinators. Here's an example (commonly known as the Ω combinator):

((*|*)<|(*|*)<)
((*|*)<|(*|*)<)   // reduction 1
((*|*)<|(*|*)<)   // reduction 2
((*|*)<|(*|*)<)   // reduction 3 and so on...

Edge cases

Minification

It is possible to reduce the total number of characters needed to assemble a *-expression by considering the following modified EBNF terminal for placeholders instead of the previous definition:

placeholder = "()"*;

This way there are only four symbols in the language:

Character Name Use
< Less than sign Ending functions
( Left parenthesis Starting redexes and placeholders
| Vertical bar Separating redex functions and arguments
) Right parenthesis Ending redexes and placeholders

The first example would look like this under this specific dialect of the language:

((((()()|())|()())<<|()()<<)|()<<) // starting expression
(((()()<<|())|()()<<)<|()<<)       // reduction 1
((()()<<|()<<)|()()<<)             // reduction 2
(()<<<|()()<<)                     // reduction 3
()<<                               // reduction 4, end of computation

Since this dialect uses four characters ['<', '(', '|', ')'] instead of five ['<', '(', '|', ')', '*'], it is possible to write ULEX source code using two-bit words instead of three to represent separate characters. The character list becomes smaller at the expense of longer source code however, since now double the words are needed to represent a placeholder:

**** -> ()()()()
char[4] -> char[8]

Metalanguage

By using a preprocessor and a *-expression catalog it's possible to write and read human-readable code that can be expanded directly into ULEX with the help of extra formatting by placing parentheses in operators which must take precedence, like in the following example:

// FALSE = *<<
// TRUE = **<<
// AND = ((**|*)|**)<<

AND TRUE FALSE             // original string
((**|*)|**)<< **<< *<<     // expanded macros
((((**|*)|**)<<|**<<)|*<<) // inserted parentheses
(((**<<|*)|**<<)<|*<<)     // reduction 1
((**<<|*<<)|**<<)          // reduction 2
(*<<<|**<<)                // reduction 3
*<<                        // reduction 4
FALSE                      // translated macro

Conventions

Following the normal order of application (leftmost innermost redexes are reduced first)

Related languages

  • Brainfuck, another minified character set language
  • DeBruijn, another implementation of De Bruijn indexing into a λ-calculus based language
  • Universal Lambda, another attempt at representing a λ-calculus based language in binary