Stisp

From Esolang
Jump to navigation Jump to search

Stisp is a language created by Tanner Swett in 2022. It is a matrioshka language, where the meta-language is designed to be usable for describing Lisp-like languages in such a way that they can be interpreted with no recursive function calls.

There has been an attempt at implementing Lisp in Stisp.

Structure of a program

In extended Backus–Naur form, the syntax of the language is:

<program> ::= {<equation>} <stack>
<equation> ::= <expr> <frame> "=" <stack> ";"
<stack> ::= <expr> {<frame>}
<frame> ::= "->" lword "[" {<expr> ","} [<expr>] "]"
<expr> ::= uword | lword | "(" <expr> "." <expr> ")" | "(" {<expr>} ")"

An identifier consists of one or more letters, digits, hyphens (-), and/or underscores (_). An lword is an identifier where the first character is a lowercase letter, and a uword is an identifier where the first character is anything other than a lowercase letters.

Consecutive identifiers must be separated by at least one whitespace character. Aside from that, whitespace is irrelevant. A line comment starts with the number sign (#) and continues until the end of the line.

There are four types of S-expressions: variables, which are denoted by lwords; atoms, which are denoted by uwords; cons cells, which are denoted by the (head . tail) syntax; and the empty list, denoted as (). If x1, x2, ..., xn are S-expressions, then the syntax (x1 x2 [...] xn) is syntactic sugar for the S-expression (x1 . (x2 . [...] (xn . ())[...])). For example, (LAMBDA (X) X) is syntactic sugar for (LAMBDA . ((X . ()) . (X . ()))). (This is the same syntactic sugar that is used in most versions of Lisp.)

The adjective "closed" means "containing no variables." Closed S-expressions serve as the main way of representing data at runtime.

A frame expression consists of a block name, which is an lword, and any number of bracketed arguments.

A stack expression consists of an S-expression (the "head") followed by any number of frame expressions. A closed stack expression is a stack expression that contains no variables, or, in other words, a stack expression where the only lwords are block names.

An equation consists of one S-expression and one frame expression on the left-hand side, and one S-expression and any number of frame expressions on the right-hand side. It is illegal for any variable to appear on the right-hand side unless that same variable also appears at least once on the left-hand side. Note that block names are not variables, so the equation h -> b[] = b; is illegal, because b never appears on the left-hand side as a variable.

Finally, a program consists of any number of equations, followed by exactly one closed stack expression.

Execution

At any point in time, the state is a closed stack expression. The current expression is the S-expression at the head of the stack expression, and the call stack consists of all of the frames in the stack expression. The initial state is the stack expression at the end of the program.

To execute a program, perform execution steps until the call stack is empty. When the call stack is empty, execution is complete, and the result of the execution is the current expression.

To perform an execution step, do the following:

  • Look at the current expression and the top (that is, first) frame of the call stack. These are the "immediate state."
  • Find the first equation in the program whose left-hand side matches the immediate state. An expression matches the immediate state if there is some consistent way to substitute variables in that expression to obtain the immediate state. For example, the expression x -> which[x, BLUE] matches the immediate state YELLOW -> which[YELLOW, BLUE] by replacing x with YELLOW.
  • On the right-hand side of the equation we found, substitute every variable in the same way that it was substituted on the left-hand side.
  • Replace the immediate state with the result.

If we ever attempt to perform an execution step, but find that no equation matches, the outcome is undefined.

A simple example

Suppose that we have the program

x -> eq[x] = TRUE;
x -> eq[y] = FALSE;
BLUE -> eq[YELLOW] -> eq[FALSE]

In the first execution step, we find that the immediate state is BLUE -> eq[YELLOW]. The first equation doesn't match the immediate state, because there is no value which can be substituted for x in x -> eq[x] to obtain BLUE -> eq[YELLOW]. However, the second equation does match the immediate state, because the expression x -> eq[y] can be transformed into BLUE -> eq[YELLOW] by replacing x with BLUE and y with YELLOW.

At this point, if the variable x or y appeared on the right-hand side of this equation, then that variable would be replaced with BLUE or YELLOW, respectively. However, since the right-hand side is merely the closed S-expression FALSE, no such replacement occurs.

Next, the immediate state (BLUE -> eq[YELLOW]) is replaced with the substituted version of the right-hand side of the matching equation (FALSE). Thus, the execution state becomes FALSE -> eq[FALSE]. This concludes the first execution step.

In the second execution step, the immediate state is FALSE -> eq[FALSE]. The first equation matches the immediate state, so the immediate state (which is now the entire execution state) is replaced with TRUE.

After these two execution steps, the call stack is empty, and the current expression is TRUE. So, execution finishes, and the result of the program is TRUE.

A more complex example

This is the example which motivated the creation of Stisp in the first place.

(CONS xh xt) -> eval[] = xh -> eval_cons[xt];
(QUOTE x) -> eval[] = x;

xh -> eval_cons[xt] = xh -> eval[] -> eval_cons_2[xt];
h -> eval_cons_2[xt] = xt -> eval[] -> reverse_cons[h];

t -> reverse_cons[h] = (h . t);

(CONS (QUOTE HELLO) (QUOTE ())) -> eval[]

Execution proceeds as follows:

(CONS (QUOTE HELLO) (QUOTE ())) -> eval[]
(QUOTE HELLO) -> eval_cons[(QUOTE ())]
(QUOTE HELLO) -> eval[] -> eval_cons_2[(QUOTE ())]
HELLO -> eval_cons_2[(QUOTE ())]
(QUOTE ()) -> eval[] -> reverse_cons[HELLO]
() -> reverse_cons[HELLO]
(HELLO)

Author's remarks

Lately, I've been investigating ways to implement a Lisp-like programming language. In writing a Lisp interpreter, one finds that in order to evaluate an expression, it's often necessary to evaluate the sub-expressions of that expression. One obvious way to implement that is to have the evaluation function call itself recursively. But how can Lisp be implemented without any recursive calls?

It seems obvious that in order to implement recursive calls without using recursive calls, the evaluation function should manually keep track of a call stack. That raises the question of what a stack frame should look like. Is a stack frame simply an expression? Should a stack frame contain a marker indicating where the result of the current expression should be substituted? Can a stack frame contain multiple such markers? Can the sub-expressions of a stack frame be something more complicated than S-expressions? If a function needs to call multiple other functions, how does the stack frame track where the instruction pointer is in the calling function? Where are local variables stored?

Eventually, I realized that if each function takes one special "return value" parameter, then it's unnecessary to mark where return values need to be substituted. Furthermore, all of the work of keeping track of local variables, and of keeping track of where we are in the execution of a function, can simply be foisted off onto the programmer, who can accomplish the task by splitting one function into several "blocks" (as the eval_cons "function" was implemented using both eval_cons and eval_cons_2 above).

The result is Stisp.

It seems very unlikely that I'm the first person to discover the idea of an abstract machine which works using Stisp-like stack frames, but I have no idea how to figure out who discovered it first. I bet it was in the 1960s, though.