Olvasható

Olvasható (also sometimes called Olvashato) is a general purpose toy language created by b_jonas. The olvasható source is compiled to a Prolog program and a Standard ML program, the two behaving identically.

History
I, B_jonas, created this toy language in 2006 when a homework assignment for a university course required writing a program to solve the same problem in both Prolog and Standard ML. Translating the program by hand would have been faster but more boring, so I wrote a compiler.

The programs for homework were scored in two ways. They're automatically ran on testcase inputs to see if they give the correct answer in a reasonable amount of time. Then the professor teaching the course read them to see that they're readable and well-documented code. The requirements included writing a comment documenting each function in the program, and each argument of that function. They also required that the (Prolog and SML) programs must have lines at most 72 characters long.

The original goal was that the compiler outputs both readable Prolog code and readable Standard ML code. I ran out of time and so ended up failing this. Now the compiler outputs correct Prolog code and correct SML code, but both are difficult to read.

There are very few programs written in Olvasható. The only non-small program is the one for the homework task, and that one isn't too great either, so it only worked on some of the testcases.

The compiler isn't very full-featured, it only has a little bit more features than was necessary for the homework. I abandoned the language and the compiler shortly after submitting the homework, so now it's completely unmaintained. I think the language is still worth documenting as a lesson though.

Source language
The Olvasható language is mostly a simple pure functional language, with basically all features correspond directly to simple features of Standard ML.

The language does not currently directly support mutable data or other side effects of any sort, so in its own it's pure functional. You could use calls to imported foreign functions to manipulate mutable data or anything else the destination languages support. Indeed, the homework program throws and catches an exception this way.

Apart from that, Olvasható differs from Standard ML in that functions aren't treated as freely, which are compromises for easy compilation to Prolog, but don't limit the power of the language much. Anonymous functions are treated differently from named (global) functions, and have different syntax. Also, neither named functions (nor constructors) nor anonymous functions are automatically curried, you must always call them with all arguments together. If you want partial application, you must make a closure for that.

Top-level declarations
Programs consist of mainly type definitions and named function definitions. Type definitions declare algebraic types. These definitions give a list of constructors, which can then be used directly from the Olvasható program. They also give the number and type of arguments for each constructor, and the resulting type. Function definitions have a name, list of parameters, body expression, and optional doctext. When the function is called, the parameter symbols are bound to the arguments, then the body expression is evaluated, and its value returned. The doctext can have special escapes to refer to an argument or local variable of the function, the return value of the function, or a global Olvasható symbol. Mutual function definitions are also possible.

The program can have a few other declarations besides these definitions. You can import algebraic types or functions  defined elsewhere so that the Olvasható program can directly refer to the constructors or functions. You can set Prolog or SML identifiers corresponding to global Olvasható identifiers, which can be used both to import external functions or types under a different name together with the above, or to export types or functions defined in Olvasható so that they can be called from Prolog or SML. Apart from this, you can conditionally compile code for just Prolog or just SML, emit top-level comments, or emit pieces of handwritten Prolog or SML declarations straight into the compiled code.

Named function definitions must appear at the top level, but lambda expressions can be nested at any level.

Expressions
Expressions can be any of the following.


 * Local variables. These expressions get the value of a local variable, which are defined by parameters of the named function, patterns of let expressions, or parameters of lambda expressions.)
 * Calls to global functions (defined or imported foreign functions). These must have arity (number of arguments) matching the function definition or declaration.
 * Calls to a data constructor.
 * Integer literals.
 * Integer arithmetic expressions with the following operators:
 * Boolean valued integer comparison:
 * Deep structural equality comparison:
 * Logical operations on booleans with short-circuiting:
 * Case match expressions. These can bind any value to any variable, destructure values on data constructors, even multiple levels at a time, have multiple arms for matching constructors, and can match multiple values each to corresponding patterns.  Patterns can only include data constructors and newly bound variables.
 * Conditional expressions, which can have multiple conditions for an elseif cascade.
 * Calls to anonymous functions: .  The number of arguments is determined at compile time, so there's one expression for each argument.  The function can be an arbitrary expression.
 * Lambda expressions to create an anonymous function closure that can capture local variables of the surrounding code:
 * Error exit:

A data constructor can be:
 * one defined in Olvasható,
 * a foreign one imported to Olvasható with a declaration,
 * the constructor for an SML tuple, which can have any arity,
 * the list cell constructor,
 * a full bracketed list construction with the number of elements fixed at compile time and an expression for each element listed
 * or  booleans, with zero arguments.

It is worth noting that variable identifiers that appear in  patterns and parameter lists are always separate local variables even if the same identifier occurs multiple times. This can be surprising because Prolog doesn't work that way. This doesn't actually come up in the code I write though.

Concrete syntax
The concrete syntax of Olvasható is S-expressions similar at first sight to lisp code.

Olvasható expressions look somewhat similar to ordinary lisp code, but most of the nontrivial expression types are actually different. Literals, arithmetic, comparison and logical operations, named function calls, lambda expressions, and most data constructor calls including the list ones look quite like the ones in lisp. Anonymous function calls have custom syntax, because I didn't want to implement Scheme's convenient syntax in the compiler, nor use any of Common Lisp's abomination syntaxes. expressions have a custom syntax that doesn't resemble any lisp. This is mostly because lisp code rarely uses algebraic variable destructuring, although something similar exists as macros. expressions also have custom syntax, because the language is mostly side-effectless so I don't need lisp's easy syntax for sequencing multiple expressions (as in the comma operator of C).

Function definitions almost look like Common Lisp  definitions, but not quite, because the doc string is after the body, not before. Other top-level declarations have custom syntax.

Compiler
The compiler for Olvasható can output Prolog code or SML code. Each of those two outputs are complete, running the entire Olvasható program, and standalone, not needing any custom runtime libraries other than the standard libraries of those languages and whatever foreign stuff the program directly uses.

For the homework, the compiled output was ran with Sicstus Prolog 3.12.x and MOSML 2.01 respectively, but it shouldn't be hard to port the compiler to work with any other Prolog or SML implementation. In particular, I believe the output runs without change on recent versions of GNU Prolog and SWI Prolog.

The compiler is implemented in ruby 1.8.

Formatting of the compiled output
The main reason why both the SML and Prolog output still fails to be readable is the formatting. Breaking code to lines at most 72 characters long while also indenting the code in a reasonable way is a difficult problem that would require a multipass algorithm working both downwards and upwards the syntax tree. Originally I was supposed to attempt to implement something like that, but I ran out of time, so the output is just an unreadable mess where lines are broken in a hard to read way with only a single level of indentation used for the entire body of each function. The Prolog output is even less readable than the SML one, both because the code is more verbose, so the 72 character bound is more limiting, and because the compiler cannot produce idiomatic Prolog code at all and misses a lot of Prolog-specific optimizations that could be done.

The compiler generates identifiers in Prolog and SML from just the identifiers in the source code, but modifies the first letter of the symbol to uppercase or lowercase, as required by the language rules of Prolog. The compiler renames identical local variables within a function by appending a numeric suffix. For the result of function calls, these are usually based on the name of the functions. The Olvasható programmer is responsible to make sure that the generated global identifiers don't clash with SML keywords or library symbols of SML or Prolog in a way that breaks the program, although the compiler does have a short list of keywords that it will avoid by renaming.

SML output
All Olvasható constructs are translated in a completely straightforward way to their underlying SML constructs. This should result in mostly readable SML code as output, because I write the Olvasható programs with mostly the mindset of targeting an ordinary pure functional language like SML.

The Olvasható compiler doesn't know about the SML type system at all, but the SML compiler itself will check the types, so the program does have to write a well-typed program. SML types are mentioned in the source code for data type definitions so that the compiler can output the definitions for SML datatypes, and also in foreign type declarations so that the syntax is similar. Types aren't mentioned in function definitions or anywhere else, we just let the SML compiler derive all the types.

Perhaps the only big readability optimization missing from the SML compiler is that it fails to deconstruct arguments in the function head or output multiple function heads with different patterns for the arguments. This could be done mechanically whenever the body of the function is a let expression, and would improve the readability of the SML code a bit, but this isn't too big of a problem. Funnily the Prolog compiler does do this optimization with function heads. Another missing optimization is that currently all  expressions of Olvasható are translated as   expressions in SML, even though the ones with only one arm should be translated to   expressions instead.

Prolog output
In the Prolog output, Olvasható functions are always translated to Prolog predicates where the last parameter outputs the return value, the rest of the parameters are the parameters of the Olvasható function, and the function succeeds exactly once. Foreign functions imported and called from Olvasható code are required to work this way too.

Prolog doesn't know about types at all, so the types aren't output there, and the Prolog interpreter it runs the Prolog program output as a dynamically typed program. Almost all Olvasható values are represented as ground terms in Prolog, with Prolog terms representing the SML terms in a mostly obvious way. Lists are represented as Prolog lists, and tuples are represented with the head.

Function closures are represented as terms with head  and unbound variables inside. These have the form. Here,  are always unbound variables, but the   is actually Prolog code. To call such a closure, stored in the variable, you must first make a renamed copy so that the closure can be reused later, then in the rename copy, unify the   variables to the arguments   of the call, then execute the   as prolog code, then the result of the call is in. This is as simple as  in the Prolog code. If the closure captures any upvalues, then these occur straight inside the  term whereever the closure body uses them. Luckily because of the limited use of non-ground terms, handling upvalues is easy, otherwise the compiler would have to emit extra code to copy each upvalue when compiling any lambda expression, because it would not be able to tell when it's safe to optimize that away.

In addition to the local variables in the Olvasható source code, the Prolog code has to use a temporary variable to store the result of most subexpressions. These don't correspond to any variable in the source code, so the compiler has to just make up names. The compiler tries to keep these generated names mostly sensible, eg. the results of a call to a named function are put in a temporary based on the name of the called function.

Expressions with arithmetic operators and comparisons are compiled properly to reasonable Prolog code, with the whole arithmetic expression written together and evaluated with the same  or comparison predicate. If the comparison appears as the condition of an  conditional expression, then you get idiomatic Prolog code with arrow conditionals or multiple clauses and a cut. You still get somewhat reasonable code if multiple comparisons combined with logic operators appear as the condition of a conditional. If the comparison result is used in any other way, such as bound to a  variable, used as an argument to a function or constructor call, or returned from a function, then the boolean value is reified (represented) as a term   or , resulting in verbose and often ugly code.

The generated prolog code often contains redundant temporary variables that always just get unified with another variable. This happens almost every time you use a  expression. There's also some dead code emitted after the Prolog  call when the Olvasható   primitive is involved. There's a lot of optimizations that could be done in such cases to get more idiomatic and shorter Prolog output.

Links

 * Old webpage about the language, with the compiler, the very few existing programs, and their compiled form.
 * Compiler, implemented in ruby 1.8
 * Homework program: Olvasható source code, compiled to Prolog, compiled to SML