Cammy/Hives
Monoidal Hives with Cammy
Cammy does not specify a transport for expressions, jets, or other essential syntax. Instead, each toolchain is free to implement its own storage mechanism. This page details "hives," a prototype storage format used by the reference toolchain.
Prior Versions
v1
A v1 hive is a directory tree with files. Each file contains a Cammy expression encoded as an S-expression. Expressions are identified using their nested path in the tree.
v2
A v2 hive is a JSON object containing four keys:
- heap
- symbols
- templates
- jets
The heap contains Cammy expressions as trees. Each heap record is either a primitive arrow or a composition of heap records into a composite arrow.
The symbols are named pointers into the heap, identifying expressions of interest from the heap.
The templates and jets are both stored as JSON objects which map template/jet names to template/jet expressions. Templates have de Bruijn numbers representing their holes. Jets must be stored with topological ordering, respecting the underlying partial order.
Objects in git
A v3 hive is a git repository. To properly explain this, we must first explain the structure of objects within git's content-addressed store.
git has four types of objects:
- blob
- tree
- commit
- tag
The `tree` objects form labeled trees, with `blob` objects as leaves. The `commit` objects form a DAG/poset, with each node pointing to a `tree`. The `tag` objects also form a DAG/poset, with each node pointing to any object. These objects are immutable and content-addressed; mutation has copy-on-write semantics. A `tag` object may be mutated in certain ways, but we will adopt the common constraint that they are effectively immutable.
There are additional named types:
- branch
- reference
The `reference` objects form a DAG, and can refer to any of the four unnamed object types. The `branch` objects refer only to `commit` or `tag` objects, and serve only to name them. Named types are inherently mutable.
Encoding of v3 Hives
First-order Expressions
Encoding of standard first-order Cammy expressions is straightforward. Every primitive arrow is encoded as a `blob` containing its symbol as text. For example, the arrow `id` is encoded as a `blob` containing the text "id". Every composite arrow is encoded as a `tree` containing its components as entries, labeled with numbers encoded with ASCII. For example, the arrow `(pr zero succ)` is encoded as a tree which maps "0" to the `blob` containing "pr", "1" to the `blob` containing "zero", and "2" to the `blob` containing "succ".
Since every Cammy expression is a tree, this encoding is complete; trees are mapped to `tree` objects.
First-order Jets
In a certain sense, jets are merely expressions with names. For example, the standard jet `dup` is merely the name "dup" and the expression `(pair id id)`. Each name is given to a `branch` along with the prefix "jets/". For example, `dup` is encoded as a `branch` named "jets/dup" which refers to a `tree` encoding the expression `(dup id id)`.
Note that jet names can be made opaque by omitting them from hives. In that case, it would be implementation-dependent whether the jet is implemented. This compromise is intended for establishing low-level control over assembly output for vectorized or branchless/constant-time operations, not for creating proprietary code; in particular, there might be a future convention for annotating jets which are intended to be opaque while still providing model expressions.
Higher-order Expressions
To encode higher-order expressions with holes, we merely need a way to encode holes as `blob`s. Our technique will be to convert each natural number to its ASCII string, prefixed with "_". For example, the expression `(comp 0 1)` would be encoded as if its components were "comp", "_0", and "_1".
Since templates are merely expressions with holes, this encoding scheme encodes all templates.
Tags and Trails
Tagged expressions are encoded as `commit`s. The name of the expression is given to a `branch` with the prefix "exprs/". The trail is put into the body of the commit message. The expression is encoded as the `tree` which roots the commit.