Abyssal-9
- Note: This page has been generated by a ChatGPT and Grok.
Abyssal-9
Abyssal-9 is an intentionally unreadable esoteric programming language designed so that static analysis and program generation are astronomically difficult under the canonical configuration. Programs are encoded as a sequence of trits (base-3 digits: 0, 1, 2). The runtime decoding of each instruction depends on a program-specific landscape derived from a long prefix checksum, so that reconstructing instruction meaning usually requires full dynamic simulation of the program prefix, microtable rebuilding, CNF/witness verification, and more.
This page is a complete, canonical specification and explanatory reference for Abyssal-9: what it is, how it works, the canonical constants and formulas, recommended serialization, generator strategy, and practical notes for implementers.
Short high-level summary
- Programs are a single stream of trits (`0`,`1`,`2`).
- The first large window of trits (the prefix) seeds per-program artifacts (deinterleaver coefficients, microtables, witness/PoW templates).
- The stream is deinterleaved into k planes; each plane is chunked into fixed-width trit-words of length W (canonical W=23).
- Each decoded word must pass staged verification: candidate derivation → witness computation → CNF & PoW checks → authoritative schema selection (with fallbacks).
- Schemas can require reading overlapping words and implicit backrefs into the runtime tape; CNF fragments couple program regions globally.
- Canonical parameters (ease=0) make generation an astronomically costly global search problem.
Goals and design intent
- Make an esolang that is deterministic yet infeasible to craft by hand in canonical mode.
- Force implementers/generators to perform heavy computation to produce even trivial semantics.
- Allow toy/ease modes for demonstration and testing that drastically reduce hardness while preserving architecture.
Canonical constants (authoritative)
These constants are canonical for ease=0. Implementations may support alternate eases for experiments; canonical values should be used for interoperability tests.
- Word trit-length (W): 23 (allowed alternatives: 19, 23, 29 — 23 is canonical).
- Trit planes (k): 5.
- Prefix checksum window (H0): 4093 trits (or the entire program if shorter).
- Microtable rows: 65537.
- Opcode space (N): 128 (indices 0..127).
- Mix function: Mix128 (canonical 128-bit mixer — see below).
- Address fold constants: P1 = 0x9E3779B97F4A7C15, P2 = 0xC2B2AE3D27D4EB4F.
- PoW trailing zeros (canonical): 20.
- Average CNF clauses (canonical): ~47 clauses per schema row (clause length ≤ 11).
- Overlap multiplicity (r): allowed range 0..7 (varies across schemas).
- Implicit backref depth (canonical max): 13.
Terminology
- trit — a digit in {0,1,2}.
- plane — one of k deinterleaved substreams obtained from the raw trit stream.
- word — consecutive W trits inside a single plane (words do not cross planes).
- prefix — first H0 trits of the program (or whole program if shorter); seeds microtable, deinterleaver, etc.
- microtable — a per-program large table of rows; each row describes opcode permutation, schemas, witness templates and other per-row fields.
- schema — the decoding/argument interpretation/rules for a specific opcode mapping.
- witness — a computed heavy transform used to verify a schema candidate.
- CNF fragment — a small conjunctive normal form boolean formula embedded in a schema that must evaluate true for schema use.
- G — global runtime state (internal interpreter state seed/accumulator).
- IP — instruction pointer (index into plane-word sequence for a given plane).
Mix128 (canonical 128-bit mixer)
The spec uses Mix128 as a deterministic, avalanche-prone 128-bit mixing function. Implementations must implement Mix128 exactly for canonical behavior.
# Reference Mix128 (pseudocode — canonical)
# mix64: deterministic 64-bit mixer
def mix64(v):
v = (v xor (v >> 30)) * 0xBF58476D1CE4E5B9
v = (v xor (v >> 27)) * 0x94D049BB133111EB
v = v xor (v >> 31)
return v & ((1<<64)-1)
# Mix128: fold two 64-bit inputs into two 64-bit outputs
def Mix128(lo_in, hi_in):
a = mix64(lo_in xor 0x243F6A8885A308D3)
b = mix64(hi_in xor 0x13198A2E03707344)
lo_out = (a xor ((b << 1) & ((1<<64)-1))) & ((1<<64)-1)
hi_out = (b xor ((a << 1) & ((1<<64)-1))) & ((1<<64)-1)
return lo_out, hi_out
Notes:
- All shifts are logical (unsigned) shifts.
- All arithmetic is on 64-bit unsigned integers with masking to 64 bits.
Program binary model and serialization
Two common distribution formats:
- Plain-text trit file (".aby9"): each trit written as ASCII '0'/'1'/'2' — convenient but 1 byte per trit (inefficient).
- Packed trit binary (".aby9p"): pack trits into bytes. Canonical packing: pack 5 trits per byte (3^5=243 < 256); leftover trits in the final byte are zero-padded in the high bits. Use big-endian within each packed byte for portability.
Suggested canonical file header for portable files
(If embedding prefix, microtable, or metadata; optional for toy modes)
ABY9v1 ease=<0|1|2|3> encoding=<raw|packed> prefix_included=<0|1> microtable_included=<0|1> reserved=0
Deinterleaving: planes and polynomial coefficients
The flattened trit stream is an interleaving of k plane-strings. Which plane a raw-stream index t belongs to is determined by a polynomial function whose coefficients are derived from the prefix checksum. This is intentionally nontrivial to make plane membership dependent on the prefix.
Canonical coefficient generation
1. Compute (prefix_check_lo, prefix_check_hi) = Mix128(prefix_low, prefix_high). The prefix_low/high are the integer interpretations of first H0 trits split canonical as low=lower half, high=upper half. 2. Set seed = (prefix_check_lo, prefix_check_hi). 3. For c in 0..(k*L-1) (L a small loop constant depending on ease), compute (a_lo, a_hi) = Mix128(seed_lo xor c, seed_hi xor c); combine into a large integer A = a_lo + (a_hi << 64), then coef_c = A mod 3. 4. Compose coefficients into polynomials poly_p and poly_o that map stream index t to plane p and offset o. Implementation must use canonical polynomial evaluation (Horner) and canonical modulo arithmetic.
Deinterleaving algorithm (canonical pseudocode)
# Input: raw_trits[0..T-1], coefficients coef[0..m]
# Output: planes[0..k-1] each a list of trits
for t in 0..T-1:
p = poly_p(t, coef) mod k
append raw_trits[t] to planes[p]
# After mapping, split each plane into words of W trits.
Word layout and interpretation
- Each word is W trits (base-3 big-endian integer). For canonical W=23, word values range 0..3^23-1.
- Words do not cross planes.
- A word can be interpreted differently by schemas: literal integer, signed integer (two's complement-like in base 3 via mask), masked value, or as multiple sub-fields.
Microtable: derivation, layout, and parsing
Microtables (size microtable_rows) are generated deterministically from prefix checksum. Each microtable row contains:
- an opcode permutation (128 entries mapping 7-bit candidates to opcode indices),
- a schema table (up to 128 schema entries),
- a row-wide seed/template fields (step_bias or step_template, witness seed base),
- optional CNF template indices and witness template pools.
Canonical microtable generation (precise)
For r in 0..microtable_rows-1: 1. seed_lo, seed_hi = Mix128(prefix_checksum_lo xor r, prefix_checksum_hi xor r). 2. Use an RNG stream derived from iterating Mix128(seed_lo + i, seed_hi + i) for i=0..S-1. 3. From the RNG stream parse:
* 128 unique bytes to form a permutation (use canonical Fisher-Yates with RNG stream).
* For each of the 128 schema slots, parse a fixed field layout:
- arg_base (4 bits),
- arg_mask (variable length, canonical encoding),
- flags (bitfield: overlap_enabled, backref_allowed, CNF_present, witness_present, PoW_present, fallback_count),
- CNF_template_index (if CNF_present),
- witness_template_index (if witness_present),
- fallback_list (sequence of up to M fallback indices),
- step_template/step_bias (32 bits or compact).
4. Store M[r] as the microtable row.
Microtable row binary layout (canonical bit widths)
A canonical row serialized for inclusion in microtable_included files uses fixed-width fields for reproducibility:
- 128 × 8-bit permutation entries = 1024 bits.
- 128 × schema_descriptor where each descriptor is:
* 16 bits header (arg_base:4 | flags:6 | fallback_count:6), * variable-length payload (CNF idx 16 bits, witness idx 16 bits, step_bias 32 bits, fallback_list fallback_count × 16 bits, arg_mask encoded variable-length).
(Implementers MUST use canonical packing rules; leaving fields ambiguous breaks determinism.)
Two-stage decode (detailed step-by-step)
This section explains step-by-step what the interpreter does when it decodes a word at IP on a given plane.
Stage 0: Preliminaries
- Ensure microtable M is constructed or available.
- Determine micro_idx = (Mix128(prefix_checksum_lo xor G_seed, prefix_checksum_hi) mod microtable_rows) — G_seed is runtime-derived constant (usually 0 initially).
- Let row = M[micro_idx].
Stage 1: Candidate derivation
1. Read w = word(IP) as a W-trit integer. 2. Split w into canonical 64-bit halves parts_of_w_lo and parts_of_w_hi (if W<128 use zero high bits). 3. Compute (s_lo, s_hi) = Mix128(parts_of_w_lo, parts_of_w_hi) xor G_fold where G_fold is canonical folding of global state. 4. Compute opcode_candidate = fold_to_7bits(s_lo, s_hi) using canonical folding function (e.g., xor high and low halves, reduce with mask 0x7F). 5. Let opcode = row.op_perm[opcode_candidate] and schema_candidate = row.schema_table[opcode_candidate].
Stage 2: Witness & verify
6. Compute arg_count = schema_candidate.arg_base + ((G >> (opcode_candidate mod 8)) & 0xF). 7. Compute 'witness' using the schema's witness_template (if any). Witness templates are deterministic heavy transforms; canonical example below. 8. Evaluate verify_predicate(witness, schema_candidate.witness_predicate). Predicate typically checks:
* CNF fragment evaluation (if present), and/or * PoW: Mix128(witness_lo, witness_hi, small_context) has ≥ t trailing zero bits.
9. If verify_predicate returns True, schema_candidate is authoritative. 10. If it returns False and schema_candidate.fallback_list non-empty, iterate fallbacks (repeat steps 6..9 for fallback candidates). If none succeed, decoding fails (runtime may treat as NOP or error per implementer option).
After authoritative schema is chosen
- Interpret arguments according to arg_types_mask in schema. Argument types include:
* LITERAL — word value used directly. * RELATIVE_ADDR — computed address offset. * IMPLICIT_TAPE_FETCH — fetch value from tape at addr_map_128(literal,...). * IMPLICIT_BACKREF — fetch from tape at addr_map(word(IP-delta),...) where delta is small.
- If schema.flags includes OVERLAP_MULTIPLICITY=r, compute r additional offsets and fetch those words (they are not consumed).
- Evaluate embedded CNF fragment(s) using runtime bits; if any fail, fallback logic applies.
- Execute opcode semantics (128-slot semantic set).
CNF schema fragments and global SAT coupling
- CNF fragments are compact Boolean formulas in conjunctive normal form.
- Variables for CNFs are derived from:
* low-order bits of recent TAPE entries in a sliding window (configurable canonical window size), * low-order bits of G, * low-order bits of prior k words, * witness-derived bit slices.
- Clause length up to 11. Clause count varies. Canonical average ~47 per schema row.
- Because CNFs in different rows reference overlapping tape windows, satisfying all schemas across a generated prefix creates a global SAT problem — this is a primary hardness amplifier.
Witness templates (examples and canonical forms)
Witness templates are parameterized transforms; the microtable row picks a template index and parameters.
Canonical example templates (illustrative)
- Matrix-Power template: derive a 2×2 matrix A from recent_tape_window and G; compute A^e mod M where e up to 2^20 and M a large modulus derived from microtable seed; extract one or more words (little-endian) as witness bits.
- Modular-root template: compute a large modular root of polynomial f(x) seeded by w and tape bits; fold result.
- Polynomial-Fold template: compose multiple polynomials seeded from local context, evaluate at high-degree points, fold to produce witness.
These templates are intentionally compute-heavy for generator-time.
Proof-of-work (PoW) specifics
- PoW predicate is a canonical cheap-to-verify but expensive-to-find constraint.
- Canonical PoW: Mix128(witness_lo, witness_hi, small_context) has at least t trailing zero bits in its 64-bit little-endian representation. Canonical t = 20.
- PoW is used to bias generation cost without affecting runtime verification cost significantly.
Address mapping (addr_map_128)
Address mapping folds literal big integers into tape keys using prefix checksum and G so that identical literals map to different tape locations across contexts.
# addr_map_128 (canonical pseudocode)
def addr_map_128(a_bigint, G_lo, G_hi, plane, IP, prefix_checksum_lo, prefix_checksum_hi):
a_lo = a_bigint & ((1<<64)-1)
a_hi = (a_bigint >> 64) & ((1<<64)-1)
lo, hi = Mix128(a_lo xor prefix_checksum_lo, a_hi xor prefix_checksum_hi)
mapped_lo = ((lo * 0x9E3779B97F4A7C15) + 0xC2B2AE3D27D4EB4F) xor (hi xor G_lo)
mapped_hi = ((hi * 0xC2B2AE3D27D4EB4F) + 0x9E3779B97F4A7C15) xor (lo xor G_hi)
return (mapped_hi << 64) | mapped_lo
Instruction pointer (IP) update canonical formula
After an instruction executes and produces last_result (a 128-bit pair), update IP via an avalanche mixing formula.
# canonical IP update pseudocode # last_result_lo, last_result_hi are 64-bit halves of a 128-bit last_result x_lo, x_hi = Mix128(G_lo xor last_result_lo, prefix_checksum_hi xor last_result_hi) step_mix = ((x_lo * 0x5851F42D4C957F2D + x_hi) >> 61) & 0xFFFFFFFF IP = (IP + 1 + row.step_bias + step_mix) % plane_word_count
Notes:
- plane_word_count = number of words in the current plane.
- row.step_bias is obtained from the authoritative microtable row.
Opcode set and semantics (dense canonical list)
There are 128 semantic opcode slots; microtables permute which numeric candidate maps to which semantic slot. Representative semantics (not exhaustive) include:
- Flow control: NOP, JMP, JZ (jump if zero), JNZ, CALL, RET.
- Stack/Tape: PUSH_LITERAL, POP, DUP, LOAD (addr_map_128), STORE.
- Arithmetic: ADD, SUB, MUL, DIV, DIVMOD, MODPOW.
- Linear algebra: MATRIX_PUSH, MATRIX_POP, MATRIX_MUL, MATRIX_EXP, MATRIX_VECTOR.
- Number theory: POLY_COMPOSE, POLY_EVAL, PRIME_CHECK (Miller–Rabin rounds seeded by G).
- Bit/trit transforms: ROT_TRITS, PERMUTE_BITS, BITOP (AND/OR/XOR), PACK_TRITS.
- Hash/mix: HASH_PUSH (Mix128 fold), HASH_VERIFY.
- SAT/PoW helpers: SAT_ASSIST (propose witness candidates), SAT_VERIFY, PROOF_OF_WORK_VERIFY.
- IO: IO_READ, IO_WRITE.
- Tape ops: BLOCK_COPY, ROTATE_TAPE, COPY_WITH_TRANSFORM.
- Many micro-variants (e.g., MUL_MOD, MUL_FAST, MUL_BIGINT) populate remaining slots.
Argument types and interpretation
A schema's arg_types_mask marks each argument as one of:
- LITERAL — use the raw word value.
- RELATIVE_ADDR — interpret as relative tape offset.
- IMPLICIT_TAPE_FETCH — fetch tape value at computed addr_map.
- IMPLICIT_BACKREF — fetch tape value at addr computed from earlier word(IP-delta).
- COMPUTED — compute argument from witness-derived bytes.
Overlap, shadowing, multi-overlap details
- If schema.flags has OVERLAP_MULTIPLICITY = r, compute r additional offset words (deterministic function of G, w, schema_template). These overlapped words are read but not consumed — they remain available.
- Overlap windows may shadow arguments of other decoded instructions, introducing tangled dependencies and often requiring backtracking generation.
Generator design (detailed, canonical algorithm)
A generator must produce a program prefix such that executing the interpreter on it yields desired semantics (e.g., print a character). This is a search+simulation task with global constraints.
High-level generator loop (canonical)
state = initial_state(seed=0) # includes G, TAPE, STACK, prefix_trits
emitted_words = []
while not finished:
target_semantic = next_desired_semantic_operation()
m = micro_idx_from(state.G, prefix_checksum)
row = reconstruct_microtable_row(m)
# Deterministic guided search for a candidate word w
for candidate_index in deterministic_candidate_sequence(seed_for_row):
w = index_to_tritword(candidate_index, W)
# Stage-1 decode simulation
cand_opcode_candidate = fold_to_7bits(Mix128(parts_of_w_lo, parts_of_w_hi) xor state.G_fold)
cand_schema = row.schema_table[cand_opcode_candidate]
# Stage-2: attempt to satisfy witness/CNF/PoW by searching small associated space or appending helper words
if can_satisfy_schema(cand_schema, state, w):
# Accept the candidate: append w and any required helper words
append w and helpers to emitted_words and prefix_trits
simulate_execution_of_emitted_words(state) # updates G, TAPE, etc.
break
if no candidate found:
deterministically backtrack or expand search (per canonical backtracking policy)
Key generator subsystems
- Deterministic candidate sequence: walk candidate space in a deterministic pseudo-random but repeatable order seeded from Mix128 (so generator is deterministic).
- can_satisfy_schema(): entails localized search for witness preimages, SAT-assisted local CNF solving, PoW nonce search (bounded), and possible appending of helper words to satisfy overlapped CNFs.
- Backtracking policy: canonical deterministic depth-first with fixed heuristics and pruning; generators must be deterministic.
Heuristics and performance tricks (required for any practical generator on non-huge tasks)
- Cache microtable reconstructions for repeated micro_idx.
- Maintain inverse maps for frequently used microtable rows (map desired opcode → small candidate subset).
- SAT-assisted localized solving: translate overlapping CNF into a small SAT instance and solve with backtracking using canonical branching heuristics.
- Deterministic parallelization (sharding): split candidate_index intervals across deterministic workers, then recombine in canonical order to maintain reproducibility.
Complexity analysis and file-size estimates
Below are order-of-magnitude estimates to convey why tiny semantics can yield massive output in canonical mode.
Per-word candidate space
- For canonical W=23, candidate words per instruction = 3^23 ≈ 9.4 × 10^9.
- If a generator must examine, say, 2^30 (~1.07 × 10^9) candidates on average per emitted instruction (plausible under heavy CNF/witness constraints), cost is huge.
Trit → bytes (text vs packed)
- Text encoding (one ASCII '0'/'1'/'2' per trit): 1 trit = 1 byte.
- Packed encoding: can pack 5 trits per byte (3^5 = 243) → ~0.2 bytes/trit.
- Example: 1 trillion trits ≈ 1 TB (text) or ≈ 200 GB (packed).
Practical file-size scale for a minimal "print '0'" program (very rough)
- If generator outputs billions/trillions of auxiliary trits to satisfy global CNFs and PoW: 10^11–10^13 trits → 100 GB–10 TB (text) or ~20 GB–2 TB (packed).
- Toy ease modes drastically shrink this: ease=3 might produce a few dozen words only (tens to hundreds of bytes).
Ease modes (toy profiles)
Implementers should support toy eases for testing:
- ease=3 (demo): W=11, k=1, microtable_rows=17, H0=61. Minimal CNFs, witness complexity trivial, PoW disabled. Use for test vectors and education.
- ease=2 (medium): W=15, k=2, microtable_rows=257, H0=257. Moderate hardness.
- ease=1 (small): W=19, k=3, microtable_rows=1024, H0=1024. Intermediate difficulty.
- ease=0 (canonical): W=23, k=5, microtable_rows=65537, H0=4093, full CNF/witness/PoW.
Example toy programs and interpreter traces (ease=3)
Below are tiny example programs in ease=3 that demonstrate decode flow. Use these as canonical test vectors.
Example A: print "0" in ease=3
(These trit strings are illustrative; exact microtable/permutation depends on prefix, but toy eases are simple enough to demonstrate.)
- Program trits (plane interleaving trivial k=1): `00200121011` (11 trits = 1 word at W=11)
- Packed file size: ~3 bytes (if packed).
- Expected interpreter trace:
* prefix H0=61 insufficient here; microtables use small defaults.
* decode word → resolves to IO_WRITE('0') → prints "0".
Example B: decode trace step-by-step (toy)
IP=0, read w= (value 0x1A3) Stage1: Mix128(parts_of_w_lo, parts_of_w_hi) => seed -> opcode_candidate = 0x12 Lookup row.schema[0x12] => candidate schema: IO_WRITE, arg_base=1 Stage2: witness template: trivial in ease=3 -> verify true Execute IO_WRITE with literal arg -> output '0' Update IP via IP_update formula Halt or continue
(For canonical ease=0, provide full test vectors only alongside a reference interpreter, because microtables depend on prefix checksum.)
Reference implementation guidance
- Use big integers and exact masking to reproduce Mix128 and addr_map_128 behavior.
- Keep deterministic iteration orders (e.g. when parsing microtable RNG stream).
- Publish sample test vectors for ease=3 and ease=2 with expected traces: IP, G, micro_idx, opcode chosen, witness values, tape diffs.
- Implement separate modes: verify-only interpreter (no microtable generation included in file) vs self-contained runnable files that include microtable serialization.
Serialization & recommended distribution practice
- For canonical releases containing full microtables (for offline verification without re-generation), include:
* canonical file header, * prefix trits (H0), * packed microtable blob (if disk-size is tolerable — massive for ease=0), * plane-word stream (packed).
- For generated programs intended for distribution under canonical ease=0: prefer to distribute in packed `.aby9p` format to reduce size by ~5x.
Testing and interoperability =
Essential: publish reproducible test vectors and a reference interpreter+generator pair. Minimal required test vectors:
- ease=3: tiny program that prints a character with full trace.
- ease=2: slightly larger example demonstrating overlap or a simple CNF.
- ease=1: moderate example exercising witness template.
- Canonical: sample microtable row dump and a very small verified fragment (if possible) with full trace.
Security & resource notes
- Generators for canonical mode can consume extreme compute and storage — never run untrusted canonical generators on machines with low storage or on metered environments.
- Interpreters must defend against abusive programs that request huge memory via crafted addr_map_128 keys; provide validated resource limits.
FAQ (common questions)
Q: Is the language Turing-complete?
Yes — with tape, conditional jumps and subroutines, it can simulate Minsky machines or Brainfuck.
Q: Why are programs huge for trivial semantics?
Because canonical generators must satisfy globally coupled CNF constraints, witness templates, PoW, and microtable dependencies — the generator emits many auxiliary words to satisfy those constraints.
Q: Can I hand-write programs for ease=0?
Not practically. Hand-writing is feasible only in toy eases (ease=2/3).
Q: Why Mix128?
Mix128 provides strong avalanche behavior: tiny state changes produce large changes in decode mapping, amplifying hardness.
Appendix: copyable canonical snippets
Mix128, addr_map_128, IP update, canonical generator sketch — all included earlier; repeat here for easy copy:
# Mix128 --- copy verbatim for canonical implementations
def mix64(v):
v = (v xor (v >> 30)) * 0xBF58476D1CE4E5B9
v = (v xor (v >> 27)) * 0x94D049BB133111EB
v = v xor (v >> 31)
return v & ((1<<64)-1)
def Mix128(lo_in, hi_in):
a = mix64(lo_in xor 0x243F6A8885A308D3)
b = mix64(hi_in xor 0x13198A2E03707344)
lo_out = (a xor ((b << 1) & ((1<<64)-1))) & ((1<<64)-1)
hi_out = (b xor ((a << 1) & ((1<<64)-1))) & ((1<<64)-1)
return lo_out, hi_out
# addr_map_128
def addr_map_128(a_bigint, G_lo, G_hi, plane, IP, prefix_checksum_lo, prefix_checksum_hi):
a_lo = a_bigint & ((1<<64)-1)
a_hi = (a_bigint >> 64) & ((1<<64)-1)
lo, hi = Mix128(a_lo xor prefix_checksum_lo, a_hi xor prefix_checksum_hi)
mapped_lo = ((lo * 0x9E3779B97F4A7C15) + 0xC2B2AE3D27D4EB4F) xor (hi xor G_lo)
mapped_hi = ((hi * 0xC2B2AE3D27D4EB4F) + 0x9E3779B97F4A7C15) xor (lo xor G_hi)
return (mapped_hi << 64) | mapped_lo
# IP update x_lo, x_hi = Mix128(G_lo xor last_result_lo, prefix_checksum_hi xor last_result_hi) step_mix = ((x_lo * 0x5851F42D4C957F2D + x_hi) >> 61) & 0xFFFFFFFF IP = (IP + 1 + row.step_bias + step_mix) % plane_word_count