Abyssal-8
Abyssal-8
Abyssal-8 (Deterministic Ultimate Mode) — file extension .aby8 — is a deliberately **extremely unreadable** esoteric language designed so that program text is effectively inscrutable to humans and to heuristic analysis, yet **deterministic**: the *same file bytes always produce the same behavior forever* (no external randomness, no timestamps, no hidden environment entropy). This page is the canonical, final deterministic specification. It raises complexity beyond Malbolge by stacking many independent, deterministic anti-analysis mechanisms while keeping all sources of entropy inside the file itself.
- Design goals (final):**
- Determinism: same bytes → same semantics forever (reproducible).
- Maximal ambiguity by construction (overlapping windows, context sensitivity, self-modifying instruction map, address permutation).
- Practical reproducibility: interpreters deriving seeds only from file bytes (header + first N significant bytes) will produce identical results.
- Usable test examples: a canonical "reference mapping" is included for reproducible examples (Hello, World!, tests).
Summary
- The very first line of every Abyssal-8 file MUST be the exact header:
-[name.aby8: <program_name>]-
- File is read as a **single stream** of significant bytes (printable ASCII 32–126 only). **Whitespace and newlines are ignored**.
- Tokenization: sliding overlapping windows of length **1..4** (inclusive), step = 1. Each stream index i produces windows [i], [i..i+1], [i..i+2], [i..i+3]. The map resolves which window length is used, but the lexer ALWAYS advances by 1 (creating heavy aliasing).
- All seeds (instruction map, address permutation, arithmetic scramble, stack-permutation, mapmod seed) are derived deterministically from the file bytes (header + first N significant characters). No external entropy or wall-clock dependence.
- The instruction map (bijective mapping from textual windows → abstract primitives) is deterministic per-file and may be mutated by program execution (self-modifying map) — mutations are deterministic functions of prior state and the file stream, so runs remain reproducible.
- Default interpreter behavior: when you run
program_name()the interpreter resets VM state and executes deterministically. Rerunning the same file later returns the same outputs.
Complete lexical rules
- Significant bytes: printable ASCII characters with decimal codes 32..126 EXCEPT backtick (96). All other bytes are ignored silently.
- Allowed significant characters list (for clarity):
* letters:a–z A–Z(case-sensitive) * digits:0–9* symbols:, . / ? < > ' " : ; \ | } { = + - _ ( ) * & ^ % $ # @ ! ~
- Whitespace characters (space, tab, CR, LF) are ignored — they are only for human readability.
Mandatory header and invocation
- Header (first line) - exact bytes:
-[name.aby8: program_name]-
- Invocation (appears anywhere after header):
program_name() program_name(N) ; N is base-10 decimal; interpreter resets VM between runs by default
- If the header is missing or malformed the interpreter MUST reject the file (interpreters MAY provide a "forgive-header" debug flag, but by spec files on the wiki must include the header).
Comments and "comment-duality"
Abyssal-8 intentionally inverts one comment-like form to confuse readers while still keeping strict semantics.
[ ... ]— **Real comment**. Everything between the square brackets MUST be ignored by the interpreter (no effect on seeds, map, or execution).-[ ... ]-— **Fake comment (executable)**. Text inside-[ ... ]-is *not* ignored: those bytes are injected back into the main instruction stream exactly as if they occurred at that position in the file (i.e., they produce sliding windows and contribute to seed-based maps). This is purely textual: the `-[]-` markers are delimiters; what is inside participates in tokenization and mapping.
- Examples:**
[this is a harmless comment] ; ignored -[this looks like a comment]- ; the contents execute (are part of the stream)
How determinism is guaranteed
All random-seeming behavior is computed from the file bytes only:
1. The interpreter builds a canonical list of "significant bytes" (header + first N, where N ≥ 64 recommended). 2. It computes SHA-256 over a deterministic canonicalization of the file (e.g., header followed by first N significant bytes) and derives seeds from that hash:
*S_instr= first 32 bytes of SHA-256(header || firstN) *S_addr= next 32 bytes (or HMAC with S_instr) *S_arith,S_stack,S_mapmodsimilarly derived by deterministic expansion (HKDF-like).
3. All PRNG outputs (permutations, Feistel keys, HMAC operands) are computed from those seeds only. 4. Any "self-modifying" step mutates maps deterministically using the running state and predetermined mixing functions (HMAC-SHA256 or ChaCha20 with the derived seeds). No system-time or hardware randomness is used.
Because the whole pipeline uses only file bytes, runs are reproducible and portable between machines/interpreters following the spec.
Tokenization, windows and aliasing (the heart of unreadability)
- At stream index i (0-based on significant bytes), the lexer forms overlapping windows:
* W1 = [i] * W2 = [i..i+1] (if available) * W3 = [i..i+2] (if available) * W4 = [i..i+3] (if available)
- The instruction map resolution function Resolve(W1,W2,W3,W4, history_hash, M) deterministically selects which window length to treat as the "active" token and returns the mapped primitive and operands. The lexer then advances i by **exactly 1** (not by token length).
- Because every byte participates in up to 4 different windows, local edits cascade nonlocally — this is the primary confusion mechanism.
Seeds, PRNGs, and deterministic permutations
- Use SHA-256 and HKDF-SHA256 (or ChaCha20 with key derived from SHA-256) for all deterministic cryptographic mixing steps.
- Build the instruction bijection M by enumerating canonical window encodings (canonicalize each 1..4 byte window to a 32-bit integer id) and shuffling the enumeration with the PRNG seeded by
S_instr(Fisher–Yates). M is a bijection: every window maps to exactly one abstract primitive. - PERM_ADDR (address permutation) is a bijection over a 64-bit address space constructed from a fixed-round Feistel network keyed by
S_addr. Implementations can use 64-bit block Feistel ensuring invertibility. - SCR_const and SCR_jump are derived via HMAC-SHA256 keyed by
S_arithwith context inputs (ASCII bytes, stream index, prev cell values) then truncated to 32 bits. This produces deterministic but data-dependent immediates.
Context-sensitivity (history window)
- Resolve(...) uses a rolling history hash H_history computed from the last K resolved primitives (K = 12 by default). The map selection can consult H_history to produce contextual behavior: same windows reached with different histories may map to different primitives. This increases analysis cost while remaining deterministic because the history hash itself deterministically depends on earlier windows (which themselves depended on earlier history — a well-defined deterministic but analysis-hard system).
Self-modifying instruction map (map evolution)
- Certain primitives (abstract
ModMap) mutate M on execution. Mutations are defined as deterministic transformations of M usingS_mapmodand current execution state (e.g., rotate a slice of the bijection, swap N pairs selected by HMAC with S_mapmod and step index). - Because map changes are deterministic (seed + prior state + exact program bytes), repeated runs produce identical map evolution and identical outputs.
---
Memory model and address permutation
- Abstract VM:
* Infinite tape of 32-bit signed integer cells (wraps two's complement).
* Data pointer P starts at 0.
* Call stack: logical stack for returns and pushes (the stack addressing uses PERM_STACK derived from S_stack, implemented as a bijection over stack indices).
- All memory accesses use PERM_ADDR: logical index L is mapped to physical storage index PERM_ADDR(L, S_addr). This makes neighbor accesses non-obvious unless you compute the permutation.
---
Primitives (abstract)
Abyssal-8 abstracts a modest set of primitives. Concrete windows map to these via M. Implementors must supply these primitives; all maps refer to them:
- IncCell — cell[P] += SCR_const(...)
- DecCell — cell[P] -= SCR_const(...)
- AddCell — cell[P] += cell[PERM_ADDR(P+X)]
- SubCell, MulCell, DivCell, ModCell — arithmetic with scrambled addresses/operands
- MoveP — P := P + SCR_offset(...)
- PermuteP — P := PERM_ADDR(P, S_addr) (explicit remap)
- JZ(offset) — if cell[P] == 0 then stream_index += SCR_jump(...) (relative; can be negative)
- JMP(offset) — unconditional relative jump
- Push — push (P, cell[P]) onto scrambled stack
- Pop — pop -> restore (P, cell[P])
- InASCII — read a byte into cell[P] (deterministic I/O behavior for test harnesses; e.g., tests provide input stream)
- OutASCII — output ASCII of cell[P]
- ModMap — mutate instruction map M deterministically
- NOP — no-op
- Panic/Guard — deterministic encoded-panic primitive that alters map in a predictable, file-determined way if certain invariants fail (used to discourage naive tampering)
Deterministic "anti-analysis" features (still reproducible)
These are all deterministic functions of file bytes and execution state:
1. **Context-sensitive mapping** (K = 12): same window can mean different things depending on last K primitives. Deterministic because history is deterministic.
2. **Temporal map drift**: at deterministic intervals (e.g., every 1024 steps) the mutator mixes H_history into S_mapmod to change M. Since H_history depends only on file and prior steps, this is deterministic.
3. **Trampoline windows**: windows that remap the next R windows via a local temporary map (deterministic selection using S_mapmod).
4. **Encoded payloads**: fake-comments -[enc:BASE64:...]- or -[enc:HEX:...]- are allowed; when the interpreter sees a deterministic guard (e.g., a specific resolved primitive sequence that appears uniquely) it will decode and splice the payload into the stream deterministically. The guard and decoding are both derived from file bytes.
5. **Encrypted immediates**: immediates are computed by HMAC-SHA256(S_arith, context) so the same snippet always yields the same immediate on any conformant interpreter.
All the above *increase analysis difficulty* while still guaranteeing identical semantics across different runs and machines.
Exhaustive per-character *role* (general spec)
Below we list what each character class contributes to in the hardened deterministic spec. This is *not* a fixed mapping but documents how characters are used when building windows and contexts. The *Reference Mapping* (next section) gives a concrete mapping used for runnable examples.
a–z(lowercase): high-probability contributors to arithmetic and pointer-related windows; they appear in many windows that map to arithmetic primitives in typical maps.A–Z(uppercase): high-entropy toggles; maps often reserve a proportion of uppercase-containing windows to control-flow or ModMap roles.0–9(digits): included in SCR_const contexts; in the reference mapping digits following a:immediate marker are parsed (deterministically) as a decimal immediate.- Symbols — how the reference interpreter treats them (general role):
*,— separator / lightweight NOP window *.— alignment / small-const producer */— often maps to DivCell-like windows *?— conditional-check contributor *</>— often map to MoveLeft/MoveRight in many seeded mappings *',"— quoted-literal toggles in the Reference Mapping only (the hardened spec permits encoded payloads instead) *:— immediate-value joiner (reference mode) *;,_— padding / alignment *{/}— push/pop stack windows (common) *=— assignment-like contributor (write) *+/-— arithmetic windows (Inc/Dec or Add/Sub depending on map) **/&/^/%— arithmetic / bitwise windows *$/#/@/~/!— high-entropy control windows (often map to ModMap, panic, loop marks in various seeded maps)
- Note:** these roles are *guides* — the exact mapping is determined per-file by the instruction bijection M built from S_instr.
Reference mapping (concrete deterministic mapping for examples)
To publish reproducible examples on the wiki we provide a canonical *Reference Mapping* and a canonical seed derivation rule. Files that include a protected seed block (optional) or that are intended for the wiki examples must follow this mapping to produce identical results across interpreters.
- Reference seed derivation (canonical):**
- The interpreter forms the canonical significant-bytes vector: header line + first 64 significant bytes after header (pad with zero bytes if fewer).
- Compute SHA-256 of that vector.
- Derive S_instr, S_addr, S_arith, S_stack, S_mapmod by HKDF-SHA256 expansion of that hash with labeled info strings.
- Reference token→primitive subset (for examples):**
(only a SMALL subset of full bijection is shown — interpreters implementing the reference for examples must at minimum implement these)
GbR→ IncCell (cell[P] += 1)FnJ→ DecCell (cell[P] -= 1)eHr→ MoveRight(1)kGb→ MoveLeft(1)rKe→ OutASCIINjH→ InASCIIbFr→ MulCell (cell[P] *= cell[PERM_ADDR(P-1)])JeB→ DivCell (cell[P] //= cell[PERM_ADDR(P-1)])HrK→ JZ (jump by SCR_jump if cell[P] == 0)@→ loop-start marker (reference convenience)#→ loop-end marker (reference convenience){→ Push}→ Pop:→ immediate-join (digits following form an immediate)"→ literal mode toggle (reference convenience only)$panic→ deterministic encoded panic (reference-only; mutates map in a file-determined way)
- Important:** The full bijection is large (all 1..4 byte windows). The wiki's reference interpreter deterministically builds the full bijection using S_instr and then maps windows to primitives. The above subset is purely for human-readable examples.
Reproducible Example: "Hello, World!" (Reference, deterministic)
We provide two deterministic examples that run on any interpreter implementing the Reference Mapping above.
A) Short & reproducible — Reference literal convenience (recommended for quick testing)
This uses the reference interpreter's deterministic literal mode. The literal mode is merely a convenience that is deterministic and reproducible — it is not required by the hardened core, but it appears in the reference interpreter to make demonstration practical.
-[name.aby8: hello_ref]- -[seed: CANON=0123456789ABCDEF0123456789ABCDEF]- ; optional, ensures reference seed exactly "Hello, World!"rKe hello_ref()
- Why this is deterministic:** the CANON seed line is part of the file bytes and therefore part of the seed derivation; the literal mode of the reference interpreter deterministically outputs each character as OutASCII. Any conformant interpreter using the reference derivation will print the same bytes.
B) Pure-gibberish deterministic Hello-World (no literal mode) — low-level
This shows how to print "Hello, World!" using the IncCell / OutASCII primitives only (reference mapping). This is tedious but fully deterministic and follows the rules above. Below is a compacted single-line stream (whitespace ignored) — interpreters must treat it as a contiguous stream of significant bytes.
> NOTE: For brevity here we show a deterministic, human-compressed view. When pasting into the wiki put the contents in one continuous stream (no linebreak significance).
-[name.aby8: hello_lowlvl]- -[seed: CANON=C0FFEEBEEF0123456789ABCDEF012345]- GbRGbRGbRGbRGbRGbRGbRGbRGbRGbRGbRGbRGbRGbR rKe eHr GbR... rKe ... hello_lowlvl()
- Explanation (how to expand):**
- Each
GbRincrements cell[0] by 1 (IncCell). Repeating it 72 times sets cell[0]=72 (ASCII 'H'). ThenrKeoutputs it. eHrmoves pointer right to cell[1] etc. To produce all bytes in "Hello, World!" the program issues deterministic sequences ofGbR(or combined operations using SCR_const) andrKewith deterministic map-derived offsets. The above is intentionally tedious; it's given to demonstrate pure-token deterministic encoding.
If you want, I can produce a fully expanded single-line pure-gibberish Hello-World with explicit counts (e.g., 72 × GbR explicitly spelled out) so you can copy-paste a fully runnable deterministic file. Say "expand lowlvl" and I'll output the exact, expanded single-line stream.
Reproducibility guarantees (explicit)
- Given two conformant interpreters that implement the spec (particularly: seed derivation from header + firstN; canonical cryptographic mixing; PERM_ADDR construction; and deterministic map mutation rules) and the same file bytes, they MUST produce identical outputs and identical final VM state (up to implementation-defined limits like maximum integer width which the spec sets to 32-bit two's complement).
- No interpreter should incorporate system clock, file metadata, or other non-file entropy into seed derivation unless explicitly invoked by a non-default debug flag. Such interpreters will not be conformant to the deterministic spec.
Debugging, test-suite and canonical seeds
- The wiki should host a canonical reference interpreter and a test-suite that includes:
*hello_ref.aby8(literal example above) *hw_lowlvl.aby8(explicitly expanded low-level variant) *seed_probe.aby8— outputs hash-derived seeds (debug-only) *perm_probe.aby8— verifies PERM_ADDR bijection
- Interpreters may expose
--debug-seedsand--traceflags for implementors; the debug output MUST NOT change program semantics.
---
Implementation checklist (for conformance)
1. Read file; canonicalize significant bytes (header + first N where N≥64 recommended).
2. Derive seeds via SHA-256 + HKDF expansion deterministically.
3. Enumerate all 1..4 byte windows and build bijection M via PRNG seeded with S_instr.
4. Implement PERM_ADDR as a deterministic bijection (Feistel) using S_addr.
5. Implement SCR_const / SCR_jump via HMAC-SHA256(S_arith, context) truncated to 32 bits.
6. Implement context-sensitive Resolve(window candidates, history_hash, M).
7. Execute with sliding window step = 1; support self-modifying ModMap primitives using S_mapmod.
8. Implement deterministic decoding for encoded payloads inside -[enc:..]- (no external entropy).
9. Default: reset VM for each top-level invocation program_name().
FAQ
- Q
- Will the same code print the same thing 5 days from now?
- Yes — in Deterministic Mode the same bytes produce the same output indefinitely on conformant interpreters.
- Q
- Are there any hidden time- or OS-dependent seeds?
- No — the deterministic spec forbids them. All seeds come from the file bytes only.
- Q
- Can I still make programs harder than Malbolge?
- Yes — the spec stacks context-sensitivity, overlapping windows up to 4 bytes, deterministic map mutation, scrambled immediates and permuted addresses — together these multiply reverse-engineering cost massively while staying deterministic.
- Q
- Are the fake-comments
-[ ... ]-insecure? - They are intentional obfuscation tools — they *execute* (their bytes are part of the stream). Use
[ ... ]for human comments to avoid execution.