Boxy

From Esolang
Jump to navigation Jump to search

Boxy is a Smartboxes derivative by User:Zerk, mainly differing by slightly simpler syntax. For lack of solid grasp on semantics or any sort of implementation, this page is going to be a speculative idiom dumping ground until the former enables the latter. The more it is refined, the more it becomes a bizarre mix of Ax, Forth, LiveScript, Glass, and possibly Smalltalk, so this may take a while.

Syntax

Presented in a mix of EBNF and regular expressions.

 <box>          ::= "{" [ <clauses>(',' | ' ') ] "}"
 <(.*)s>(foo)   ::= <\1> [<foo> <\1s>]
 <clause>       ::= <key> [ ":" <values>(';')  | ":" [<literal>] <assign>]
 <key>          ::= <assign> | <literal>
 <assign>       ::=  "_" <literal> | <literal> { "_" <literal> }
 <literal>      ::= <box> | <symbol>
 <symbol>       ::= /[^\x00-\x20[\]{}():;]+/
 <value>        ::= <literal> [ <value> ] |  "(" <value> ")"

Whitespace is permitted everywhere1. // and /* */ comments are whitespace.

A box is a function/object, that maps boxes and symbols to yet more boxes and symbols. Nil is the simplest box, {}, and yields itself on any input; {a:a} returns a only when given a. Constructions such as {a:10, b:7} can be treated as JSON, though {4:x 25:y} is equally valid.

Symbols are scoped lexically, including all literally declared mappings within the same box; figuratively declared mappings in the current box must be accessed by self-reference, $sym. Clauses are evaluated in turn, and sequential values within them, with a nonexplicit nil indicating match failure, and the last value being returned otherwise. An explicit nil thus acts to signify failure for the entire box.

Literals can be assigned to by prefixing them with an underscore, matching the literal/assignment immediately before the underscore, or default ~~. This binds any symbols within against the key being matched, most frequently just the symbol itself.

1Except inside symbols or assignments, and several #Sugar constructs. But on the parser level, it is(TODO: existence of parser) hidden, if easy to recover from position metadata.

Unification

Assigningment, except for a symbol previously unbound in the current clause, requires prolog-style unification, which can be thought of as enhanced pattern matching. For example, {_{a:a}:a}.{1:1 2:2} will return 1(though this arguably doesn't demonstrate any enhancement over the relevant e.g. Python facilities). TODO: expand

Sugar

Commas are strict whitespace, disabling any other whitespace following them from being interpreted(including as application).

.key, used as fun .key or fun.key(alternately {.key:foo} or {foo:.key}), passes key directly to . as a symbol/box, . being the toplevel parser. For undefined symbols(and boxes containing only such), this is redundant as their lookups would reach it anyway.

{foo bar baz:x} is {foo:_ bar:_ baz:x}, the leading values matching themselves. They are primarily used as annotation(e.g. types), and are ignored by pattern-matching unless explicitly specified as {foo} or {foo:~}.
{foo:x;y :z}, signifies a repeated base key, becoming {foo:x;y foo:z}. (in conjunction with the previous line, this means whitespace is technically no longer optional after keys; {,foo :bar} is still {foo:bar} despite {foo :bar} being {foo:foo foo:bar}, however) {:x;y :z} is a special case that parses as a series of _:foo clauses appended in reverse order to the end, informally meaning "until/unless"; similarly, any sequence of values ending in a semicolon followed implies a return of the full matched value. Multiple keys can also be specified, {a:b:res}={a:res b:res}
[foo bar baz]1 is {0:foo 1:bar 2:baz}, which is a smart box around an underlying representation of {0:foo t:{0:bar t:baz}}: [a [b c]]] = [a b c]. Any baz lacking a .t is assumed to be a raw box. Vectors can also be constructed implicitly from comma-separated values, and closed by dot-prefixed symbols.
|a b|: is _[a b]: (any instances of '|' in a or b must be in parens of some sort)

While .foo is the only prefix that is recognized greedily, any otherwise undefined symbols containing /[\pS\pP]/(unicode-aware symbol/punctuation) have those chunks treated as prefix and infix: ++x is equivalent to (++ x), and odd?str.parse-int to ?(odd, str .parse_int). Parentheses can be used to infix in lieu of whitespace: ( (a b)-(c d) ) is (- (a b), (c d)), but ( (a b) - c d) is ((a b) (- (c d))).
This also works in keys, so {%a %b:foo _} becomes {a:%a b:foo %b}, and {2*a:bar _;_}, {a:bar 2*a;2*a}.

~ is {}, ~~ is {_}. ~[a b c] = [a b [c]], the conventional linked list. ~|foo bar|: becomes _~[foo bar] as you'd expect.
_ refers to the immediate matched key, unless encountered in one: 'same' above can be expressed as [= _ _]. __ match any values, independently.
$ is the current enclosing box, {a:$[b] [b]:c} parses to approximately {:[= _$, {a:$[b] [b]:c}];$_}. $$, $$$, etc. refer to successive outer boxes. $foo or $box.foo as a lone clause, instead of encoding an infinite loop, becomes a binding, foo:$$foo and foo:box.foo respectively.

foo::{$bar baz:4} yields an extended foo, parsing to {:foo._ bar:$$bar baz:4}; this makes foo::bar a logical conjunction, and {bar}::foo the idiomatic way to add post-match annotations. To preserve the intuitive meaning of $$bar(and provide a convenient handle for the box being extended), $$ is replaced with foo, and any multi-$ decremented. Thus, a::{$$$} extends a with a key matching the enclosing $.
As a clause, foo::bar shadows foo: {a:x a::b a::c} is redundant with {a:x a::b::c}, and becomes {a:(x::b)::c}.

arg:foo>, e.g. {foo:->bar}, is a macro, which parses everything until the next less-indented line(or closing brace) as an {arg…} -surrounded map, passing a list of escaped rule-value pairs terminated by .$(the default intepretation, unescaped) to the corresponding >foo>. Foo may contain '>'s, but only in leading or trailing positions(and not exclusively). arg is optional.
Everything else in this section is interpolated within macros, and may eventually be implemented as one, especially if "escaped" becomes strings and not .symbols

1Whitespace is significant within arrays, but commas still imply "same level of nesting as previous element", so this may be overridden by using them everywhere. One-line structures may nest to three levels, e.g.

 [a0 a1  a4 a5    b0 b1  b4 b5]

It is also possible to use sweet-expr

 [
   a1 a2, a4 a5
   b1 b2, b4 b5
 ]

or even YAMLy

 [
     a1
     a2
   , a4
     a5
 ,
     b1
     b2
   , b4
     b5
 ]

Types

29 or 0x1d is an integer, conceptually represented as little-endian bitstring .{u:~[~~ ~ ~~ ~ ~~] int}. Negatives are ::.{int -}

'a' or \a is the utf8 value of 'a': 'hi' is 0x6968. "Foobar" is ~[\F \o \o \b \a \r]::{str}

Salt

Hopefully the compiler will complain if it sees anything like {:$~;}

Catching attempted unification on infinite non-patternmatch functions? (Do those even exist?)

Warnings on particularly convoluted/ambiguous implicit vectoring might be useful. Inappropriate indentation(where significant) is definitely a syntax error.

A box can be extended to only consume and/or produce a certain pattern: foo::{:[.err .not .int _] :_.int;($$_).int;$$_} wraps foo in an integer typecheck(inasmuch as Boxy has types). This could probably be made more elegant.

Toplevel

Standard library, WIP.

Definition

 {
   .:$

Macros

   >->:{:$(_.t) [.$ _]:_}  //just fetch a baked map, usage {_f:|x y|:->f(x, y)} = {_f:{|x y|:f(x, y)}}

Boolean logic

   =:{[_ _]}
   &:|f g|:->{:f g _}
   |:|f g|:->g::f //expected short-circuiting direction
   !:~:->~~
   (!=):(!)&(=)  //xor, when called on ~/~~ canonical booleans

Vector functions

   map:&
   zip:|a b|:->[[a.0 b.0] $[a.t b.t]] //The compiler can detect that $~ is [~ $~] and conclude it's ~. In theory.
   zipf:|f p|:->f&(zip p)
   keys:_f:->{_k:__}=f;[k $(f::k:->~}]}
   vals:_:->map _, keys _  //These will be as infinite as the argument map. TODO: lazy evaluation?

Arithmetic

Bitstring

   .::math
   math:{
     0:~ 1:[~~]
     ++:{[~ _]:[~~ _] [~~ _]:[~ $_]}
     --:{[~~ _]:[~ _] [~ _]:[~~ $_]}
     +:{[_ 0]:_ [0 _]:_ |a b|:zipf[!= a b]$[~ zipf[& a b]]}
     <:{[[~ _] ~~ _]:~~ [[__ _] ~ _]:~
        |[__ a] _ b|:a$b}
     >:|a b|:->(b<a)
     -:{[_ 0]:_, [_ _]:0
         |a b|:(a.-)!=(b.-);a+(-b)
              :a.-;-(-a)$(-b)
              :a<b;-$[b a]
              :zipf[!= a b]$[~ zipf[& (!)&a b]]}  //xor zeroes all 1,1 pairs, subtract carry which is 2* all 0,1 pairs
   
     *:{[_ 0]:0 [_ 1]:_ [0 _]:0 [1 _]:_
        [a ~ b]:[~ a$b] [a ~~ b]:a+[~ a$b]}
     %:{[__ 0]:[.err .% _] [_ 1]:0 [_ _]:0 [[_r _] ~ _d]:[r _$d]
        |a b|:b>a;a :[~ b]>a;a-b
            :(a$[~ b])$b}
      /:{[__ 0]:[.err ./ _] [_ 1]:_ [_ _]:1 [[__ _] ~ _d]:_$d
         |a b|:b>a;0 :[~ b]>a;1
              :c*b=(a-a%b);c} //Let's pretend unifying unmultiplication isn't exponential time and/or memory. 
   }                          //Though I suppose technically linear since our representation is log(N)

Signed

   math::{
     0:.0 1:.1 //unclobber
     
     -:{~ _n::{.-:!n.-}} 
     abs:{::.{-:~}}
   
     ++::_n:->n.-;(- -- -n)
     --::_n:->n.-;(- ++ -n)
     +::|a b|:->(a.-)!=(b.-);a-(-b)  :a.-;-(-a)$(-b)
     -::{_:_.int;($$$- _)
         |a b|:(a.-)!=(b.-);a+(-b)  :a.-;-(-a)$(-b)}
 
     <::|a b|:->(a.-)!=(b.-);a.-  :a.-;(-b)$(-a)
     >:|a b|:->(b<a) //rebind
     
      wr:op:->|a b|:->(op abs a, abs b)::{.-:(a.-)!=(b.-)}
      *::wr(*) /::wr(/) %::wr(%)
   }

Typecast

   math::{:f=($$_);(_n:->{.u:[n.0 n.t] .-:n.- int})&{  //turn back into an int the result of 
            :|a b|:->f[a b] |a b c|:(a$b)$c} vec&_    //left-associatively applying f to
            vec:{:_::a.u}                     //unwrapped ints with saved sign, type
            _n:n.int;f vec n}                        //or just one

TODO: io, string parsing.

 }

Nock Interpeter

 {
    =>:{|g f|:f&g}
    err:{:.err, .nock, _} //Some implementations might decide to have failure(i.e. ~) trigger a stack trace, but for now we roll our own
    wut:{:1
         [__ _]:0}
    lus:{:err[.+ _]
         :_.int;++_}
    tis:{:err[.= _]
         [_ _]:0
         [__ _]:1
    br:{:err[./ _]
        [1 _]:_ [2 _a _]:a [3 _ _b]:b
        |n _|:n.int;((n%2)+2)$(n/2)$_}
    nok:{:[~ __]:[__ ~ __]:err[.* _] //|_ __|:_.-;$[[.-noun _]]
         |a [b c] d|: nok[a b c], nok[a d]
         |a n b|: (op n) a, b}
    op:[{|a b|:br b, a}
        {[__ _]:_}
        {|a b c|: nok(nok[a b], nok[a c])}
        nok=>wut
        nok=>lus
        nok=>tis
        mac=>nok]
    mac:[
      {|b c d|:[2 [0 1] 2 [1 c d] [1 0] 2 [1 2 3] [1 0] 4 4 b]}
      {|b c|:[2 b 1 c]}
      {|b c|:[7 [[7 [0 1] b] 0 1] c]}
      {|b c|:[7 c 2 [0 1] 0 b]}
      {|[b c] d|:[8 c 7 [0 3] d]
       |b c|:c}
    ]=>{_f:|a b|:->a, f b}
    noksym:{:nok (unmap.[br is nok cel up eq if cmp ins arm put]) _
      unmap:{_vec:{
        |h t|:[$h, $t]
        _:_=(vec _a); a}} 
  }.noksym .[[1 3 5] cmp [[up br 2] br 3] ins [cmp [br 3] [br 3]]] // [5 2 3 5]

Misc. Examples

 {[1 _ __]:_ |m a b|:a, $[--m b a+b]} [8 0 1] // [0 1 1 2 3 5 8 13]  
  {[warm cat] [soft cat] [sleepy _]:warm$_;soft$_;} [sleepy _] // [sleepy cat]
  {[_]:_ |a b c|:$[a [b$c]] |a [[b c]]|:|a [b]|: a>b;$[b a c] :[a b$c]} ~[5 1 7 4 3]  //~[1 3 4 5 7]  
 {_n:_i:->[n+i $$$n+i]}  //in lieu of a mutable state macro, the accumulator returns a continuation