SCOOP

From Esolang
Jump to navigation Jump to search
Not to be confused with Scoop.

SCOOP (Single-Character Object-Oriented Programming) is a pure, classless, object-oriented esolang using the full Unicode character set where all commands and identifiers/names are a single character. Most printable characters can refer to a user-defined object, method, or data value. A few characters are reserved for primitive instructions, special identifiers, and predefined methods.

Everything in SCOOP is an object, including numbers, strings, methods, and even the program itself. The program object serves as the top-level scope in which all of the “global” definitions are made. Most operations in SCOOP are accomplished by sending messages from one object to another.

Features

SCOOP has the following features:

  • Pure OOP: everything is an object
  • Dynamic creation & composition of objects
  • Duck typing for user-defined objects??
  • Easy syntax for defining flexible number and string objects
  • Control structures are implemented as method calls on numbers and strings
  • “Full” Unicode support
  • (optional) Interactive interpreter (REPL)

And the following "non-features":

  • No classes or prototypes
  • No inheritance
  • No explicit type system; type checking only for built-in types passed as parameters to predefined methods?
  • No expressions
  • (Almost) No reflection
  • No builtin arrays, lists, or other collections
  • No exception handling
  • No comments
  • Number, string, and code literals may only be used in object definitions, not in assignments, method calls, or return instructions

Primitive Instructions

SCOOP has five primitive instructions that are used to write user programs. They are listed in the following table and described in detail below.

Instruction Symbol Symbol Name Unicode
Definition : colon U+003A
Assignment leftwards arrow U+2190
Access member . period (full stop) U+002E
Send a message (method call) white right-pointing triangle U+25B7
Return from method return symbol U+23CE

Instructions are not expressions, viz. they do not themselves have values and they cannot be combined together except as allowed below.

Definitions

:<identifier><contents><delimiter>

A colon (:) is used to begin the definition of an object. The character following the colon is the identifier to be defined. This identifier is followed by a string of characters that make up the contents of the definition. The content string ends with a delimiter specific for the type of object being defined.

Type Delimiter Symbol Name Unicode
string " double quote U+0022
number # number sign U+0023
object white square U+25A1
method ƒ florin sign U+0192
method 𝑓 mathematical italic small F U+1D453
callback telephone location sign U+2706

A double quotation mark (") delimiter defines a string. The content is a Unicode string. If the string contains a colon or any of the delimiters listed above, then they must be escaped by preceding them with a backquote character (` Grave Accent U+0060). A literal backquote character must also be escaped.

  • :aHello world!" would define a as the string “Hello world!”.
  • :s`:s```"" would define s as the string “:s`"”.

A number sign (#) delimiter defines a number. The content is one of three literal number formats: integer, fraction, or decimal. See “Number Formats and Operations” below for a complete description.

  • :b42# would define b as the integer 42.
  • :π3.14159265358979# would define π as a decimal approximation of pi.
  • :c4/7# would define c as the fraction 4/7.

A square (□) delimiter creates an arbitrary object encapsulating zero or more existing objects. The content is a list of member identifiers that must exist in the current object scope. The new object will have a member, with the same name, for each identifier in the content list. The new members will refer to the same objects as the originals. The original members in the current scope are not deleted.

  • :dabcs□ would define d as a new object with a, b, c, and s as members.

There are two types of methods that can be defined: standard methods and callbacks. The two types are defined in exactly the same way but differ in which object scope is used when they execute (see “Scope” below). A florin sign (ƒ) delimiter (also known as a “small letter F with hook”) defines a standard method. (The “mathematical italic small F” (𝑓) character may be used as an alternative to define a standard method). Finally, a telephone location sign (✆) delimiter is used to define a callback.

The contents of a method definition can be any sequence of instructions, including nested definitions. When parsing a method definition, SCOOP pairs colons with delimiters in a way similar to nested parentheses to allow definitions to be nested as deeply as needed.

  • :f←y▷c×π←z▷b+yƒ would define f as a method with code “←y▷c×π←z▷b+y” which would assign 4π/7 to y and 42+4π/7 to z.
  • :g▷aⓟ␣ :m The answer is " ▷mⓟ␣ ▷bⓟ␣ƒ would define g as a method that outputs “Hello world! The answer is 42”.

Assignments

←<target id><source id>
←<target id>.<object id><member>
←<target id>▷<object id><message><parameter>

An assignment sets an identifier to refer to an existing object. There are three forms of the assignment instruction. The first just assigns the target identifier to refer to the same object as the source identifier. Both identifiers must belong to the current scope.

The second form uses member access syntax and assigns the target identifier to refer to a member of an object. The identifiers for the target and source objects must belong to the current scope and the member id must exist within the source object.

Since member access instructions cannot be nested within each other, assigning the member of a member of (a member of…) an object to an identifier requires multiple steps.

The third form assigns the target identifier to refer to the object returned by a method call. The syntax following ←<target id> is the same as a normal method call. See below for details. The identifiers for the target of the assignment, the target of the message, and the message parameter must belong to the current scope and the message name must match a method in the target object.

Examples

  • ←ab assigns a to refer to the same object as b.
  • ←a.bc assigns a to refer to the same object as member c of object b.
  • ←a.bc←a.ad←a.ae assigns a to refer to the same object as member e of member d of member c of object b.
  • ←c▷b+a assigns c to refer to the object returned by the method call ▷b+a.

Assignments vs. Definitions

Assignments only copy references to objects that already exist, creating aliases for those objects. They do not copy the objects themselves. In contrast, a definition always creates a new object (unless an implementation reuses immutable objects) and then effectively assigns a reference for that new object to the target identifier.

SCOOP does not limit the reuse of identifiers. Names can be redefined and reassigned within the same scope. And an identifier can be assigned a value without first using it in a definition.

Accessing Members

.<object id><member>

A period (.) is used by one object to access the members of another object. This instruction is usually combined with an assignment or return instruction. By itself, the member access instruction doesn’t do anything useful in a running program. But it can be used in the interactive interpreter to inspect the members of objects.

Member access instructions cannot be chained together to access the members of a member of an object. And they cannot be used in any other context than the three contexts described here.

Sending Messages

▷<target object><message><parameter>

A method of an object can be called by sending a message to the object. The message is a single character matching the name of the method to be executed. An identifier for a single parameter must also be provided as part of this instruction. See “Methods and Messages” below for more details.

The identifiers for the target of the message and the parameter must belong to the current scope and the message name must match an existing method in the target object.

Examples

These examples use the following definitions: :a12# and :b23#.

  • ▷a+b sends the + message to a with the parameter b. This executes the + method which adds a and b returns 35 as is result.
  • ▷aⓟ␣ sends the message to a which prints its value, 12, to standard output. (Null) is passed as the parameter since the method does not require one.

Returning from Methods

⏎<object id>
⏎.<object id><member>

Methods can return values to their callers using the ⏎ instruction. There are two forms of this instruction to return either an object within the current scope or a member of such an object. The ⏎ instruction returns a reference to the original object it is given. The method should make a copy of the object to return if that behavior is preferred.

If a method’s code does not end with an explicit return instruction, then an implied ⏎␣ (return null) is appended to it.

Examples

  • ⏎a returns the object referred to by a.
  • ⏎.ab returns member b of object a.

Special Identifiers

There are four identifiers reserved by SCOOP for special uses.

Symbol Symbol Name Unicode Refers to
$ dollar sign U+0024 “self”
@ at sign U+0040 program object
open box U+2423 Null/None
parenthesized small letter X U+24B3 method parameter

The dollar sign ($) allows an object to refer to itself ($ = ‘self’). And the “at sign” (@) always refers to the top-level program object (@ = ‘app’).

The open box symbol (␣) is used to refer to a special constant value of “Null” or “None”. This value is a singleton, empty object with no extra properties or methods.

A parenthesized small letter X (⒳) is used within the body of a method to refer to the parameter that was passed to that method. While the ⒳ identifier cannot be redefined or assigned a new value, one can send messages to the object represented by ⒳ in order to make changes to that object.

Special identifiers cannot be the target identifier in either a definition or an assignment. They also may not be included directly in the definition of a new object. E.g. :o$@□ is not allowed. But the objects to which the special identifiers refer can be assigned to different names in order to include them in a new object. E.g. ←p$ ←a@ :opa□

Objects

The members of an object can be read by any other object using the access member (.) instruction. Members may only be written by the object to which they belong.

In SCOOP, properties and methods are not two completely distinct classes of object members. Both are objects that can be copied, passed as parameters, or used as return values. The names of both can be redefined or reassigned.

Methods and Messages

Methods are object members that contain code that is executed when the object receives a matching message.

All methods accept a single object as a parameter and return a single object. Both the parameter and the return value are passed by reference. The Null object (␣) can be used as a parameter or a return value if a method does not need one or the other. The name of the parameter is always the special identifier ⒳ (U+24B3). This is the only temporary variable available to methods. Any definitions or assignments performed by the method change properties within the current object scope (see below). Such changes will persist after the method call is done.

Method calls can be recursive but special care may be needed by the programmer to preserve any “temporary” state used by a method.

Methods are objects, so they can have their own members too. But, since the object scope that a method executes within is not typically the method itself, the members of a method cannot normally be accessed by the code of that method. Here is a short program though showing how it is possible to make a method execute within its own scope.

:m⏎$ƒ
▷$m␣    (returns @, not m)
:r←m⒳ƒ
:xmr□
←ⓜ▷m◰x (ⓜ has ref to m)
▷ⓜm␣    (executes m, returns ⓜ)
▷ⓜrⓜ   (ⓜ now has ref to ⓜ)
▷ⓜm␣    (executes ⓜ, returns ⓜ)

Method objects have one special predefined member. The code of user-defined methods can be retrieved as a string of Unicode characters — the same string used to define the method — by accessing the member of a method. (⌘ is the “command key” symbol or “place of interest sign”, U+2318). The code string will be empty for predefined methods.

Note though that the executable code of a method object is an internal and immutable property of the implementation. Merely having a member does not make an object a method and trying to reassign or redefine the member of a method has undefined behavior(?).

Predefined Methods

The following methods are available for all objects:

Method Function Parameter Symbol Name Unicode
Copy object (none) two joined squares U+29C9
Identical to 2nd object identical to U+2261
Extend object 2nd object white square with upper left quadrant U+25F0

⧉ Copy object

Returns a copy of the object that receives this message. For most objects, a new object is returned. Depending on the implementation, numbers and strings may return themselves. The Null object (␣) always returns itself.

This method performs a shallow copy, not a deep copy. I.e. Only references are copied to the new object. The objects to which the members refer are not duplicated.

≡ Identical to

Used for testing if two names refer to the same object. The first argument is the object on which the method is called. The second argument is the parameter passed to the method. Returns 1 if the two arguments are the same object (not just equal objects) and 0 if not.

◰ Extend object

Returns a new object with copies of all of the members of the object that receives this message (the receiver object) and copies of all of the members of another object passed as the parameter to the method (the extension object). If any members of the two input objects have the same name(s), then the members from the extension object are copied to the new object instead of the members in the receiver object. Only references are copied to the new object. The objects to which the members refer are not duplicated.

Scope

Every object in SCOOP is a separate scope. The members of an object are the “local variables” within its scope. Because references to methods can be freely passed around between objects, it is important to understand which object’s scope is used when a method is called.

Standard methods use dynamic scoping. When called, these methods have access to the members of the object that received the message resulting in the method call. In other words, any (non-special) identifiers in the method’s code will be looked up within the receiver object.

Callback methods do not use dynamic scoping. When a callback is defined, a reference to the object that created it is saved within the callback object. Then when the callback is executed, any (non-special) identifiers in the callback’s code will be looked up within its creator object instead of within the object that received the message.

Defining a callback method does not create a closure. None of the values of the creator object’s properties are stored within the callback. When the callback is executed, the current values of the creator object’s properties will be used when referenced, not the values at the time of the callback’s definition.

Numbers

Number objects should support integers of arbitrary size. Ideally, they would also seamlessly support arbitrary-precision decimals, fractions, and “exact” arithmetic.

Numbers objects are immutable. They are never modified by any methods, and numerical operations generally return new number objects. SCOOP implementations may choose to reuse existing number objects whenever the same value is needed but this is optional and need not be done consistently. It is even permissible for the ⧉ (copy) method of number objects to just return themselves. In any case, SCOOP programs should not rely on this behavior and the ≡ (Identical to) method should not be used to compare values.

Number Formats and Operations

TODO

Predefined Methods

All number objects have predefined methods for arithmetic, comparisons, looping, branching, input, output, etc.

Method Function Parameter Symbol Name Unicode
+ Addition 2nd addend plus sign U+002B
- Subtraction subtrahend hyphen-minus U+002D
× Multiplication 2nd factor multiplication sign U+00D7
÷ Division divisor division sign U+00F7
Floor (none) left floor U+230A
Round number of decimal places large circle U+25EF
= Equal to 2nd number equals sign U+003D
Not equal to 2nd number not equals sign U+2260
< Less than 2nd number less-than sign U+003C
Less than or equal to 2nd number less-than-or-equal sign U+2264
> Greater than 2nd number greater-than sign U+003E
Greater than or equal to 2nd number greater-than-or-equal sign U+2265
Loop loop body callback clockwise closed circle arrow U+2941
If/Else callback pair (see below) option key U+2325
Number to code point (none) circled capital letter A U+24B6
Input decimal (none) circled small letter D U+24D3
Input fraction (none) circled small letter F U+24D5
Input integer (none) circled small letter I U+24D8
Print (Output) (none) circled small letter P U+24DF

Arithmetic methods

The +, -, ×, and ÷ methods are binary operations. The first argument is the number on which the method is called. The second argument is the parameter passed to the method.

TODO: Describe how methods work on decimals and mixed arguments.

Comparison methods

The =, ≠, <, ≤, >, and ≥ methods are binary operations that work similarly to the arithmetic methods. They return 1 if the comparison is true and 0 if it is false.

TODO: Describe how comparisons work on mixed arguments.

◯ Round

Rounds a decimal number to the number of specified decimal places. If called on a fraction, this method converts the fraction to a decimal and then rounds it. Has no effect on integers.

Returns a new number object without changing the object on which it was called.

⥁ Loop

The ⥁ message can be sent to a number object that has the positive value x to perform a loop that repeats floor(x) times. The ⥁ method takes a single argument which should be the name of a callback method. This callback is called on each iteration of the loop and the ⥁ method passes the current loop counter to the callback. The loop counter starts at 0 and increases by 1 on each iteration until reaching floor(x)-1 on the final repeat. The callback can optionally return a value to break out of the loop early. If the callback returns a non-zero, non-null value, then the loop exits. If it returns zero or null, then the loop continues.

If ⥁ is called on a number equal to zero, then no iteration occurs and the user callback is never called. If ⥁ is called on a number less than zero, then it will loop forever or until the callback returns a non-zero, non-null value.

⌥ If/Else

The ⌥ message can be sent to a number object to conditionally execute one of two callback methods. The parameter to ⌥ must be an object with two callback methods named ⊨ (True, U+22A8) and ⊭ (Not True, U+22AD). If the value of the receiver object is nonzero, then the ⊨ (True) method is executed. If the value is zero, then the ⊭ (Not True) method is executed.

Strings

String objects are sequences of Unicode code points.

String objects are immutable. They are never modified by any methods, and operations like index and concatenation generally return new string objects. SCOOP implementations may choose to reuse existing string objects whenever the same string is needed but this is optional and need not be done consistently. It is even permissible for the ⧉ (copy) method of string objects to just return themselves. In any case, SCOOP programs should not rely on this behavior and the ≡ (Identical to) method should not be used to compare strings.

Predefined Methods

All string objects have predefined methods for length, concatenation, comparisons, looping, input, output, etc.

Method Function Parameter Symbol Name Unicode
Length (none) left right arrow U+2194
Index position parenthesized small letter I U+24A4
Code point to number position numero sign U+2116
+ Concatenation 2nd string plus sign U+002B
= Equal to 2nd string equals sign U+003D
Not equal to 2nd string not equals sign U+2260
< Less than 2nd string less-than sign U+003C
Less than or equal to 2nd string less-than-or-equal sign U+2264
> Greater than 2nd string greater-than sign U+003E
Greater than or equal to 2nd string greater-than-or-equal sign U+2265
Loop loop body method clockwise closed circle arrow U+2941
Input character (none) circled small letter C U+24D2
Input line (none) circled small letter I U+24D8
Input word (none) circled small letter W U+24E6
Print (Output) (none) circled small letter P U+24DF
Ƒ New method from string creator object capital letter F with hook U+0191

↔ Length

Returns the number of Unicode code points in the string. This value may not be the same as the number of characters.

⒤ Index

Given an integer parameter n, returns a string with the single code point at index n in the string. Positions are numbered from 0 to length-1.

№ Code point to number

Given an integer parameter n, returns the integer value for the code point at index n in the string. Positions are numbered from 0 to length-1.

+ Concatenation

Returns a new string with the parameter string appended to the string that received this message.

Comparison methods

The =, ≠, <, ≤, >, and ≥ methods are binary operations that work similarly to the comparison methods for numbers. These methods all compare normalized versions of the string operands. Implementations should first use either Unicode normalization scheme NFC or NFD so that canonically equivalent sequences of code points can be properly compared. After normalization, the two strings are compared one code point at a time using a simple comparison of their numerical values. Two strings are equal if and only if the normalized strings have the same length and identical sequences of code points. Note that two unnormalized strings can be different lengths and still compare as equal. These methods return 1 if the comparison is true and 0 if it is false.

Ƒ New method from string

Creates and returns a new method object from the string that receives this message. If a non-null parameter is given, then the new method will be a callback and the parameter is used as the creator object. If the parameter is Null (␣), then a standard method will be created.

Programs

Hello world program

:mHello world!" ▷mⓟ␣

Output the first 15 Fibbonaci numbers

:n15# :a0# :b1# :s "
:f▷bⓟ␣▷sⓟ␣←t▷b+a←ab←bt✆
▷n⥁f

Cat program (simple version)

:e" :i-1#
:🐈←c▷eⓒ␣▷cⓟ␣✆
▷i⥁🐈

Cat program (with test for end of input)

:e" :i-1#
:⊨:r1#✆ :⊭←r␣▷cⓟ␣✆
:m⊨⊭□
:🐈←c▷eⓒ␣←b▷c≡␣▷b⌥m⏎r✆
▷i⥁🐈

Read integers from input and use counter objects to count the number of ones and zeros.

:11# :00# :c0# :+←c▷c+1𝑓 :r←c0𝑓
:C10c+r□
←①▷C⧉␣
←⓪▷C⧉␣
:i-1#
:⊨▷⓪+␣✆ :⊭✆
:P⊨⊭□
:⊨▷①+␣✆ :⊭←b▷n=0▷b⌥P✆
:Q⊨⊭□
:⊨:r1#✆ :⊭←r␣←b▷n=1▷b⌥Q✆
:M⊨⊭□
:f←n▷1ⓘ␣←b▷n≡␣▷b⌥M⏎r✆
▷i⥁f

TODO: A linked list type

TODO: Read integers from input and use a linked list of counter objects to count every value in the input. Then print a list of the values and how many times each occurred.

TODO: Example showing how dynamically-created methods can be used with the integer/character conversion methods to use any identifier at runtime and how a sequential range of identifiers may be used like an array.

Appendix

Whitespace

Any characters designated by the Unicode standard as whitespace may be used freely in-between SCOOP instructions, including between instructions in method definitions. All whitespace characters may also be embedded in string definitions. Whitespace may not be used to otherwise breakup the characters of a single instruction.

Examples of correct usages of whitespace:

:n2# ▷n×n :a3.14#:b1.618# ▷b+a
:s " :mDon't Panic!" :n
"
←①▷C⧉␣ ←t.Cc ⏎t ⏎.Cc
:⊭ ←b▷n=0 ▷b⌥P ✆

Examples of incorrect usages of whitespace:

:n 2#
▷n × n
:a3.14 #
:b1 . 618#
▷ b+a
: s "
←① ▷C⧉␣
← t. Cc 
⏎ t
⏎ .Cc
: ⊭ ← b▷
n=0 ▷ b ⌥ P✆

The use of whitespace in SCOOP programs is completely optional.

Reserved Symbols, Identifiers, and Unicode

The symbols for the six definition delimiters, the five primitive instructions, the four special identifiers, and the three predefined methods of all objects are reserved and may not be used as identifiers by programs. The backquote character is also reserved. The symbols used by the predefined methods of numbers and strings are not reserved.

Here are all of the reserved symbols:

: ← . ▷ ⏎
# " □ ƒ 𝑓 ✆
$ @ ␣ ⒳
⧉ ≡ ◰ `

Control characters, format characters, whitespace, and other non-printable Unicode characters also may not be used as identifiers. In addition, code points representing combining marks (e.g. accents) or enclosing marks may not be used by themselves as identifiers (but see below). All other printable (graphic) Unicode characters may be used freely by programs.

Note that there are some ambiguities or implementation issues with this “specification” of SCOOP identifiers. I was originally thinking that identifiers would correspond to single Unicode code points. However, many characters can be represented in Unicode with a sequence of two or more code points.

Some of these characters also have single-code point representations that are considered equivalent to their multiple-code point representations. SCOOP programmers may be unable to control which representation is used by their text editors and indeed may be unaware of the potential problems with using such characters.

E.g. é may be either

é U+00E9
LATIN SMALL LETTER E WITH ACUTE

é U+0065 U+0301
LATIN SMALL LETTER E + COMBINING ACUTE ACCENT

On the other hand, not all “characters” that can be displayed as a single glyph can be represented with a single code point in Unicode. For example, I can enter the “character” q̷ — q with a slash — using a sequence of two Unicode characters but there is no single code point assigned to it. (Indeed, I don’t even know if this glyph has ever been used before).

q̷ U+0071 U+0337
LATIN SMALL LETTER Q + COMBINING SHORT SOLIDUS OVERLAY

TODO: Ligatures?

Characters for many non-Western scripts are also formed by combining smaller units…

See section 2.4 Code Points and Characters in the Unicode standard. https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-2/

The situation can be even more complex when certain emoji are considered. People emoji may have a “base character” such as 👩 “woman” and be modified by multiple code points specifying skin tone, hair color, or in some cases gender. 👩👩‍🦰👩🏾‍🦰👱‍♀️

[Attach image showing character codes for various women emoji]

TODO: Discuss how normalization can help.

See section 3.11.7 Definition of Normalization Forms in the Unicode standard. https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/

So which “characters” should be allowed as identifiers? I can think of four possible implementation “levels”.

  1. Only characters that are represented by a single Unicode code point may be used as identifiers. (With the exclusions listed above).
  2. Code point sequences that are canonical equivalents of single code points would also be allowed, using Unicode normalization scheme NFC to determine equivalent identifiers.
  3. Allow any code point sequence consisting of a base character with zero or more combining/enclosing marks. However, emoji variations would not be allowed or at least not considered distinct. Use Unicode algorithms to determine which sequences are equivalent.
  4. All of the above plus allow emoji with various modifiers to be considered different identifiers.

Feedback on these possibilities would be appreciated.

Text Encodings

Implementations of SCOOP should support reading program files and program input in the UTF-8 encoding. Other Unicode text encodings may be optionally supported. The encoding of program output should be chosen to match the execution environment.

Program input and output is always performed on Unicode characters, never on individual bytes.

Implementations may represent strings of characters internally in whatever format is convenient.

List of Errors

TODO

TO DO

  • describe number formats & operations
  • scope examples
  • more example programs
  • finish describing predefined methods
  • the input methods of numbers and strings return null(␣) if they encounter invalid input or the end of file
  • describe REPL?

See also