SCOOP
- Not to be confused with Scoop.
SCOOP (Single-Character Object-Oriented Programming) is a pure, classless, object-oriented esolang using the full Unicode character set where all commands and identifiers/names are a single character. Most printable characters can refer to a user-defined object, method, or data value. A few characters are reserved for primitive instructions, special identifiers, and predefined methods.
Everything in SCOOP is an object, including numbers, strings, methods, and even the program itself. The program object serves as the top-level scope in which all of the “global” definitions are made. Most operations in SCOOP are accomplished by sending messages from one object to another.
Features
SCOOP has the following features:
- Pure OOP: everything is an object
- Dynamic creation & composition of objects
- Duck typing for user-defined objects??
- Easy syntax for defining flexible number and string objects
- Control structures are implemented as method calls on numbers and strings
- “Full” Unicode support
- (optional) Interactive interpreter (REPL)
And the following "non-features":
- No classes or prototypes
- No inheritance
- No explicit type system; type checking only for built-in types passed as parameters to predefined methods?
- No expressions
- (Almost) No reflection
- No builtin arrays, lists, or other collections
- No exception handling
- No comments
- Number, string, and code literals may only be used in object definitions, not in assignments, method calls, or return instructions
Primitive Instructions
SCOOP has five primitive instructions that are used to write user programs. They are listed in the following table and described in detail below.
Instruction | Symbol | Symbol Name | Unicode |
---|---|---|---|
Definition | : | colon | U+003A |
Assignment | ← | leftwards arrow | U+2190 |
Access member | . | period (full stop) | U+002E |
Send a message (method call) | ▷ | white right-pointing triangle | U+25B7 |
Return from method | ⏎ | return symbol | U+23CE |
Instructions are not expressions, viz. they do not themselves have values and they cannot be combined together except as allowed below.
Definitions
:<identifier><contents><delimiter>
A colon (:) is used to begin the definition of an object. The character following the colon is the identifier to be defined. This identifier is followed by a string of characters that make up the contents of the definition. The content string ends with a delimiter specific for the type of object being defined.
Type | Delimiter | Symbol Name | Unicode |
---|---|---|---|
string | " | double quote | U+0022 |
number | # | number sign | U+0023 |
object | □ | white square | U+25A1 |
method | ƒ | florin sign | U+0192 |
method | 𝑓 | mathematical italic small F | U+1D453 |
callback | ✆ | telephone location sign | U+2706 |
A double quotation mark (") delimiter defines a string. The content is a Unicode string. If the string contains a colon or any of the delimiters listed above, then they must be escaped by preceding them with a backquote character (` Grave Accent U+0060). A literal backquote character must also be escaped.
:aHello world!"
would definea
as the string “Hello world!”.:s`:s```""
would defines
as the string “:s`"”.
A number sign (#) delimiter defines a number. The content is one of three literal number formats: integer, fraction, or decimal. See “Number Formats and Operations” below for a complete description.
:b42#
would defineb
as the integer 42.:π3.14159265358979#
would defineπ
as a decimal approximation of pi.:c4/7#
would definec
as the fraction 4/7.
A square (□) delimiter creates an arbitrary object encapsulating zero or more existing objects. The content is a list of member identifiers that must exist in the current object scope. The new object will have a member, with the same name, for each identifier in the content list. The new members will refer to the same objects as the originals. The original members in the current scope are not deleted.
:dabcs□
would defined
as a new object witha
,b
,c
, ands
as members.
There are two types of methods that can be defined: standard methods and callbacks. The two types are defined in exactly the same way but differ in which object scope is used when they execute (see “Scope” below). A florin sign (ƒ) delimiter (also known as a “small letter F with hook”) defines a standard method. (The “mathematical italic small F” (𝑓) character may be used as an alternative to define a standard method). Finally, a telephone location sign (✆) delimiter is used to define a callback.
The contents of a method definition can be any sequence of instructions, including nested definitions. When parsing a method definition, SCOOP pairs colons with delimiters in a way similar to nested parentheses to allow definitions to be nested as deeply as needed.
:f←y▷c×π←z▷b+yƒ
would definef
as a method with code “←y▷c×π←z▷b+y” which would assign 4π/7 toy
and 42+4π/7 toz
.:g▷aⓟ␣ :m The answer is " ▷mⓟ␣ ▷bⓟ␣ƒ
would defineg
as a method that outputs “Hello world! The answer is 42”.
Assignments
←<target id><source id> ←<target id>.<object id><member> ←<target id>▷<object id><message><parameter>
An assignment sets an identifier to refer to an existing object. There are three forms of the assignment instruction. The first just assigns the target identifier to refer to the same object as the source identifier. Both identifiers must belong to the current scope.
The second form uses member access syntax and assigns the target identifier to refer to a member of an object. The identifiers for the target and source objects must belong to the current scope and the member id must exist within the source object.
Since member access instructions cannot be nested within each other, assigning the member of a member of (a member of…) an object to an identifier requires multiple steps.
The third form assigns the target identifier to refer to the object returned by a method call. The syntax following ←<target id>
is the same as a normal method call. See below for details. The identifiers for the target of the assignment, the target of the message, and the message parameter must belong to the current scope and the message name must match a method in the target object.
Examples
←ab
assignsa
to refer to the same object asb
.←a.bc
assignsa
to refer to the same object as memberc
of objectb
.←a.bc←a.ad←a.ae
assignsa
to refer to the same object as membere
of memberd
of memberc
of objectb
.←c▷b+a
assignsc
to refer to the object returned by the method call▷b+a
.
Assignments vs. Definitions
Assignments only copy references to objects that already exist, creating aliases for those objects. They do not copy the objects themselves. In contrast, a definition always creates a new object (unless an implementation reuses immutable objects) and then effectively assigns a reference for that new object to the target identifier.
SCOOP does not limit the reuse of identifiers. Names can be redefined and reassigned within the same scope. And an identifier can be assigned a value without first using it in a definition.
Accessing Members
.<object id><member>
A period (.) is used by one object to access the members of another object. This instruction is usually combined with an assignment or return instruction. By itself, the member access instruction doesn’t do anything useful in a running program. But it can be used in the interactive interpreter to inspect the members of objects.
Member access instructions cannot be chained together to access the members of a member of an object. And they cannot be used in any other context than the three contexts described here.
Sending Messages
▷<target object><message><parameter>
A method of an object can be called by sending a message to the object. The message is a single character matching the name of the method to be executed. An identifier for a single parameter must also be provided as part of this instruction. See “Methods and Messages” below for more details.
The identifiers for the target of the message and the parameter must belong to the current scope and the message name must match an existing method in the target object.
Examples
These examples use the following definitions: :a12#
and :b23#
.
▷a+b
sends the+
message toa
with the parameterb
. This executes the+
method which addsa
andb
returns 35 as is result.▷aⓟ␣
sends theⓟ
message toa
which prints its value, 12, to standard output.␣
(Null) is passed as the parameter since theⓟ
method does not require one.
Returning from Methods
⏎<object id> ⏎.<object id><member>
Methods can return values to their callers using the ⏎ instruction. There are two forms of this instruction to return either an object within the current scope or a member of such an object. The ⏎ instruction returns a reference to the original object it is given. The method should make a copy of the object to return if that behavior is preferred.
If a method’s code does not end with an explicit return instruction, then an implied ⏎␣
(return null) is appended to it.
Examples
⏎a
returns the object referred to bya
.⏎.ab
returns memberb
of objecta
.
Special Identifiers
There are four identifiers reserved by SCOOP for special uses.
Symbol | Symbol Name | Unicode | Refers to |
---|---|---|---|
$ | dollar sign | U+0024 | “self” |
@ | at sign | U+0040 | program object |
␣ | open box | U+2423 | Null/None |
⒳ | parenthesized small letter X | U+24B3 | method parameter |
The dollar sign ($) allows an object to refer to itself ($ = ‘self’). And the “at sign” (@) always refers to the top-level program object (@ = ‘app’).
The open box symbol (␣) is used to refer to a special constant value of “Null” or “None”. This value is a singleton, empty object with no extra properties or methods.
A parenthesized small letter X (⒳) is used within the body of a method to refer to the parameter that was passed to that method. While the ⒳ identifier cannot be redefined or assigned a new value, one can send messages to the object represented by ⒳ in order to make changes to that object.
Special identifiers cannot be the target identifier in either a definition or an assignment. They also may not be included directly in the definition of a new object. E.g. :o$@□
is not allowed. But the objects to which the special identifiers refer can be assigned to different names in order to include them in a new object. E.g. ←p$ ←a@ :opa□
Objects
The members of an object can be read by any other object using the access member (.) instruction. Members may only be written by the object to which they belong.
In SCOOP, properties and methods are not two completely distinct classes of object members. Both are objects that can be copied, passed as parameters, or used as return values. The names of both can be redefined or reassigned.
Methods and Messages
Methods are object members that contain code that is executed when the object receives a matching message.
All methods accept a single object as a parameter and return a single object. Both the parameter and the return value are passed by reference. The Null object (␣) can be used as a parameter or a return value if a method does not need one or the other. The name of the parameter is always the special identifier ⒳ (U+24B3). This is the only temporary variable available to methods. Any definitions or assignments performed by the method change properties within the current object scope (see below). Such changes will persist after the method call is done.
Method calls can be recursive but special care may be needed by the programmer to preserve any “temporary” state used by a method.
Methods are objects, so they can have their own members too. But, since the object scope that a method executes within is not typically the method itself, the members of a method cannot normally be accessed by the code of that method. Here is a short program though showing how it is possible to make a method execute within its own scope.
:m⏎$ƒ ▷$m␣ (returns @, not m) :r←m⒳ƒ :xmr□ ←ⓜ▷m◰x (ⓜ has ref to m) ▷ⓜm␣ (executes m, returns ⓜ) ▷ⓜrⓜ (ⓜ now has ref to ⓜ) ▷ⓜm␣ (executes ⓜ, returns ⓜ)
Method objects have one special predefined member. The code of user-defined methods can be retrieved as a string of Unicode characters — the same string used to define the method — by accessing the ⌘
member of a method. (⌘ is the “command key” symbol or “place of interest sign”, U+2318). The code string will be empty for predefined methods.
Note though that the executable code of a method object is an internal and immutable property of the implementation. Merely having a ⌘
member does not make an object a method and trying to reassign or redefine the ⌘
member of a method has undefined behavior(?).
Predefined Methods
The following methods are available for all objects:
Method | Function | Parameter | Symbol Name | Unicode |
---|---|---|---|---|
⧉ | Copy object | (none) | two joined squares | U+29C9 |
≡ | Identical to | 2nd object | identical to | U+2261 |
◰ | Extend object | 2nd object | white square with upper left quadrant | U+25F0 |
⧉ Copy object
Returns a copy of the object that receives this message. For most objects, a new object is returned. Depending on the implementation, numbers and strings may return themselves. The Null object (␣) always returns itself.
This method performs a shallow copy, not a deep copy. I.e. Only references are copied to the new object. The objects to which the members refer are not duplicated.
≡ Identical to
Used for testing if two names refer to the same object. The first argument is the object on which the method is called. The second argument is the parameter passed to the method. Returns 1 if the two arguments are the same object (not just equal objects) and 0 if not.
◰ Extend object
Returns a new object with copies of all of the members of the object that receives this message (the receiver object) and copies of all of the members of another object passed as the parameter to the method (the extension object). If any members of the two input objects have the same name(s), then the members from the extension object are copied to the new object instead of the members in the receiver object. Only references are copied to the new object. The objects to which the members refer are not duplicated.
Scope
Every object in SCOOP is a separate scope. The members of an object are the “local variables” within its scope. Because references to methods can be freely passed around between objects, it is important to understand which object’s scope is used when a method is called.
Standard methods use dynamic scoping. When called, these methods have access to the members of the object that received the message resulting in the method call. In other words, any (non-special) identifiers in the method’s code will be looked up within the receiver object.
Callback methods do not use dynamic scoping. When a callback is defined, a reference to the object that created it is saved within the callback object. Then when the callback is executed, any (non-special) identifiers in the callback’s code will be looked up within its creator object instead of within the object that received the message.
Defining a callback method does not create a closure. None of the values of the creator object’s properties are stored within the callback. When the callback is executed, the current values of the creator object’s properties will be used when referenced, not the values at the time of the callback’s definition.
Numbers
Number objects should support integers of arbitrary size. Ideally, they would also seamlessly support arbitrary-precision decimals, fractions, and “exact” arithmetic.
Numbers objects are immutable. They are never modified by any methods, and numerical operations generally return new number objects. SCOOP implementations may choose to reuse existing number objects whenever the same value is needed but this is optional and need not be done consistently. It is even permissible for the ⧉ (copy) method of number objects to just return themselves. In any case, SCOOP programs should not rely on this behavior and the ≡ (Identical to) method should not be used to compare values.
Number Formats and Operations
TODO
Predefined Methods
All number objects have predefined methods for arithmetic, comparisons, looping, branching, input, output, etc.
Method | Function | Parameter | Symbol Name | Unicode |
---|---|---|---|---|
+ | Addition | 2nd addend | plus sign | U+002B |
- | Subtraction | subtrahend | hyphen-minus | U+002D |
× | Multiplication | 2nd factor | multiplication sign | U+00D7 |
÷ | Division | divisor | division sign | U+00F7 |
⌊ | Floor | (none) | left floor | U+230A |
◯ | Round | number of decimal places | large circle | U+25EF |
= | Equal to | 2nd number | equals sign | U+003D |
≠ | Not equal to | 2nd number | not equals sign | U+2260 |
< | Less than | 2nd number | less-than sign | U+003C |
≤ | Less than or equal to | 2nd number | less-than-or-equal sign | U+2264 |
> | Greater than | 2nd number | greater-than sign | U+003E |
≥ | Greater than or equal to | 2nd number | greater-than-or-equal sign | U+2265 |
⥁ | Loop | loop body callback | clockwise closed circle arrow | U+2941 |
⌥ | If/Else | callback pair (see below) | option key | U+2325 |
Ⓐ | Number to code point | (none) | circled capital letter A | U+24B6 |
ⓓ | Input decimal | (none) | circled small letter D | U+24D3 |
ⓕ | Input fraction | (none) | circled small letter F | U+24D5 |
ⓘ | Input integer | (none) | circled small letter I | U+24D8 |
ⓟ | Print (Output) | (none) | circled small letter P | U+24DF |
Arithmetic methods
The +, -, ×, and ÷ methods are binary operations. The first argument is the number on which the method is called. The second argument is the parameter passed to the method.
TODO: Describe how methods work on decimals and mixed arguments.
Comparison methods
The =, ≠, <, ≤, >, and ≥ methods are binary operations that work similarly to the arithmetic methods. They return 1 if the comparison is true and 0 if it is false.
TODO: Describe how comparisons work on mixed arguments.
◯ Round
Rounds a decimal number to the number of specified decimal places. If called on a fraction, this method converts the fraction to a decimal and then rounds it. Has no effect on integers.
Returns a new number object without changing the object on which it was called.
⥁ Loop
The ⥁ message can be sent to a number object that has the positive value x to perform a loop that repeats floor(x) times. The ⥁ method takes a single argument which should be the name of a callback method. This callback is called on each iteration of the loop and the ⥁ method passes the current loop counter to the callback. The loop counter starts at 0 and increases by 1 on each iteration until reaching floor(x)-1 on the final repeat. The callback can optionally return a value to break out of the loop early. If the callback returns a non-zero, non-null value, then the loop exits. If it returns zero or null, then the loop continues.
If ⥁ is called on a number equal to zero, then no iteration occurs and the user callback is never called. If ⥁ is called on a number less than zero, then it will loop forever or until the callback returns a non-zero, non-null value.
⌥ If/Else
The ⌥ message can be sent to a number object to conditionally execute one of two callback methods. The parameter to ⌥ must be an object with two callback methods named ⊨ (True, U+22A8) and ⊭ (Not True, U+22AD). If the value of the receiver object is nonzero, then the ⊨ (True) method is executed. If the value is zero, then the ⊭ (Not True) method is executed.
Strings
String objects are sequences of Unicode code points.
String objects are immutable. They are never modified by any methods, and operations like index and concatenation generally return new string objects. SCOOP implementations may choose to reuse existing string objects whenever the same string is needed but this is optional and need not be done consistently. It is even permissible for the ⧉ (copy) method of string objects to just return themselves. In any case, SCOOP programs should not rely on this behavior and the ≡ (Identical to) method should not be used to compare strings.
Predefined Methods
All string objects have predefined methods for length, concatenation, comparisons, looping, input, output, etc.
Method | Function | Parameter | Symbol Name | Unicode |
---|---|---|---|---|
↔ | Length | (none) | left right arrow | U+2194 |
⒤ | Index | position | parenthesized small letter I | U+24A4 |
№ | Code point to number | position | numero sign | U+2116 |
+ | Concatenation | 2nd string | plus sign | U+002B |
= | Equal to | 2nd string | equals sign | U+003D |
≠ | Not equal to | 2nd string | not equals sign | U+2260 |
< | Less than | 2nd string | less-than sign | U+003C |
≤ | Less than or equal to | 2nd string | less-than-or-equal sign | U+2264 |
> | Greater than | 2nd string | greater-than sign | U+003E |
≥ | Greater than or equal to | 2nd string | greater-than-or-equal sign | U+2265 |
⥁ | Loop | loop body method | clockwise closed circle arrow | U+2941 |
ⓒ | Input character | (none) | circled small letter C | U+24D2 |
ⓘ | Input line | (none) | circled small letter I | U+24D8 |
ⓦ | Input word | (none) | circled small letter W | U+24E6 |
ⓟ | Print (Output) | (none) | circled small letter P | U+24DF |
Ƒ | New method from string | creator object | capital letter F with hook | U+0191 |
↔ Length
Returns the number of Unicode code points in the string. This value may not be the same as the number of characters.
⒤ Index
Given an integer parameter n, returns a string with the single code point at index n in the string. Positions are numbered from 0 to length-1.
№ Code point to number
Given an integer parameter n, returns the integer value for the code point at index n in the string. Positions are numbered from 0 to length-1.
+ Concatenation
Returns a new string with the parameter string appended to the string that received this message.
Comparison methods
The =, ≠, <, ≤, >, and ≥ methods are binary operations that work similarly to the comparison methods for numbers. These methods all compare normalized versions of the string operands. Implementations should first use either Unicode normalization scheme NFC or NFD so that canonically equivalent sequences of code points can be properly compared. After normalization, the two strings are compared one code point at a time using a simple comparison of their numerical values. Two strings are equal if and only if the normalized strings have the same length and identical sequences of code points. Note that two unnormalized strings can be different lengths and still compare as equal. These methods return 1 if the comparison is true and 0 if it is false.
Ƒ New method from string
Creates and returns a new method object from the string that receives this message. If a non-null parameter is given, then the new method will be a callback and the parameter is used as the creator object. If the parameter is Null (␣), then a standard method will be created.
Programs
Hello world program
:mHello world!" ▷mⓟ␣
Output the first 15 Fibbonaci numbers
:n15# :a0# :b1# :s " :f▷bⓟ␣▷sⓟ␣←t▷b+a←ab←bt✆ ▷n⥁f
Cat program (simple version)
:e" :i-1# :🐈←c▷eⓒ␣▷cⓟ␣✆ ▷i⥁🐈
Cat program (with test for end of input)
:e" :i-1# :⊨:r1#✆ :⊭←r␣▷cⓟ␣✆ :m⊨⊭□ :🐈←c▷eⓒ␣←b▷c≡␣▷b⌥m⏎r✆ ▷i⥁🐈
Read integers from input and use counter objects to count the number of ones and zeros.
:11# :00# :c0# :+←c▷c+1𝑓 :r←c0𝑓 :C10c+r□ ←①▷C⧉␣ ←⓪▷C⧉␣ :i-1# :⊨▷⓪+␣✆ :⊭✆ :P⊨⊭□ :⊨▷①+␣✆ :⊭←b▷n=0▷b⌥P✆ :Q⊨⊭□ :⊨:r1#✆ :⊭←r␣←b▷n=1▷b⌥Q✆ :M⊨⊭□ :f←n▷1ⓘ␣←b▷n≡␣▷b⌥M⏎r✆ ▷i⥁f
TODO: A linked list type
TODO: Read integers from input and use a linked list of counter objects to count every value in the input. Then print a list of the values and how many times each occurred.
TODO: Example showing how dynamically-created methods can be used with the integer/character conversion methods to use any identifier at runtime and how a sequential range of identifiers may be used like an array.
Appendix
Whitespace
Any characters designated by the Unicode standard as whitespace may be used freely in-between SCOOP instructions, including between instructions in method definitions. All whitespace characters may also be embedded in string definitions. Whitespace may not be used to otherwise breakup the characters of a single instruction.
Examples of correct usages of whitespace:
:n2# ▷n×n :a3.14#:b1.618# ▷b+a :s " :mDon't Panic!" :n " ←①▷C⧉␣ ←t.Cc ⏎t ⏎.Cc :⊭ ←b▷n=0 ▷b⌥P ✆
Examples of incorrect usages of whitespace:
:n 2# ▷n × n :a3.14 # :b1 . 618# ▷ b+a : s " ←① ▷C⧉␣ ← t. Cc ⏎ t ⏎ .Cc : ⊭ ← b▷ n=0 ▷ b ⌥ P✆
The use of whitespace in SCOOP programs is completely optional.
Reserved Symbols, Identifiers, and Unicode
The symbols for the six definition delimiters, the five primitive instructions, the four special identifiers, and the three predefined methods of all objects are reserved and may not be used as identifiers by programs. The backquote character is also reserved. The symbols used by the predefined methods of numbers and strings are not reserved.
Here are all of the reserved symbols:
: ← . ▷ ⏎ # " □ ƒ 𝑓 ✆ $ @ ␣ ⒳ ⧉ ≡ ◰ `
Control characters, format characters, whitespace, and other non-printable Unicode characters also may not be used as identifiers. In addition, code points representing combining marks (e.g. accents) or enclosing marks may not be used by themselves as identifiers (but see below). All other printable (graphic) Unicode characters may be used freely by programs.
Note that there are some ambiguities or implementation issues with this “specification” of SCOOP identifiers. I was originally thinking that identifiers would correspond to single Unicode code points. However, many characters can be represented in Unicode with a sequence of two or more code points.
Some of these characters also have single-code point representations that are considered equivalent to their multiple-code point representations. SCOOP programmers may be unable to control which representation is used by their text editors and indeed may be unaware of the potential problems with using such characters.
E.g. é may be either
é U+00E9
LATIN SMALL LETTER E WITH ACUTE
é U+0065 U+0301
LATIN SMALL LETTER E + COMBINING ACUTE ACCENT
On the other hand, not all “characters” that can be displayed as a single glyph can be represented with a single code point in Unicode. For example, I can enter the “character” q̷ — q with a slash — using a sequence of two Unicode characters but there is no single code point assigned to it. (Indeed, I don’t even know if this glyph has ever been used before).
q̷ U+0071 U+0337
LATIN SMALL LETTER Q + COMBINING SHORT SOLIDUS OVERLAY
TODO: Ligatures?
Characters for many non-Western scripts are also formed by combining smaller units…
See section 2.4 Code Points and Characters in the Unicode standard. https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-2/
The situation can be even more complex when certain emoji are considered. People emoji may have a “base character” such as 👩 “woman” and be modified by multiple code points specifying skin tone, hair color, or in some cases gender. 👩👩🦰👩🏾🦰👱♀️
[Attach image showing character codes for various women emoji]
TODO: Discuss how normalization can help.
See section 3.11.7 Definition of Normalization Forms in the Unicode standard. https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/
So which “characters” should be allowed as identifiers? I can think of four possible implementation “levels”.
- Only characters that are represented by a single Unicode code point may be used as identifiers. (With the exclusions listed above).
- Code point sequences that are canonical equivalents of single code points would also be allowed, using Unicode normalization scheme NFC to determine equivalent identifiers.
- Allow any code point sequence consisting of a base character with zero or more combining/enclosing marks. However, emoji variations would not be allowed or at least not considered distinct. Use Unicode algorithms to determine which sequences are equivalent.
- All of the above plus allow emoji with various modifiers to be considered different identifiers.
Feedback on these possibilities would be appreciated.
Text Encodings
Implementations of SCOOP should support reading program files and program input in the UTF-8 encoding. Other Unicode text encodings may be optionally supported. The encoding of program output should be chosen to match the execution environment.
Program input and output is always performed on Unicode characters, never on individual bytes.
Implementations may represent strings of characters internally in whatever format is convenient.
List of Errors
TODO
TO DO
- describe number formats & operations
- scope examples
- more example programs
- finish describing predefined methods
- the input methods of numbers and strings return null(␣) if they encounter invalid input or the end of file
- describe REPL?