Gemini
- This is still a work in progress. It may be changed in the future.
Not be confused with the AI with the same name of this programming language.
Gemini is designed by PSTF.
Intro
Code Blocks
In order to make the program look more layered and clear, we do not use any brackets to distinguish code blocks. All code blocks are enclosed in do ... end, except for structures, enumerators, or classes.
Yes, many programming languages use curly braces to denote code blocks, but this can actually mess up your code and may even lead to the following situations:
}}}}}
{
{some_statements;}}
}
}}
}
Thus, some programming languages provide unique ways to represent code blocks, such as Python's indentation levels.
Inspiration
I drew inspiration from the following programming languages:
- BellBase(or BunnyBell), for the format of the code.
- CangjieLang, for the basic programming language structure.
- SLet, for some data types.
- Python, for the statement types and error message format.
- Many more programming languages.
Some Fun Facts
- It is the T. 284463 equivalent to BellBase/BunnyBell, designed by a user on Vicipaedia Linguarum Programmationis named Khaki_16218_Tracer.
Most Basic Concepts
Identifier
All identifiers must follow the following format: start with an XID_start character followed by any number of XID_continue characters, or start with an underscore followed by at least one XID_continue character. Among them, the definitions of XID_Start and XID_Continue can be found in the Unicode standard. Gemini uses Unicode Standard 15.0.0.
All identifiers are uniformly treated in Normalized Form C. If two identifiers are equal after normalization, then the original identifiers are also considered equal. For example, Kelvin and Kelvin are equal, even though the "uppercase K" in the former is actually the Kelvin sign (U+212A).
For example, the following strings are all valid identifiers:
abc a1b2c3 _a a114514 PrySigneToFry 你好 __こんにちは
The following are illegal identifiers:
ab&c # Unacceptable symbol 3abc # Started with Digit struct # Keyword. But it is a valid raw identifier.
A raw identifier is simply an ordinary identifier surrounded by backtick (`, also known as ASCII grave) symbols. For example, the following strings are all valid raw identifiers:
abc a1b2c3 _a a114514 PrySigneToFry 你好 __こんにちは struct à֮̅̕b H͉̱̜̖̞̓̃͆͆͌e̴̥͔̯̙̝̞̲̒̾̒̕̕̚l̩̹̟̦̖̲̙̏͋̃͐̽̚͢͡ͅl͍͇͉͈͇̮͚͓͔̿̄̏̓͋͡o̹͎͇̭̪͂́̉̎͠ ฉันคือพ่อของคุณ
The following are illegal raw identifiers:
ab&c # Unacceptable symbol 3abc # Started with Digit
Besides, the identifier "_" is a "wildcard".
Program Entry
The entry point for a Gemini program is `main`, and the top level of the package in the root directory of the source files can have at most one `main`.
If a module is compiled into an executable file, the compiler will only look for `main` at the top level of the root directory of the source files. If `main` is not found, the compiler will report an error; if `main` is found, the compiler will further check its parameter and return value types. It is important to note that `main` cannot have an access modifier, and when a package is imported, the `main` defined in that package will not be imported.
The `main` that serves as the program entry point can have no parameters or parameters of type `Int64` or `Array<String>`, and the return type can be `Unit` or an integer type.
You can specify your own entry point. To specify the program entry point, add a dollar sign before the statement, like this:
$program_entry_statement; # Program will start from here
other_statements;
other_statements;
other_statements;
other_statements;
func main(argC: Int64, argV: list) do
main_program; # Instead from here
main_program;
main_program;
main_program;
end
Variables and Constants
In the Gemini programming language, a variable consists of a corresponding variable name, data (value), and several attributes. Developers access the variable's corresponding data through the variable name, but access operations must comply with the constraints of the relevant attributes (such as data type, mutability, and visibility).
The specific form of variable definition is:
modifier variableName: variableType = initialValue
Here, the modifier is used to set various attributes of the variable and can be one or more. Common modifiers include:
- Mutability modifiers: let and var, corresponding to immutable and mutable properties, respectively. Mutability determines whether the value of the variable can be changed after initialization. Gemini variables are thus divided into constants and variables.
- Visibility modifiers: private and public, which affect the scope in which global and member variables can be referenced. For more details, see the relevant sections in later chapters.
- Static modifier: static, which affects the storage and reference method of member variables. For more details, see the relevant sections in later chapters.
When defining a Gemini variable, a mutability modifier is required. Based on this, other modifiers can be added if needed.
The variable name should be a valid Gemini identifier.
The variable type specifies the type of data the variable holds. If the initial value has a clear type, the variable type annotation can be omitted, in which case the compiler can automatically infer the variable type.
The initial value is a Gemini expression used to initialize the variable. If a variable type is annotated, the type of the initial value must match the variable type. When defining global variables or static member variables, an initial value must be specified. When defining local variables or instance member variables, the initial value can be omitted, but the variable type must be specified, and initialization must be completed before the variable is referenced; otherwise, a compilation error will occur.
| Type name | Description | Data Range |
|---|---|---|
| IntX | Limited integer type. X can only be 16, 32, 64 or 128, or be omitted. | [-2x-1, 2x-1) if x exists otherwise ℤ |
| UIntX | Limited integer type. X can only be 16, 32, 64 or 128, or be omitted. | [0, 2x) if x exists otherwise ℕ |
| FloatX | Limited float type. X can only be 16, 32, 64 or 128, or be omitted. | Similar to C++, but if x is omitted then the range is ℝ |
| Doc | A string enclosed by double or single quotes, supporting escape sequences (except for multi-line documents, which are enclosed by triple or sextuple quotes) | No range |
| Rune | A string with length of 1 and prefix of r | [0x0000, 0xD7FF] ∪ [0xE000, 0x10FFFF] |
| Bool | A logical value | true or false |
| List | A list of elements in any type enclosed in brackets | No range |
| Func | A function reference | No range |
| Pointer | An address reference | Depends with your memory |
| Pair | A pair of value | No range |
| Key-value bind | Format: "key": value |
No range |
| Unit | Just simply a placeholder that indicates the expression is in fact a "code block" | Range not available |
| Nothing | Just nothing | Range not available |
| Everything | Just everything | Range not available |
| Set<T> | Like what in SLetScript but one type(Format: {$x1, x2, x3, x4}) | Range not available |
| Array<T> | Like Set<T> but unsorted(Format: {x1, x2, x3, x4}) | Range not available |
A multi-line document may shown like this:
""" This is a milti-line document. Hello, world! Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Hwæt, we Gar-Dena in geardagum þeodcyninga þrym gefrunon, hu ða æþelingas ellen fremedon. Oft Scyld Scefing sceaþena þreatum, monegum mægþum meodosetla ofteah; egsode Eorle. Syððan ærest wearð feasceaft funden, (he þæs frofre gebad) weox under wolcnum, weorðmyndum þah, oðþæt him æghwyle þara ymbsittendra ofer hronrade hyan scolde, gomban gyldan: þæt wæs god cyning! “吹面不寒杨柳风”,不错的,像母亲的手抚摸着你。 风里带来些新翻的泥土的气息,混着青草味儿,还有各种花的香,都在微微润湿的空气里酝酿。 鸟儿将窠巢安在繁花嫩叶当中,高兴起来了,呼朋引伴地卖弄清脆的喉咙,唱出宛转的曲子,与轻风流水应和着。 牛背上牧童的短笛,这时候也成天在嘹亮地响着。 """
Expressions
Any expression that can be evaluated is an expression in Gemini. In other words, a Gemini code block is also an expression.
Here are some examples of valid expressions:
var myList: list = [i ^ 2 for i in interval(1, 10)];
This returns the squares of numbers from 1 to 10.
var voltage: float32 = 5.0; var bit: boolean = false if voltage < 2.5 else true;
The bit will return true because the voltage is greater than or equal to 2.5. This simulates a computer's bit, with low voltage representing 0 and high voltage representing 1.
for (i in range(1, 100)) do
num *= i
end
print num
This will return 100!, equals to 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000, even Int128 will overflow for SUCH A BIG NUMBER.
Also, there are many kinds of operators.
| Operator | Name | Description |
|---|---|---|
| - | Negative | For integers or decimals, take their opposite. For boolean values, take the result after applying logical NOT. For lists or documents, reverse them. For pairs of numbers, swap the first and second values. |
| ! | Logical NOT | Evaluates into false if the original expression was true. |
| ! | Named parameter indicator | Indicates that the parameter is named. |
| + | Addition | If both operands are numbers, add them together. If both operands are lists or documents, append the second to the first. |
| - | Subtraction | Like what in mathematics. |
| * | Multiplication | If both operands are numbers, return their product. If one is a document or list and the other is a natural number n, repeat it n times and return it. |
| / | Division | Like what in Python. |
| // | Truncated Division | Like what in Python. |
| % | Modulo | Like what in Python. |
| ^ | Exponent | Like what in Xonovile. |
| & | Bitwise AND | Literally meaning. |
| | | Bitwise OR | Literally meaning. |
| || | Logical OR | Returns true if one of the expression is true. |
| && | Logical AND | Returns true onlt if both expression is true. |
| ~ | Bitwise NOT | Literally meaning. |
| @ | Bitwise XOR | Literally meaning. |
| # | Neglection | Inline comment. |
| << | Left Shift | Multiply x by 2y. |
| >> | Right Shift | Divide x by 2y and round it towards 0. |
| = | Assignment | Assign y to x. |
| <!-- XXXXX --> | Document Comment | Literally meaning. Quite like in HTML, isn't it? |
| == | Equality | Returns true if x is equal to y. |
| != | Unequality | Returns false if x is equal to y. |
| > | Superior | Returns true if x is greater than y. |
| <= | Non-Superior | Returns false if x is greater than y. |
| < | Inferior | Returns true if x is less than to y. |
| >= | Non-inferior | Returns false if x is less than y. |
Conditional Jump
The basic form of an if expression is:
if (condition) do
# branch 1
else do
# branch 2
end
Here, "condition" is a boolean expression, and "branch 1" and "branch 2" are two code blocks. An if expression executes according to the following rules:
- Evaluate the "condition" expression. If the value is true, go to step 2; if false, go to step 3.
- Execute "branch 1" and then proceed to step 4.
- Execute "branch 2" and then proceed to step 4.
- Continue executing the code following the if expression.
In some scenarios, you may only care about what to do when the condition is true, so the else and its corresponding code block are optional.
In many scenarios, when one condition is not met, it may be necessary to check one or more additional conditions and then execute the corresponding actions. Gemini allows a new if expression to follow an else, thereby supporting multi-level conditional judgments and branch execution.
For example,
include Std.Random.*
include Std.TimeDate.*
func main(int argC, list argV) do
Random.Seed(Time().Time64bit());
let speed = Random().NextFloat64();
let speed = Random().nextFloat64() * 20.0
print "${speed} km/s\n"
if (speed > 16.7) do
print "You're now outside the solar system."
else if (speed > 11.2) do
print "You're now outside the earth system."
else if (speed > 7.9) do
print "You're now outside the earth."
else do
print "Prepare for fly to the deeper place."
end
end
Conditional Loop
The basic form of a while expression is:
while (condition) do
# loop body
end
where "condition" is a boolean expression and "loop body" is a block of code. The while expression executes according to the following rules:
- Evaluate the "condition" expression. If the value is true, proceed to step 2; if false, go to step 3.
- Execute the "loop body", then go back to step 1.
- End the loop and continue executing the code following the while expression.
The basic form of a do-while expression is:
do
# loop body
end while (condition)
where "condition" is a boolean expression, and "loop body" is a block of code. The do-while expression executes according to the following rules:
- Execute the "loop body" and go to step 2.
- Evaluate the "condition" expression. If the value is true, go back to step 1; if the value is false, go to step 3.
- End the loop and continue executing the code after the do-while expression.
Iterative Loop
The for-in expression can iterate over instances of types that implement the Iterator interface Iterable<T>. The basic form of a for-in expression is:
for (iterationVariable in sequence [where condition]) {
loopBody
}
Here, "loopBody" is a block of code. The "iterationVariable" is a single identifier or a tuple of multiple identifiers used to bind the data pointed to by the iterator in each iteration, and it can be used as a local variable within the "loopBody." The "sequence" is an expression that is evaluated only once, and the iteration is performed on the value of this expression. Its type must implement the Iterator interface Iterable<T>. The "condition" is optional that means that if iterationVariable has some special things then do the loopBody. The for-in expression is executed according to the following rules:
- Evaluate the "sequence" expression, use its value as the iterable object, and initialize the iterator of the iterable object.
- Advance the iterator; if the iterator is exhausted, go to step 4, otherwise, if the condition is met, go to step 3, otherwise repeat step 2.
- Bind the data currently pointed to by the iterator to the "iterationVariable" and execute the "loopBody," then go to step 2.
- End the loop and continue executing the code after the for-in expression.
Break out of Loop
In programs with loop structures, sometimes it is necessary to end a loop early or skip the current iteration based on specific conditions. To address this, Gemini introduced the break and continue expressions. They can appear in the body of a loop expression. The break expression is used to terminate the execution of the current loop expression and proceed to the code after the loop expression, while the continue expression is used to end the current iteration early and move on to the next iteration. Both break and continue expressions have the type Nothing.
Functions
Define a Function
Gemini uses the keyword 'func' to indicate the start of a function definition. Following 'func' are the function name, parameter list, optional function return type, and the function body. The function name can be any valid identifier. The parameter list is defined within a pair of parentheses (with multiple parameters separated by commas), and a colon separates the parameter list and the function return type (if present). The function body is defined within a pair of curly braces.
For example,
func fibonacci(n: Int128): Int do
if (n < 0) do
return 0;
else if (n == 0) do
return 1;
else do
return fibonacci(n - 1) + fibonacci(n - 2);
end
end
Parameter List
A function can have zero or more parameters, all of which are defined in the function's parameter list. Depending on whether a parameter name needs to be provided when calling the function, parameters in the parameter list can be divided into two types: positional parameters and named parameters.
The definition of a positional parameter is p: T, where p represents the parameter name, and T represents the type of parameter p. The parameter name and its type are connected by a colon. For example, in the previous example, the parameter n of the fibonacci function is a positional parameter.
The definition of a named parameter is p!: T, which differs from a positional parameter in that there is an exclamation mark after the parameter name p.
It should be noted that if you define a named parameter, you cannot define positional parameters after it, and only named parameters can have default values.
Calling a function
The function call has the form f(arg1, arg2, ..., argn). Here, f is the name of the function to be called, and arg1 to argn are n arguments (called actual parameters) provided at the time of the call. Each actual parameter must be of a type that is a subtype of the corresponding parameter type. There can be zero or more actual parameters; when there are no actual parameters, the call is made as f().
Depending on whether the parameters in the function definition are positional or named, the way actual parameters are passed in a function call differs: for positional parameters, the corresponding actual parameter is an expression; for named parameters, the corresponding actual parameter needs to be in the form p: e, where p is the name of the named parameter and e is an expression (i.e., the value passed to parameter p).
For example, this is a function with positional parameters.
func fibonacci(n: Int128): Int do
if (n < 0) do
return 0;
else if (n == 0) do
return 1;
else do
return fibonacci(n - 1) + fibonacci(n - 2);
end
end
func main(argC: Int64, argV: list) do
print fibonacci(cast(input(), Int128));
end
This is a function with named parameters.
func greet(user!: String = "Guest"): Unit do
print("Hello, ${user}!");
end
func main(argC: Int64, argV: list) do
greet(user: input("Input your username: "));
end
We can even define nested functions, that is, functions within functions.
func foo() do
func nestAdd(a: Int64, b: Int64) do
a + b + 3
end
println nestAdd(1, 2); # 6
return nestAdd;
end
main() do
let f = foo;
let x = f(1, 2);
println "result: ${x}";
end
# Output:
<!--
6
result: 6
-->
Lambda Expressions
A lambda expression is an anonymous function (that is, a function without a name), designed primarily to quickly define short function logic in a program without explicitly declaring a function name. This concept originates from mathematical lambda calculus and has been incorporated into various programming languages (such as C++, Python, C#, etc.) to simplify code and enhance flexibility. Lambda expressions have also been introduced in Gemini, and their usage will be explained in this subsection.
The syntax of a lambda expression is as follows:
{ p1: T1, ..., pn: Tn => expressions | declarations }
Here, the part before => is the parameter list, with multiple parameters separated by commas, and each parameter's name and type separated by a colon. There can also be no parameters before =>. The part after => is the body of the lambda expression, consisting of a sequence of expressions or declarations. The scope of the lambda expression's parameters is the same as that of a function, and within the body of the lambda expression, its scope level can be considered equivalent to variables defined inside a function body.
Whether a lambda expression has parameters or not, the => cannot be omitted, unless it is used as a trailing lambda.
The type annotation of parameters in a lambda expression can be omitted. In the following situations, if the parameter type is omitted, the compiler will try to infer the type. A compilation error will occur if the compiler cannot infer the type:
- When a lambda expression is assigned to a variable, the parameter type is inferred based on the type of the variable;
- When a lambda expression is used as an argument in a function call, the parameter type is inferred based on the type of the corresponding function parameter.
Closure
A function or lambda that captures a variable from the static scope in which it is defined is called a closure, together with the captured variable. This allows the closure to function correctly even when it is used outside the scope where it was defined.
Accessing the following types of variables in the definition of a function or lambda is called variable capture:
- Accessing a local variable defined outside the function in the default value of a function parameter;
- Accessing a local variable defined outside the function or lambda within the function or lambda;
- A function or lambda defined inside a class/struct that is not a member function accessing instance member variables or this.
The following types of variable access are not considered variable capture:
- Accessing a local variable defined within the function or lambda itself;
- Accessing the function or lambda parameters;
- Accessing global variables and static member variables;
- Accessing instance member variables within instance member functions or properties. Since instance member functions or properties receive this as a parameter, all instance member variables are accessed through this.
Variable capture occurs at the time the closure is defined, so there are the following rules for variable capture:
- The captured variable must be visible at the time the closure is defined, otherwise a compile error occurs;
- The captured variable must be fully initialized at the time the closure is defined, otherwise a compile error occurs.
To prevent closures that capture var-declared variables from escaping, such closures can only be invoked and cannot be used as first-class citizens. This includes not being able to assign them to variables, use them as arguments or return values, or directly use the closure's name as an expression.
Function Call Syntactic Sugar
Trailing lambda
Trailing lambdas can make function calls look like built-in language syntax, increasing the extensibility of the language.
When the last parameter of a function is of function type, and the argument provided for the function call is a lambda, you can use trailing lambda syntax to place the lambda at the end of the function call, outside the parentheses.
For example, in the code below, a myIf function is defined, where the first parameter is of type Bool and the second parameter is a function type. When the first parameter is true, it returns the result of calling the second parameter; otherwise, it returns 0. When calling myIf, you can either call it like a regular function or use the trailing lambda style.
func myIf(a: Bool, fn: () -> Int64) do
if(a) do
fn();
else do
0;
end
end
func test() do
myIf(true, { => 100 }); # General function call
myIf(true) do # Trailing closure call
100
end
end
Variadic Parameter
Variable-length arguments are a special function call syntactic sugar. When the last positional parameter is of type List, you can pass a sequence of arguments directly in the corresponding position in the actual arguments instead of a List literal (the number of arguments can be 0 or more). It should be noted that only the last non-named parameter can be used as a variable-length parameter, and named parameters cannot use this syntactic sugar.
Variadic parameters can appear in global functions, static member functions, instance member functions, local functions, constructors, function variables, lambdas, function call operator overloads, and index operator overload call sites. Other operator overloads, composition, and pipeline invocation methods are not supported.
Two door gods: Pipeline & Composition
When a series of operations need to be performed on input data, a pipeline expression can be used to simplify the description. The syntax of a pipeline expression is as follows: e1 |> e2. This is equivalent to the following syntactic sugar: let v = e1; e2(v).
Here, e2 is an expression of function type, and the type of e1 is a subtype of the parameter type of e2.
A composition expression represents the composition of two single-argument functions. The syntax of a composition expression is f ~> g, which is equivalent to { x => g(f(x)) }.
Here, both f and g are expressions of function types with only one parameter.
For f and g to be composed, the return type of f(x) must be a subtype of the parameter type of g(...).
Overload a Function
If a function name corresponds to multiple function definitions in a scope, this phenomenon is called function overloading.
Only function overloads introduced by function declarations are allowed. However, the following situations do not constitute overloading, and two names that do not constitute overloading cannot be defined or declared within the same scope:
- Static member functions and instance member functions of class, interface, or struct types cannot be overloaded
- Constructors, static member functions, and instance member functions of enum types cannot be overloaded
When calling a function, all callable functions (meaning those that are visible in the current scope and pass type checking) form a candidate set. If there are multiple functions in the candidate set, deciding which function to choose requires function overload resolution, which follows these rules:
- Prefer functions in higher-level scopes. In nested expressions or functions, the inner a scope is, the higher its scope level.
- If there are still multiple functions in the highest-level scope, choose the best-matching function (for functions f and g and given arguments, if f can be called whenever g can also be called, but not vice versa, then f is considered a better match than g). If there is no uniquely best match, an error is reported.
- Subclasses and parent classes are considered the same scope.
For example,
func foo(user: String): Unit do
print("Hello, ${user}!");
end
func foo(a: Int64, b: Int64): Unit do
print("${a} + ${b} = ${a + b}");
end
func main(argC: Int64, argV: list) do
foo("Harry"); # Output: Hello, Harry!
foo(48317283, 74927107); # Output: 48317283 + 74927107 = 123244390
end
Operator overload
If you want to support an operator that a type does not natively support, you can implement it using operator overloading.
If you need to overload an operator for a type, you can do so by defining a function with the same name as the operator for that type. When an instance of that type uses the operator, the operator function will be called automatically.
The definition of an operator function is similar to that of a regular function, with the following differences:
- When defining an operator function, the `operator` modifier must be added before the `func` keyword;
- The number of parameters of the operator function must match the requirements of the corresponding operator (see the appendix on operators for details);
- Operator functions can only be defined inside classes, interfaces, structs, enums, and extensions;
- Operator functions have the semantics of instance member functions, so the `static` modifier is prohibited;
- Operator functions cannot be generic functions.
Additionally, it should be noted that overloading an operator does not change its inherent precedence or associativity.
Structure
Define a struct
The definition of a struct type starts with the keyword struct, followed by the name of the struct, and then the struct body defined above \struct and below the declaration line. The struct body can define a series of member variables, member properties (see Properties), static initializers, constructors, and member functions.
As shown below:
struct Rectangle
let width: Int64
let height: Int64
public init(width: Int64, height: Int64) do
self.width = width
self.height = height
end
public func area(): Int64 do
return width * height
end
\struct
In the example above, a struct type named "Rectangle" is defined. It has two member variables width and height of type Int64, a constructor with two parameters of type Int64 (defined using the keyword init, and typically used to initialize member variables inside its body), and a member function area (which returns the product of width and height).
Struct member variables are divided into instance member variables and static member variables (modified with the static keyword). The difference in access is that instance member variables can only be accessed through a struct instance (saying that a is an instance of type T means that a is a value of type T), while static member variables can only be accessed through the struct type name.
When defining instance member variables, you may omit an initial value (but the type must be specified, like width and height in the example above), or you can provide an initial value.
Create an Object
After defining a struct type, you can create an instance of the struct by calling its constructor. Outside of the struct definition, you can create an instance of this type by calling the constructor with the struct type name, and you can access instance member variables and instance member functions that meet the visibility modifiers (such as public) through the instance. If you want to modify the values of member variables through a struct instance, you need to define the struct variable as mutable, and the member variables to be modified must also be mutable (defined using var). When assigning or passing as a parameter, the struct instance will be copied (if the member variable is a reference type, only the reference is copied, not the object it refers to), creating a new instance, and modifications to one instance will not affect the other instance.
For example,
A struct supports defining a static initializer, and in the static initializer, static member variables can be initialized through assignment expressions.
The static initializer begins with the keyword combination static init, followed by a parameterless parameter list and a function body, and it cannot be modified by access modifiers. All uninitialized static member variables must be initialized in the function body, otherwise a compilation error will occur.
Structs support two types of constructors: regular constructors and primary constructors.
A regular constructor begins with the keyword init, followed by a parameter list and a function body. In the function body, all uninitialized instance member variables must be initialized (if the parameter name and member variable name cannot be distinguished, you can use this before the member variable to differentiate; this refers to the current instance of the struct), otherwise, a compilation error will occur.
In addition to defining multiple regular constructors named init, a struct can also define (at most) one primary constructor. The primary constructor has the same name as the struct type, and the parameters can be in two forms: regular parameters and member variable parameters (you need to add let or var before the parameter name). Member variable parameters simultaneously serve to define member variables and as constructor parameters.
If a struct definition does not have any custom constructors (including primary constructors), and all instance member variables have initial values, a parameterless constructor will be automatically generated (calling this constructor will create an object where all instance member variables have values equal to their initial values); otherwise, this parameterless constructor will not be automatically generated. For example, for the following struct definition, the comments show the automatically generated parameterless constructor:
struct Rectangle
let width: Int64 = 10
let height: Int64 = 10
/* Auto-generated memberwise constructor:
public init() do
end
*/
\struct
Struct member functions are divided into instance member functions and static member functions (modified with the static keyword). The difference between the two is that instance member functions can only be accessed through a struct instance, whereas static member functions can only be accessed through the struct type name. Static member functions cannot access instance member variables or call instance member functions, but instance member functions can access static member variables and static member functions.
Members of a struct (including member variables, member properties, constructors, member functions, and operator functions) are modified with four access modifiers: private, internal, protected, and public, with the default modifier being internal.
- private means visible within the struct definition.
- internal means visible only within the current package and its subpackages (including subpackages of subpackages).
- protected means visible within the current module (see the package section for details).
- public means visible both inside and outside the module.
Structs defined recursively or mutually recursively are both illegal.
struct User
public var name: Doc;
public var ID: Int128;
public init(name: Doc, ID: Int128) do
self.name = name;
self.ID = ID;
end
public func intro(): Unit do
print "Hi! I'm ${name}, and my ID is ${ID}.\n";
end
\struct
let pstf = User("PrySigneToFry", 173926951);
pstf.intro();
Intro to OOP
The class type is a classic concept in object-oriented programming, and Gemini also supports using classes to implement object-oriented programming. The main difference between class and struct is that a class is a reference type, while a struct is a value type, so their behavior differs when assigning or passing them as parameters; classes can inherit from each other, but structs cannot.
This section will sequentially introduce how to define a class type, how to create objects, and class inheritance.
Define a class
The definition of a class type starts with the keyword 'class', followed by the name of the class, and then the class body, and finally '\class'. The class body can define a series of member variables, member properties, static initializers, constructors, member functions, and operator functions.
class Rectangle
let width: Int64;
let height: Int64;
public init(width: Int64, height: Int64) do
this.width = width;
this.height = height;
end
public func area() do
width * height;
end
\class
In the example above, a class type named Rectangle is defined. It has two member variables, width and height, of type Int64, a constructor with two Int64 type parameters, and a member function area (which returns the product of width and height).
A class modified with abstract is an abstract class. Unlike a regular class, in an abstract class, you can define regular functions as well as declare abstract functions (without a function body). The open modifier is optional when defining an abstract class, and you can also use the sealed modifier to declare an abstract class, indicating that it can only be inherited within the same package.
Create an Object
After defining a class type, you can create an object by calling its constructor (using the class type name to call the constructor). Once an object is created, you can access its instance member variables and instance member functions (that are declared public) through the object. If you want to modify the values of member variables through the object (which is not recommended; it's better to modify them through member functions), you need to define the member variables in the class as mutable (i.e., defined with var). Unlike structs, when objects are assigned or passed as parameters, the object is not copied. Multiple variables point to the same object, so modifying a member variable through one variable will also change the corresponding member variable in the other variables.
A complex example
For example,
include Std.Math.*
abstract class Shape
public init() do
# Not implemented
end
public func area() do
# Not implemented
end
\class
class Roundity <: Shape
let radius: Float64;
public override init(radius: Float64) do
this.radius = radius;
end
public override func area() do
Math.pi * (r ** 2);
end
\class
class Rectangle <: Shape
let width: Int64;
let height: Int64;
public override init(width: Int64, height: Int64) do
this.width = width;
this.height = height;
end
public override func area() do
width * height;
end
\class
let circle = Roundity(3.0);
let rect = Rectangle(3, 5);
println circle.area();
println rect.area();
Interface
An interface is used to define an abstract type. It does not contain data but can define the behavior of a type. A type that declares it implements an interface and implements all the members of that interface is said to have implemented the interface.
Members of an interface can include:
- Member functions
- Operator overload functions
- Member properties
These members are all abstract, requiring the implementing type to have corresponding member implementations.
A simple interface is defined as follows:
interface I # 'open' modifier is optional.
func f(): Unit
\interface
Interfaces are declared using the keyword 'interface', followed by the interface identifier I and the interface members. Interface members can be modified with the 'open' keyword, and the 'open' modifier is optional.
Once an interface I declares a member function f, any type that implements I must provide a corresponding f function.
Since interfaces are open by default, the 'open' modifier in the interface definition is optional.
As shown in the code below, a class Foo is defined, and Foo <: I declares that Foo implements the I interface.
Foo must include implementations for all members declared by I, meaning it needs to define a function f of the same type; otherwise, a compilation error will occur due to the interface not being fully implemented.
include Std.Math.*
interface Shape
public init() do
# Not implemented
end
public func area() do
# Not implemented
end
\interface
class Roundity <: Shape
let radius: Float64;
public override init(radius: Float64) do
this.radius = radius;
end
public override func area() do
Math.pi * (r ** 2);
end
\class
class Rectangle <: Shape
let width: Int64;
let height: Int64;
public override init(width: Int64, height: Int64) do
this.width = width;
this.height = height;
end
public override func area() do
width * height;
end
\class
let circle = Roundity(3.0);
let rect = Rectangle(3, 5);
println circle.area();
println rect.area();
An interface can also use the sealed modifier to indicate that it can only be inherited, implemented, or extended within the package where the interface is defined. Sealed already implies the semantics of public/open, so if you provide public/open modifiers when defining a sealed interface, the compiler will issue a warning. Subinterfaces that inherit a sealed interface or abstract classes that implement a sealed interface can still be marked as sealed or not use the sealed modifier. If a subinterface of a sealed interface is marked as public and is not sealed, then its subinterfaces can be inherited, implemented, or extended outside the package. Types that inherit or implement a sealed interface do not need to be marked as public.
In this programming language, we have a built-in interface called the Everything type. All interfaces by default inherit from Everything, and all non-interface types implement the functionality of Everything. Therefore, all types are subtypes of Everything, as shown below:
func main(argC: Int64, argV: list) do
var any: Everything = 114514;
any = 1919810.0;
any = "Programming is fun";
any = ["明月几时有", "把酒问青天", "不知天上宫阙", "今夕是何年"];
end
Type casting
Gemini requires explicit conversion when performing type casting.
I'll create another page for type casting table.
Generics
In Gemini, generics refer to parameterized types, which are types unknown at the time of declaration and need to be specified when used. Both type declarations and function declarations can be generic. The most common examples are container types like Array<T> and List.
In Gemini, the declarations of function, class, interface, struct, and enum can all declare type parameters, which means they can all be generic.
For the convenience of discussion, the following commonly used terms are defined:
- Type parameter: A type or function declaration may have one or more types that need to be specified at the use site, and these types are called type parameters. When declaring a type parameter, an identifier must be provided so that it can be referenced within the declaration body.
- Type variable: After declaring type parameters, the identifiers used to refer to these types are called type variables.
- Type argument: When specifying generic parameters while using a generic type or function, these parameters are called type arguments.
- Type constructor: A type that requires zero, one, or more types as arguments is called a type constructor.
More about generics, I'll document it to more pages.
Packages
As the project continues to grow, managing the source code in a single large file can become very difficult. At this point, the source code can be grouped according to functionality, and the code for different functions can be managed separately. Each independently managed group of code generates an output file. When using them, the corresponding functions are accessed by importing the relevant output file, or more complex features are achieved through the interaction and combination of different functions, making project management more efficient. Each of these small groups is called a module, and the stacks of code contained within a module are called packages.
A package is the smallest unit of compilation. Each package can independently produce outputs such as AST files, static library files, and dynamic library files. Each package has its own namespace, and it is not allowed to have top-level definitions or declarations with the same name within the same package (function overloading excepted). A package can contain several source files.
A module is a collection of several packages and is the smallest unit released by third-party developers. A module's program entry can only be located in its root directory, and its top level can have at most one main as the program entry. This main has no parameters, or its parameters are of type Int64 or Array<String>, and its return type is either an integer type or Unit type.
Definition of Package
A package consists of one or more source files. Source files in the same package must be in the same directory, and source files in the same directory can only belong to the same package. A package can define sub-packages, forming a tree structure. The directory of a sub-package is a subdirectory of its parent package's directory. A package without a parent is called a root package, and the entire tree formed by the root package and its sub-packages (including sub-packages of sub-packages) is called a module.
demo ├src │├main.gmn │└pkg0 │ ├pkg0.gmn │ ├bar │ │└bar.gmn │ ├baz │ │└baz.gmn │ ├foo │ │└foo.gmn └GmnPrg.toml
GmnPrg.toml is the configuration file for the current module's workspace, used to define basic information, dependencies, compilation options, and other content. This file is parsed and executed by Gemini's official package management tool, GmnPrg.
Note: For the same module, if you need to configure a valid package for it, the directory containing the package must directly include at least one gmn file, and all of its upstream directories must also be valid packages.
Declare a package
A package declaration begins with the keyword package, followed by the names of all the packages in the path from the root package to the current package, separated by dots. Package names must be valid regular identifiers (excluding raw identifiers). Note: In the current Windows platform version, package names do not yet support the use of Unicode characters; package names must be valid regular identifiers containing only ASCII characters.
The package declaration must be on the first non-empty, non-comment line of the source file, and the package declarations in different source files within the same package must be consistent.
The package name should reflect the path of the current source file relative to the project's source root directory src, with path separators replaced by dots. For example, if the package's source code is located under src/directory_0/directory_1 and the root package name is pkg, then the package declaration in its source code should be package pkg.directory_0.directory_1.
It is important to note that:
- The folder name where the package resides must match the package name.
- The default name of the source root directory is src.
- Packages under the source root directory can have no package declaration; in this case, the compiler will default to assigning them the package name default.
Package declarations cannot cause naming conflicts: a subpackage cannot have the same name as a top-level declaration in the current package.
Visibility of Top-Layer Declarations
In Gemini, access modifiers can be used to control the visibility of top-level declarations such as types, variables, and functions. The Cangjie language has four access modifiers: private, internal, protected, and public. When applied to top-level elements, their semantics are as follows:
- private means visible only within the current file. Members of this type cannot be accessed from other files.
- internal means visible only within the current package and its sub-packages (including sub-sub-packages). Members of this type can be accessed within the same package without importing, and can be accessed from sub-packages (including sub-sub-packages) via import.
- protected means visible only within the current module. Files within the same package can access these members without importing, while other packages within the same module can access these members via import. Packages in different modules cannot access these members.
- public means visible both inside and outside the module. Files within the same package can access these members without importing, while other packages can access these members via import.
| Modifier | Current file | Package tree | Module | All packs |
|---|---|---|---|---|
| private | √ | × | × | × |
| internal | √ | √ | × | × |
| protected | √ | √ | √ | × |
| public | √ | √ | √ | √ |
Include a package
In Gemini, you can import a top-level declaration or definition from another package using the syntax include fullPackageName.itemName, where fullPackageName is the complete path of the package and itemName is the name of the declaration. The import statement must appear in the source file after the package declaration and before other declarations or definitions.
If multiple itemNames to be imported belong to the same fullPackageName, you can use the syntax include fullPackageName.{itemName[, itemName]*}.
In addition to importing a specific top-level declaration or definition with the import fullPackageName.itemName syntax, you can also import all visible top-level declarations or definitions in a package using the import packageName.* syntax.
Exception Handling
Exception? What is it?
Exceptions are a special type of error that can be caught and handled by programmers, and refer to a series of abnormal behaviors that occur during program execution. For example, array out-of-bounds access, division by zero, calculation overflow, and illegal input. To ensure system correctness and robustness, many software systems include a large amount of code for error detection and handling.
Exceptions do not belong to the normal functionality of a program. Once an exception occurs, the program must handle it immediately, transferring control from the normal execution flow to the part that handles the exception. Gemini provides an exception handling mechanism to deal with various exceptions that may occur during program execution.
In Gemini, exception classes include Error and Exception:
The Error class describes internal system errors and resource exhaustion errors (such as stack overflow or out-of-memory errors) that occur during Gemini runtime. Applications should not throw this type of error. If an internal error occurs, the user can only be notified, and the program should be safely terminated as much as possible.
The Exception class describes exceptions caused by logical errors or I/O errors during program execution, such as array out-of-bounds access or trying to open a non-existent file. These exceptions need to be caught and handled within the program.
Developers cannot create custom exceptions by inheriting from the built-in Error or its subclasses in the Gemini, but they can create custom exceptions by inheriting from the built-in Exception or its subclasses.
Throw and Catch an Exception
The above introduction explained how to customize exceptions. Next, we will learn how to throw and handle exceptions.
- Since exceptions are class types, you can create an exception simply by constructing it like a class object. For example, the expression FatherException() creates an exception of type FatherException.
- Gemini provides the throw keyword to throw exceptions. When using throw, the expression following it must be a subtype of Exception (Error, although also an exception, cannot be manually thrown). For example, throw ArithmeticException("I am an Exception!") will throw an arithmetic exception when executed.
- Exceptions thrown using the throw keyword need to be caught and handled. If an exception is not caught, the system will invoke the default exception handling function.
Exception handling is done using the try expression and can be divided into:
Ordinary try expressions that do not involve automatic resource management.
Try-with-resources expressions that perform automatic resource management.
Regular Try Expression
A regular try expression consists of three parts: the try block, the catch block, and the finally block.
- Try Block: Starts with the keyword
try, followed by a block composed of expressions and declarations (enclosed in curly braces, which defines a new local scope and can contain any expressions and declarations, hereinafter referred to as a "block"). The block following thetrycan throw exceptions, which can be caught and handled by the subsequentcatchblocks. If there is no catch block or the exception is not caught, it will continue to be thrown after thefinallyblock executes. - Catch Block: A regular try expression can include zero or more catch blocks (if there are no catch blocks, a finally block must be present). Each catch block begins with the keyword
catch, followed by acatchPatternand a block. ThecatchPatternmatches the exception to be caught using pattern matching. Once a match is found, the following block handles it, and any subsequent catch blocks are ignored. If the exception types catchable by a catch block can all be caught by a preceding catch block, a "catch block is unreachable" warning is issued. - Else Block: A regular try expression can contain only one else block. If none of the catch blocks catch a matching exception, it indicates that the program ran "successfully," and the else block executes.
- Finally Block: Begins with the keyword
finally, followed by a block. In principle, the finally block is mainly used to perform cleanup tasks, such as releasing resources, and should avoid throwing new exceptions. The contents of the finally block are executed regardless of whether an exception occurs (i.e., whether the try block throws an exception). If the exception is not handled, it continues to propagate outward after the finally block executes. A try expression may omit the finally block if it includes a catch block; otherwise, a finally block is required.
The scope of the block following the try and each catch block are independent of each other.
Try with Resources
The try-with-resources expression is primarily designed for automatically releasing non-memory resources. Unlike a regular try expression, in a try-with-resources expression, both the catch and finally blocks are optional, and between the try keyword and the block, one or more ResourceSpecifications can be inserted to acquire a series of resources (the ResourceSpecifications do not affect the type of the entire try expression). The resources referred to here correspond to objects at the language level, so a ResourceSpecification essentially involves instantiating a series of objects (multiple instantiations are separated by commas).
The names introduced between the try keyword and the do ... end block have the same scope level as the variables introduced within a code block; redeclaring the same name within do ... end will trigger a redefinition error. The types of ResourceSpecifications in a try-with-resources expression must implement the Resource interface.
It should be noted that there is generally no need to include catch or finally blocks in a try-with-resources expression, and developers are not advised to manually release resources (as this would be redundant). However, if it is necessary to explicitly catch and handle exceptions that may be thrown during the try block or the acquisition and release of resources, catch and finally blocks can still be included in a try-with-resources expression.
Some common Exceptions
| Number | Exception | Description |
|---|---|---|
| 0 | SuccessfullyRun | No abnormalities occurred. |
| 1 | IllegalArgumentException | The parameter passed is invalid or incorrect. |
| 2 | ArithmeticException | Some mathematical errors, usually caused by division by zero. |
| 3 | NegativeArraySizeException | Defined an array with a size that is BELOW 0. |
| 4 | NoneValueException | Value Not Find. |
| 5 | OverflowException | Value overflow. |
| -1 | ManualExit | The user manually exited the program. |
| 6 | ConcurrentModificationException | An exception caused by concurrent modifications. |
| 3221225473 | StackError | Stack overflow or stack underflow. |
| 3221225474 | OutOfMemory | No enough memory to run the program. |
| 3221225475 | InternalSystemError | Something bad happened to the internal things. |
| 3221225476 | CompilationError | The compilation failed, usually because there is something in the program that Gemini can't understand. |
| Works | In | Progress |
More about Gemini
Coming Soon.
Examples
I created the page for very long time. I'll add them later.