REXS
REXS is an esoteric programming language created by User:Uellenberg built to simplify reading and writing regular expressions (especially complex ones).
Description
Every action in REXS consists of a function, of which two types exist: bodied and non-bodied functions. All functions consist of a function name (which is always lowercase), followed by an opening parenthesis and optional parameters, then finally a closing parenthesis (for example, NAME(PARAMETERS)
). Bodied functions will be followed by braces (NAME(PARAMETERS) {}
) and non-bodied functions will be followed by a semicolon (NAME(PARAMETERS);
).
Specifications
Parameters
The parameters a function can take will depend on the specific function, and if multiple parameters exist, they will be separated by a comma (param1, param2
). In general, parameters can be split up into three distinct types (although they might not always fall into these categories, as it is entirely dependent on the specific function):
Control
Control parameters are specific, per-function parameters that do certain things. For example, assert(START);
has the parameter START
, which is a control parameter. In general, control parameters will be comprised of only uppercase characters and can be thought of as enums.
String
String parameters are, as their name suggests, strings. They are surrounded by double quotes on both sides. For example, match("t");
has the parameter "t"
, which is a string parameter. When this parameter is used by the function, the double quotes on either end will be stripped. It is also important to note that strings allow every character except double quotes within them. If you wish to use a double quote in a function, there will likely be an option to use a control parameter (such as QUOTE
) to achieve it.
Integer
Integer parameters are integer numbers. For example, repeat(10) {}
has the parameter 10
, which is an integer parameter.
Functions
Here, the different classes of functions, discussed in the description, will be explained in greater detail.
Bodied Functions
Bodied functions are used to do some action on their body (which is everything between their braces), based on their parameters. Their body can consist of both bodied and non-bodied functions, although some functions have specific limits on which functions can appear inside their body.
Non-bodied Functions
Non-bodied functions are used to do some action based on their parameters.
Functions
Match
Match (non-bodied) is one of the most essential functions in REXS. It represents a match of a literal string in a regular expression.
Type | Allowed Values | Descriptions | Required |
---|---|---|---|
String or Control | Strings or Match controls | The value being matched | Yes |
Integer | Integers | An input value for specific controls | No |
Assert
Assert (non-bodied) another important function in REXS, and is similar to a match, except it does not include the match result.
Type | Allowed Values | Descriptions | Required |
---|---|---|---|
Control | Assert controls | The value being asserted | Yes |
Flag
Flag (non-bodied) sets a flag in the regular expression.
Type | Allowed Values | Descriptions | Required |
---|---|---|---|
Control | i /g /m /s /u /y |
The flag being set | Yes |
Group
Group (bodied) defines a capturing group around everything inside its body.
Backref
Backref (non-bodied) matches the value of a capturing group, by its index.
Type | Allowed Values | Descriptions | Required |
---|---|---|---|
Integer | Integers | The index of the capturing group | Yes |
Repeat
Repeat (bodied) repeats its body a specified amount of times.
Type | Allowed Values | Descriptions | Required |
---|---|---|---|
Integer | Integers | The minimum amount of repeats needed to match | Yes |
Integer | Integers or inf /infinity /forever |
The maximum amount of repeats that can be matched | Yes |
Control | nongreedy |
If specified, makes the repeat non-greedy and repeats the least amount of times possible | No |
Set
Set (bodied) creates a character set and puts its body inside the set. Additionally, only the match
and to
functions are allowed inside its body.
To
To (non-bodied) is to be used inside of set
and represents the to (-
) character in a regex set.
Or
Or (bodied) creates an or-expression, matching anything inside its body. It will only match one group in its body. Additionally, only the orpart
functions are allowed inside its body.
OrPart
OrPart (bodied) is to be used inside of or
and creates a possible option for the or to use.
Before
Before (bodied) is similar to assert in that it matches something but does not include the match. This function can be used to assert that its body matches (or doesn't match) before the main expression.
Type | Allowed Values | Descriptions | Required |
---|---|---|---|
Control | not |
If used, will assert that the body doesn't match instead of asserting that it does match | No |
After
After (bodied) is similar to assert in that it matches something but does not include the match. This function can be used to assert that its body matches (or doesn't match) after the main expression.
Type | Allowed Values | Descriptions | Required |
---|---|---|---|
Control | not |
If used, will assert that the body doesn't match instead of asserting that it does match | No |
Control Sets
Below is a list of the possible control values for specific functions:
Match Controls
- ANY - Matches any character.
- DIGIT - Matches any digit (0-9).
- NON_DIGIT - Matches any non-digit (not 0-9).
- ALPHANUM - Matches any alphanum (0-9 or a-z or A-Z).
- NON_ALPHANUM - Matches any non-alphanum (not 0-9 or a-z or A-Z).
- SPACE - Matches any space (
- NON_SPACE - Matches any non-spcae (not
- HTAB - Matches any horizontal tab (
\t
). - VTAB - Matches any vertical tab (
\v
). - RETURN - Matches any return (
\r
). - LINEFEED - Matches any linefeed (
\n
). - FORMFEED - Matches any formfeed (
\f
). - BACKSPACE - Matches any backspace (
\b
). - NULL - Matches any null (
\0
). - QUOTE - Matches any quote (
"
). - *CONTROL - Matches a specific control character. The second parameter of
match
must be a value A-Z, specifying the control character that this matches. - *HEX - Matches a specific character by its hex code. The second parameter of
match
must be a hex string with a length of either 2 or 4, indicating the hex character being matched.
*=requires second parameter to be used.
Assert Controls
- START - Asserts the start of the string being matched on or a line (depending on the flags).
- END - Asserts the end of the string being matched on or a line (depending on the flags).
- WORD_BOUNDARY - Asserts a word boundary (the character before or after the start or end of a word).
- NOT_WORD_BOUNDARY - Asserts not a word boundary (the character before or after the start or end of a word).
Examples
IRC
Let's take a look at a basic IRC message:
name!email PRIVMSG #channel :message
If we want to parse all of the details from this, we can build a regular expression in REXS like this:
assert(START); group() { repeat(0, inf, nongreedy) { match(ANY); } } match("!"); group() { repeat(0, inf, nongreedy) { match(ANY); } } match(" PRIVMSG #");
group() { repeat(0, inf, nongreedy) { match(ANY); } } match(" :");
group() { repeat(0, inf) { match(ANY); } } assert(END);
This code will then be compiled to:
/^(.*?)!(.*?) PRIVMSG \#(.*?) :(.*)$/