REXS

From Esolang
Jump to navigation Jump to search

REXS is an esoteric programming language created by User:Uellenberg built to simplify reading and writing regular expressions (especially complex ones).

Description

Every action in REXS consists of a function, of which two types exist: bodied and non-bodied functions. All functions consist of a function name (which is always lowercase), followed by an opening parenthesis and optional parameters, then finally a closing parenthesis (for example, NAME(PARAMETERS)). Bodied functions will be followed by braces (NAME(PARAMETERS) {}) and non-bodied functions will be followed by a semicolon (NAME(PARAMETERS);).

Specifications

Parameters

The parameters a function can take will depend on the specific function, and if multiple parameters exist, they will be separated by a comma (param1, param2). In general, parameters can be split up into three distinct types (although they might not always fall into these categories, as it is entirely dependent on the specific function):

Control

Control parameters are specific, per-function parameters that do certain things. For example, assert(START); has the parameter START, which is a control parameter. In general, control parameters will be comprised of only uppercase characters and can be thought of as enums.

String

String parameters are, as their name suggests, strings. They are surrounded by double quotes on both sides. For example, match("t"); has the parameter "t", which is a string parameter. When this parameter is used by the function, the double quotes on either end will be stripped. It is also important to note that strings allow every character except double quotes within them. If you wish to use a double quote in a function, there will likely be an option to use a control parameter (such as QUOTE) to achieve it.

Integer

Integer parameters are integer numbers. For example, repeat(10) {} has the parameter 10, which is an integer parameter.

Functions

Here, the different classes of functions, discussed in the description, will be explained in greater detail.

Bodied Functions

Bodied functions are used to do some action on their body (which is everything between their braces), based on their parameters. Their body can consist of both bodied and non-bodied functions, although some functions have specific limits on which functions can appear inside their body.

Non-bodied Functions

Non-bodied functions are used to do some action based on their parameters.

Functions

Match

Match (non-bodied) is one of the most essential functions in REXS. It represents a match of a literal string in a regular expression.

Parameters
Type Allowed Values Descriptions Required
String or Control Strings or Match controls The value being matched Yes
Integer Integers An input value for specific controls No

Assert

Assert (non-bodied) another important function in REXS, and is similar to a match, except it does not include the match result.

Parameters
Type Allowed Values Descriptions Required
Control Assert controls The value being asserted Yes

Flag

Flag (non-bodied) sets a flag in the regular expression.

Parameters
Type Allowed Values Descriptions Required
Control i/g/m/s/u/y The flag being set Yes

Group

Group (bodied) defines a capturing group around everything inside its body.

Backref

Backref (non-bodied) matches the value of a capturing group, by its index.

Parameters
Type Allowed Values Descriptions Required
Integer Integers The index of the capturing group Yes

Repeat

Repeat (bodied) repeats its body a specified amount of times.

Parameters
Type Allowed Values Descriptions Required
Integer Integers The minimum amount of repeats needed to match Yes
Integer Integers or inf/infinity/forever The maximum amount of repeats that can be matched Yes
Control nongreedy If specified, makes the repeat non-greedy and repeats the least amount of times possible No

Set

Set (bodied) creates a character set and puts its body inside the set. Additionally, only the match and to functions are allowed inside its body.

To

To (non-bodied) is to be used inside of set and represents the to (-) character in a regex set.

Or

Or (bodied) creates an or-expression, matching anything inside its body. It will only match one group in its body. Additionally, only the orpart functions are allowed inside its body.

OrPart

OrPart (bodied) is to be used inside of or and creates a possible option for the or to use.

Before

Before (bodied) is similar to assert in that it matches something but does not include the match. This function can be used to assert that its body matches (or doesn't match) before the main expression.

Parameters
Type Allowed Values Descriptions Required
Control not If used, will assert that the body doesn't match instead of asserting that it does match No

After

After (bodied) is similar to assert in that it matches something but does not include the match. This function can be used to assert that its body matches (or doesn't match) after the main expression.

Parameters
Type Allowed Values Descriptions Required
Control not If used, will assert that the body doesn't match instead of asserting that it does match No

Control Sets

Below is a list of the possible control values for specific functions:

Match Controls

  • ANY - Matches any character.
  • DIGIT - Matches any digit (0-9).
  • NON_DIGIT - Matches any non-digit (not 0-9).
  • ALPHANUM - Matches any alphanum (0-9 or a-z or A-Z).
  • NON_ALPHANUM - Matches any non-alphanum (not 0-9 or a-z or A-Z).
  • SPACE - Matches any space ( ).
  • NON_SPACE - Matches any non-spcae (not ).
  • HTAB - Matches any horizontal tab ( or \t).
  • VTAB - Matches any vertical tab (\v).
  • RETURN - Matches any return (\r).
  • LINEFEED - Matches any linefeed (\n).
  • FORMFEED - Matches any formfeed (\f).
  • BACKSPACE - Matches any backspace (\b).
  • NULL - Matches any null (\0).
  • QUOTE - Matches any quote (").
  • *CONTROL - Matches a specific control character. The second parameter of match must be a value A-Z, specifying the control character that this matches.
  • *HEX - Matches a specific character by its hex code. The second parameter of match must be a hex string with a length of either 2 or 4, indicating the hex character being matched.

*=requires second parameter to be used.

Assert Controls

  • START - Asserts the start of the string being matched on or a line (depending on the flags).
  • END - Asserts the end of the string being matched on or a line (depending on the flags).
  • WORD_BOUNDARY - Asserts a word boundary (the character before or after the start or end of a word).
  • NOT_WORD_BOUNDARY - Asserts not a word boundary (the character before or after the start or end of a word).

Examples

IRC

Let's take a look at a basic IRC message:

 name!email PRIVMSG #channel :message

If we want to parse all of the details from this, we can build a regular expression in REXS like this:

 assert(START);
 
 group() {
   repeat(0, inf, nongreedy) {
     match(ANY);
   }
 }
 
 match("!");
 
 group() {
   repeat(0, inf, nongreedy) {
     match(ANY);
   }
 }
 
 match(" PRIVMSG #");
 group() {
   repeat(0, inf, nongreedy) {
     match(ANY);
   }
 }
 
 match(" :");
 group() {
   repeat(0, inf) {
     match(ANY);
   }
 }
 
 assert(END);

This code will then be compiled to:

 /^(.*?)!(.*?) PRIVMSG \#(.*?) :(.*)$/

External Resources

Compiler and Decompiler