User:Yayimhere/XeReg

From Esolang
Jump to navigation Jump to search

XeReg is an esolang based on Regex which has the string operators of Cirt e mys, as well as other stuff. It was created specifically to be able to implement Cirt e mys. It is special in that matched strings can be "Modified" for other matches later in the string.

Semantics

A program is a XeReg match pattern. The program takes a single input, and returns the subset of that input string which was matched, unless if it was empty, in which it returns the program itself.

Tokens

The following is the list of tokens in XeReg:

  • .: matches any character, other than the empty string.
  • (a ~ ...): capture group, with the name a(single letter). Not matched within the string. It matches independently on the input string and is equal to the substring matched. Cannot be redefined, new names are generated by appending a ', so for example, (a ~ o)(a ~ oo)(a ~ ooo) -> (a ~ o)(a' ~ oo)(a ~ ooo) -> (a ~ o)(a' ~ oo)(a' ~ ooo) -> (a ~ o)(a' ~ oo)(a'' ~ ooo). The characters within are remembered.
  • _a: match capture group a.
  • a+: 1 or more instances of a that are all followed by each other. This, and all other multi character matches, match until the item after them matches. Matches greedily.
  • a(x)±: 1 or more instances of a, that may be separated by any characters in x.
  • a*: +, but 0 or more.
  • a(x)@: a(x)±, but 0 or more.
  • ·: equal to .+.
  • ÷: equal to .*.
  • [f ~ x @y]: creates a "function". These are simply shorthands for another regex expression. x is an input, y is the body, and is referenced as {f.x}. ø may be used instead of any inputs, which simply makes it always be replaced with the body y, and are called as {f.}. Example: [· ~ ø @.+]. May be recursive.
  • <: start of string.
  • $: end of string.
  • : matches everything not matched by a.
  • a|b: matches either a or b.
  • a': matches capture group a, but with its order of tokens reversed.
  • a^: matches capture group a, however ignoring its first token. Matches the empty string if a has less than 2 tokens.
  • a-: matches the first token of capture group a.
  • x+y: x and y concatenated.
  • ?x(y|z): if previous actually matched character was x, it will return y and further evaluation will continue, else it returns z.
  • space: separates tokens.
  • \x: escapes x.
  • /a: matches capture group a, however with the order defined being ignored, and always chooses the match that matches the most of the string(using ascii ordering)
  • #: Is equal to the latest's defined capture group's name.
  • : Immdetily stops further matching.
  • {a - x}: begins a captured match, where it runs the subprogram x, but with its input string being a. returns the matched substring.

Evalutation goes from the left to right. Every other character matches itself.

Examples

The following code's do not match any strings:

[! ~ ø @.!] !

or:

.*¯

Computational class

XeReg is Turing complete, by compilation from BCT:
For some program p, first these replacements must take place:

0 -> (# ~ #^)
1x -> {# ~ 1 ?1((# ~ #+x)|(#))}

with spaces between each command. Then, the new program is appended to the program (a ~ .+) [p ~ ø @{ # - , and then a space is appended to the resulting program, and then # ?1|0((# ~ #)|≠) p } ]. The input should be the data string.