ELIP

From Esolang
Jump to navigation Jump to search

ELIP stands for Esoteric Language Implemented in Preprocessor. It's a project with the goal of implementing any of the esolangs on this wiki by using the C Preprocessor. Since the C preprocessor is unable to process arbitrary tokens, they may need to be encoded in some way, like BF_LBRACKET or BF_LOOP() and BF_PLUS for Brainfuck, or UNLAMBDA_APPLY for Unlambda. Anybody is allowed to contribute anything to this project as long as it involves the C preprocessor, even joke languages like HQ9+, but please put language-specific code on sub-pages and link them here under the "Language list" section. Libraries like Chaos and Order or Boost.Preprocessor are allowed and encouraged. The "Techniques and Examples" section lists techniques for using libraries or the preprocessor itself.

Why?

The C preprocessor has libraries that support loops, repetition, lazy evaluation, continuations, lambda functions, arithmetic including powers and arbitrary-precision numbers, several compound data types, partial application, binary search, folding, sorting, map and filter, and higher-order and first-class functions. The Order library even has operators that take a variadic number of arguments and automatically folds them in either a left-to-right order or right-to-left order, depending on how the operators are defined. It should be possible to create an interpreter for any programming language (with some modifications) using only the C preprocessor. The only restriction is that it is unable to output a newline or some other characters, but it can output text like $NEWLINE$ or $CONTROL_A$ that can be post-processed into the desired characters.

Description

The C preprocessor has four main contexts:

  1. Directives, which are line-oriented, beginning with # and ending in a newline.
  2. Comments, which may start with /* and end with */ and may span multiple lines or start with // and run to the next newline.
  3. Source code, which is a series of tokens that may contain macros, character/string literals, identifiers, numbers, and operators/punctuation.
  4. Character literals in single quotes (') and string literals in double quotes ("), which are taken literally except for backslashes and the closing quote.

Anywhere in the file, a line that ends in a backslash (\) or trigraph (??/) escape sequence is considered to be part of the previous line with no break, so an escape sequence followed by a newline can appear even in the middle of a number, string/character literal, or identifier.

Directives

  • #if constant-expr - if the condition is true, processes the file up to the next matching #elif, #else, or #endif
  • #ifdef identifier - equivalent to #if defined
  • #ifndef identifier - equivalent to #if !defined
  • #elif constant-expr - if the preceding conditions were false and this condition is true, processes the file up to the next matching #elif, #else, or #endif
  • #else - if the preceding conditions were all false, processes the file up to the next matching #endif
  • #endif - ends a conditional (#if/#ifdef/#ifndef/#elif/#else) group
  • #include <h-char-sequence> - includes a file from the set of places containing the standard headers (usually the compiler/system's include directory), the name may not contain >
  • #include "q-char-sequence" - includes a file from an implementation-defined set of places (usually the current directory and the program's source or include directory), followed by the <h-char-sequence> places if not found, the name may not contain "
  • #define identifier replacement-list - defines an object-like macro
  • #define identifier(identifier-list) replacement-list - defines a function-like macro, with an argument list that may possibly be empty or optionally end in a ... argument corresponding to the __VA_ARGS__ identifier in the replacement list
  • #undef identifier - undefines the macro, or does nothing if the macro name is currently undefined
  • #line digit-sequence - set the current line number
  • #line digit-sequence "s-char-sequence-opt" - set the current line number and file name
  • #error pp-tokens-opt - print a diagnostic message
  • #pragma pp-tokens-opt - a way of providing implementation-defined and non-standard control to the preprocessor or compiler
  • # - a null preprocessing directive

The directive execution portion of the C preprocessor can be considered an imperative language. Macros are expanded only in #if, #elif, #include, #line, and certain non-standard #pragma directives. A directive can never be formed as a result of macro expansion, even if it resembles one. If a sequence of characters resembling a directive occurs inside of a macro argument list, the behavior is undefined. All directives are ended by newlines (unless escaped), even if it's in the middle of the arguments to a function-like macro. In conditional directives, parts of the file that are skipped are still tokenized, meaning the quoting rules, comments, and trigraphs all still apply. #if may be used to perform arithmetic and then assign the results to a macro by using #defines, as shown by Boost.PP and Chaos's evaluated slots mechanism. #include can be used perform loops and call "subroutines" (also called file iteration and X-Macros).

Operators

  • defined identifier or defined ( identifier ) - in #if and #elif directives, evaluates to 1 if the macro is defined or 0 otherwise.
  • In #if and #elif directives, any of the operators and constants of C that may be used in an integer constant expression (including character literals) may also be used, except for casts and sizeof.
  • _Pragma ( string-literal ) - in C99, the same as #pragma directives, but can be created by macro expansion
  • # - a unary operator of unspecified order of evaluation that converts its parameter into a string literal
  • ## - a binary operator of unspecified order of evaluation that concatenates its left and right operands, and is valid only if the result is a single C token
  • ( - begins an argument list when following the name of a function-like macro
  • , - separates arguments to a function-like macro
  • ) - terminates the argument list of a function-like macro

Compared to other programming languages like C, there are very few operators (outside of #if statements). The ## operator is the most powerful operator since it can create new macro names from arguments, and then cause those names to be passed around and/or expanded as macros. Boost.PP, Chaos, and Order use it for everything from arithmetic (using lookup tables) to control flow. The ( operator can be used to create data types like sequences, arrays, tuples, and lists, all based around the deferred invocation of macros.

Built-in macros

  • __LINE__ - the current line number (numeric constant)
  • __FILE__ - the current source file (string literal)
  • __DATE__ - the month, day, and year in "Mmm dd yyyy" format as returned by asctime()
  • __TIME__ - the current time in "hh:mm:ss" format as returned by asctime()
  • __STDC__ - is 1 if and only if the implementation conforms to the C standard

The __LINE__ and __FILE__ macros can be set by #line directives.

Trigraphs

??=     #
??(     [
??/     \
??)     ]
??'     ^
??<     {
??!     |
??>     }
??-     ~

For certain non-English character sets, some of the standard C characters are unavailable. Thus, trigraphs were created. The trigraph sequences on the left are equivalent to the standard C characters on the right, even in character/string literals or comments. This is because they are all replaced by their equivalent characters during phase 1 of preprocessing (before tokenization).

Digraphs

<:  :>  <%  %>  %:  %:%:
[   ]   {   }   #   ##

Digraphs serve a similar purpose as trigraphs, but unlike trigraphs, digraphs retain their original forms and are only equivalent in meaning as tokens in source code, not in comments or character/string literals. They stringify as their original forms and after preprocessing, will not change appearance. For example, #??= is identical to ##, but #%: is two separate tokens. Another difference is that new digraphs may be formed by using the ## (or equivalent) operator.

Macro expansion

The macro expansion portion of the C preprocessor is purely functional. Unlike function calls in most programming languages, macro expansion proceeds from the outside-in, so arguments are not evaluated before being replaced. This is what makes lazy evaluation possible and the ## operator so powerful.

Identifiers

The most common type of token used in libraries like Boost.PP and Chaos is the identifier. An identifier is a sequence beginning with an alphabetic character, an underscore, or other implementation-defined characters (such as letters with diacritics, Chinese characters, or the dollar sign ($)), and possibly followed by any number of these characters and/or digits. The most common use of identifiers is a macro name. All identifiers except defined, __LINE__, __FILE__, __DATE__, __TIME__, __STDC__, and __VA_ARGS__ are valid as names of user-defined macros.

PP-numbers

Similar to an identifier, a pp-number is a sequence of characters. They must start with a digit or with a period (.) followed by a digit and may be followed by a sequence consisting of any of the characters valid in an identifier, with the addition of the period and signed exponents of the following forms: e+, e-, E+, E-, p+, p-, P+, or P-. In an #if or #elif expression, the beginning integral portion of a pp-number is interpreted as an intmax_t (at least 64-bit, but may be larger, even bignum) of hexadecimal base if it begins with 0x or 0X, octal base if it begins with 0 or decimal base otherwise. If a u or U follows the valid portion of the number, it is interpreted as a uintmax_t, its unsigned equivalent. By using the subset of pp-number characters also valid in identifiers, it may be concatenated to a prefix to form a new identifier. This identifier will expand when used as a macro and may also be invoked, but the pp-number itself is able to be passed around safely even when immediately followed by (, since it is not a valid macro name.

Techniques and Examples

These are some examples of some preprocessor techniques that will be useful in ELIP and already have real-world usage in the Boost.PP, Chaos, and Order preprocessor libraries.

Lazy evaluation of arguments

#define WILL_ERROR() + ## -
#define SECOND(x,...) __VA_ARGS__
SECOND(WILL_ERROR(),5)

Because of the order of macro expansion and the lazy evaluation of macro arguments, the WILL_ERROR() macro (which would cause an error or a warning by attempting to create the invalid token "+-") is not invoked. This particular example isn't very practical, but it can be used to create IF() macros that selectively expand one or more of its arguments (that may possibly be invalid) without causing errors or undefined behavior.

Lazy evaluation with #

#define XSTR(...) #__VA_ARGS__
#define STR(...) XSTR(__VA_ARGS__)
#define Hello Goodbye
XSTR(Hello, world!)
STR(Hello, world!)

Output:

"Hello, world!"
"Goodbye, world!"

These seemingly redundant macros do different things because STR() is eager and XSTR() is lazy.

Lazy evaluation with ##

#define XCAT(x,y) x ## y
#define CAT(x,y) XCAT(x,y)
#define A 1
#define B 2
#define AB 3
XCAT(A,B)
CAT(A,B)

Output:

3
12

Similarly, CAT() is eager and XCAT() is lazy.

Directive in an argument list

A(
#define B 50
B
)

This will behave cause B to be replaced with 50 if A is undefined or an object-like macro, but cause undefined behavior if A is defined as a function-like macro. For example, Plan 9 and mcpp will treat the argument as #define B 50 B (not a directive), but GCC cpp will treat the argument as B and the previous line as a directive, so B will be defined as 50.

PP-numbers

77
0xFACE
199901L
3.141592765
5_
1.2.3.4.5.6.7.8.9
192.168.0.1
1.....0
5VALID_pp-number
0addition_operator

These are all valid pp-numbers. They all may be concatenated to each other and may have identifiers concatenated to the right side of them, forming new pp-numbers. The first three, the fifth, and the last may be concatenated to the right side of an identifier to form a new identifier. This new identifier may be interpreted as a macro name and called. The INC(), REP(), IF(), and HIF() macros below all use this method.

Arithmetic using lookup tables

#define XCAT(x,y) x ## y
#define INC(x) XCAT(INC_,x)
#define INC_0 1
#define INC_1 2
#define INC_2 3
#define INC_3 4
#define INC_4 5
#define INC_5 5 /* saturate to 5 */
INC(2)
INC(INC(INC(INC(INC(INC(0))))))

Output:

3
5

This is the same method of arithmetic that is used by Boost.PP and by Chaos's saturating arithmetic functions. This can be changed to wrapping arithmetic simply by making the highest value in the table expand to 0 instead of to itself.

Higher-order functions

#define REP(n) REP ## n
#define REP256(f,...) REP128(f,__VA_ARGS__) REP128(f,__VA_ARGS__)
#define REP128(f,...) REP64(f,__VA_ARGS__) REP64(f,__VA_ARGS__)
#define REP64(f,...) REP32(f,__VA_ARGS__) REP32(f,__VA_ARGS__)
#define REP32(f,...) REP16(f,__VA_ARGS__) REP16(f,__VA_ARGS__)
#define REP16(f,...) REP8(f,__VA_ARGS__) REP8(f,__VA_ARGS__)
#define REP8(f,...) REP4(f,__VA_ARGS__) REP4(f,__VA_ARGS__)
#define REP4(f,...) REP2(f,__VA_ARGS__) REP2(f,__VA_ARGS__)
#define REP2(f,...) f(__VA_ARGS__) f(__VA_ARGS__)
#define REP1(f,...) f(__VA_ARGS__)
#define A(x) {A: x}
REP(128)(A,test)

The REP() macro takes a number that is a power of two from 1 to 256 and returns a macro name that when invoked will expand the function-like macro given as its first argument that number of times, using the remaining arguments as arguments to that macro. This can easily be extended to include all integers between 1 and 256.

Simple conditional

#define XCAT(x,y) x ## y
#define IF(p,t,f) XCAT(IF_,p)(t,f)
#define IF_0(t,f) f
#define IF_1(t,f) t
IF(0,5,6)
IF(1,7,8)

Output:

6
7

If the first argument is 0, the second argument is returned, otherwise the third argument is returned.

Higher-order conditional

#define XCAT(x,y) x ## y
#define EXPAND(...) __VA_ARGS__
#define EAT(...)
#define HIF(p) XCAT(HIF_,p)
#define HIF_0(...) EXPAND
#define HIF_1(...) __VA_ARGS__ EAT
HIF(0)(true)(false)
HIF(1)(true)(false)

Output:

false
true

This HIF (higher-order IF) takes 3 sets of arguments: a condition that is 0 if false or 1 if true, arguments to expand to if true, and arguments to expand to if false. Because it's higher-order, it can be partially evaluated and then passed around until it receives all the necessary arguments.

Higher-order booleans

#define APPLY_IF(cond, ...) cond(__VA_ARGS__)
#define TRUE(x, y) x
#define FALSE(x, y) y
#define AND(x, y) x(y, FALSE)
#define OR(x, y) x(TRUE, y)
#define NOT(x) x(FALSE, TRUE)
#define XOR(x, y) x(y(FALSE, TRUE), y)
AND(TRUE, FALSE)(AND, OR)(TRUE, FALSE)

Output:

TRUE

These are the higher-order Church booleans as defined by the Lambda Calculus. Unlike the previous IF and HIF macros, these do not use identifier concatenation, but function application. The APPLY_IF macro is actually unnecessary because it simply applies the remaining arguments to the first argument, although it can be used to make the syntax more similar to the previous IF macro. All of these are lazily evaluated and the results of these macros are themselves higher-order functions. Because they are function-like macros, the identifiers TRUE and FALSE can safely be returned from any macro without being expanded.

Deferred evaluation

#define EXPAND(...) __VA_ARGS__
#define EAT(...)
#define DEFER(x) x EMPTY()
#define EMPTY()
#define COMMA ,
#define XCAT(x,y) x ## y
#define A(x,y,z) {1: x}{2: y}{3: z}
#define B 1
#define C 2
#define BC 3

A EAT(dummy, argument, list) (4,5,6)
EXPAND(A EAT(dummy, argument, list) (4,5,6))
/* A(ONE COMMA TWO COMMA THREE) */ /* syntax error */
/* EXPAND(A(ONE COMMA TWO COMMA THREE)) */ /* syntax error */
DEFER(A)(ONE COMMA TWO COMMA THREE)
EXPAND(DEFER(A)(ONE COMMA TWO COMMA THREE))
XCAT(B,C)
EXPAND(XCAT(B,C))
DEFER(XCAT)(B,C)
EXPAND(DEFER(XCAT)(B,C))

Output:

A (4,5,6)
{1: 4}{2: 5}{3: 6}
A (ONE , TWO , THREE)
{1: ONE}{2: TWO}{3: THREE}
3
3
XCAT (1,2)
12

Deferral is a way of preventing the execution of a function-like macro until a particular time by keeping it out of direct contact with its argument list. Combined with EXPAND, this can allow a function's argument list to be fully expanded before the function itself is called (normally arguments are lazy). The COMMA macro used above may instead be replaced by something more complex, like a REP macro. Deferral can also reduce the number of "helper macros." For example, the deferred XCAT in this case behaves identically to CAT.

List functions

#define EXPAND(...) __VA_ARGS__
#define DEFER(x) x EMPTY()
#define EMPTY()
#define TRUE(x,...) x
#define FALSE(x,...) __VA_ARGS__
#define OPEN(x) OPEN_I x
#define OPEN_I(...) __VA_ARGS__
#define CAR(x) CAR_I x
#define CDR(x) (CDR_I x)
#define CAR_I(x,...) x
#define CDR_I(x,...) __VA_ARGS__
#define CONS(x,y) (x,OPEN_I y)
#define RCONS(x,y) (OPEN_I y,x)
#define NIL ()
#define IS_EDIBLE(x) EXPAND(DEFER(IS_EDIBLE_I)(IS_EDIBLE_II x,FALSE))
#define IS_EDIBLE_I(x,y,...) y
#define IS_EDIBLE_II(...) ,TRUE,
#define SCONS(x,y) (x,IS_EDIBLE(y)(OPEN_I y,y))

CONS(1,CONS(2,RCONS(5,RCONS(4,(3)))))
OPEN(CONS(1,CONS(2,RCONS(5,RCONS(4,(3))))))
CAR(CDR(CDR((1,2,3,4,5))))
CONS(CONS(1,CONS(2,NIL)),CONS(3,CONS(4,NIL)))
CONS(CONS(1,2),CONS(3,4))
SCONS(SCONS(1,2),SCONS(3,4))
SCONS(*,SCONS(%,SCONS(^,&)))
IS_EDIBLE(CONS(1,NIL))
IS_EDIBLE(10)
IS_EDIBLE(+++)
IS_EDIBLE()
IS_EDIBLE(())
IS_EDIBLE((1,2,3)(7,8,9))

Output:

(1,2,3,4,5)
1,2,3,4,5
3
((1,2,),3,4,)
((1,OPEN_I 2),3,OPEN_I 4)
((1,2),3,4)
(*,%,^,&)
TRUE
FALSE
FALSE
FALSE
TRUE
TRUE

These macros are for creating and accessing a list data structure called a tuple. CAR returns the first element of a tuple, or nothing if its argument is NIL. CDR returns the remaining elements of a tuple, or NIL if its argument is a one-element tuple or NIL. CONS takes a value and a tuple and constructs a new tuple. The RCONS macro appends the first argument to the end of a tuple rather than the beginning. CONS and RCONS can be used to create other structures like binary trees, not just flat lists. IS_EDIBLE returns a higher-order boolean depending on whether its argument is edible (a list such as a tuple or arbitrary-precision number) or not (an atom or scalar such as an operator, identifier, pp-number, or empty argument). Arguments that are edible can be "eaten" with the EAT macro defined above. OPEN removes the parentheses from a tuple, allowing elements to be added or removed. SCONS is a "safe" version of CONS that detects whether or not its second argument is a tuple, and acts accordingly. This allows it to receive a non-tuple as its second argument without breaking.

Sequences and sequential iteration

#define XCAT(x,y) x ## y
#define CAT(x,y) XCAT(x,y)
#define EMPTY()
#define LPAREN (
#define RPAREN )
#define DEFER(x) x EMPTY()
#define EAT(...)
#define EXPAND(...) __VA_ARGS__
#define TRUE(x,...) x
#define FALSE(x,...) __VA_ARGS__
#define OPEN_I(...) __VA_ARGS__
#define EATSEQ_A(...) EATSEQ_B
#define EATSEQ_B(...) EATSEQ_A
#define EATSEQ_A0END
#define EATSEQ_B0END
#define HSEQ(seq) CAT(HSEQ_I seq,0END)
#define HSEQ_I(...) __VA_ARGS__ EATSEQ_A
#define TSEQ(seq) EAT seq
#define TRANSFORM(seq, ...) CAT(TRANSFORM_I_A seq,0END)(EAT, __VA_ARGS__)
#define TRANSFORM_I_A(...) (OPEN_I, __VA_ARGS__)() TRANSFORM_I_B
#define TRANSFORM_I_B(...) (OPEN_I, __VA_ARGS__)() TRANSFORM_I_A
#define TRANSFORM_I_A0END
#define TRANSFORM_I_B0END
#define RPXFRM(m, ...) m(RPAREN RPXFRM_ID)
#define RPXFRM_ID() RPXFRM
#define INFUSE(seq, ...) INFUSE_V(INFUSE_I(TRANSFORM(seq), __VA_ARGS__))
#define INFUSE_I(xfrm, ...) INFUSE_II xfrm, __VA_ARGS__ RPXFRM xfrm
#define INFUSE_II(m, ...) m(INFUSE_III DEFER(XCAT)(LPA,REN)(__VA_ARGS__), INFUSE_II_ID)
#define INFUSE_II_ID() INFUSE_II
#define INFUSE_III(...) INFUSE_IV(__VA_ARGS__)
#define INFUSE_IV(x, rest, ...) (__VA_ARGS__, OPEN_I x)() rest, __VA_ARGS__
#define INFUSE_V(...) INFUSE_VI(__VA_ARGS__)
#define INFUSE_VI(...) INFUSE_VII(__VA_ARGS__)
#define INFUSE_VII(seq, ...) seq
#define FOREACH(macro, seq) EXPAND(FOREACH_I INFUSE(seq, TRUE, macro)(FALSE,EAT,))
#define FOREACH_I(p, m, ...) m(__VA_ARGS__) p(FOREACH_I_ID)
#define FOREACH_I_ID() FOREACH_I
#define MAP(macro, seq) EXPAND(MAP_I INFUSE(seq, TRUE, macro)(FALSE,,))
#define MAP_I(p, m, ...) p((m(__VA_ARGS__))MAP_I_ID)
#define MAP_I_ID() MAP_I
#define FILTER(macro, seq) EXPAND(FILTER_I INFUSE(seq, TRUE, macro)(FALSE,,))
#define FILTER_I(p, m, ...) p(m(__VA_ARGS__)((__VA_ARGS__))FILTER_I_ID)
#define FILTER_I_ID() FILTER_I
#define REVERSE(seq) EXPAND(REVERSE_I(TRANSFORM(seq)))
#define REVERSE_I(xfrm) REVERSE_II xfrm RPXFRM xfrm
#define REVERSE_II(m, ...) m(REVERSE_III DEFER(XCAT)(LPA,REN)(__VA_ARGS__), REVERSE_II_ID)
#define REVERSE_II_ID() REVERSE_II
#define REVERSE_III(x, rest) REVERSE_IV(rest)x
#define REVERSE_IV(...) __VA_ARGS__
#define NEST(macro, seq, ...) EXPAND(EXPAND(NEST_I INFUSE(seq, TRUE, macro)(FALSE,,) __VA_ARGS__ RPXFRM TRANSFORM(seq)))
#define NEST_I(p, m, ...) p(m DEFER(XCAT)(LPA,REN) NEST_I_ID)
#define NEST_I_ID() NEST_I

HSEQ((1)(2)(3)(4)(5)(6))
TSEQ(TSEQ((1)(2)(3)(4)(5)(6)))
HSEQ(TSEQ(TSEQ((1)(2)(3)(4)(5)(6))))
REVERSE((1)(2)(3)(4))
#define A(...) **__VA_ARGS__**
FOREACH(A, (1)(2)(3)(4)(5))
#define B(n) B_##n
#define B_0 1
#define B_1 2
#define B_2 3
#define B_3 3
MAP(B, (0)(1)(2)(1)(0))
MAP(B, MAP(B, (0)(1)(2)(0)(1)))
#define C(x,...) x
FILTER(C, (TRUE, 1)(FALSE, 2)(TRUE, 3)(TRUE, 4)(FALSE, 5))
NEST(TSEQ, ()()(), (1)(2)(3)(4)(5)(6))

Output:

1
(3)(4)(5)(6)
3
(4)(3)(2)(1)
**1** **2** **3** **4** **5**
(1)(2)(3)(2)(1)
(2)(3)(3)(2)(3)
(TRUE, 1)(TRUE, 3)(TRUE, 4)
(4)(5)(6)

These also require macros from other examples:

#define LENSEQ(seq) NEST(INC, seq, (00))
#define LENLT(s1, s2) IS_EDIBLE(NEST(TSEQ, s1, s2))
#define LENEQ(s1, s2) IS_EDIBLE(NEST(TSEQ, s1, s2))(FALSE, IS_EDIBLE(NEST(TSEQ, s2, s1))(FALSE, TRUE))
LENSEQ((1)(2)(3)(4)(5)(6)(7)(8)(9))
LENLT((1),(2)(3))
LENEQ((1),(2)(3))

Output:

(9)(00)
TRUE
FALSE

A sequence is another data structure. Sequences resemble tuples placed end-to-end. Unlike a tuple, which is better for random access to elements, sequences are better for sequential access to elements where operations on previous elements do not require knowledge of the following elements (such as foreach or map operations). In tuples, these operations would require a more complicated recursion method, but for sequences they happen automatically by using mutual recursion (like the EATSEQ_A and EATSEQ_B defined above). In the Boost and Chaos documentation, this recursion method and derived methods are called sequential iteration. One example of where sequences would be the best would be for arbitrary-precision numbers. Another benefit to sequences is that they allow a structure of indeterminate length to be used in C89 and C++03 (and earlier), which don't support variadic arguments. Most of these examples use unary sequences, the most common type, but sequences can have multiple members, such as the binary sequence (1,2)(3,4)(5,6). In C99 and C++0x, a sequence can have a different number of members for each element (such as (1)(2,3)(4)), which is called a variadic sequence. TRANSFORM creates a binary transform, which simplifies finding the end of a sequence and allows iteration using only a single macro and indirection. REVERSE reverses a sequence. The INFUSE macro allows arbitrary data (such as a macro name) to be attached to a sequence so any macro can be used in looping constructs without worrying about indirection or mutual recursion. By using INFUSE, functions like FOREACH, MAP, and FILTER become incredibly simple. The predicates for FOREACH and MAP can be anything capable of accepting the specific sequence data as an argument. The predicate for FILTER is a higher-order function that returns TRUE/FALSE or other macros with similar behavior. NEST causes a macro to be applied to itself as many times as there are elements in the sequence, using __VA_ARGS__ as the initial arguments. By using an INC macro with a suitable zero value, it can be used to determine the length of a sequence. It can also be used to determine which sequence is longer by using TSEQ and IS_EDIBLE to see which runs out of elements first.

Arbitrary-precision numbers

#define INC(num) INC_I_A num
#define INC_I_A(d) INC_II_A_ ## d
#define INC_I_B(d) INC_II_B_ ## d
#define INC_II_A_0 (1)
#define INC_II_A_1 (2)
#define INC_II_A_2 (3)
#define INC_II_A_3 (4)
#define INC_II_A_4 (5)
#define INC_II_A_5 (6)
#define INC_II_A_6 (7)
#define INC_II_A_7 (8)
#define INC_II_A_8 (9)
#define INC_II_A_9 (0)INC_I_B
#define INC_II_A_00 (1)(00)
#define INC_II_B_0 (1)
#define INC_II_B_1 (2)
#define INC_II_B_2 (3)
#define INC_II_B_3 (4)
#define INC_II_B_4 (5)
#define INC_II_B_5 (6)
#define INC_II_B_6 (7)
#define INC_II_B_7 (8)
#define INC_II_B_8 (9)
#define INC_II_B_9 (0)INC_I_A
#define INC_II_B_00 (1)(00)

INC((00))
INC(INC((00)))
INC((5)(6)(00))
INC(INC(INC(INC(INC((5)(6)(00))))))
INC((9)(9)(9)(9)(9)(00))

Output:

(1)(00)
(2)(00)
(6)(6)(00)
(0)(7)(00)
(0)(0)(0)(0)(0)(1)(00)

This uses a sequence of decimal digits in little-endian terminated by (00) to store an arbitrary-precision number. The INC_II_s_n macros must be repeated because a macro cannot recurse through itself (at least without helper macros), so it needs mutual recursion. It must be terminated with (00) so the macros know when they reach the end of the number. Compared to using fixed pp-numbers like the previous INC example, this allows operations on any number of any length and only needs two macros per digit instead of a single macro per number. This last example would require 100,000 separate macros if it was implemented the other way.

Limitations and Caveats

Compared to other macro languages like m4 and TeX, there are some things that are different or difficult in the C preprocessor. Most of the limitations have to do with the distinction between directives and non-directive text and the impossibility of using directives from within macros. Except for _Pragma, which is the operator form of #pragma, none of the directives can be used in macro definitions.

  • Macros cannot be defined from within macros (no _Define analog to _Pragma).
  • It does not have built-in arithmetic outside of #if and #elif directives (but you can define your own arithmetic macros).
  • Macros that expand to their own names do not automatically cause recursion (it must instead be implemented indirectly).
  • There is no way to extract a substring (either of a token or of a string literal).
  • Concatenation that forms invalid tokens cause errors or undefined behavior, so it's impossible to allow arbitrary input to be passed to a function.
  • Function-like macros are only expanded when followed by a left parenthesis (unlike m4), which allows them to be passed around by name indefinitely without being evaluated (providing a way to do delayed evaluation).

Most of the limitations can be worked around by using special data structures (like a Boost sequence, tuple, list, or array, or a Chaos string), by reimplementing versions of the directives using functional-style macros, by creating indirect macros using concatenation, or by using lambda functions. Chaos and Order use all of these methods.

Language list