CLC-INTERCAL

From Esolang
Jump to: navigation, search

PLEASE NOTe that this article only discusses the differences with other dialects of INTERCAL: familiarity with INTERCAL or C-INTERCAL may be required to understand this article.

CLC-INTERCAL introduced a small number of new statements, types of expressions, statement variants and data structure; these extensions are listed here in no particular order.

Input/output

There are small differences in the numeric input/output processing, but these are minor and anyway interest the user of a program, rather than the programmer.

The binary input/output framework is considerably different from the one implemented in C-INTERCAL. The author of CLC-INTERCAL did not approve of the concept referred to as Turing tapes, which appeared to be a computer emulation of the mechanism found in some types of old line printers. Instead he just went for an arbitrary convention designed to make the programmer's life as difficult as possible.

Binary input/output only works using hybrid registers. The register must be dimensioned before the READ OUT or WRITE IN statement, and is processed one element at a time, with multidimensional arrays being "flattened" to provide a unidimensional stream of elements. When WRITING IN, the number of elements in the array determines the number of bytes to be obtained from standard input. The special "beginning of stream" marker, #172, followed by all these bytes is then fed into the garbling algorithm, which produces the array elements. If an end of file is determined before enough bytes have been written in, a zero is stored in the corresponding array element. Note that the garbling algorithm will never produce a zero when processing valid bytes, so there is a simple way to test for this condition.

The garbling algorithm takes two bytes at a time, with overlap. In other words, the first array element is filled by garbling #172 with the first byte; the second array element by garbling the first two bytes, and so on. The easiest way to explain the algorithm is to assume that .1 and .2 contain the two bytes, then the following expression produces the value to be stored in the array element:

   DO :1 <- '.2~.1'¢"'"¥'#65535¢.2'"~"#0¢#65535"'~'"¥'#65535¢.1'"~"#0¢#65535"'"

When READING OUT, the same process is applied in reverse, in the obvious way.

Input/output is also possible with tails, but this selects the special Alphanumeric mode, rather than binary. This involves conversion to and from something called extended Baudot, which is essentially a five bit code perverted to handle the full complement of letters (both upper- and lower-case), digits, punctuation. On WRITING IN, a line is obtained from standard input and stored in the array after conversion. You get an error if there aren't enough array elements to do this. Note that the use of extended shift codes in Baudot means that the number of elements required to store a string may be up to three times the number of characters. When READING OUT, the conversion is reversed and a newline appended at the end.

Baudot "works" by having two separate character sets, one coding letters and one coding digits and punctuation; two special codes shift from one to the other: 27 selects the digits/punctuation, and 31 selects the letters; some codes are common between the two sets, for example carriage return, line feed and space. To encode a string containing letters and numbers, one just uses shift codes where appropriate: INTERCAL-72 is coded as #27 (shift to letters) #6 (I) #12 (N) #16 (T) #1 (E) #10 (R) #14 (C) #3 (A) #19 (L) #31 (shift to digits) #3 (-) #7 (7) #19 (2). Note that without the shift codes one would not know if #19 represented "A" or "2".

The extended Baudot used in CLC-INTERCAL uses four, rather than two, character sets: uppercase, lowercase, figures, symbols; the shift codes change to one or the other set depending on the set already in use: #27 shifts to figures from uppercase/lowercase but to symbols from figures/symbols; #31 shifts to lowercase from uppercase/lowercase but to uppercase from figures/symbols. It is possible to shift to a specified set even without knowing what set you are starting from using two consecutive shift codes: uppercase, #27 #31; lowercase, #31 #31; figures, #31 #27; symbols, #27 #27.

The choice of Baudot as a starting point was determined by its obvious benefits: non consecutive coding (the digits from 0 to 9 are encoded as #22, #23, #19, #1, #10, #16, #21, #7, #6, #24 respectively; the letters are also all over the place); ambiguous coding (if you start a sequence without a shift code you don't know what you will get) etc.

CLC-INTERCAL also has a modified EBCDIC character set, but this is not used internally. It is possible to provide program source in EBCDIC, mainly for compatibility with the original INTERCAL-72. Most people tend to use ASCII out of habit.

Computed labels

CLC-INTERCAL introduced a form of computed COME FROM. At the time, the INTERCAL community spoke quite loudly against this statement, but it now appears to be incorporated into C-INTERCAL as well, so I guess this concept has now been accepted in the mainstream.

However, just to be awkward, CLC-INTERCAL also supports computed ABSTAIN FROM and REINSTATE, computed NEXT, with the obvious meaning.

CLC-INTERCAL 1.-94 went even farther, with COME FROM gerund (similar to ABSTAIN FROM gerund). I personally never used it, apart for writing one small test program to be included with the distribution, but I am sure somebody will find a use for it.

A similar extension allows templates wherever gerunds can be used. For example, if I say DO ABSTAIN FROM ABSTAINING FROM + REINSTATING, I can no longer ABSTAIN FROM anything, but templates give me a finer control, maybe I can say:

   PLEASE ABSTAIN FROM ABSTAINING FROM + REINSTATE LABEL
   PLEASE REINSTATE REINSTATING

Now, REINSTATING has been fully REINSTATEd. This is because the ABSTAIN FROM did not touch any REINSTATE statement which used gerunds: only statements using labels are affected.

Finally, the compiler fully supports the use of computed labels everywhere, even in the "label" part of a statement. However, in the interest of sanity (???), this feature is disabled in the compiler as distributed: a comment in the source code marks a single line which, when reinstaned, will allow a monster like:

   (.1)  DO .1 <- #666
   (666) DO .1 <- #2
         PLEASE GIVE UP
         DO NEXT FROM (666)
         PLEASE READ OUT .1
         DO RESUME #1

This READs OUT DCLXVI and II, for obvious reasons.

NEXT FROM and duplicated labels

The new NEXT FROM statement has been introduced to replace NEXT. This behaves just like a COME FROM, and has the same syntax; however, the return address is saved as it would be done by the NEXT statement. To assist programmers wanting to implement subroutines, it is now possible to have duplicate labels, as in this fragment:

   (666) DO .1 <- #1
   (666) DO .1 <- #2
         PLEASE GIVE UP
         DO NEXT FROM (666)
         PLEASE READ OUT .1
         DO RESUME #1

This just READs OUT I and II, but it is clear that overuse of NEXT FROM and repeated labels can be a great obfuscation tool.

Arbitrary data structures

This article is a stub, which means that it is not detailed enough and needs to be expanded. Please help us by adding some more information.

This section would describe the BELONG TO relation, the ENSLAVE and FREE statements used to build it, and the "ownership path" in register names. Please feel free to write it, otherwise I'll get back to this some day.

See the External resources section below for a link to the CLC-INTERCAL online reference, which may provide some information about this.

Object orientation

This article is a stub, which means that it is not detailed enough and needs to be expanded. Please help us by adding some more information.

CLC-INTERCAL also introduced some sort of class concept, althouth the idea is to use "Classes and lectures" rather than "Classes and objects" as more commonly done.

The following statements have something to do with object orientation:

  • ENROL
  • FINISH LECTURE
  • GRADUATES
  • LEARNS
  • STUDY

See the External resources section below for a link to the CLC-INTERCAL online reference, which may provide some information about this.

Quantum INTERCAL

This section only describes the new syntax used to support Quantum programs: see Quantum INTERCAL for an introduction to the subject.

 DO (register) <- (expression) WHILE NOT ASSIGNING TO IT
 DO ABSTAIN FROM (label/gerund/template) WHILE REINSTATING IT/THEM
 DO COME FROM (label/gerund/template) WHILE NOT COMING FROM THERE
 DO CONVERT (template) TO (template) WHILE LEAVING IT UNCHANGED
 DO CREATE (grammar) (symbol) (template) AS (assembler) WHILE NOT CREATING IT
 DO DESTROY (grammar) (symbol) (template) WHILE NOT DESTROYING IT
 DO ENROL (register) TO LEARN (list of expressions) WHILE NOT ENROLLING
 DO ENSLAVE (register) TO (register) WHILE LEAVING IT FREE
 DO FINISH LECTURE WHILE CONTINUING IT
 DO FORGET (expression) WHILE NOT FORGETTING
 DO FREE (register) FROM (register) WHILE LEAVING IT IN SLAVERY
 DO GIVE UP WHILE CONTINUING TO RUN
 DO (register) GRADUATES WHILE REMAINING A STUDENT
 DO (register) LEARNS (expression) WHILE NOT LEARNING IT
 DO IGNORE (list of registers) WHILE REMEMBERING IT/THEM
 DO (label) NEXT WHILE NOT NEXTING
 DO NEXT FROM (label/gerund/template) WHILE NOT NEXTING FROM THERE
 DO REINSTATE (label/gerund/template) WHILE ABSTAINING FROM IT/THEM
 DO REMEMBER (list of registers) WHILE IGNORING IT/THEM
 DO RESUME (expression) WHILE NOT RESUMING
 DO RETRIEVE (list of registers) WHILE NOT RETRIEVING IT/THEM
 DO STASH (list of registers) WHILE NOT STASHING IT/THEM
 DO STUDY (expression) AT (label) IN CLASS (register) WHILE NOT STUDYING IT
 DO SWAP (template) AND (template) WHILE LEAVING THEM UNCHANGED
 DO WRITE IN (list of registers) WHILE NOT WRITING IT/THEM

Where the above list specifies IT/THEM, their use is completely equivalent, even when it results in bad English. A future version of the compiler may change that.

Note that there is no quantum READ OUT. This is because the current implementation has no control over the superposition of states of the rest of the universe.

Compiler modification at runtime

CLC-INTERCAL allows the running program to modify the compiler used to compile it. The result is that the program will be recompiled "on-the-fly" and a different program may continue running. The new statements introduced to achieve this are CREATE, DESTROY, CONVERT and SWAP.

The CREATE statement allows to extend the compiler. This may cause comments to suddenly become meaningful. The compiler in version 1.-94 of CLC-INTERCAL is actually implemented this way: a long list of CREATE statements is prefixed to the program being compiled, and the result is executed to see what happens. A special statement causes the termination of the compile-time execution while saving the state, which then becomes the fully compiled program. This special statement can only be used when writing compilers and does not work in unormal programs. This is because the DESTROY (see below) statement removes it just before the end of the compiler, so it stops working at this point.

The DESTROY statement undoes any effect of the corresponding CREATE statement. For example, the following program produces the splat "DO IGNORE .1" because this statement becomes a comment:

   PLEASE DESTROY _1 ?VERB ,IGNORE, ?NAMES ?Q4
   DO IGNORE .1
   DO GIVE UP

Note that the template provided to the DESTROY statement must be identical to the one used to CREATE it; in this case, sick.iacc, the CLC-INTERCAL compiler compiler compiler contains the line:

   DO CREATE _1 ?VERB ,IGNORE, ?NAMES ?Q4 AS GER + #4 + ?Q4 #1 IGN + #0 + !NAMES #1 + ?NAMES #1

The precise syntax of CREATE and DESTROY has never been documented and I am not about to change this here. The compiler's source code provides a large number of examples.

By contrast, CONVERT and SWAP are a lot simpler. For starters, they use more human-friendly templates so you can use them even if you are not familiar with the compiler's internals. CONVERT makes one statement behave like another one, provided the two are "compatible", in the sense that they use the same elements. For example, ABSTAIN FROM LABEL is a template which specifies a particular type of ABSTAIN FROM; this is "compatible" with "LABEL NEXT", another template. So if you say:

         PLEASE CONVERT ABSTAIN FROM LABEL TO LABEL NEXT
         DO ABSTAIN FROM (666)
         PLEASE GIVE UP
   (666) DO .1 <- #1
         PLEASE READ OUT .1
         DO RESUME .1

The program will READ OUT the number I. This is because, by the time the ABSTAIN FROM gets executed, it is no longer an ABSTAIN FROM but a NEXT. Note that the original meaning of ABSTAIN FROM LABEL is now lost and inaccessible to a program, although one can still ABSTAIN FROM GERUND or ABSTAIN FROM TEMPLATE.

The SWAP statement is similar to CONVERT, but the two meanings are swapped. So, for example:

         DO NOT TRY THIS AT HOME
         PLEASE SWAP ABSTAIN FROM LABEL AND LABEL NEXT
         DO (1) NEXT
   (1)   DO ABSTAIN FROM (666)
         DO ABSTAIN FROM (1)
         PLEASE GIVE UP
   (666) DO .1 <- #1
         PLEASE READ OUT .1
         DO RESUME .1

This does not READ OUT anything; in fact it eventually produces an error when the NEXT stack overflows. The "DO (1) NEXT" is really executed as an "ABSTAIN FROM (1)", so the next statement is not executed at all; however, the statement after that IS executed, and by now it behaves like a "DO (1) NEXT".

Loops and events

Loop and events have a common syntax, looking like "DO (condition) WHILE (body)". The difference between them is that loops use a statement as condition, events use an expression; both use a statement for the body.

In loops, the condition (first) statement determines the total duration of the action, the body (second) statement is executed again and again to fill in the time available. For example:

     DO .1 <- #1
     DO (1) NEXT WHILE READ OUT .1
     PLEASE GIVE UP
 (1) DO .2 < #2
     DO READ OUT .2
     PLEASE RESUME #1

This program outputs the number II once; it also outputs the number I, but we don't know how many times. This is because the READ OUT .1 keeps being executed again and again until the subroutine returns - the "execution time" of the NEXT is considered the time it takes to run the subroutine and return. Note that this construct can be used to create threaded programs quite independently of the Threaded INTERCAL framework, which is also supported by CLC-INTERCAL in a special compatibility mode.

Some people have expressed dismay at the fact that the loop condition preceeds the body. For this reason, a special version of the CONVERT statement allows to change this in your program:

   DO CONVERT BODY WHILE CONDITION TO CONDITION WHILE BODY

After this, you use the syntax "DO (body) WHILE (condition)" and it works like the above. For completeness, you can also do the opposite conversion, and even swap the two meanings:

   DO CONVERT CONDITION WHILE BODY TO BODY WHILE CONDITION
   DO SWAP BODY WHILE CONDITION AND CONDITION WHILE BODY
   DO SWAP CONDITION WHILE BODY AND BODY WHILE CONDITION

As mentioned, events differ from loops because the condition is an expression, rather than a statement. If the condition can be executed, nothing else happens (but the expression may have side effects such as overloading so be careful). If the expression cannot be executed, the statement is stored somewhere and executed as soon as possible. For example:

   PLEASE DO ,1 SUB #2 WHILE READ OUT .2
   DO .2 <- #9
   DO ,1 < #1
   DO .2 < #6
   DO ,1 < #2
   PLEASE GIVE UP

Will output the number VI. The initial event cannot be executed because the expression (,1 SUB #2) tries to subscript an array which has not been dimensioned yet. So the event is stored for future reference. The event still cannot be executed after the first time ,1 is dimensioned, because the subscript #2 used in the event is out of range; however, as soon as the array is redimensioned the event becomes executable and will READ OUT .2 which, at that point, contains #6.

Note that the keyword WHILE is used for loops, events and quantum statements. This may cause some minor confusion when people are attempting to decypher INTERCAL source code.

There is not, at present, a Quantum loop or a Quantum event, because unfortunately I cannot figure out what they would be supposed to do.

Operand overloading

Operand overloading in CLC-INTERCAL was inspired by similar proposals which have been circulating a few years ago (see External resources section below). However, I've got carried away and ended up doing things far worse than the original proposal, such as ending up with variable constants. Two new binary operators were introduced, a slat representing overload of a single register and a backslat representing overload of a range. Both evaluate to their left operand.

Overload register has the form REGISTER/EXPRESSION, where REGISTER can be any register or an array element: for example ,1/#2 or ',2SUB#1'/',2SUB#2' are both valid overloads. The side effect of evaluating this expression is that from now on whenever you mention the REGISTER you mean the EXPRESSION instead. After evaluating ,1/#2 you can use ,1 as you would use a constant #2, in particular you can DO .1 <- ,1 and this behaves the same as DO .1 <- #2 - also note that subscripting ,1 is now an error. The other example, ',2SUB#1'/',2SUB#2' simply makes the first two elements of ,2 look identical.

Overload range is similar but takes two expressions, where the left one represents a range of registers, which are all overloaded with the same expression, the right argument to the backslat. The left expression is interpreted as the interleave of two values, representing the first and last register to be overloaded. For example, after #5/.1 all the following registers are overloaded to .1: .1 .2 ,1 ,2 :1 :2 ;1 ;2.

Within the overload expression (the right argument to slat or backslat) it is possible to use the special register .0: this is temporarily enslaved to the register being overloaded: this is most useful with an overload range so you can find out which register the program actually requested. Also, within the overload expression the register being overloaded (that is $.0) is available without overloading: this prevents overloading loops. It also means that after evaluating .1/.1 the register .1 is guaranteed to be free from overloads. You can figure out how to "un-overload" all registers using the backslat.

CLC-INTERCAL departs from other overloading proposals in the way assignment is handled: assigning to an overloaded register assigns to the corresponding expression. For example, after DO .1 <- .1/.2 the value of .1 is unchanged but the original value of .1 is assigned to .2: this is because the expression causes the overload as side-effect and returns .1; then the assignment assigns this to .2 (the overload expression corresponding to .1). Of course, if a register is overloaded to something more complicated than a simple register, the effects of assignment can be more interesting. For example:

   DO .1 <- .2/.&1
   DO .2 <- #4

The (un-overloaded) value of .2 is unchanged, but .1 now contains #12. This is because #&12 is #4 and the assignment really assigns #4 to .&1, so the only way to make things work is to really assign #12, not #4, to .1.

I trust the above example has made everything crystal clear, so let me now move to the most confusing feature of overloading:

   DO .1 <- .2/#1
   DO .2 <- #3
   DO .1 <- #1

What happens here? The first assignment introduces overloading of .2, so the second assignment assigns #3 to #1. In other words, from now on every time you say 1 you really mean 3. We have changed the value of a constant, but not just that. Consider the third assignment: this assigns #3 to .3, not #1 to .1, for obvious reasons. This can be a great obfuscation tool.

STASHing a register also stores its current overload status, and RETRIEVing it restores the overload as it was at the time of last STASH, so for example:

   DO STASH .1 + .2
   DO .1 <- .2/#1
   DO .2 <- #3
   PLEASE RETRIEVE .1 + .2

would assign #3 to #1 leaving registers .1 and .2 completely unchanged (and presumably free from overloads). CLC-INTERCAL 1.-94.-4 (which I am working on at the moment) allows to achieve the same effect by assigning directly to an expression, so the above can now be simplified to:

   DO #1 <- #3

It is not currently possible to STASH and RETRIEVE constants, so be careful assigning to them.

It is recommended to avoid overloading in programs intended to be readable.

And now, for another example, a bit of code which swaps two constants:

   PLEASE STASH .1 + .2 + .5
   DO .5 <- .1/#3
   DO .5 <- .2/#4
   DO .5 <- .2
   DO .2 <- .1
   DO .1 <- .5
   PLEASE RETRIEVE .1 + .2 + .5

Now the value of #4 is 3 and the value of #3 is 4; also if you use .4 you mean .3 and if you use :3 you mean :4 etc.

External resources