Talk:Clue (Keymaker)

From Esolang
Jump to navigation Jump to search

Clue might well be TC, since it is rather similar to Self BCT (which I suspect is TC, but with no proof). In the authorship section on the BCT page, there's a link to a usegroup message where I had asked about TCness of Self BCT, without reply. (Great work, btw, on those Clue graphics -- the execution traces are beautiful.)

-- r.e.s. 23:15, 27 September 2009 (UTC)

It would be great if it were TC, and I don't think it's an impossible thought. I mainly added the line about my disbelief to sneakily get someone to correct me on the matter. :D Anyway, I was not aware of Self BCT while making Clue -- and they are still quite different, I think. I wonder if the direction of adding and removing data has anything to do with the computational abilities of these languages. Also, in this kind of languages, is an instruction for removing data necessary? Could it be replaced with different functionality, such as increasing the program in some different way than the other instruction does? (I haven't really read anything or inspected this type of languages, so I likely don't know the 'obvious' details.) And yeah, the visualizations kept me awake all last night, they are interesting... --Keymaker 00:48, 28 September 2009 (UTC)
Please take my speculations with the proverbial grain of salt — a probablistic argument might show that Clue isn't TC after all, but I'm not sure. I'm sure of almost nothing about these languages, because the self-modifying nature of the programs makes matters so complicated. Just out of curiosity ... Which Clue program, among those you've observed to terminate, required the the greatest number of steps to delete itself, and how many steps was that?
-- r.e.s. 04:40, 28 September 2009 (UTC)
I had damn hard time getting any of them halt! I tried many Clues (=a Clue program) but the largest I came across was the 295 cycles of 1110001100011101011 (295.cue on my site). My search was not any systematic (I made the aforementioned program semi-randomly pressing keys), however, I'm going to work more on trying to find larger ones, or to try to find some pattern that could (possibly) make such programs. --Keymaker 11:27, 28 September 2009 (UTC)
I did something similar when looking for long-but-terminating execution traces for Self BCT "programs". I recall using all of my computer's resources for hours just to see if some traces finally terminated; e.g. the 10-bit program that did so in 43,000+ steps. What worked best for me in Python was to just watch and see if the trace eventually began to repeat some pattern, at which point I knew it would not terminate. I strongly suspect that many of the shorter-than-ten-bit programs would in fact terminate, but my resources were not sufficient to observe them long enough. That's the intriguing difficulty: distinguishing between a nonterminating sequence whose "pattern" is too complex to discern versus one that eventually terminates after possibly an inconceivably large number of steps.
-- r.e.s. 15:47, 28 September 2009 (UTC)
I ran program "10" for several hours the other day (and terminated it myself). It did not crash my Perl interpreter (which one program that expanded fast did, in relatively short time -- 'out of memory'), so I think it formed a periodic pattern of some sort. Or began to increase or decrease very slowly, but as there are so immense amount of cycles I don't know how I could figure anything out. Take one state from the output and modify the interpreter to check if such state occurs again? --Keymaker 12:00, 30 September 2009 (UTC)
I just remembered that some questions like yours were discussed at the XigXag Talk page. (I'd forgotten that Self BCT was also mentioned there.) XigXag is called a string-rewriting automaton, a term that could also describe Clue and Self BCT, I suppose.
-- r.e.s. 15:28, 28 September 2009 (UTC)

I would not call Clue (as well as Self BCT) a language, rather a machine with a complex behavior. Language usually implies semantics for its programs. Inability to write a program makes it difficult (if not impossible) to find out the computational class. I am pretty sure that if one can program in it, then Clue would be TC. I wonder how different the pictures would be, if the position of bits is fixed. In this case the string evolving always moves to the left. --Oleg 04:18, 28 September 2009 (UTC)

I would call this a language. Its behaviour is fully defined (if that's what you mean with semantics, I don't know these terms so well). One can program in Clue... It is not known, however, what its computational capabilities are. Many minimalistic systems have been proven to be Turing-complete (and are widely acknowledged being so), and it has not required people to write complex programs in them. It's a matter of finding certain mechanisms. (Don't ask me what they are, in this case!) Even if one could program in it using more traditional programming concepts, that would not instantly make it TC. --Keymaker 12:12, 28 September 2009 (UTC)
Not instantly. That was only my guess. --Oleg 00:35, 29 September 2009 (UTC)
Sorry, I composed my reply to Oleg (below) before noticing yours. The terminology seems especially messy because of what I called the "double-duty" that our programs are doing. As I see it, they're expressions that are supposed to represent program+data in some as-yet unspecified way that will define input and output, giving computational meaning to our "execution traces". From this viewpoint, the interest is in whether there exists such a way to complete the semantics so that the resulting programming language TC.
-- r.e.s. 15:07, 28 September 2009 (UTC)
(To Oleg) In my opinion, that's an excellent point about the semantics being (as-yet) incompletely specified for these languages. Bear with me, and I'll try to give a concise explanation of what's missing, and why this is an intentional situation allowing the TC question to be posed a certain way ...
To have a properly defined semantics for these languages, to make them indeed "programming" languages, it's necessary to give a computational meaning to our "execution sequences", say (E, t1E, t2E, t3E, ...), where E is a starting word ("Expression") in the language, and t is the transition function that we've already defined for each language. What we're calling a "program" E is doing double-duty as program+data, and might better be called an initial "configuration" or "system state" that evolves as the transition function is iterated. Such an expression is supposed to encode program+data, say E = enc(P,D), where P and D are program and data in some ordinary programming language, and enc() is an as-yet unspecified encoding function. Also as-yet unspecified is an output decoding function, say dec(), such that output = dec(E, t1E, t2E, t3E, ...). When these enc/dec functions are specified, then we can actually "program in" the language. So, from a theoretical viewpoint, the question is whether there exist enc/dec functions that make the resulting programming language TC. It seems extremely likely that, to give TCness to the resulting programming language, the enc/dec functions would be extremely complicated. (Although it would be another issue, it might be interesting to explore what sorts of nonTC programming languages result from not-so-complicated enc/dec functions.) We can say that each of these languages defines a class of programming languages, each programming language corresponding to a specific choice of enc/dec functions.
-- r.e.s. 13:20, 28 September 2009 (UTC)
Brainfuck is an interesting example. There are enc/dec functions which are used in the TC proof. However they are not used for programming. Some different enc/dec functions are in the heads of brainfuck programmers allowing expressing different computation models in brainfuck code. The important fact is that these enc/dec’s exist. My comment about a language was informal, obviously, because the semantics is the behavior of the abstract machine, which is defined in Clue. --Oleg 00:35, 29 September 2009 (UTC)


I think your interpreter has a problem with trailing newlines in programs. It is awkward to create files which don't have them, and even more inconvenient if you're trying to use stdin. To fix this, I changed the beginning of the big if statement to

if($clue =~ /^([01]*)$/){
  $clue = $1;

Was your 10 example above affected by this bug? --Ørjan 17:36, 30 September 2009 (UTC)

Damn, you're correct... I'm new to Perl and seemingly my regular expressions weren't perfect -- I was especially trying not to let the interpreter allow lines with a new-line in them. What do ( ) change in the expression? And this still allows one new-line in the file (but the new-line doesn't appear in the program data anymore)! I fixed the interpreter with these lines. Luckily I've ran all my programs without having any new-lines in them -- all but "10", where I just noticed had been a new-line... I'll have to run it again, sometime... --Keymaker 21:46, 30 September 2009 (UTC)
The () captures the part inside in a numbered variable, $1 in this case, and in the next line I set $clue to that. The tricky bit here is that the regexp assertion $ matches not just at end of string, but also before a newline just before the end of string. This is probably for convenience when analyzing a line which may or may not have a final newline, like what you get from <FILEHANDLE> expressions. For when you really need it, there's a special \z assertion which matches only at end of string.
Testing this, your original 10\n seems to be equivalent to 100 after a few steps, as one would expect since your program tests characters by comparing explicitly to "1". --Ørjan 00:17, 1 October 2009 (UTC)

Probabilistic musings

Since I cannot see any way yet to long-time preserve any real structure in Clue programs, I tried to think about the one structure that does exist, namely the proportion of 0s and 1s in the program. Let's assume that this proportion tells everything important about a program and that the order of 0s and 1s is otherwise essentially random.

Letting p be the probability that a single character is 1. If p > 1/2 then there are more 1yz commands than 0 commands, and so one run through the program will tend to increase its length, while p < 1/2 will tend to decrease it.

More importantly, the probability that a 1yz command produces a new 1 is 1-p2. If we assume that running the program long enough stabilizes the proportions, then we should reach a solution to the equation

p = 1-p2

or, since 0 <= p <= 1,

p = -1/2 + sqrt(5)/2 ~= 0.618033988749895 (the golden ratio!)

Trying the programs 10 and 100, running for 10000 steps gives final proportions of

1629/(1030+1629) ~= 0.612636329447161 and
1406/(890+1406) ~= 0.612369337979094

which seems pretty close for a short test. Although 100000 steps of 100 gives something a bit further off again:

25413/(14596+25413) ~= 0.635182084031093

In any case as long as the proportion stays close or even larger, I would think the programs should grow pretty fast, unless some real structure appears. But then I would still expect any remaining fast-growing parts to swamp that out again. --Ørjan 13:25, 1 October 2009 (UTC)

I don't know why, maybe I messed something up first time, but rerunning 100 100000 times again now gives

1-12468/35427 ~= 0.648065035142688

Also, with this theory, on average each command should give

1*p + (-1)*p = 2*p - 1 ~= 0.23606797749979

in character growth. This fits reasonably well (not as well as the proportions though) with the 10000 step calculations (2659 and 2296), but the 100000 one seems much more off. Something fishy? --Ørjan 16:26, 1 October 2009 (UTC)

Your analysis can be refined a little bit by noting that at the end of each pass through the string the new string will consist of the old one minus a part deleted from the right, plus a part added to the left. That is, it will be a new left-hand part with new proportions of 0s/1s, while the remainder of the old string still has the old proportions. Then the new overall 0/1 proportions for the string will be a combination of these new and old ones for the two parts. Assuming a fixpoint for "old overall proportion of 0s" = "new overall proportion of 0s" (it seemed easier to use 0s), I get a cubic equation in q = 1- p, i.e., 2q^3 - 16q^2 + 20q - 5 = 0, which has only one root in the interval [0,1], namely q* = 0.337... (p* = ~ 0.663). That would seem to agree reasonably well with your result of ~0.65, since there's some "sampling error" involved. It would be interesting to look at the trace for the 100000 steps, and check the string before and after the last pass to see the lengths and compositions of the parts deleted and added.
-- r.e.s. 16:56, 2 October 2009 (UTC)

I don't understand how you can get an equation which does not also have the golden ratio as a solution for p, since that proportion by my previous thinking means the new and old parts will have the same proportions p, and so will their combination. Let me try to derive a similar equation. Assume the pass runs x commands and the initial proportion of 1s = p.

Run length = x(3p + (1-p)) = x(2p + 1)
Deleted length = x(1-p)
Added length = xp
Added part proportion = 1 - p2
Total new length = x(2p + 1) + xp = x(3p + 1)
Total new 1s = px(2p + 1) + xp(1 - p2) = xp(2p + 2 - p2)
Total new proportion = xp(2p + 2 - p2) / (x(3p + 1))   (assume x not 0)
 = p(2p + 2 - p2) / (3p + 1)
Equation for old and new proportions being equal:
 p = p(2p + 2 - p2) / (3p + 1) <=> (ignoring p = 0)
 2p + 2 - p2 = 3p + 1 <=>
 1 - p2 = p

or the same (golden ratio) as my previous calculation. --Ørjan 19:05, 2 October 2009 (UTC)

That is indeed the refinement I had in mind, and your calculation looks to be error-free. (I'd made an error in my calculation, so I've lined out my comment above about the result being different from yours.)
-- r.e.s. 13:36, 3 October 2009 (UTC)

There is an alternative treatment based on a known number theoretic function called Sum-of-Digits

Some people consider this to be a noisy one but it is not at all so. After close inspection one discovers a clear recursive structure which can be summarized in the form of an iteration like

{0} -> {{0}, 1} -> {{0,1},1,2} -> ... : S_{n+1} = {Sn, Sn+1}

This is because all binaries have a maximal non-zero digit under the envelope of a logarithm_2 curve adding ones in every new exponential interval. Thus one can in principle know beforehand the "probabilities" for all possible initial conditions of the Clue recursive map in any interval given an upper string limit. But these are not treated as probabilities now because themselves can be derived as an exact recursive function over all the integers unless one wants a random source providing the input strings. The point of this is that instead of randomly assessing the structure of the particular Clue map it is better to compress its complexity in some simple number theoretic functions.

For instance, if one takes an arbitrary interval [0, ..., 2(L+1)-1] than one can immediately write the action of the Clue map (as a dynamical system) in the corresponding binary representations of fixed length L as follows:

"0" instruction: All even integers in the lower half interval [0,...,2(L)-1]are fixed points under a permutation which brings the pointer at the start.

"1" instruction: All odd integers in the upper half interval [2(L),...,2(L+1)-1] under the same permutation are mapped to the lower half interval.

Then if at any stage of the map we have a total integer Xn with a binary string representation a0,a1,a2..,ak where k = floor[log2(Xn)]+1 the total map could be written like

X(n+1) = Clue(Xn):= (1 - a(j))*[Xn - a(k)*2k] + a(j)*[1 - a(j+1)*a(j+2) + 2*Xn]

In the above the "pointer" index j just have to satisfy so called "periodic boundary conditions" just like in ordinary 1-D Cellular Automata but with a variable length. To make this a completely autonomous dynamical system without any reference to the underlying bit pattern one can use the "Bit extractor" function a(i) = mod(floor[Xn/2i], 2).

I am not entirely sure whether removal of "0" from the end of strings is necessary for a similar dynamics and most probably such a class of dynamical self-rerwiting systems could be generalized into perhaps unforseen complexity. A most compact way of generalizing bifurcations likethose produced by the combined action of the modulo and the cyclic index above would of course be the introduction of an equivalent map in the complex plane via the notion of Riemann Sheets in nested Power Towers but this goes way beyond this short notice here.

--Theriel 17.00 UTC 5-5-2015

Error in Example ?

Unless I'm really missing something, it seems that there is an error in the example on the main page; I think the program pointer is not advanced to the right spot on the second line:

1x 1 0 (1 and 0 is NANDed, the resulting 1 is placed in the beginning)
1 1 1x 0 (1 NAND 1, 0 added in the beginning)
     ^ -- correct pointer location
 ^ ------ pointer position from main page
0 1 1 1 0x (the last character is removed, thus the 0 removes itself)
0x 1 1 1 (the last character, 1, is removed)
0 1x 1 (1 NAND 0, 1 is added in the beginning -- and so forth...)

--Nthern 21:10, 2 September 2010 (UTC)

No, the example is correct. For instruction 1, the program pointer doesn't just skip the instruction, but also its input. --Ørjan 23:39, 2 September 2010 (UTC)
Ah, so it works like this:
1x 1 0 _
 ^- - - - current PL
   ^- - - 1st arg to this instruction
     ^- - 2nd arg to this instruction
       ^- New PL after this instruction
1 1x 1 0 _
^-------- - Execution of the instruction adds a 1 to the beginning
         ^- New PL after this instruction
1x 1 1 0
 ^-------- New PL is past the end of the program string, so it wraps around
1x 1 1 0 (1 NAND 1, 0 added in the beginning)
0 1 1 1 0x (the last character is removed, thus the 0 removes itself)
0x 1 1 1 (the last character, 1, is removed)
0 1x 1 _ _
   ^- - - - current PL
     ^- - - 1st arg to this instruction
       ^- - 2nd arg to this instruction
         ^- New PL after this instruction
^---------- 2nd arg is past the end of the program string, so it wraps around
0 0 1x 1 _ _
^---------- - Execution of the instruction adds a 0 to the beginning
           ^- New PL after this instruction
0 0x 1 1
   ^-------- New PL is past the end of the program string, so it wraps around
... and so on
So the program string acts like it is circular, but the beginning is remembered for inserting the results of a 1 instruction. After a 1, the PL is advanced 3 locations regardless of what executing the 1 did in front of (circularly speaking) it. --Nthern 13:07, 3 September 2010 (UTC)

A small mistake in your tracing:

0 1x 1 _ _

would add 1 in the beginning, since 1 NAND 0 is 1. Anyway, the easiest way to think of the 1-instruction execution is to first move the instruction pointer twice (so that it now resides in B (of input)), then add the new NANDed (A NAND B) value in the beginning of the string, still keep the instruction pointer where it is (in implementations, this often means increasing it by one at this point), and then move the instruction pointer once more to get to the next instruction. Like this:

0 1x 1   1-instruction encountered...
0 1 1y   ...pointer moved here, store 1 to A...
0z 1 1   ...pointer moved here, store 0 to B...
1 0v 1 1 ...1 NAND 0 is 1 was added it in the beginning, pointer still where B was taken from...
1 0 1x 1 ...pointer moved forwards, another 1-instruction encountered...

--Keymaker 02:35, 22 September 2010 (UTC)

Brute forcing

I'm running some brute force scans right now ... here's a teaser 010101101011101001100 Rdebath (talk) 19:15, 10 November 2013 (UTC)

Results so far:

  • 1..14 bits search sequences to 2000000000 steps, maximum termination is at 618 steps
  • 1..28 bits search sequences to 100000 steps, maximum termination is 8428 steps

Each search run is a proper subset; the ones with longer runs are not finding more terminations. But 111001001100 is an interesting one, completely stable with a short repeat. That one and it's eight equivalents are the only ones that reach 2000000000 steps with a string of under 200MB; so far.

Rdebath (talk) 08:01, 12 November 2013 (UTC)

Thank you very much for this research. This is just great! This makes me think again that this language could have some kind of controllable/predictable computational power -- not that I have ideas how one'd go about programming it or reading its states, but perhaps something might be found by studying the finite formations... I need to think. And I'm happy to see a periodic pattern was found -- it's another thing that makes me feel there could be something to the computational capabilities. A stable loop. The Clue wiki page needs editing and fast. Will you add some of your results there, such as programs that create those long finite formations? How about your software, are you going to release it at some point? Thanks again and if you find more interesting things, be sure to write about them. --Keymaker (talk) 09:14, 12 November 2013 (UTC)
"more interesting things" as in "more of these interesting things". This is all interesting enough. :) --Keymaker (talk) 14:42, 12 November 2013 (UTC)

Here's a few more results then.

Bits   How     How     Code
       Many    Long
1      1       3       '1'
2      2       2       '00'
3      2       3       '000'
4      4       4       '0000'
5      4       5       '00000'
6      8       6       '000000'
7      4       9       '0011100'
8      2       16      '11001110'
9      1       557     '101111101'
10     8       572     '0100010000'
11     1       609     '11011011010'
12     8       574     '010001001000'
13     1       613     '1101101101001'
14     8       618     '01101100010000'
15     48      2629    '011011101000000'
16     48      2630    '0110111010000000'
17     32      2643    '10101000011110000'
18     1       1014    '111110111111100111'
19     192     2651    '0100001111000000000'
20     384     2652    '01000011110000000000'
21     12      3973    '010101101011101001100'
22     72      2670    '1101000011101111110000'
23     6       2813    '10011111111111110011110'
24     12      3954    '011011110110110011010010'
25     36      2815    '1001111111110100100111100'
26     324     4802    '10111010001001111111001000'
27     648     4803    '101110100010011111110010000'
28     1152    8428    '0100110001000100100111100000'

Any yes, I do mean there are over a thousand (related) codes with a run of 8428. Which I suspect is another good sign.


This pattern seems fairly obvious; most of the bits get deleted before they're run. Others are run but '01' and '10' are equivalent under NAND.

To get this I just converted your perl to C; as I said, it's a brute force search.

   #include <stdio.h>
   #include <stdlib.h>
   #include <string.h>
   #define MAXRUN  100000000
   #define MAXINIT 256
   void process_string(void);
   void trace_string(void);
   #define SZ      MAXRUN+MAXINIT
   char buffer[SZ+2];
   int left, right, steps, pc;
   int trace = 0;
   int forceshowlog = 0;
   void bincounter(void);
   void run_one(char * str);
   main(int argc, char **argv)
       forceshowlog = (argc>1);
       if(argc>1 && *argv[1]) {
           trace=(argc == 2);
       } else
       return 0;
   run_one(char * str)
       int v, i;
       v = strlen(str);
       for(i=0; i<v; i++)
           buffer[SZ-v+i] = str[i];
       left = SZ-v;
       right = left+v;
       steps = 0;
       pc = left;
       if (steps < MAXRUN || forceshowlog)
           printf("For '%s' after %d steps left=%d, right=%d, len=%d\n", str, steps, left, right, right-left);
       int i, maxp=0;
   static char number[128] = "";
       for(i=0; i<1000000000; i++)
           int p = 0;
           while(number[p] == '1')
               number[p++] = '0';
           number[p++] = '1';
           if(p>maxp) maxp = p;
           number[maxp-1] = 0;
           number[maxp-1] = '1';
       int a, b, v;
       while(left != right && steps < MAXRUN)
           if(buffer[pc] == '0') {
               pc = pc + 1; if (pc >= right) pc = left;
           } else if(buffer[pc] == '1') {
               a = pc + 1; if (a >= right) a = left;
               b = a + 1; if (b >= right) b = left;
               if (buffer[a] == '1' && buffer[b] == '1') {
                   v = '0';
               } else {
                   v = '1';
               buffer[--left] = v; b--;
               pc = b+1; if (pc >= right) pc = left;
           } else {
               fprintf(stderr, "Bad character found\n");
       int a, b, v;
       while(left != right && steps < MAXRUN)
           printf("%.*s\n", right-left, buffer+left);
           // for(v=right-1; v>=left; v--) putchar(buffer[v]); putchar('\n');
           if(buffer[pc] == '0') {
               pc = pc + 1; if (pc >= right) pc = left;
           } else if(buffer[pc] == '1') {
               a = pc + 1; if (a >= right) a = left;
               b = a + 1; if (b >= right) b = left;
               if (buffer[a] == '1' && buffer[b] == '1') {
                   v = '0';
               } else {
                   v = '1';
               buffer[--left] = v; b--;
               pc = b+1; if (pc >= right) pc = left;
           } else {
               fprintf(stderr, "Bad character found\n");

If you run it with a MAXRUN of 10000 you should get all these in a couple of days; it seems to run at about 8ns/step.

Rdebath (talk) 21:35, 12 November 2013 (UTC)