Talk:Trivial brainfuck substitution

From Esolang
Jump to navigation Jump to search

Isn't this the category?

We have a category Brainfuck_equivalents that seems to be exactly this; isn't it ? Rdebath (talk) 20:54, 22 November 2014 (UTC)

Not quite, e.g. Unary cannot be got by just a character to string substitution. --Ørjan (talk) 08:38, 25 November 2014 (UTC)

I deleted the count.

Why? ... Because ...

  • Single character encodings are not special.
  • Encodings don't have to have a unique set of symbols, for example case insensitivity.
    Do you count the case sensitive and insensitive separately ? Do you merge them ?
  • It is impossible to be correct about how many characters are in Unicode.

I'd like to hammer the last one home a bit more.

Unicode started as a 16bit encoding, it was guessed that CJK could be squeezed into this space because it had already been done. But they forgot that it was similar to squeezing ASCII into 5 bits, possible but really, really ugly. So it got expanded to 32bit, that'll be enough; unfortunately Microsoft has invested a huge amount of time into 16bit Unicode so they needed a workaround and UTF-16 was invented ... 20bit should be enough. The thing is though, they have a fallback if it isn't; if the number of allocated characters gets anywhere near the million they will be after part of the private use area on the BMP for "Super high surrogates", if the Unicode Corp. doesn't agree, tough, it is the Microsoft "private use"! This means there is in fact nothing to prevent the number of characters going past the 1000000 mark; UTF-8 and UTF-32 just get their artificial limits removed.

On the other hand right now about 120000 characters are allocated; of those 66 are non-characters, they ARE NOT and never will be "Unicode characters" by definition. Then there's the 2048 codes that are used for the UTF-16 surrogates, they are effectively "non-characters" too because they won't ever be used in a plain encoding. Then there are some "presentation forms", these aren't considered real characters either but are included because old systems used these to be able to display languages like Arabic where a single "letter" is displayed very differently depending on where it appears in a word. There are other classes too, like "tags".

Then, if that isn't enough, we get to combining characters, these are characters that attach themselves to another character to create a third like "a" and "¨" making "ä". But you can put any combining character with any base including one that already has a combining character ... L̲ī̲k̲ē̲ ̲t̲h̲ī̲s̲ ... or this ... Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞ ... so according to some people this makes the possible number of Unicode characters infinite already ... at the very least nobody agrees on exactly how many possible characters there are.

Rdebath (talk) 06:32, 11 July 2015 (UTC)

In addition, assuming Unicode is not a valid assumption. (Some might even use codings which cannot even be use with Unicode.) Even if you do use the various UTF formats: UTF-8 is limited to 36-bits (unless you allow "extended overlong encodings", which makes it unlimited), UTF-16 is limited to 21-bits but many codepoints cannot be encoded (all valid Unicode codepoints can be encoded though), UTF-32 is limited to 32-bits, VLQ-9 (also called UTF-9) is unlimited, UTF-18 doesn't even encode all valid Unicode codepoints, etc. And then there are other character sets that might be used: VT100, Commodore 64, Infocom character graphics, PC character set, EUC-JP, EBCDIC, or something else entirely different. And then something which might be considered (I don't know?) is where there is more than one character meaning same thing, where a character is replaced by a bit sequence of characters, using a string of characters for each command, etc; you might not even use characters? But these are still just a simple kind of substitutions, rather than stuff like Unary which aren't. --Zzo38 (talk) 16:59, 11 July 2015 (UTC)
Oh, Ghod, I hadn't come across UTF-9 before; that is a truly horrific encoding. It's actually worse than any of the ISO2022 encodings and their derivatives, at least ISO2022 doesn't reuse control characters! Rdebath (talk) 20:43, 11 July 2015 (UTC)
I have two points to make here. First, no matter how many characters there are, you are allowed to use multicharacter commands - as in Ook! or Ternary, making the number of possible substitutions infinite no matter which character set is used. Secondly, I came across a Wikipedia article which stated UTF-9 and UTF-18 were April Fools jokes. Rdococ (talk) 20:09, 10 February 2017 (UTC)