User:Gilbert189/Polymorphic quine
- This article may need to be made more rigorous. If you have any questions, feel free to ask it on the talk page (and hopefully I can answer it).
If you think about it, viruses are really just smart quines. Not only can they replicate themselves, but they can also embed themselves to any program nearby, whilst doing some (usually) malicious act to the victim's computer. Sure, usually they're a cheating quine, but they're a quine variant nonetheless. This led me to wonder, what if we use the techniques that virus programmers have used for a quine?
...Okay, probably a Word macro quine isn't that exciting. But how about this: a polymorphic quine that mutates its output without changing its behaviour?
At first, the idea of a polymorphic quine seems contradictory. Quines should return its own source code, and polymorphic code (as the name suggests) have multiple forms, which is never the original source code. How could a program be classified as both?
Definition
Quines shouldn't need introduction, but perhaps polymorphic code might need one. Essentially, they use a special tool called the polymorphic engine that modifies some code without changing its behavior. We could define this as such:
A polymorphic engine takes a program and outputs an equivalent program that is not equal to ,
where:
- two programs are called equivalent (notated with ) if both program produces equal output for all (valid) inputs, and
- two programs are called equal (notated with ) if both program have the same source code/instructions.[note 1]
Note that multiple compositions of will always create an equivalent program to what was given in. That is, .
Notice that has multiple, potentially infinite solution for one program . 3 + 2 can be converted into 6 - 1, or (x + 5) - x, etc. If one wants it, we can turn bijective by specifying an index , which specifies the program returned from all the valid equivalent programs.[note 2]
With this, we can define what a polymorphic quine is. Recall that quines return their own source code. We will notate this with the "source" subscript; in this case, . A polymorphic quine (further shortened to "polyquine") is simply a quine that returns itself, but mutated first by the polymorphic engine. That is,
Fuzzy cases
You might have noticed a slight weakness on our definition of . Mainly, it doesn't account for instructions that has no effect or cancel each other. So, one engine could just add noops, a bunch of comments, or expression like || 0
, + 0
, * (x == x)
. This is the method used by some simple polymorphic code like Newlove, and while it is primitive, I think this method should still be classified as a polymorphic engine.
Could we sharpen the definition of to invalidate these fuzzy cases? We could, but defining that is tricky. We have to define what "cancellation" is, or rather, what counts as a valid cancellation (of which there's a lot). At the same time, we are redefining how two programs can be equal, and this can be dangerous. It could be possible that the definition of equal programs to be equal to equivalent programs (i.e. saying two programs are "equal" is the same as saying they're "equivalent"), therefore creating a contradiction on our definition of (after all, how can two programs be equal and not equal to each other?)
Nevertheless, we can try to get something closer. Let be a compiler/translator program that turns any valid source code of a program to its program, optimized or not. That is, .[note 3] When fed with the output of a quine , it returns itself by definition () (thus, why quines are called "a fixed point on an execution environment"[1]). However, polyquines are not so. When fed a polyquine, there are two possibilities:
- The compiler returns the exact equal polyquine, . We will call this a false polyquine (or pseudopolyquine if you like Greek prefixes), since it ends up in a 1-loop much like a regular quine.
- The compiler returns some other polyquine, derived from the previous polyquine, . We will call this a true polyquine, since generally it never ends up looping into the same polyquine, much like a random walk.
This does not eliminate the case that two equivalent programs are implied to be equal. There could be some magic compiler that reduces the given source into its most efficient representation, such that two equivalent programs are made equal.[note 4] However, this allows us to directly communicate how "veracious" (for a lack of a better word) a polyquine is, since the responsibility of the valid cancellations is delegated to the compiler. One can say that their polyquine is true under this compiler, or that this polyquine is false under that compiler.
Is this still a quine?
Great question! The answer is simple: no, it's not (per this definition of quines). Quines shouldn't take I/O, and so never changes their output. It always outputs themselves, not themselves plus something. But with that definition, neither are iterating quines, multiquines, etc., yet they still earn the quine name.
Speaking of multiquines, let's discuss those. Multiquines are like quines, but other than outputting itself, they may output alternative multiquines (in different languages), which is chosen by some input. Essentially, they're a form of oligomorphic code (which is also a word somebody coined). Seeing that oligomorphic code is a subset of polymorphic code, does the same apply to multiquines? Are polyquines just infinite (or effectively infinite) multiquines? Not really.
The definition of multiquines explicitly states that multiquines may output other multiquines in different languages. This is to alleviate the problem of the alternatives being the same multiquine, instead of something else. However, this multiquine counterexample cannot be classified as polyquine either, since the engine's output cannot be equal to its input, as stated by the definition. In fact, most polymorphic code's mutations are in the same language as its host (you wouldn't want your x86 virus turn into an ARM virus all of a sudden). Polyquines certainly overlap with multiquines, but not necessarily equal to it.
Lack of a fixed point
There's a more serious argument regarding the quine status of polyquines. All variants of quines we have mentioned so far have at least one fixed point. But polyquines (at least the true one) doesn't seem to have them. Is it still acceptable to honor the quine name for true polyquines, if their behavior is unlike them?
As I see it, there's two possible counterpoints to this argument. It could be that there are some outputs of a polyquine that leads into a loop, making a fixed point (if extremely humongous in size). This argument only works if the polymorphic engine is deterministic. It could also be that one polyquine have infinite fixed points; if the output of a polyquine is in the infinite set , then all polyquines in also output a polyquine in only. This set is infinite in size, but it's still a subset to all possible programs; its complement being programs that are not equivalent to any polyquine in . This definition also works on multiquines, although the set of the outputs of multiquines are finite instead of infinite. In the end, we could say that polyquines have a fixed point after all, if extremely numerous, justifying the quine name.
Another way to justify the quine name revolves with the definition of quine variants. If we define a quine variant as a program that can replicate itself, instead of being a fixed point, then polyquine certainly qualifies as one. But this definition is very broad; under this definition, viruses and worms count as a quine variant, which is arguably true. Of course, we then have to define what "replicating itself" is to be rigorous, but to be honest I don't feel like going into that rabbit hole right now. I'll just encourage the reader to research this topic and decide to yourselves if polyquines qualifies as a quine variant or not.
Example
Here's a simple Python polyquine I made. The engine is quite primitive; it just appends comments to every line. Incidentally, this method has been used for a worm called Newlove once.
import gzip, base64 from pprint import pformat data = (b'+,^C)#\\-\\D!rlh09lJc?$p@+qK$ZDDhHV3@R\\D4W_ZoJ3b:WnJ9*%DBka8Cs$BgpUqtTWD@i' b'pG#(Q!O#\'Zil9O]-hk:Pp_D9ZEQ;d3nW!A"leM(-M;\\]IC^c@c/<aADs`tcRB@hgt[aGPk"q' b";arV-=O3\\Vu/7:G85Re5+bt'$SKhg8%'e!aN5h:nS1K^58a^?lCV2NU]ZE;]F)%6r?bn6X'I" b'.E$3X+sl:rq>l_rdm@(5#l,I"WTPDm\'-s5N\\Jo1m&-q#AR]^p.@K,Z[C0TN-B#D<4@_T\'7&J' b'm7bU(Lo<hd^%W1.87AF"caB(`(u9N\\/+l;Qm1T]BVXrE?W`)Ll]1c?CD-rlu3W0*,CqYO-Kf' b'f>#joe1Ef=V`G*]hXBCY^P/YXhX1#24b"N*KW55u5QPCCh_#N<rrZk^$g0u]G$%b67Fhr#qj' b"9'M[E,Jpm='4><3s/1UK:)`V8'U,NOWeV>/[!Eh0#;pf;WmCf!qUQ$c`Rs*)-J7!!") def morph(x): """A simple polymorphic engine. While primitive, it technically counts as one. It's also a homage to Email-Worm.VBS.Newlove that did something like this.""" from random import choices, expovariate from string import ascii_uppercase import re return re.sub("\n|$", lambda x: "".join(f" # {''.join(choices(ascii_uppercase, k=round(20 + expovariate(0.005))))}\n" for _ in "_"*round(5 + expovariate(0.05))), x) code = gzip.decompress(base64.a85decode(data)).decode("utf-8") print(morph(code.__mod__(pformat(data))))
(it needs the two newlines)
A simple test reveal that they act as they should:
$ python polyquine.py | md5sum 6008478102f2d178b10a114073cc9bee - $ python polyquine.py | md5sum b3f0f811e77eb2c428b9b8f58ed419b6 - $ python polyquine.py | python - | md5sum 3ae1c209beee6ce2641e8bb166469f02 -
Notes
References
- ↑ Wikipedia:Quine (computing) (I can't find a better source =( )