Post correspondence problem

From Esolang
Jump to navigation Jump to search

The Post correspondence problem is the task of finding a general solution to a class of decision problems that were proposed by Emil Post in 1946. Post proved that this task was impossible – any general solution would have to be uncomputable.

Each individual correspondence problem, however, is semidecidable, i.e. if it has no solution, there might not be any way to determine this fact using a computable algorithm; but if it does have a solution, it is always possible (given enough time) to find it. Therefore, the task of "given a correspondence problem, attempt to deduce its solution" is in effect a programming language; the problem is the program, and the program halts if there is a solution or enters an infinite loop if there is no solution. Using this view of the Post correspondence problem as a programming language, the problem is Turing-complete (which explains why any general solution would be uncomputable – it would be able to solve the halting problem for Turing machines).

Syntax

A correspondence problem (i.e. the equivalent of a program, if the Post correspondence problem is viewed as a programming language) is a set of pairs of strings. (The alphabet from which the characters within the strings are drawn doesn't matter, as long as at least two characters are available; sensible choices for an implementation include printable ASCII and printable Unicode. Note that in order to store a correspondence problem in a file, you will need to specify some separators between the strings within a pair and the pairs within the set, e.g. tab and newline.)

Semantics

The execution of a correspondence problem is based around candidate solutions. A candidate solution is a nonempty list of pairs of strings, for which each element of the list appears is an element of the correspondence problem. (It's acceptable for the candidate solution to contain the same element multiple times – even if a pair of strings only appears once in the problem, it can appear multiple times in a candidate solution.)

Each candidate solution has two projections; the left projection is formed by replacing each pair within the candidate solution with its left element, and the right projection is formed by replacing each pair within the candidate solution with its right element. It also has two projection strings, which are formed by concatenating all the strings within a projection, in the order that they appear. A candidate solution is valid if its two projection strings are identical.

One possible algorithm to execute a correspondence problem is to iterate over all possible candidate solutions, in a sequence that will, for any candidate solution, check that solution within finite time. For each candidate solution, the interpreter checks to see if it is valid; if it is, it halts the program, and otherwise it continues to the next solution. This algorithm, or any algorithm which produces equivalent results (i.e. halting if there is a valid candidate solution, and executing forever if there is no valid candidate solution) implements the Post correspondence problem.

Example

Here's an example correspondence problem:

a	ab
b	rac
c	d
dab	r
ra	a

An example of a candidate solution would be [["c", "d"], ["ra", "a"]]; however, this solution is invalid (its left projection is ["c", "ra"] and its right projection ["d", "a"], leading to projection strings of cra and da which are not equal). There are, however, valid candidate solutions, e.g. [["a", "ab"], ["b", "rac"], ["ra", "a"], ["c", "d"], ["a", "ab"], ["dab", "r"], ["ra", "a"]].

Computational class

The Post correspondence problem can directly implement Couplet 2C, meaning that it is Turing-complete (when viewed as a programming language) or undecidable (when viewed as a decision problem, because it would need to solve the halting problem for the programming language view).

The proof uses a two characters in the Post correspondence problem alphabet for each symbol in the 2C program that's being compiled (represented here using bold and italic), plus four more characters [/\]. Some of the pairs are always the same (here, * means that any character from the Couplet 2C alphabet can appear there, implemented via making multiple copies of the pair, one for each possible character):

[	[/11/
*/	00\
*\	00/
/$***	/$
\$***	\$
**$	$
**$	$
*$*	$
*$*	$
*$$	$
*$$	$
\$00\]	]
/$00/]	]

Then, each rule ab/x in the Couplet 2C program (for which neither a nor b is a $) corresponds to pairs ab, xx and ab, xx in the correspondence problem. Rules of the form 0b/x additionally correspond to pairs /b, \xx and \b, /xx.

To see why this implements Couplet 2C, first note that any possible valid candidate solution must start with [ and end with ] (these are the only characters which appear at the start of both strings of a pair, or the end of both strings of a pair, respectively – any other rule has a mismatch in bold/italic or in / versus \). It must then be followed by a complete history of execution of the Couplet 2C program (encoded with each character doubled and alternating between bold between / and italic between \), up until the point the first $ appears; the history must necessarily start with /11/ (the encoding of the starting state of a 2C program), and each subsequent element can inductively be seen to correspond to the element before after one step of 2C program evolution. It is clear that if neither $ nor $ ever appears in the history, it's therefore impossible to find a valid candidate solution (because every pair containing ] also contains $ or $); so the correspondence problem can't be solved unless the 2C program halts. Meanwhile, if the 2C program does halt, it's possible to use the rules involving combinations of dollars and letters to first move the $ to the start, then shorten the history elements until they shrink down to a dollar and two zeroes, making it possible to end the candidate solution.

As a side note, any correspondence problem can be converted to an equivalent problem that uses only two different characters simply by encoding each character as a binary string of a consistent length n. Because every string within the pairs of the program must have a length that's a multiple of n, each aligned length-n fragment of the strings (both in the strings of the problem's pairs and in the projection strings) will therefore act like a symbol in its own right.

Implementations

Brachylog

Here's an (inefficient) implementation of the Post correspondence program in Brachylog, using a different syntax from the one above:

~h=∋ᵐ\w₆cᵐ=

On success, it outputs the left and right projections of the valid solution that it found.

You can try this interpreter online.

External resources