Chain rule for Kolmogorov complexity

Source: Wikipedia, the free encyclopedia.

The chain rule[

information entropy
, which states:

That is, the combined randomness of two sequences X and Y is the sum of the randomness of X plus whatever randomness is left in Y once we know X. This follows immediately from the definitions of

:

The equivalent statement for Kolmogorov complexity does not hold exactly; it is true only up to a logarithmic term:

(An exact version, KP(xy) = KP(x) + KP(y|x*) + O(1), holds for the prefix complexity KP, where x* is a shortest program for x.)

It states that the shortest program printing X and Y is obtained by concatenating a shortest program printing X with a program printing Y given X, plus

at most a logarithmic factor. The results implies that algorithmic mutual information
, an analogue of mutual information for Kolmogorov complexity is symmetric: I(x:y) = I(y:x) + O(log K(x,y)) for all x,y.

Proof

The ≤ direction is obvious: we can write a program to produce x and y by concatenating a program to produce x, a program to produce y given access to x, and (whence the log term) the length of one of the programs, so that we know where to separate the two programs for x and y|x (log(K(xy)) upper-bounds this length).

For the ≥ direction, it suffices to show that for all k,l such that k+l = K(x,y) we have that either

K(x|k,l) ≤ k + O(1)

or

K(y|x,k,l) ≤ l + O(1).

Consider the list (a1,b1), (a2,b2), ..., (ae,be) of all pairs (a,b) produced by programs of length exactly K(x,y) [hence K(a,b) ≤ K(x,y)]. Note that this list

  • contains the pair (x,y),
  • can be
    enumerated
    given k and l (by running all programs of length K(x,y) in parallel),
  • has at most 2K(x,y) elements (because there are at most 2n programs of length n).

First, suppose that x appears less than 2l times as first element. We can specify y given x,k,l by enumerating (a1,b1), (a2,b2), ... and then selecting (x,y) in the sub-list of pairs (x,b). By assumption, the index of (x,y) in this sub-list is less than 2l and hence, there is a program for y given x,k,l of length l + O(1). Now, suppose that x appears at least 2l times as first element. This can happen for at most 2K(x,y)-l = 2k different strings. These strings can be enumerated given k,l and hence x can be specified by its index in this enumeration. The corresponding program for x has size k + O(1). Theorem proved.

References