Chain rule for Kolmogorov complexity

The chain rule^[

information entropy

, which states:

$H(X,Y)=H(X)+H(Y|X)$

That is, the combined randomness of two sequences X and Y is the sum of the randomness of X plus whatever randomness is left in Y once we know X. This follows immediately from the definitions of
marginal and conditional probability
:

$P(X,Y)=P(X)P(Y|X)$

$\Rightarrow \log P(X,Y)=\log P(X)+\log P(Y|X)$

The equivalent statement for Kolmogorov complexity does not hold exactly; it is true only up to a logarithmic term:

$K(x,y)=K(x)+K(y|x)+O(\log(K(x,y)))$

(An exact version, KP(x, y) = KP(x) + KP(y|x*) + O(1), holds for the prefix complexity KP, where x* is a shortest program for x.)
It states that the shortest program printing X and Y is obtained by concatenating a shortest program printing X with a program printing Y given X, plus
at most a logarithmic factor. The results implies that algorithmic mutual information
, an analogue of mutual information for Kolmogorov complexity is symmetric: I(x:y) = I(y:x) + O(log K(x,y)) for all x,y.

Proof

The ≤ direction is obvious: we can write a program to produce x and y by concatenating a program to produce x, a program to produce y given access to x, and (whence the log term) the length of one of the programs, so that we know where to separate the two programs for x and y|x (log(K(x, y)) upper-bounds this length).
For the ≥ direction, it suffices to show that for all k,l such that k+l = K(x,y) we have that either
K(x|k,l) ≤ k + O(1)
or
K(y|x,k,l) ≤ l + O(1).
Consider the list (a₁,b₁), (a₂,b₂), ..., (a_e,b_e) of all pairs (a,b) produced by programs of length exactly K(x,y) [hence K(a,b) ≤ K(x,y)]. Note that this list

contains the pair (x,y),

can be
enumerated
given k and l (by running all programs of length K(x,y) in parallel),

has at most 2^K(x,y) elements (because there are at most 2ⁿ programs of length n).

First, suppose that x appears less than 2^l times as first element. We can specify y given x,k,l by enumerating (a₁,b₁), (a₂,b₂), ... and then selecting (x,y) in the sub-list of pairs (x,b). By assumption, the index of (x,y) in this sub-list is less than 2^l and hence, there is a program for y given x,k,l of length l + O(1). Now, suppose that x appears at least 2^l times as first element. This can happen for at most 2^K(x,y)-l = 2^k different strings. These strings can be enumerated given k,l and hence x can be specified by its index in this enumeration. The corresponding program for x has size k + O(1). Theorem proved.

References

Li, Ming; Vitányi, Paul (February 1997). An introduction to Kolmogorov complexity and its applications. New York:
ISBN 0-387-94868-6
.

Kolmogorov, A. (1968). "Logical basis for information theory and probability theory". IEEE Transactions on Information Theory. 14 (5). Institute of Electrical and Electronics Engineers (IEEE): 662–664.
S2CID 11402549
.

Zvonkin, A K; Levin, L A (1970-12-31). "The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms". Russian Mathematical Surveys. 25 (6). IOP Publishing: 83–124.
S2CID 250850390
.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Chain_rule_for_Kolmogorov_complexity&oldid=1166863424"