A string’s depth might be defined as the execution time of its minimal program.

The difficulty with this definition arises in cases where the minimal program is only a few bits smaller than some much faster program, such as a print program, to compute the same output x. In this case, slight changes in x may induce arbitrarily large changes in the run time of the minimal program, by changing which of the two competing programs is minimal. Analogous instability manifests itself in translating programs from one universal machine to another. This instability emphasizes the essential role of the quantity of buried redundancy, not as a measure of depth, but as a certifier of depth. In terms of the philosophy-of-science metaphor, an object whose minimal program is only a few bits smaller than its print program is like an observation that points to a nontrivial hypothesis, but with only a low level of statistical confidence.

To adequately characterize a finite string’s depth one must therefore consider the amount of buried redundancy as well as the depth of its burial. A string’s depth at significance level s might thus be defined as that amount of time complexity which is attested by s bits worth of buried redundancy. This characterization of depth may be formalized in several ways.

A string’s depth at significance level s be defined as the time required to compute the string by a program no more than s bits larger than the minimal program.

This definition solves the stability problem, but is unsatisfactory in the way it treats multiple programs of the same length. Intuitively, 2^{k} distinct (n + k)-bit programs that compute same output ought to be accorded the same weight as one n-bit program; but, by the present definition, they would be given no more weight than one (n + k)-bit program.

A string’s depth at signicifcance level s depth might be defined as the time t required for the string’s time-bounded algorithmic probability P_{t}(x) to rise to within a factor 2^{−s} of its asymptotic time-unbounded value P(x).

This formalizes the notion that for the string to have originated by an effective process of t steps or fewer is less plausible than for the first s tosses of a fair coin all to come up heads.

It is not known whether there exist strings that are deep, in other words, strings having no small fast programs, even though they have enough large fast programs to contribute a significant fraction of their algorithmic probability. Such strings might be called deterministically deep but probabilistically shallow, because their chance of being produced quickly in a probabilistic computation (e.g. one where the input bits of U are supplied by coin tossing) is significant compared to their chance of being produced slowly. The question of whether such strings exist is probably hard to answer because it does not relativize uniformly. Deterministic and probabilistic depths are not very different relative to a random coin-toss oracle A of the equality of random-oracle-relativized deterministic and probabilistic polynomial time complexity classes; but they can be very different relative to an oracle B deliberately designed to hide information from deterministic computations (this parallels Hunt’s proof that deterministic and probabilistic polynomial time are unequal relative to such an oracle).

(Depth of Finite Strings): Let x and w be strings and s a significance parameter. A string’s depth at significance level s, denoted D_{s}(x), will be defined as min{T(p) : (|p|−|p^{∗}| < s)∧(U(p) = x)}, the least time required to compute it by a s-incompressible program. At any given significance level, a string will be called t-deep if its depth exceeds t, and t-shallow otherwise.

The difference between this definition and the previous one is rather subtle philosophically and not very great quantitatively. Philosophically, when each individual hypothesis for the rapid origin of x is implausible at the 2^{−s} confidence level, then it requires only that a weighted average of all such hypotheses be implausible.

There exist constants c_{1} and c_{2} such that for any string x, if programs running in time ≤ t contribute a fraction between 2^{−s} and 2^{−s+1} of the string’s total algorithmic probability, then x has depth at most t at significance level s + c_{1} and depth at least t at significance level s − min{H(s), H(t)} − c^{2}.

Proof : The first part follows easily from the fact that any k-compressible self-delimiting program p is associated with a unique, k − O(1) bits shorter, program of the form “execute the result of executing p∗”. Therefore there exists a constant c_{1} such that if all t-fast programs for x were s + c_{1}– compressible, the associated shorter programs would contribute more than the total algorithmic probability of x. The second part follows because, roughly, if fast programs contribute only a small fraction of the algorithmic probability of x, then the property of being a fast program for x is so unusual that no program having that property can be random. More precisely, the t-fast programs for x constitute a finite prefix set, a superset S of which can be computed by a program of size H(x) + min{H(t), H(s)} + O(1) bits. (Given x∗ and either t∗ or s∗, begin enumerating all self-delimiting programs that compute x, in order of increasing running time, and quit when either the running time exceeds t or the accumulated measure of programs so far enumerated exceeds 2^{−(H(x)−s)}). Therefore there exists a constant c_{2} such that, every member of S, and thus every t-fast program for x, is compressible by at least s − min{H(s), H(t)} − O(1) bits.

The ability of universal machines to simulate one another efficiently implies a corresponding degree of machine-independence for depth: for any two efficiently universal machines of the sort considered here, there exists a constant c and a linear polynomial L such that for any t, strings whose (s+c)-significant depth is at least L(t) on one machine will have s-significant depth at least t on the other.

Depth of one string relative to another may be defined analogously, and represents the plausible time required to produce one string, x, from another, w.

(Relative Depth of Finite Strings): For any two strings w and x, the depth of x relative to w at significance level s, denoted D_{s}(x/w), will be defined as min{T(p, w) : (|p|−|(p/w)∗| < s)∧(U(p, w) = x)}, the least time required to compute x from w by a program that is s-incompressible relative to w.

Depth of a string relative to its length is a particularly useful notion, allowing us, as it were, to consider the triviality or nontriviality of the “content” of a string (i.e. its bit sequence), independent of its “form” (length). For example, although the infinite sequence 000… is intuitively trivial, its initial segment 0^{n} is deep whenever n is deep. However, 0^{n} is always shallow relative to n, as is, with high probability, a random string of length n.

In order to adequately represent the intuitive notion of stored mathematical work, it is necessary that depth obey a “slow growth” law, i.e. that fast deterministic processes be unable to transform a shallow object into a deep one, and that fast probabilistic processes be able to do so only with low probability.

(Slow Growth Law): Given any data string x and two significance parameters s_{2} > s_{1}, a random program generated by coin tossing has probability less than 2^{−(s2−s1)+O(1)} of transforming x into an excessively deep output, i.e. one whose s_{2}-significant depth exceeds the s_{1}-significant depth of x plus the run time of the transforming program plus O(1). More precisely, there exist positive constants c_{1}, c_{2} such that for all strings x, and all pairs of significance parameters s_{2} > s_{1}, the prefix set {q : D_{s2}(U(q, x)) > D_{s1}(x) + T(q, x) + c_{1}} has measure less than 2^{−(s2−s1)+c2.}

Proof: Let p be a s_{1}-incompressible program which computes x in time D_{s1}(x), and let r be the restart prefix mentioned in the definition of the U machine. Let Q be the prefix set {q : D_{s2}(U(q, x)) > T(q, x) + D_{s1}(x) + c_{1}}, where the constant c_{1} is sufficient to cover the time overhead of concatenation. For all q ∈ Q, the program rpq by definition computes some deep result U(q, x) in less time than that result’s own s_{2}-significant depth, and so rpq must be compressible by s_{2} bits. The sum of the algorithmic probabilities of strings of the form rpq, where q ∈ Q, is therefore

Σ_{q∈Q} P(rpq)< Σ_{q∈Q} 2^{−|rpq| + s2} = 2^{−|r|−|p|+s2} μ(Q)

On the other hand, since the self-delimiting program p can be recovered from any string of the form rpq (by deleting r and executing the remainder pq until halting occurs, by which time exactly p will have been read), the algorithmic probability of p is at least as great (within a constant factor) as the sum of the algorithmic probabilities of the strings {rpq : q ∈ Q} considered above:

P(p) > μ(Q) · 2^{−|r|−|p|+s2−O(1)}

Recalling the fact that minimal program size is equal within a constant factor to the −log of algorithmic probability, and the s_{1}-incompressibility of p, we have P(p) < 2^{−(|p|−s1+O(1))}, and therefore finally

μ(Q) < 2^{−(s2−s1)+O(1)}, which was to be demonstrated.