φ^{A}_{u}(sp, w) = φ^{A}_{j} (p, w)

and

Φ^{A}_{u}(sp, w) ≤ L(Φ^{A}_{j} (p, w))

The functions U^{A}(p,w) and T^{A}(p,w) are defined respectively as φ^{A}_{u}(p, w) and Φ^{A}_{u}(p, w), where u is an efficiently universal index.

For any string x, the minimal program, denoted x∗, is min{p : U(p) = x}, the least self-delimiting program to compute x. For any two strings x and w, the minimal program of x relative to w, denoted (x/w)∗, is defined similarly as min{p : U(p,w) = x}.

By contrast to its minimal program, any string x also has a print program, of length |x| + O(log|x|), which simply transcribes the string x from a verbatim description of x contained within the program. The print program is logarithmically longer than x because, being self-delimiting, it must indicate the length as well as the contents of x. Because it makes no effort to exploit redundancies to achieve efficient coding, the print program can be made to run quickly (e.g. linear time in |x|, in the present formalism). Extra information w may help, but cannot significantly hinder, the computation of x, since a finite subprogram would suffice to tell U to simply erase w before proceeding. Therefore, a relative minimal program (x/w)∗ may be much shorter than the corresponding absolute minimal program x∗, but can only be longer by O(1), independent of x and w.

A string is compressible by s bits if its minimal program is shorter by at least s bits than the string itself, i.e. if |x∗| ≤ |x| − s. Similarly, a string x is said to be compressible by s bits relative to a string w if |(x/w)∗| ≤ |x| − s. Regardless of how compressible a string x may be, its minimal program x∗ is compressible by at most an additive constant depending on the universal computer but independent of x. [If (x∗)∗ were much smaller than x∗, then the role of x∗ as minimal program for x would be undercut by a program of the form “execute the result of executing (x∗)∗.”] Similarly, a relative minimal program (x/w)∗ is compressible relative to w by at most a constant number of bits independent of x or w.

The algorithmic probability of a string x, denoted P(x), is defined as {∑2^{−|p|} : U(p) = x}. This is the probability that the U machine, with a random program chosen by coin tossing and an initially blank data tape, will halt with output x. The time-bounded algorithmic probability, P_{t}(x), is defined similarly, except that the sum is taken only over programs which halt within time t. We use P(x/w) and P_{t}(x/w) to denote the analogous algorithmic probabilities of one string x relative to another w, i.e. for computations that begin with w on the data tape and halt with x on the data tape.

The algorithmic entropy H(x) is defined as the least integer greater than −log_{2}P(x), and the conditional entropy H(x/w) is defined similarly as the least integer greater than − log_{2}P(x/w). Among the most important properties of the algorithmic entropy is its equality, to within O(1), with the size of the minimal program:

∃c∀x∀wH(x/w) ≤ |(x/w)∗| ≤ H(x/w) + c

The first part of the relation, viz. that algorithmic entropy should be no greater than minimal program size, is obvious, because of the minimal program’s own contribution to the algorithmic probability. The second half of the relation is less obvious. The approximate equality of algorithmic entropy and minimal program size means that there are few near-minimal programs for any given input/output pair (x/w), and that every string gets an O(1) fraction of its algorithmic probability from its minimal program.

Finite strings, such as minimal programs, which are incompressible or nearly so are called algorithmically random. The definition of randomness for finite strings is necessarily a little vague because of the ±O(1) machine-dependence of H(x) and, in the case of strings other than self-delimiting programs, because of the question of how to count the information encoded in the string’s length, as opposed to its bit sequence. Roughly speaking, an n-bit self-delimiting program is considered random (and therefore not *ad-hoc* as a hypothesis) iff its information content is about n bits, i.e. iff it is incompressible; while an externally delimited n-bit string is considered random iff its information content is about n + H(n) bits, enough to specify both its length and its contents.

For infinite binary sequences (which may be viewed also as real numbers in the unit interval, or as characteristic sequences of sets of natural numbers) randomness can be defined sharply: a sequence X is incompressible, or algorithmically random, if there is an O(1) bound to the compressibility of its initial segments X_{n}. Intuitively, an infinite sequence is random if it is typical in every way of sequences that might be produced by tossing a fair coin; in other words, if it belongs to no informally definable set of measure zero. Algorithmically random sequences constitute a larger class, including sequences such as Ω which can be specified by ineffective definitions.

The busy beaver function B(n) is the greatest number computable by a self-delimiting program of n bits or fewer. The halting set K is {x : φ_{x}(x) converges}. This is the standard representation of the halting problem.

The self-delimiting halting set K_{0} is the (prefix) set of all self-delimiting programs for the U machine that halt: {p : U(p) converges}.

K and K_{0} are readily computed from one another (e.g. by regarding the self-delimiting programs as a subset of ordinary programs, the first 2^{n} bits of K_{0} can be recovered from the first 2^{n+O(1)} bits of K; by encoding each n-bit ordinary program as a self-delimiting program of length n + O(log n), the first 2^{n} bits of K can be recovered from the first 2^{n+O(log n)} bits of K_{0}.)

The halting probability Ω is defined as {2^{−|p|} : U(p) converges}, the probability that the U machine would halt on an infinite input supplied by coin tossing. Ω is thus a real number between 0 and 1.

The first 2^{n} bits of K_{0} can be computed from the first n bits of Ω, by enumerating halting programs until enough have halted to account for all but 2^{−n} of the total halting probability. The time required for this decoding (between B(n − O(1)) and B(n + H(n) + O(1)) grows faster than any computable function of n. Although K_{0} is only slowly computable from Ω, the first n bits of Ω can be rapidly computed from the first 2^{n+H(n)+O(1)} bits of K_{0}, by asking about the halting of programs of the form “enumerate halting programs until (if ever) their cumulative weight exceeds q, then halt”, where q is an n-bit rational number…