String’s Depth of Burial

string_conversion_03.2013

A string’s depth might be defined as the execution time of its minimal program.

The difficulty with this definition arises in cases where the minimal program is only a few bits smaller than some much faster program, such as a print program, to compute the same output x. In this case, slight changes in x may induce arbitrarily large changes in the run time of the minimal program, by changing which of the two competing programs is minimal. Analogous instability manifests itself in translating programs from one universal machine to another. This instability emphasizes the essential role of the quantity of buried redundancy, not as a measure of depth, but as a certifier of depth. In terms of the philosophy-of-science metaphor, an object whose minimal program is only a few bits smaller than its print program is like an observation that points to a nontrivial hypothesis, but with only a low level of statistical confidence.

To adequately characterize a finite string’s depth one must therefore consider the amount of buried redundancy as well as the depth of its burial. A string’s depth at significance level s might thus be defined as that amount of time complexity which is attested by s bits worth of buried redundancy. This characterization of depth may be formalized in several ways.

A string’s depth at significance level s be defined as the time required to compute the string by a program no more than s bits larger than the minimal program.

This definition solves the stability problem, but is unsatisfactory in the way it treats multiple programs of the same length. Intuitively, 2k distinct (n + k)-bit programs that compute same output ought to be accorded the same weight as one n-bit program; but, by the present definition, they would be given no more weight than one (n + k)-bit program.

A string’s depth at signicifcance level s depth might be defined as the time t required for the string’s time-bounded algorithmic probability Pt(x) to rise to within a factor 2−s of its asymptotic time-unbounded value P(x).

This formalizes the notion that for the string to have originated by an effective process of t steps or fewer is less plausible than for the first s tosses of a fair coin all to come up heads.

It is not known whether there exist strings that are deep, in other words, strings having no small fast programs, even though they have enough large fast programs to contribute a significant fraction of their algorithmic probability. Such strings might be called deterministically deep but probabilistically shallow, because their chance of being produced quickly in a probabilistic computation (e.g. one where the input bits of U are supplied by coin tossing) is significant compared to their chance of being produced slowly. The question of whether such strings exist is probably hard to answer because it does not relativize uniformly. Deterministic and probabilistic depths are not very different relative to a random coin-toss oracle A of the equality of random-oracle-relativized deterministic and probabilistic polynomial time complexity classes; but they can be very different relative to an oracle B deliberately designed to hide information from deterministic computations (this parallels Hunt’s proof that deterministic and probabilistic polynomial time are unequal relative to such an oracle).

(Depth of Finite Strings): Let x and w be strings and s a significance parameter. A string’s depth at significance level s, denoted Ds(x), will be defined as min{T(p) : (|p|−|p| < s)∧(U(p) = x)}, the least time required to compute it by a s-incompressible program. At any given significance level, a string will be called t-deep if its depth exceeds t, and t-shallow otherwise.

The difference between this definition and the previous one is rather subtle philosophically and not very great quantitatively. Philosophically, when each individual hypothesis for the rapid origin of x is implausible at the 2−s confidence level, then it requires only that a weighted average of all such hypotheses be implausible.

There exist constants c1 and c2 such that for any string x, if programs running in time ≤ t contribute a fraction between 2−s and 2−s+1 of the string’s total algorithmic probability, then x has depth at most t at significance level s + c1 and depth at least t at significance level s − min{H(s), H(t)} − c2.

Proof : The first part follows easily from the fact that any k-compressible self-delimiting program p is associated with a unique, k − O(1) bits shorter, program of the form “execute the result of executing p∗”. Therefore there exists a constant c1 such that if all t-fast programs for x were s + c1– compressible, the associated shorter programs would contribute more than the total algorithmic probability of x. The second part follows because, roughly, if fast programs contribute only a small fraction of the algorithmic probability of x, then the property of being a fast program for x is so unusual that no program having that property can be random. More precisely, the t-fast programs for x constitute a finite prefix set, a superset S of which can be computed by a program of size H(x) + min{H(t), H(s)} + O(1) bits. (Given x∗ and either t∗ or s∗, begin enumerating all self-delimiting programs that compute x, in order of increasing running time, and quit when either the running time exceeds t or the accumulated measure of programs so far enumerated exceeds 2−(H(x)−s)). Therefore there exists a constant c2 such that, every member of S, and thus every t-fast program for x, is compressible by at least s − min{H(s), H(t)} − O(1) bits.

The ability of universal machines to simulate one another efficiently implies a corresponding degree of machine-independence for depth: for any two efficiently universal machines of the sort considered here, there exists a constant c and a linear polynomial L such that for any t, strings whose (s+c)-significant depth is at least L(t) on one machine will have s-significant depth at least t on the other.

Depth of one string relative to another may be defined analogously, and represents the plausible time required to produce one string, x, from another, w.

(Relative Depth of Finite Strings): For any two strings w and x, the depth of x relative to w at significance level s, denoted Ds(x/w), will be defined as min{T(p, w) : (|p|−|(p/w)∗| < s)∧(U(p, w) = x)}, the least time required to compute x from w by a program that is s-incompressible relative to w.

Depth of a string relative to its length is a particularly useful notion, allowing us, as it were, to consider the triviality or nontriviality of the “content” of a string (i.e. its bit sequence), independent of its “form” (length). For example, although the infinite sequence 000… is intuitively trivial, its initial segment 0n is deep whenever n is deep. However, 0n is always shallow relative to n, as is, with high probability, a random string of length n.

In order to adequately represent the intuitive notion of stored mathematical work, it is necessary that depth obey a “slow growth” law, i.e. that fast deterministic processes be unable to transform a shallow object into a deep one, and that fast probabilistic processes be able to do so only with low probability.

(Slow Growth Law): Given any data string x and two significance parameters s2 > s1, a random program generated by coin tossing has probability less than 2−(s2−s1)+O(1) of transforming x into an excessively deep output, i.e. one whose s2-significant depth exceeds the s1-significant depth of x plus the run time of the transforming program plus O(1). More precisely, there exist positive constants c1, c2 such that for all strings x, and all pairs of significance parameters s2 > s1, the prefix set {q : Ds2(U(q, x)) > Ds1(x) + T(q, x) + c1} has measure less than 2−(s2−s1)+c2.

Proof: Let p be a s1-incompressible program which computes x in time Ds1(x), and let r be the restart prefix mentioned in the definition of the U machine. Let Q be the prefix set {q : Ds2(U(q, x)) > T(q, x) + Ds1(x) + c1}, where the constant c1 is sufficient to cover the time overhead of concatenation. For all q ∈ Q, the program rpq by definition computes some deep result U(q, x) in less time than that result’s own s2-significant depth, and so rpq must be compressible by s2 bits. The sum of the algorithmic probabilities of strings of the form rpq, where q ∈ Q, is therefore

Σq∈Q P(rpq)< Σq∈Q 2−|rpq| + s2 = 2−|r|−|p|+s2 μ(Q)

On the other hand, since the self-delimiting program p can be recovered from any string of the form rpq (by deleting r and executing the remainder pq until halting occurs, by which time exactly p will have been read), the algorithmic probability of p is at least as great (within a constant factor) as the sum of the algorithmic probabilities of the strings {rpq : q ∈ Q} considered above:

P(p) > μ(Q) · 2−|r|−|p|+s2−O(1)

Recalling the fact that minimal program size is equal within a constant factor to the −log of algorithmic probability, and the s1-incompressibility of p, we have P(p) < 2−(|p|−s1+O(1)), and therefore finally

μ(Q) < 2−(s2−s1)+O(1), which was to be demonstrated.

Universal Turing Machine: Algorithmic Halting

169d342be4ac9fdca10d1c8c9c04c3df

A natural number x will be identified with the x’th binary string in lexicographic order (Λ,0,1,00,01,10,11,000…), and a set X of natural numbers will be identified with its characteristic sequence, and with the real number between 0 and 1 having that sequence as its dyadic expansion. The length of a string x will be denoted |x|, the n’th bit of an infinite sequence X will be noted X(n), and the initial n bits of X will be denoted Xn. Concatenation of strings p and q will be denoted pq.

We now define the information content (and later the depth) of finite strings using a universal Turing machine U. A universal Turing machine may be viewed as a partial recursive function of two arguments. It is universal in the sense that by varying one argument (“program”) any partial recursive function of the other argument (“data”) can be obtained. In the usual machine formats, program, data and output are all finite strings, or, equivalently, natural numbers. However, it is not possible to take a uniformly weighted average over a countably infinite set. Chaitin’s universal machine has two tapes: a read-only one-way tape containing the infinite program; and an ordinary two-way read/write tape, which is used for data input, intermediate work, and output, all of which are finite strings. Our machine differs from Chaitin’s in having some additional auxiliary storage (e.g. another read/write tape) which is needed only to improve the time efficiency of simulations.

We consider only terminating computations, during which, of course, only a finite portion of the program tape can be read. Therefore, the machine’s behavior can still be described by a partial recursive function of two string arguments U(p, w), if we use the first argument to represent that portion of the program that is actually read in the course of a particular computation. The expression U (p, w) = x will be used to indicate that the U machine, started with any infinite sequence beginning with p on its program tape and the finite string w on its data tape, performs a halting computation which reads exactly the initial portion p of the program, and leaves output data x on the data tape at the end of the computation. In all other cases (reading less than p, more than p, or failing to halt), the function U(p, w) is undefined. Wherever U(p, w) is defined, we say that p is a self-delimiting program to compute x from w, and we use T(p, w) to represent the time (machine cycles) of the computation. Often we will consider computations without input data; in that case we abbreviate U(p, Λ) and T(p, Λ) as U(p) and T(p) respectively.

The self-delimiting convention for the program tape forces the domain of U and T, for each data input w, to be a prefix set, that is, a set of strings no member of which is the extension of any other member. Any prefix set S obeys the Kraft inequality

p∈S 2−|p| ≤ 1

Besides being self-delimiting with regard to its program tape, the U machine must be efficiently universal in the sense of being able to simulate any other machine of its kind (Turing machines with self-delimiting program tape) with at most an additive constant constant increase in program size and a linear increase in execution time.

Without loss of generality we assume that there exists for the U machine a constant prefix r which has the effect of stacking an instruction to restart the computation when it would otherwise end. This gives the machine the ability to concatenate programs to run consecutively: if U(p, w) = x and U(q, x) = y, then U(rpq, w) = y. Moreover, this concatenation should be efficient in the sense that T (rpq, w) should exceed T (p, w) + T (q, x) by at most O(1). This efficiency of running concatenated programs can be realized with the help of the auxiliary storage to stack the restart instructions.

Sometimes we will generalize U to have access to an “oracle” A, i.e. an infinite look-up table which the machine can consult in the course of its computation. The oracle may be thought of as an arbitrary 0/1-valued function A(x) which the machine can cause to be evaluated by writing the argument x on a special tape and entering a special state of the finite control unit. In the next machine cycle the oracle responds by sending back the value A(x). The time required to evaluate the function is thus linear in the length of its argument. In particular we consider the case in which the information in the oracle is random, each location of the look-up table having been filled by an independent coin toss. Such a random oracle is a function whose values are reproducible, but otherwise unpredictable and uncorrelated.

Let {φAi (p, w): i = 0,1,2…} be an acceptable Gödel numbering of A-partial recursive functions of two arguments and {φAi (p, w)} an associated Blum complexity measure, henceforth referred to as time. An index j is called self-delimiting iff, for all oracles A and all values w of the second argument, the set { x : φAj (x, w) is defined} is a prefix set. A self-delimiting index has efficient concatenation if there exists a string r such that for all oracles A and all strings w, x, y, p, and q,if φAj (p, w) = x and φAj (q, x) = y, then φAj(rpq, w) = y and φAj (rpq, w) = φAj (p, w) + φAj (q, x) + O(1). A self-delimiting index u with efficient concatenation is called efficiently universal iff, for every self-delimiting index j with efficient concatenation, there exists a simulation program s and a linear polynomial L such that for all oracles A and all strings p and w, and

φAu(sp, w) = φAj (p, w)

and

ΦAu(sp, w) ≤ L(ΦAj (p, w))

The functions UA(p,w) and TA(p,w) are defined respectively as φAu(p, w) and ΦAu(p, w), where u is an efficiently universal index.

For any string x, the minimal program, denoted x∗, is min{p : U(p) = x}, the least self-delimiting program to compute x. For any two strings x and w, the minimal program of x relative to w, denoted (x/w)∗, is defined similarly as min{p : U(p,w) = x}.

By contrast to its minimal program, any string x also has a print program, of length |x| + O(log|x|), which simply transcribes the string x from a verbatim description of x contained within the program. The print program is logarithmically longer than x because, being self-delimiting, it must indicate the length as well as the contents of x. Because it makes no effort to exploit redundancies to achieve efficient coding, the print program can be made to run quickly (e.g. linear time in |x|, in the present formalism). Extra information w may help, but cannot significantly hinder, the computation of x, since a finite subprogram would suffice to tell U to simply erase w before proceeding. Therefore, a relative minimal program (x/w)∗ may be much shorter than the corresponding absolute minimal program x∗, but can only be longer by O(1), independent of x and w.

A string is compressible by s bits if its minimal program is shorter by at least s bits than the string itself, i.e. if |x∗| ≤ |x| − s. Similarly, a string x is said to be compressible by s bits relative to a string w if |(x/w)∗| ≤ |x| − s. Regardless of how compressible a string x may be, its minimal program x∗ is compressible by at most an additive constant depending on the universal computer but independent of x. [If (x∗)∗ were much smaller than x∗, then the role of x∗ as minimal program for x would be undercut by a program of the form “execute the result of executing (x∗)∗.”] Similarly, a relative minimal program (x/w)∗ is compressible relative to w by at most a constant number of bits independent of x or w.

The algorithmic probability of a string x, denoted P(x), is defined as {∑2−|p| : U(p) = x}. This is the probability that the U machine, with a random program chosen by coin tossing and an initially blank data tape, will halt with output x. The time-bounded algorithmic probability, Pt(x), is defined similarly, except that the sum is taken only over programs which halt within time t. We use P(x/w) and Pt(x/w) to denote the analogous algorithmic probabilities of one string x relative to another w, i.e. for computations that begin with w on the data tape and halt with x on the data tape.

The algorithmic entropy H(x) is defined as the least integer greater than −log2P(x), and the conditional entropy H(x/w) is defined similarly as the least integer greater than − log2P(x/w). Among the most important properties of the algorithmic entropy is its equality, to within O(1), with the size of the minimal program:

∃c∀x∀wH(x/w) ≤ |(x/w)∗| ≤ H(x/w) + c

The first part of the relation, viz. that algorithmic entropy should be no greater than minimal program size, is obvious, because of the minimal program’s own contribution to the algorithmic probability. The second half of the relation is less obvious. The approximate equality of algorithmic entropy and minimal program size means that there are few near-minimal programs for any given input/output pair (x/w), and that every string gets an O(1) fraction of its algorithmic probability from its minimal program.

Finite strings, such as minimal programs, which are incompressible or nearly so are called algorithmically random. The definition of randomness for finite strings is necessarily a little vague because of the ±O(1) machine-dependence of H(x) and, in the case of strings other than self-delimiting programs, because of the question of how to count the information encoded in the string’s length, as opposed to its bit sequence. Roughly speaking, an n-bit self-delimiting program is considered random (and therefore not ad-hoc as a hypothesis) iff its information content is about n bits, i.e. iff it is incompressible; while an externally delimited n-bit string is considered random iff its information content is about n + H(n) bits, enough to specify both its length and its contents.

For infinite binary sequences (which may be viewed also as real numbers in the unit interval, or as characteristic sequences of sets of natural numbers) randomness can be defined sharply: a sequence X is incompressible, or algorithmically random, if there is an O(1) bound to the compressibility of its initial segments Xn. Intuitively, an infinite sequence is random if it is typical in every way of sequences that might be produced by tossing a fair coin; in other words, if it belongs to no informally definable set of measure zero. Algorithmically random sequences constitute a larger class, including sequences such as Ω which can be specified by ineffective definitions.

The busy beaver function B(n) is the greatest number computable by a self-delimiting program of n bits or fewer. The halting set K is {x : φx(x) converges}. This is the standard representation of the halting problem.

The self-delimiting halting set K0 is the (prefix) set of all self-delimiting programs for the U machine that halt: {p : U(p) converges}.

K and K0 are readily computed from one another (e.g. by regarding the self-delimiting programs as a subset of ordinary programs, the first 2n bits of K0 can be recovered from the first 2n+O(1) bits of K; by encoding each n-bit ordinary program as a self-delimiting program of length n + O(log n), the first 2n bits of K can be recovered from the first 2n+O(log n) bits of K0.)

The halting probability Ω is defined as {2−|p| : U(p) converges}, the probability that the U machine would halt on an infinite input supplied by coin tossing. Ω is thus a real number between 0 and 1.

The first 2n bits of K0 can be computed from the first n bits of Ω, by enumerating halting programs until enough have halted to account for all but 2−n of the total halting probability. The time required for this decoding (between B(n − O(1)) and B(n + H(n) + O(1)) grows faster than any computable function of n. Although K0 is only slowly computable from Ω, the first n bits of Ω can be rapidly computed from the first 2n+H(n)+O(1) bits of K0, by asking about the halting of programs of the form “enumerate halting programs until (if ever) their cumulative weight exceeds q, then halt”, where q is an n-bit rational number…

The Locus of Renormalization. Note Quote.

IMM-3-200-g005

Since symmetries and the related conservation properties have a major role in physics, it is interesting to consider the paradigmatic case where symmetry changes are at the core of the analysis: critical transitions. In these state transitions, “something is not preserved”. In general, this is expressed by the fact that some symmetries are broken or new ones are obtained after the transition (symmetry changes, corresponding to state changes). At the transition, typically, there is the passage to a new “coherence structure” (a non-trivial scale symmetry); mathematically, this is described by the non-analyticity of the pertinent formal development. Consider the classical para-ferromagnetic transition: the system goes from a disordered state to sudden common orientation of spins, up to the complete ordered state of a unique orientation. Or percolation, often based on the formation of fractal structures, that is the iteration of a statistically invariant motif. Similarly for the formation of a snow flake . . . . In all these circumstances, a “new physical object of observation” is formed. Most of the current analyses deal with transitions at equilibrium; the less studied and more challenging case of far form equilibrium critical transitions may require new mathematical tools, or variants of the powerful renormalization methods. These methods change the pertinent object, yet they are based on symmetries and conservation properties such as energy or other invariants. That is, one obtains a new object, yet not necessarily new observables for the theoretical analysis. Another key mathematical aspect of renormalization is that it analyzes point-wise transitions, that is, mathematically, the physical transition is seen as happening in an isolated mathematical point (isolated with respect to the interval topology, or the topology induced by the usual measurement and the associated metrics).

One can say in full generality that a mathematical frame completely handles the determination of the object it describes as long as no strong enough singularity (i.e. relevant infinity or divergences) shows up to break this very mathematical determination. In classical statistical fields (at criticality) and in quantum field theories this leads to the necessity of using renormalization methods. The point of these methods is that when it is impossible to handle mathematically all the interaction of the system in a direct manner (because they lead to infinite quantities and therefore to no relevant account of the situation), one can still analyze parts of the interactions in a systematic manner, typically within arbitrary scale intervals. This allows us to exhibit a symmetry between partial sets of “interactions”, when the arbitrary scales are taken as a parameter.

In this situation, the intelligibility still has an “upward” flavor since renormalization is based on the stability of the equational determination when one considers a part of the interactions occurring in the system. Now, the “locus of the objectivity” is not in the description of the parts but in the stability of the equational determination when taking more and more interactions into account. This is true for critical phenomena, where the parts, atoms for example, can be objectivized outside the system and have a characteristic scale. In general, though, only scale invariance matters and the contingent choice of a fundamental (atomic) scale is irrelevant. Even worse, in quantum fields theories, the parts are not really separable from the whole (this would mean to separate an electron from the field it generates) and there is no relevant elementary scale which would allow ONE to get rid of the infinities (and again this would be quite arbitrary, since the objectivity needs the inter-scale relationship).

In short, even in physics there are situations where the whole is not the sum of the parts because the parts cannot be summed on (this is not specific to quantum fields and is also relevant for classical fields, in principle). In these situations, the intelligibility is obtained by the scale symmetry which is why fundamental scale choices are arbitrary with respect to this phenomena. This choice of the object of quantitative and objective analysis is at the core of the scientific enterprise: looking only at molecules as the only pertinent observable of life is worse than reductionist, it is against the history of physics and its audacious unifications and invention of new observables, scale invariances and even conceptual frames.

As for criticality in biology, there exists substantial empirical evidence that living organisms undergo critical transitions. These are mostly analyzed as limit situations, either never really reached by an organism or as occasional point-wise transitions. Or also, as researchers nicely claim in specific analysis: a biological system, a cell genetic regulatory networks, brain and brain slices …are “poised at criticality”. In other words, critical state transitions happen continually.

Thus, as for the pertinent observables, the phenotypes, we propose to understand evolutionary trajectories as cascades of critical transitions, thus of symmetry changes. In this perspective, one cannot pre-give, nor formally pre-define, the phase space for the biological dynamics, in contrast to what has been done for the profound mathematical frame for physics. This does not forbid a scientific analysis of life. This may just be given in different terms.

As for evolution, there is no possible equational entailment nor a causal structure of determination derived from such entailment, as in physics. The point is that these are better understood and correlated, since the work of Noether and Weyl in the last century, as symmetries in the intended equations, where they express the underlying invariants and invariant preserving transformations. No theoretical symmetries, no equations, thus no laws and no entailed causes allow the mathematical deduction of biological trajectories in pre-given phase spaces – at least not in the deep and strong sense established by the physico-mathematical theories. Observe that the robust, clear, powerful physico-mathematical sense of entailing law has been permeating all sciences, including societal ones, economics among others. If we are correct, this permeating physico-mathematical sense of entailing law must be given up for unentailed diachronic evolution in biology, in economic evolution, and cultural evolution.

As a fundamental example of symmetry change, observe that mitosis yields different proteome distributions, differences in DNA or DNA expressions, in membranes or organelles: the symmetries are not preserved. In a multi-cellular organism, each mitosis asymmetrically reconstructs a new coherent “Kantian whole”, in the sense of the physics of critical transitions: a new tissue matrix, new collagen structure, new cell-to-cell connections . . . . And we undergo millions of mitosis each minute. More, this is not “noise”: this is variability, which yields diversity, which is at the core of evolution and even of stability of an organism or an ecosystem. Organisms and ecosystems are structurally stable, also because they are Kantian wholes that permanently and non-identically reconstruct themselves: they do it in an always different, thus adaptive, way. They change the coherence structure, thus its symmetries. This reconstruction is thus random, but also not random, as it heavily depends on constraints, such as the proteins types imposed by the DNA, the relative geometric distribution of cells in embryogenesis, interactions in an organism, in a niche, but also on the opposite of constraints, the autonomy of Kantian wholes.

In the interaction with the ecosystem, the evolutionary trajectory of an organism is characterized by the co-constitution of new interfaces, i.e. new functions and organs that are the proper observables for the Darwinian analysis. And the change of a (major) function induces a change in the global Kantian whole as a coherence structure, that is it changes the internal symmetries: the fish with the new bladder will swim differently, its heart-vascular system will relevantly change ….

Organisms transform the ecosystem while transforming themselves and they can stand/do it because they have an internal preserved universe. Its stability is maintained also by slightly, yet constantly changing internal symmetries. The notion of extended criticality in biology focuses on the dynamics of symmetry changes and provides an insight into the permanent, ontogenetic and evolutionary adaptability, as long as these changes are compatible with the co-constituted Kantian whole and the ecosystem. As we said, autonomy is integrated in and regulated by constraints, with an organism itself and of an organism within an ecosystem. Autonomy makes no sense without constraints and constraints apply to an autonomous Kantian whole. So constraints shape autonomy, which in turn modifies constraints, within the margin of viability, i.e. within the limits of the interval of extended criticality. The extended critical transition proper to the biological dynamics does not allow one to prestate the symmetries and the correlated phase space.

Consider, say, a microbial ecosystem in a human. It has some 150 different microbial species in the intestinal tract. Each person’s ecosystem is unique, and tends largely to be restored following antibiotic treatment. Each of these microbes is a Kantian whole, and in ways we do not understand yet, the “community” in the intestines co-creates their worlds together, co-creating the niches by which each and all achieve, with the surrounding human tissue, a task closure that is “always” sustained even if it may change by immigration of new microbial species into the community and extinction of old species in the community. With such community membership turnover, or community assembly, the phase space of the system is undergoing continual and open ended changes. Moreover, given the rate of mutation in microbial populations, it is very likely that these microbial communities are also co-evolving with one another on a rapid time scale. Again, the phase space is continually changing as are the symmetries.

Can one have a complete description of actual and potential biological niches? If so, the description seems to be incompressible, in the sense that any linguistic description may require new names and meanings for the new unprestable functions, where functions and their names make only sense in the newly co-constructed biological and historical (linguistic) environment. Even for existing niches, short descriptions are given from a specific perspective (they are very epistemic), looking at a purpose, say. One finds out a feature in a niche, because you observe that if it goes away the intended organisms dies. In other terms, niches are compared by differences: one may not be able to prove that two niches are identical or equivalent (in supporting life), but one may show that two niches are different. Once more, there are no symmetries organizing over time these spaces and their internal relations. Mathematically, no symmetry (groups) nor (partial-) order (semigroups) organize the phase spaces of phenotypes, in contrast to physical phase spaces.

Finally, here is one of the many logical challenges posed by evolution: the circularity of the definition of niches is more than the circularity in the definitions. The “in the definitions” circularity concerns the quantities (or quantitative distributions) of given observables. Typically, a numerical function defined by recursion or by impredicative tools yields a circularity in the definition and poses no mathematical nor logical problems, in contemporary logic (this is so also for recursive definitions of metabolic cycles in biology). Similarly, a river flow, which shapes its own border, presents technical difficulties for a careful equational description of its dynamics, but no mathematical nor logical impossibility: one has to optimize a highly non linear and large action/reaction system, yielding a dynamically constructed geodetic, the river path, in perfectly known phase spaces (momentum and space or energy and time, say, as pertinent observables and variables).

The circularity “of the definitions” applies, instead, when it is impossible to prestate the phase space, so the very novel interaction (including the “boundary conditions” in the niche and the biological dynamics) co-defines new observables. The circularity then radically differs from the one in the definition, since it is at the meta-theoretical (meta-linguistic) level: which are the observables and variables to put in the equations? It is not just within prestatable yet circular equations within the theory (ordinary recursion and extended non – linear dynamics), but in the ever changing observables, the phenotypes and the biological functions in a circularly co-specified niche. Mathematically and logically, no law entails the evolution of the biosphere.