Adjacency of the Possible: Teleology of Autocatalysis. Thought of the Day 140.0

abiogenesisautocatalysis

Given a network of catalyzed chemical reactions, a (sub)set R of such reactions is called:

  1. Reflexively autocatalytic (RA) if every reaction in R is catalyzed by at least one molecule involved in any of the reactions in R;
  2. F-generated (F) if every reactant in R can be constructed from a small “food set” F by successive applications of reactions from R;
  3. Reflexively autocatalytic and F-generated (RAF) if it is both RA and F.

The food set F contains molecules that are assumed to be freely available in the environment. Thus, an RAF set formally captures the notion of “catalytic closure”, i.e., a self-sustaining set supported by a steady supply of (simple) molecules from some food set….

Stuart Kauffman begins with the Darwinian idea of the origin of life in a biological ‘primordial soup’ of organic chemicals and investigates the possibility of one chemical substance to catalyze the reaction of two others, forming new reagents in the soup. Such catalyses may, of course, form chains, so that one reagent catalyzes the formation of another catalyzing another, etc., and self-sustaining loops of reaction chains is an evident possibility in the appropriate chemical environment. A statistical analysis would reveal that such catalytic reactions may form interdependent networks when the rate of catalyzed reactions per molecule approaches one, creating a self-organizing chemical cycle which he calls an ‘autocatalytic set’. When the rate of catalyses per reagent is low, only small local reaction chains form, but as the rate approaches one, the reaction chains in the soup suddenly ‘freeze’ so that what was a group of chains or islands in the soup now connects into one large interdependent network, constituting an ‘autocatalytic set’. Such an interdependent reaction network constitutes the core of the body definition unfolding in Kauffman, and its cyclic character forms the basic precondition for self-sustainment. ‘Autonomous agent’ is an autocatalytic set able to reproduce and to undertake at least one thermodynamic work cycle.

This definition implies two things: reproduction possibility, and the appearance of completely new, interdependent goals in work cycles. The latter idea requires the ability of the autocatalytic set to save energy in order to spend it in its own self-organization, in its search for reagents necessary to uphold the network. These goals evidently introduce a – restricted, to be sure – teleology defined simply by the survival of the autocatalytic set itself: actions supporting this have a local teleological character. Thus, the autocatalytic set may, as it evolves, enlarge its cyclic network by recruiting new subcycles supporting and enhancing it in a developing structure of subcycles and sub-sub-cycles. 

Kauffman proposes that the concept of ‘autonomous agent’ implies a whole new cluster of interdependent concepts. Thus, the autonomy of the agent is defined by ‘catalytic closure’ (any reaction in the network demanding catalysis will get it) which is a genuine Gestalt property in the molecular system as a whole – and thus not in any way derivable from the chemistry of single chemical reactions alone.

Kauffman’s definitions on the basis of speculative chemistry thus entail not only the Kantian cyclic structure, but also the primitive perception and action phases of Uexküll’s functional circle. Thus, Kauffman’s definition of the organism in terms of an ‘autonomous agent’ basically builds on an Uexküllian intuition, namely the idea that the most basic property in a body is metabolism: the constrained, organizing processing of high-energy chemical material and the correlated perception and action performed to localize and utilize it – all of this constituting a metabolic cycle coordinating the organism’s in- and outside, defining teleological action. Perception and action phases are so to speak the extension of the cyclical structure of the closed catalytical set to encompass parts of its surroundings, so that the circle of metabolism may only be completed by means of successful perception and action parts.

The evolution of autonomous agents is taken as the empirical basis for the hypothesis of a general thermodynamic regularity based on non-ergodicity: the Big Bang universe (and, consequently, the biosphere) is not at equilibrium and will not reach equilibrium during the life-time of the universe. This gives rise to Kauffman’s idea of the ‘adjacent possible’. At a given point in evolution, one can define the set of chemical substances which do not exist in the universe – but which is at a distance of one chemical reaction only from a substance already existing in the universe. Biological evolution has, evidently, led to an enormous growth of types of organic macromolecules, and new such substances come into being every day. Maybe there is a sort of chemical potential leading from the actually realized substances and into the adjacent possible which is in some sense driving the evolution? In any case, Kauffman claims the hypothesis that the biosphere as such is supercritical in the sense that there is, in general, more than one action catalyzed by each reagent. Cells, in order not to be destroyed by this chemical storm, must be internally subcritical (even if close to the critical boundary). But if the biosphere as such is, in fact, supercritical, then this distinction seemingly a priori necessitates the existence of a boundary of the agent, protecting it against the environment.

Valencies of Predicates. Thought of the Day 125.0

Naturalizing semiotics - The triadic sign of Charles Sanders Pei

Since icons are the means of representing qualities, they generally constitute the predicative side of more complicated signs:

The only way of directly communicating an idea is by means of an icon; and every indirect method of communicating an idea must depend for its establishment upon the use of an icon. Hence, every assertion must contain an icon or set of icons, or else must contain signs whose meaning is only explicable by icons. The idea which the set of icons (or the equivalent of a set of icons) contained in an assertion signifies may be termed the predicate of the assertion. (Collected Papers of Charles Sanders Peirce)

Thus, the predicate in logic as well as ordinary language is essentially iconic. It is important to remember here Peirce’s generalization of the predicate from the traditional subject-copula-predicate structure. Predicates exist with more than one subject slot; this is the basis for Peirce’s logic of relatives and permits at the same time enlarging the scope of logic considerably and approaching it to ordinary language where several-slot-predicates prevail, for instance in all verbs with a valency larger than one. In his definition of these predicates by means of valency, that is, number of empty slots in which subjects or more generally indices may be inserted, Peirce is actually the founder of valency grammar in the tradition of Tesnière. So, for instance, the structure ‘_ gives _ to _’ where the underlinings refer to slots, is a trivalent predicate. Thus, the word classes associated with predicates are not only adjectives, but verbs and common nouns; in short all descriptive features in language are predicative.

This entails the fact that the similarity charted in icons covers more complicated cases than does the ordinary use of the word. Thus,

where ordinary logic considers only a single, special kind of relation, that of similarity, – a relation, too, of a particularly featureless and insignificant kind, the logic of relatives imagines a relation in general to be placed. Consequently, in place of the class, which is composed of a number of individual objects or facts brought together by means of their relation of similarity, the logic of relatives considers the system, which is composed of objects brought together by any kind of relations whatsoever. (The New Elements of Mathematics)

This allows for abstract similarity because one phenomenon may be similar to another in so far as both of them partake in the same relation, or more generally, in the same system – relations and systems being complicated predicates.

But not only more abstract features may thus act as the qualities invoked in an icon; these qualities may be of widely varying generality:

But instead of a single icon, or sign by resemblance of a familiar image or ‘dream’, evocable at will, there may be a complexus of such icons, forming a composite image of which the whole is not familiar. But though the whole is not familiar, yet not only are the parts familiar images, but there will also be a familiar image in its mode of composition. ( ) The sort of idea which an icon embodies, if it be such that it can convey any positive information, being applicable to some things but not to others, is called a first intention. The idea embodied by an icon, which cannot of itself convey any information, being applicable to everything or nothing, but which may, nevertheless, be useful in modifying other icons, is called a second intention. 

What Peirce distinguishes in these scholastic standard notions borrowed from Aquinas via Scotus, is, in fact, the difference between Husserlian formal and material ontology. Formal qualities like genus, species, dependencies, quantities, spatial and temporal extension, and so on are of course attributable to any phenomenon and do not as such, in themselves, convey any information in so far as they are always instantiated in and thus, like other Second Intentions, in the Husserlian manner dependent upon First Intentions, but they are nevertheless indispensable in the composition of first intentional descriptions. The fact that a certain phenomenon is composed of parts, has a form, belongs to a species, has an extension, has been mentioned in a sentence etc. does not convey the slightest information of it until it by means of first intentional icons is specified which parts in which composition, which species, which form, etc. Thus, here Peirce makes a hierarchy of icons which we could call material and formal, respectively, in which the latter are dependent on the former. One may note in passing that the distinctions in Peirce’s semiotics are themselves built upon such Second Intentions; thus it is no wonder that every sign must possess some Iconic element. Furthermore, the very anatomy of the proposition becomes just like in Husserlian rational grammar a question of formal, synthetic a priori regularities.

Among Peirce’s forms of inference, similarity plays a certain role within abduction, his notion for a ‘qualified guess’ in which a particular fact gives rise to the formation of a hypothesis which would have the fact in question as a consequence. Many such different hypotheses are of course possible for a given fact, and this inference is not necessary, but merely possible, suggestive. Precisely for this reason, similarity plays a seminal role here: an

originary Argument, or Abduction, is an argument which presents facts in its Premiss which presents a similarity to the fact stated in the conclusion but which could perfectly be true without the latter being so.

The hypothesis proposed is abducted by some sort of iconic relation to the fact to be explained. Thus, similarity is the very source of new ideas – which must subsequently be controlled deductively and inductively, to be sure. But iconicity does not only play this role in the contents of abductive inference, it plays an even more important role in the very form of logical inference in general:

Given a conventional or other general sign of an object, to deduce any other truth than that which it explicitly signifies, it is necessary, in all cases, to replace that sign by an icon. This capacity of revealing unexpected truth is precisely that wherein the utility of algebraic formulae consists, so that the iconic character is the prevailing one.

The very form of inferences depends on it being an icon; thus for Peirce the syllogistic schema inherent in reasoning has an iconic character:

‘Whenever one thing suggests another, both are together in the mind for an instant. [ ] every proposition like the premiss, that is having an icon like it, would involve [ ] a proposition related to it as the conclusion [ ]’. Thus, first and foremost deduction is an icon: ‘I suppose it would be the general opinion of logicians, as it certainly was long mine, that the Syllogism is a Symbol, because of its Generality.’ …. The truth, however, appears to be that all deductive reasoning, even simple syllogism, involves an element of observation; namely deduction consists in constructing an icon or diagram the relation of whose parts shall present a complete analogy with those of the parts of the objects of reasoning, of experimenting upon this image in the imagination, and of observing the result so as to discover unnoticed and hidden relations among the parts. 

It then is no wonder that synthetic a priori truths exist – even if Peirce prefers notions like ‘observable, universal truths’ – the result of a deduction may contain more than what is immediately present in the premises, due to the iconic quality of the inference.

Time and World-Lines

Let γ: [s1, s2] → M be a smooth, future-directed timelike curve in M with tangent field ξa. We associate with it an elapsed proper time (relative to gab) given by

∥γ∥= ∫s1s2 (gabξaξb)1/2 ds

This elapsed proper time is invariant under reparametrization of γ and is just what we would otherwise describe as the length of (the image of) γ . The following is another basic principle of relativity theory:

Clocks record the passage of elapsed proper time along their world-lines.

Again, a number of qualifications and comments are called for. We have taken for granted that we know what “clocks” are. We have assumed that they have worldlines (rather than worldtubes). And we have overlooked the fact that ordinary clocks (e.g., the alarm clock on the nightstand) do not do well at all when subjected to extreme acceleration, tidal forces, and so forth. (Try smashing the alarm clock against the wall.) Again, these concerns are important and raise interesting questions about the role of idealization in the formulation of physical theory. (One might construe an “ideal clock” as a point-size test object that perfectly records the passage of proper time along its worldline, and then take the above principle to assert that real clocks are, under appropriate conditions and to varying degrees of accuracy, approximately ideal.) But they do not have much to do with relativity theory as such. Similar concerns arise when one attempts to formulate corresponding principles about clock behavior within the framework of Newtonian theory.

Now suppose that one has determined the conformal structure of spacetime, say, by using light rays. Then one can use clocks, rather than free particles, to determine the conformal factor.

Let g′ab be a second smooth metric on M, with g′ab = Ω2gab. Further suppose that the two metrics assign the same lengths to timelike curves – i.e., ∥γ∥g′ab = ∥γ∥gab ∀ smooth, timelike curves γ: I → M. Then Ω = 1 everywhere. (Here ∥γ∥gab is the length of γ relative to gab.)

Let ξoa be an arbitrary timelike vector at an arbitrary point p in M. We can certainly find a smooth, timelike curve γ: [s1, s2] → M through p whose tangent at p is ξoa. By our hypothesis, ∥γ∥g′ab = ∥γ∥gab. So, if ξa is the tangent field to γ,

s1s2 (g’ab ξaξb)1/2 ds = ∫s1s2 (gabξaξb)1/2 ds

∀ s in [s1, s2]. It follows that g′abξaξb = gabξaξb at every point on the image of γ. In particular, it follows that (g′ab − gab) ξoa ξob = 0 at p. But ξoa was an arbitrary timelike vector at p. So, g′ab = gab at our arbitrary point p. The principle gives the whole story of relativistic clock behavior. In particular, it implies the path dependence of clock readings. If two clocks start at an event p and travel along different trajectories to an event q, then, in general, they will record different elapsed times for the trip. This is true no matter how similar the clocks are. (We may stipulate that they came off the same assembly line.) This is the case because, as the principle asserts, the elapsed time recorded by each of the clocks is just the length of the timelike curve it traverses from p to q and, in general, those lengths will be different.

Suppose we consider all future-directed timelike curves from p to q. It is natural to ask if there are any that minimize or maximize the recorded elapsed time between the events. The answer to the first question is “no.” Indeed, one then has the following proposition:

Let p and q be events in M such that p ≪ q. Then, for all ε > 0, there exists a smooth, future directed timelike curve γ from p to q with ∥γ ∥ < ε. (But there is no such curve with length 0, since all timelike curves have non-zero length.)

Untitled

If there is a smooth, timelike curve connecting p and q, there is also a jointed, zig-zag null curve connecting them. It has length 0. But we can approximate the jointed null curve arbitrarily closely with smooth timelike curves that swing back and forth. So (by the continuity of the length function), we should expect that, for all ε > 0, there is an approximating timelike curve that has length less than ε.

The answer to the second question (“Can one maximize recorded elapsed time between p and q?”) is “yes” if one restricts attention to local regions of spacetime. In the case of positive definite metrics, i.e., ones with signature of form (n, 0) – we know geodesics are locally shortest curves. The corresponding result for Lorentzian metrics is that timelike geodesics are locally longest curves.

Let γ: I → M be a smooth, future-directed, timelike curve. Then γ can be reparametrized so as to be a geodesic iff ∀ s ∈ I there exists an open set O containing γ(s) such that , ∀ s1, s2 ∈ I with s1 ≤ s ≤ s2, if the image of γ′ = γ|[s1, s2] is contained in O, then γ′ (and its reparametrizations) are longer than all other timelike curves in O from γ(s1) to γ(s2). (Here γ|[s1, s2] is the restriction of γ to the interval [s1, s2].)

Of all clocks passing locally from p to q, the one that will record the greatest elapsed time is the one that “falls freely” from p to q. To get a clock to read a smaller elapsed time than the maximal value, one will have to accelerate the clock. Now, acceleration requires fuel, and fuel is not free. So the above proposition has the consequence that (locally) “saving time costs money.” And proposition before that may be taken to imply that “with enough money one can save as much time as one wants.” The restriction here to local regions of spacetime is essential. The connection described between clock behavior and acceleration does not, in general, hold on a global scale. In some relativistic spacetimes, one can find future-directed timelike geodesics connecting two events that have different lengths, and so clocks following the curves will record different elapsed times between the events even though both are in a state of free fall. Furthermore – this follows from the preceding claim by continuity considerations alone – it can be the case that of two clocks passing between the events, the one that undergoes acceleration during the trip records a greater elapsed time than the one that remains in a state of free fall. (A rolled-up version of two-dimensional Minkowski spacetime provides a simple example)

Untitled

Two-dimensional Minkowski spacetime rolledup into a cylindrical spacetime. Three timelike curves are displayed: γ1 and γ3 are geodesics; γ2 is not; γ1 is longer than γ2; and γ2 is longer than γ3.

The connection we have been considering between clock behavior and acceleration was once thought to be paradoxical. Recall the so-called “clock paradox.” Suppose two clocks, A and B, pass from one event to another in a suitably small region of spacetime. Further suppose A does so in a state of free fall but B undergoes acceleration at some point along the way. Then, we know, A will record a greater elapsed time for the trip than B. This was thought paradoxical because it was believed that relativity theory denies the possibility of distinguishing “absolutely” between free-fall motion and accelerated motion. (If we are equally well entitled to think that it is clock B that is in a state of free fall and A that undergoes acceleration, then, by parity of reasoning, it should be B that records the greater elapsed time.) The resolution of the paradox, if one can call it that, is that relativity theory makes no such denial. The situations of A and B here are not symmetric. The distinction between accelerated motion and free fall makes every bit as much sense in relativity theory as it does in Newtonian physics.

A “timelike curve” should be understood to be a smooth, future-directed, timelike curve parametrized by elapsed proper time – i.e., by arc length. In that case, the tangent field ξa of the curve has unit length (ξaξa = 1). And if a particle happens to have the image of the curve as its worldline, then, at any point, ξa is called the particle’s four-velocity there.

Mathematical Reductionism: As Case Via C. S. Peirce’s Hypothetical Realism.

mathematical-beauty

During the 20th century, the following epistemology of mathematics was predominant: a sufficient condition for the possibility of the cognition of objects is that these objects can be reduced to set theory. The conditions for the possibility of the cognition of the objects of set theory (the sets), in turn, can be given in various manners; in any event, the objects reduced to sets do not need an additional epistemological discussion – they “are” sets. Hence, such an epistemology relies ultimately on ontology. Frege conceived the axioms as descriptions of how we actually manipulate extensions of concepts in our thinking (and in this sense as inevitable and intuitive “laws of thought”). Hilbert admitted the use of intuition exclusively in metamathematics where the consistency proof is to be done (by which the appropriateness of the axioms would be established); Bourbaki takes the axioms as mere hypotheses. Hence, Bourbaki’s concept of justification is the weakest of the three: “it works as long as we encounter no contradiction”; nevertheless, it is still epistemology, because from this hypothetical-deductive point of view, one insists that at least a proof of relative consistency (i.e., a proof that the hypotheses are consistent with the frequently tested and approved framework of set theory) should be available.

Doing mathematics, one tries to give proofs for propositions, i.e., to deduce the propositions logically from other propositions (premisses). Now, in the reductionist perspective, a proof of a mathematical proposition yields an insight into the truth of the proposition, if the premisses are already established (if one has already an insight into their truth); this can be done by giving in turn proofs for them (in which new premisses will occur which ask again for an insight into their truth), or by agreeing to put them at the beginning (to consider them as axioms or postulates). The philosopher tries to understand how the decision about what propositions to take as axioms is arrived at, because he or she is dissatisfied with the reductionist claim that it is on these axioms that the insight into the truth of the deduced propositions rests. Actually, this epistemology might contain a short-coming since Poincaré (and Wittgenstein) stressed that to have a proof of a proposition is by no means the same as to have an insight into its truth.

Attempts to disclose the ontology of mathematical objects reveal the following tendency in epistemology of mathematics: Mathematics is seen as suffering from a lack of ontological “determinateness”, namely that this science (contrarily to many others) does not concern material data such that the concept of material truth is not available (especially in the case of the infinite). This tendency is embarrassing since on the other hand mathematical cognition is very often presented as cognition of the “greatest possible certainty” just because it seems not to be bound to material evidence, let alone experimental check.

The technical apparatus developed by the reductionist and set-theoretical approach nowadays serves other purposes, partly for the reason that tacit beliefs about sets were challenged; the explanations of the science which it provides are considered as irrelevant by the practitioners of this science. There is doubt that the above mentioned sufficient condition is also necessary; it is not even accepted throughout as a sufficient one. But what happens if some objects, as in the case of category theory, do not fulfill the condition? It seems that the reductionist approach, so to say, has been undocked from the historical development of the discipline in several respects; an alternative is required.

Anterior to Peirce, epistemology was dominated by the idea of a grasp of objects; since Descartes, intuition was considered throughout as a particular, innate capacity of cognition (even if idealists thought that it concerns the general, and empiricists that it concerns the particular). The task of this particular capacity was the foundation of epistemology; already from Aristotle’s first premisses of syllogism, what was aimed at was to go back to something first. In this traditional approach, it is by the ontology of the objects that one hopes to answer the fundamental question concerning the conditions for the possibility of the cognition of these objects. One hopes that there are simple “basic objects” to which the more complex objects can be reduced and whose cognition is possible by common sense – be this an innate or otherwise distinguished capacity of cognition common to all human beings. Here, epistemology is “wrapped up” in (or rests on) ontology; to do epistemology one has to do ontology first.

Peirce shares Kant’s opinion according to which the object depends on the subject; however, he does not agree that reason is the crucial means of cognition to be criticised. In his paper “Questions concerning certain faculties claimed for man”, he points out the basic assumption of pragmatist philosophy: every cognition is semiotically mediated. He says that there is no immediate cognition (a cognition which “refers immediately to its object”), but that every cognition “has been determined by a previous cognition” of the same object. Correspondingly, Peirce replaces critique of reason by critique of signs. He thinks that Kant’s distinction between the world of things per se (Dinge an sich) and the world of apparition (Erscheinungswelt) is not fruitful; he rather distinguishes the world of the subject and the world of the object, connected by signs; his position consequently is a “hypothetical realism” in which all cognitions are only valid with reservations. This position does not negate (nor assert) that the object per se (with the semiotical mediation stripped off) exists, since such assertions of “pure” existence are seen as necessarily hypothetical (that means, not withstanding philosophical criticism).

By his basic assumption, Peirce was led to reveal a problem concerning the subject matter of epistemology, since this assumption means in particular that there is no intuitive cognition in the classical sense (which is synonymous to “immediate”). Hence, one could no longer consider cognitions as objects; there is no intuitive cognition of an intuitive cognition. Intuition can be no more than a relation. “All the cognitive faculties we know of are relative, and consequently their products are relations”. According to this new point of view, intuition cannot any longer serve to found epistemology, in departure from the former reductionist attitude. A central argument of Peirce against reductionism or, as he puts it,

the reply to the argument that there must be a first is as follows: In retracing our way from our conclusions to premisses, or from determined cognitions to those which determine them, we finally reach, in all cases, a point beyond which the consciousness in the determined cognition is more lively than in the cognition which determines it.

Peirce gives some examples derived from physiological observations about perception, like the fact that the third dimension of space is inferred, and the blind spot of the retina. In this situation, the process of reduction loses its legitimacy since it no longer fulfills the function of cognition justification. At such a place, something happens which I would like to call an “exchange of levels”: the process of reduction is interrupted in that the things exchange the roles performed in the determination of a cognition: what was originally considered as determining is now determined by what was originally considered as asking for determination.

The idea that contents of cognition are necessarily provisional has an effect on the very concept of conditions for the possibility of cognitions. It seems that one can infer from Peirce’s words that what vouches for a cognition is not necessarily the cognition which determines it but the livelyness of our consciousness in the cognition. Here, “to vouch for a cognition” means no longer what it meant before (which was much the same as “to determine a cognition”), but it still means that the cognition is (provisionally) reliable. This conception of the livelyness of our consciousness roughly might be seen as a substitute for the capacity of intuition in Peirce’s epistemology – but only roughly, since it has a different coverage.

La Mettrie’s Man-Machine

tumblr_mbxz7eiQ0w1qzb6vvo1_1280

Philosophers’ theories regarding the human soul? Basically there are just two of them: the first and older of the two is materialism; the second is spiritualism.

The metaphysicians who implied that matter might well have the power to think didn’t disgrace themselves as thinkers. Why not? Because they had the advantage (for in this case it is one) of expressing themselves badly. To ask whether unaided matter can think is like asking whether unaided matter can indicate the time. It’s clear already that we aren’t going to hit the rock on which Locke had the bad luck to come to grief in his speculations about whether there could be thinking matter.

The Leibnizians with their ‘monads’ have constructed an unintelligible hypothesis. Rather than materialising the soul, they spiritualised matter. How can we define a being like the so-called ‘monad’ whose nature is absolutely unknown to us?

Descartes and all the Cartesians – among whom Malebranche’s followers have long been included – went wrong in the same way, namely by dogmatising about something of which they knew nothing. They admitted two distinct substances in man, as if they had seen and counted them!

man machine

String’s Depth of Burial

string_conversion_03.2013

A string’s depth might be defined as the execution time of its minimal program.

The difficulty with this definition arises in cases where the minimal program is only a few bits smaller than some much faster program, such as a print program, to compute the same output x. In this case, slight changes in x may induce arbitrarily large changes in the run time of the minimal program, by changing which of the two competing programs is minimal. Analogous instability manifests itself in translating programs from one universal machine to another. This instability emphasizes the essential role of the quantity of buried redundancy, not as a measure of depth, but as a certifier of depth. In terms of the philosophy-of-science metaphor, an object whose minimal program is only a few bits smaller than its print program is like an observation that points to a nontrivial hypothesis, but with only a low level of statistical confidence.

To adequately characterize a finite string’s depth one must therefore consider the amount of buried redundancy as well as the depth of its burial. A string’s depth at significance level s might thus be defined as that amount of time complexity which is attested by s bits worth of buried redundancy. This characterization of depth may be formalized in several ways.

A string’s depth at significance level s be defined as the time required to compute the string by a program no more than s bits larger than the minimal program.

This definition solves the stability problem, but is unsatisfactory in the way it treats multiple programs of the same length. Intuitively, 2k distinct (n + k)-bit programs that compute same output ought to be accorded the same weight as one n-bit program; but, by the present definition, they would be given no more weight than one (n + k)-bit program.

A string’s depth at signicifcance level s depth might be defined as the time t required for the string’s time-bounded algorithmic probability Pt(x) to rise to within a factor 2−s of its asymptotic time-unbounded value P(x).

This formalizes the notion that for the string to have originated by an effective process of t steps or fewer is less plausible than for the first s tosses of a fair coin all to come up heads.

It is not known whether there exist strings that are deep, in other words, strings having no small fast programs, even though they have enough large fast programs to contribute a significant fraction of their algorithmic probability. Such strings might be called deterministically deep but probabilistically shallow, because their chance of being produced quickly in a probabilistic computation (e.g. one where the input bits of U are supplied by coin tossing) is significant compared to their chance of being produced slowly. The question of whether such strings exist is probably hard to answer because it does not relativize uniformly. Deterministic and probabilistic depths are not very different relative to a random coin-toss oracle A of the equality of random-oracle-relativized deterministic and probabilistic polynomial time complexity classes; but they can be very different relative to an oracle B deliberately designed to hide information from deterministic computations (this parallels Hunt’s proof that deterministic and probabilistic polynomial time are unequal relative to such an oracle).

(Depth of Finite Strings): Let x and w be strings and s a significance parameter. A string’s depth at significance level s, denoted Ds(x), will be defined as min{T(p) : (|p|−|p| < s)∧(U(p) = x)}, the least time required to compute it by a s-incompressible program. At any given significance level, a string will be called t-deep if its depth exceeds t, and t-shallow otherwise.

The difference between this definition and the previous one is rather subtle philosophically and not very great quantitatively. Philosophically, when each individual hypothesis for the rapid origin of x is implausible at the 2−s confidence level, then it requires only that a weighted average of all such hypotheses be implausible.

There exist constants c1 and c2 such that for any string x, if programs running in time ≤ t contribute a fraction between 2−s and 2−s+1 of the string’s total algorithmic probability, then x has depth at most t at significance level s + c1 and depth at least t at significance level s − min{H(s), H(t)} − c2.

Proof : The first part follows easily from the fact that any k-compressible self-delimiting program p is associated with a unique, k − O(1) bits shorter, program of the form “execute the result of executing p∗”. Therefore there exists a constant c1 such that if all t-fast programs for x were s + c1– compressible, the associated shorter programs would contribute more than the total algorithmic probability of x. The second part follows because, roughly, if fast programs contribute only a small fraction of the algorithmic probability of x, then the property of being a fast program for x is so unusual that no program having that property can be random. More precisely, the t-fast programs for x constitute a finite prefix set, a superset S of which can be computed by a program of size H(x) + min{H(t), H(s)} + O(1) bits. (Given x∗ and either t∗ or s∗, begin enumerating all self-delimiting programs that compute x, in order of increasing running time, and quit when either the running time exceeds t or the accumulated measure of programs so far enumerated exceeds 2−(H(x)−s)). Therefore there exists a constant c2 such that, every member of S, and thus every t-fast program for x, is compressible by at least s − min{H(s), H(t)} − O(1) bits.

The ability of universal machines to simulate one another efficiently implies a corresponding degree of machine-independence for depth: for any two efficiently universal machines of the sort considered here, there exists a constant c and a linear polynomial L such that for any t, strings whose (s+c)-significant depth is at least L(t) on one machine will have s-significant depth at least t on the other.

Depth of one string relative to another may be defined analogously, and represents the plausible time required to produce one string, x, from another, w.

(Relative Depth of Finite Strings): For any two strings w and x, the depth of x relative to w at significance level s, denoted Ds(x/w), will be defined as min{T(p, w) : (|p|−|(p/w)∗| < s)∧(U(p, w) = x)}, the least time required to compute x from w by a program that is s-incompressible relative to w.

Depth of a string relative to its length is a particularly useful notion, allowing us, as it were, to consider the triviality or nontriviality of the “content” of a string (i.e. its bit sequence), independent of its “form” (length). For example, although the infinite sequence 000… is intuitively trivial, its initial segment 0n is deep whenever n is deep. However, 0n is always shallow relative to n, as is, with high probability, a random string of length n.

In order to adequately represent the intuitive notion of stored mathematical work, it is necessary that depth obey a “slow growth” law, i.e. that fast deterministic processes be unable to transform a shallow object into a deep one, and that fast probabilistic processes be able to do so only with low probability.

(Slow Growth Law): Given any data string x and two significance parameters s2 > s1, a random program generated by coin tossing has probability less than 2−(s2−s1)+O(1) of transforming x into an excessively deep output, i.e. one whose s2-significant depth exceeds the s1-significant depth of x plus the run time of the transforming program plus O(1). More precisely, there exist positive constants c1, c2 such that for all strings x, and all pairs of significance parameters s2 > s1, the prefix set {q : Ds2(U(q, x)) > Ds1(x) + T(q, x) + c1} has measure less than 2−(s2−s1)+c2.

Proof: Let p be a s1-incompressible program which computes x in time Ds1(x), and let r be the restart prefix mentioned in the definition of the U machine. Let Q be the prefix set {q : Ds2(U(q, x)) > T(q, x) + Ds1(x) + c1}, where the constant c1 is sufficient to cover the time overhead of concatenation. For all q ∈ Q, the program rpq by definition computes some deep result U(q, x) in less time than that result’s own s2-significant depth, and so rpq must be compressible by s2 bits. The sum of the algorithmic probabilities of strings of the form rpq, where q ∈ Q, is therefore

Σq∈Q P(rpq)< Σq∈Q 2−|rpq| + s2 = 2−|r|−|p|+s2 μ(Q)

On the other hand, since the self-delimiting program p can be recovered from any string of the form rpq (by deleting r and executing the remainder pq until halting occurs, by which time exactly p will have been read), the algorithmic probability of p is at least as great (within a constant factor) as the sum of the algorithmic probabilities of the strings {rpq : q ∈ Q} considered above:

P(p) > μ(Q) · 2−|r|−|p|+s2−O(1)

Recalling the fact that minimal program size is equal within a constant factor to the −log of algorithmic probability, and the s1-incompressibility of p, we have P(p) < 2−(|p|−s1+O(1)), and therefore finally

μ(Q) < 2−(s2−s1)+O(1), which was to be demonstrated.

Accelerated Capital as an Anathema to the Principles of Communicative Action. A Note Quote on the Reciprocity of Capital and Ethicality of Financial Economics

continuum

Markowitz portfolio theory explicitly observes that portfolio managers are not (expected) utility maximisers, as they diversify, and offers the hypothesis that a desire for reward is tempered by a fear of uncertainty. This model concludes that all investors should hold the same portfolio, their individual risk-reward objectives are satisfied by the weighting of this ‘index portfolio’ in comparison to riskless cash in the bank, a point on the capital market line. The slope of the Capital Market Line is the market price of risk, which is an important parameter in arbitrage arguments.

Merton had initially attempted to provide an alternative to Markowitz based on utility maximisation employing stochastic calculus. He was only able to resolve the problem by employing the hedging arguments of Black and Scholes, and in doing so built a model that was based on the absence of arbitrage, free of turpe-lucrum. That the prescriptive statement “it should not be possible to make sure profits”, is a statement explicit in the Efficient Markets Hypothesis and in employing an Arrow security in the context of the Law of One Price. Based on these observations, we conject that the whole paradigm for financial economics is built on the principle of balanced reciprocity. In order to explore this conjecture we shall examine the relationship between commerce and themes in Pragmatic philosophy. Specifically, we highlight Robert Brandom’s (Making It Explicit Reasoning, Representing, and Discursive Commitment) position that there is a pragmatist conception of norms – a notion of primitive correctnesses of performance implicit in practice that precludes and are presupposed by their explicit formulation in rules and principles.

The ‘primitive correctnesses’ of commercial practices was recognised by Aristotle when he investigated the nature of Justice in the context of commerce and then by Olivi when he looked favourably on merchants. It is exhibited in the doux-commerce thesis, compare Fourcade and Healey’s contemporary description of the thesis Commerce teaches ethics mainly through its communicative dimension, that is, by promoting conversations among equals and exchange between strangers, with Putnam’s description of Habermas’ communicative action based on the norm of sincerity, the norm of truth-telling, and the norm of asserting only what is rationally warranted …[and] is contrasted with manipulation (Hilary Putnam The Collapse of the Fact Value Dichotomy and Other Essays)

There are practices (that should be) implicit in commerce that make it an exemplar of communicative action. A further expression of markets as centres of communication is manifested in the Asian description of a market brings to mind Donald Davidson’s (Subjective, Intersubjective, Objective) argument that knowledge is not the product of a bipartite conversations but a tripartite relationship between two speakers and their shared environment. Replacing the negotiation between market agents with an algorithm that delivers a theoretical price replaces ‘knowledge’, generated through communication, with dogma. The problem with the performativity that Donald MacKenzie (An Engine, Not a Camera_ How Financial Models Shape Markets) is concerned with is one of monism. In employing pricing algorithms, the markets cannot perform to something that comes close to ‘true belief’, which can only be identified through communication between sapient humans. This is an almost trivial observation to (successful) market participants, but difficult to appreciate by spectators who seek to attain ‘objective’ knowledge of markets from a distance. To appreciate the relevance to financial crises of the position that ‘true belief’ is about establishing coherence through myriad triangulations centred on an asset rather than relying on a theoretical model.

Shifting gears now, unless the martingale measure is a by-product of a hedging approach, the price given by such martingale measures is not related to the cost of a hedging strategy therefore the meaning of such ‘prices’ is not clear. If the hedging argument cannot be employed, as in the markets studied by Cont and Tankov (Financial Modelling with Jump Processes), there is no conceptual framework supporting the prices obtained from the Fundamental Theorem of Asset Pricing. This lack of meaning can be interpreted as a consequence of the strict fact/value dichotomy in contemporary mathematics that came with the eclipse of Poincaré’s Intuitionism by Hilbert’s Formalism and Bourbaki’s Rationalism. The practical problem of supporting the social norms of market exchange has been replaced by a theoretical problem of developing formal models of markets. These models then legitimate the actions of agents in the market without having to make reference to explicitly normative values.

The Efficient Market Hypothesis is based on the axiom that the market price is determined by the balance between supply and demand, and so an increase in trading facilitates the convergence to equilibrium. If this axiom is replaced by the axiom of reciprocity, the justification for speculative activity in support of efficient markets disappears. In fact, the axiom of reciprocity would de-legitimise ‘true’ arbitrage opportunities, as being unfair. This would not necessarily make the activities of actual market arbitrageurs illicit, since there are rarely strategies that are without the risk of a loss, however, it would place more emphasis on the risks of speculation and inhibit the hubris that has been associated with the prelude to the recent Crisis. These points raise the question of the legitimacy of speculation in the markets. In an attempt to understand this issue Gabrielle and Reuven Brenner identify the three types of market participant. ‘Investors’ are preoccupied with future scarcity and so defer income. Because uncertainty exposes the investor to the risk of loss, investors wish to minimise uncertainty at the cost of potential profits, this is the basis of classical investment theory. ‘Gamblers’ will bet on an outcome taking odds that have been agreed on by society, such as with a sporting bet or in a casino, and relates to de Moivre’s and Montmort’s ‘taming of chance’. ‘Speculators’ bet on a mis-calculation of the odds quoted by society and the reason why speculators are regarded as socially questionable is that they have opinions that are explicitly at odds with the consensus: they are practitioners who rebel against a theoretical ‘Truth’. This is captured in Arjun Appadurai’s argument that the leading agents in modern finance believe in their capacity to channel the workings of chance to win in the games dominated by cultures of control . . . [they] are not those who wish to “tame chance” but those who wish to use chance to animate the otherwise deterministic play of risk [quantifiable uncertainty]”.

In the context of Pragmatism, financial speculators embody pluralism, a concept essential to Pragmatic thinking and an antidote to the problem of radical uncertainty. Appadurai was motivated to study finance by Marcel Mauss’ essay Le Don (The Gift), exploring the moral force behind reciprocity in primitive and archaic societies and goes on to say that the contemporary financial speculator is “betting on the obligation of return”, and this is the fundamental axiom of contemporary finance. David Graeber (Debt The First 5,000 Years) also recognises the fundamental position reciprocity has in finance, but where as Appadurai recognises the importance of reciprocity in the presence of uncertainty, Graeber essentially ignores uncertainty in his analysis that ends with the conclusion that “we don’t ‘all’ have to pay our debts”. In advocating that reciprocity need not be honoured, Graeber is not just challenging contemporary capitalism but also the foundations of the civitas, based on equality and reciprocity. The origins of Graeber’s argument are in the first half of the nineteenth century. In 1836 John Stuart Mill defined political economy as being concerned with [man] solely as a being who desires to possess wealth, and who is capable of judging of the comparative efficacy of means for obtaining that end.

In Principles of Political Economy With Some of Their Applications to Social Philosophy, Mill defended Thomas Malthus’ An Essay on the Principle of Population, which focused on scarcity. Mill was writing at a time when Europe was struck by the Cholera pandemic of 1829–1851 and the famines of 1845–1851 and while Lord Tennyson was describing nature as “red in tooth and claw”. At this time, society’s fear of uncertainty seems to have been replaced by a fear of scarcity, and these standards of objectivity dominated economic thought through the twentieth century. Almost a hundred years after Mill, Lionel Robbins defined economics as “the science which studies human behaviour as a relationship between ends and scarce means which have alternative uses”. Dichotomies emerge in the aftermath of the Cartesian revolution that aims to remove doubt from philosophy. Theory and practice, subject and object, facts and values, means and ends are all separated. In this environment ex cathedra norms, in particular utility (profit) maximisation, encroach on commercial practice.

In order to set boundaries on commercial behaviour motivated by profit maximisation, particularly when market uncertainty returned after the Nixon shock of 1971, society imposes regulations on practice. As a consequence, two competing ethics, functional Consequential ethics guiding market practices and regulatory Deontological ethics attempting stabilise the system, vie for supremacy. It is in this debilitating competition between two essentially theoretical ethical frameworks that we offer an explanation for the Financial Crisis of 2007-2009: profit maximisation, not speculation, is destabilising in the presence of radical uncertainty and regulation cannot keep up with motivated profit maximisers who can justify their actions through abstract mathematical models that bare little resemblance to actual markets. An implication of reorienting financial economics to focus on the markets as centres of ‘communicative action’ is that markets could become self-regulating, in the same way that the legal or medical spheres are self-regulated through professions. This is not a ‘libertarian’ argument based on freeing the Consequential ethic from a Deontological brake. Rather it argues that being a market participant entails restricting norms on the agent such as sincerity and truth telling that support knowledge creation, of asset prices, within a broader objective of social cohesion. This immediately calls into question the legitimacy of algorithmic/high- frequency trading that seems an anathema in regard to the principles of communicative action.

Discontinuous Reality. Thought of the Day 61.0

discontinuousReality-2015

Convention is an invention that plays a distinctive role in Poincaré’s philosophy of science. In terms of how they contribute to the framework of science, conventions are not empirical. They are presupposed in certain empirical tests, so they are (relatively) isolated from doubt. Yet they are not pure stipulations, or analytic, since conventional choices are guided by, and modified in the light of, experience. Finally they have a different character from genuine mathematical intuitions, which provide a fixed, a priori synthetic foundation for mathematics. Conventions are thus distinct from the synthetic a posteriori (empirical), the synthetic a priori and the analytic a priori.

The importance of Poincaré’s invention lies in the recognition of a new category of proposition and its centrality in scientific judgment. This is more important than the special place Poincaré gives Euclidean geometry. Nevertheless, it’s possible to accommodate some of what he says about the priority of Euclidean geometry with the use of non-Euclidean geometry in science, including the inapplicability of any geometry of constant curvature in physical theories of global space. Poincaré’s insistence on Euclidean geometry is based on criteria of simplicity and convenience. But these criteria surely entail that if giving up Euclidean geometry somehow results in an overall gain in simplicity then that would be condoned by conventionalism.

The a priori conditions on geometry – in particular the group concept, and the hypothesis of rigid body motion it encourages – might seem a lingering obstacle to a more flexible attitude towards applied geometry, or an empirical approach to physical space. However, just as the apriority of the intuitive continuum does not restrict physical theories to the continuous; so the apriority of the group concept does not mean that all possible theories of space must allow free mobility. This, too, can be “corrected”, or overruled, by new theories and new data, just as, Poincaré comes to admit, the new quantum theory might overrule our intuitive assumption that nature is continuous. That is, he acknowledges that reality might actually be discontinuous – despite the apriority of the intuitive continuum.

Conjuncted: Occam’s Razor and Nomological Hypothesis. Thought of the Day 51.1.1

rockswater

Conjuncted here, here and here.

A temporally evolving system must possess a sufficiently rich set of symmetries to allow us to infer general laws from a finite set of empirical observations. But what justifies this hypothesis?

This question is central to the entire scientific enterprise. Why are we justified in assuming that scientific laws are the same in different spatial locations, or that they will be the same from one day to the next? Why should replicability of other scientists’ experimental results be considered the norm, rather than a miraculous exception? Why is it normally safe to assume that the outcomes of experiments will be insensitive to irrelevant details? Why, for that matter, are we justified in the inductive generalizations that are ubiquitous in everyday reasoning?

In effect, we are assuming that the scientific phenomena under investigation are invariant under certain symmetries – both temporal and spatial, including translations, rotations, and so on. But where do we get this assumption from? The answer lies in the principle of Occam’s Razor.

Roughly speaking, this principle says that, if two theories are equally consistent with the empirical data, we should prefer the simpler theory:

Occam’s Razor: Given any body of empirical evidence about a temporally evolving system, always assume that the system has the largest possible set of symmetries consistent with that evidence.

Making it more precise, we begin by explaining what it means for a particular symmetry to be “consistent” with a body of empirical evidence. Formally, our total body of evidence can be represented as a subset E of H, i.e., namely the set of all logically possible histories that are not ruled out by that evidence. Note that we cannot assume that our evidence is a subset of Ω; when we scientifically investigate a system, we do not normally know what Ω is. Hence we can only assume that E is a subset of the larger set H of logically possible histories.

Now let ψ be a transformation of H, and suppose that we are testing the hypothesis that ψ is a symmetry of the system. For any positive integer n, let ψn be the transformation obtained by applying ψ repeatedly, n times in a row. For example, if ψ is a rotation about some axis by angle θ, then ψn is the rotation by the angle nθ. For any such transformation ψn, we write ψ–n(E) to denote the inverse image in H of E under ψn. We say that the transformation ψ is consistent with the evidence E if the intersection

E ∩ ψ–1(E) ∩ ψ–2(E) ∩ ψ–3(E) ∩ …

is non-empty. This means that the available evidence (i.e., E) does not falsify the hypothesis that ψ is a symmetry of the system.

For example, suppose we are interested in whether cosmic microwave background radiation is isotropic, i.e., the same in every direction. Suppose we measure a background radiation level of x1 when we point the telescope in direction d1, and a radiation level of x2 when we point it in direction d2. Call these events E1 and E2. Thus, our experimental evidence is summarized by the event E = E1 ∩ E2. Let ψ be a spatial rotation that rotates d1 to d2. Then, focusing for simplicity just on the first two terms of the infinite intersection above,

E ∩ ψ–1(E) = E1 ∩ E2 ∩ ψ–1(E1) ∩ ψ–1(E2).

If x1 = x2, we have E1 = ψ–1(E2), and the expression for E ∩ ψ–1(E) simplifies to E1 ∩ E2 ∩ ψ–1(E1), which has at least a chance of being non-empty, meaning that the evidence has not (yet) falsified isotropy. But if x1 ≠ x2, then E1 and ψ–1(E2) are disjoint. In that case, the intersection E ∩ ψ–1(E) is empty, and the evidence is inconsistent with isotropy. As it happens, we know from recent astronomy that x1 ≠ x2 in some cases, so cosmic microwave background radiation is not isotropic, and ψ is not a symmetry.

Our version of Occam’s Razor now says that we should postulate as symmetries of our system a maximal monoid of transformations consistent with our evidence. Formally, a monoid Ψ of transformations (where each ψ in Ψ is a function from H into itself) is consistent with evidence E if the intersection

ψ∈Ψ ψ–1(E)

is non-empty. This is the generalization of the infinite intersection that appeared in our definition of an individual transformation’s consistency with the evidence. Further, a monoid Ψ that is consistent with E is maximal if no proper superset of Ψ forms a monoid that is also consistent with E.

Occam’s Razor (formal): Given any body E of empirical evidence about a temporally evolving system, always assume that the set of symmetries of the system is a maximal monoid Ψ consistent with E.

What is the significance of this principle? We define Γ to be the set of all symmetries of our temporally evolving system. In practice, we do not know Γ. A monoid Ψ that passes the test of Occam’s Razor, however, can be viewed as our best guess as to what Γ is.

Furthermore, if Ψ is this monoid, and E is our body of evidence, the intersection

ψ∈Ψ ψ–1(E)

can be viewed as our best guess as to what the set of nomologically possible histories is. It consists of all those histories among the logically possible ones that are not ruled out by the postulated symmetry monoid Ψ and the observed evidence E. We thus call this intersection our nomological hypothesis and label it Ω(Ψ,E).

To see that this construction is not completely far-fetched, note that, under certain conditions, our nomological hypothesis does indeed reflect the truth about nomological possibility. If the hypothesized symmetry monoid Ψ is a subset of the true symmetry monoid Γ of our temporally evolving system – i.e., we have postulated some of the right symmetries – then the true set Ω of all nomologically possible histories will be a subset of Ω(Ψ,E). So, our nomological hypothesis will be consistent with the truth and will, at most, be logically weaker than the truth.

Given the hypothesized symmetry monoid Ψ, we can then assume provisionally (i) that any empirical observation we make, corresponding to some event D, can be generalized to a Ψ-invariant law and (ii) that unconditional and conditional probabilities can be estimated from empirical frequency data using a suitable version of the Ergodic Theorem.

Conjuncted: Ergodicity. Thought of the Day 51.1

ergod_noise

When we scientifically investigate a system, we cannot normally observe all possible histories in Ω, or directly access the conditional probability structure {PrE}E⊆Ω. Instead, we can only observe specific events. Conducting many “runs” of the same experiment is an attempt to observe as many histories of a system as possible, but even the best experimental design rarely allows us to observe all histories or to read off the full conditional probability structure. Furthermore, this strategy works only for smaller systems that we can isolate in laboratory conditions. When the system is the economy, the global ecosystem, or the universe in its entirety, we are stuck in a single history. We cannot step outside that history and look at alternative histories. Nonetheless, we would like to infer something about the laws of the system in general, and especially about the true probability distribution over histories.

Can we discern the system’s laws and true probabilities from observations of specific events? And what kinds of regularities must the system display in order to make this possible? In other words, are there certain “metaphysical prerequisites” that must be in place for scientific inference to work?

To answer these questions, we first consider a very simple example. Here T = {1,2,3,…}, and the system’s state at any time is the outcome of an independent coin toss. So the state space is X = {Heads, Tails}, and each possible history in Ω is one possible Heads/Tails sequence.

Suppose the true conditional probability structure on Ω is induced by the single parameter p, the probability of Heads. In this example, the Law of Large Numbers guarantees that, with probability 1, the limiting frequency of Heads in a given history (as time goes to infinity) will match p. This means that the subset of Ω consisting of “well-behaved” histories has probability 1, where a history is well-behaved if (i) there exists a limiting frequency of Heads for it (i.e., the proportion of Heads converges to a well-defined limit as time goes to infinity) and (ii) that limiting frequency is p. For this reason, we will almost certainly (with probability 1) arrive at the true conditional probability structure on Ω on the basis of observing just a single history and counting the number of Heads and Tails in it.

Does this result generalize? The short answer is “yes”, provided the system’s symmetries are of the right kind. Without suitable symmetries, generalizing from local observations to global laws is not possible. In a slogan, for scientific inference to work, there must be sufficient regularities in the system. In our toy system of the coin tosses, there are. Wigner (1967) recognized this point, taking symmetries to be “a prerequisite for the very possibility of discovering the laws of nature”.

Generally, symmetries allow us to infer general laws from specific observations. For example, let T = {1,2,3,…}, and let Y and Z be two subsets of the state space X. Suppose we have made the observation O: “whenever the state is in the set Y at time 5, there is a 50% probability that it will be in Z at time 6”. Suppose we know, or are justified in hypothesizing, that the system has the set of time symmetries {ψr : r = 1,2,3,….}, with ψr(t) = t + r, as defined as in the previous section. Then, from observation O, we can deduce the following general law: “for any t in T, if the state of the system is in the set Y at time t, there is a 50% probability that it will be in Z at time t + 1”.

However, this example still has a problem. It only shows that if we could make observation O, then our generalization would be warranted, provided the system has the relevant symmetries. But the “if” is a big “if”. Recall what observation O says: “whenever the system’s state is in the set Y at time 5, there is a 50% probability that it will be in the set Z at time 6”. Clearly, this statement is only empirically well supported – and thus a real observation rather than a mere hypothesis – if we can make many observations of possible histories at times 5 and 6. We can do this if the system is an experimental apparatus in a lab or a virtual system in a computer, which we are manipulating and observing “from the outside”, and on which we can perform many “runs” of an experiment. But, as noted above, if we are participants in the system, as in the case of the economy, an ecosystem, or the universe at large, we only get to experience times 5 and 6 once, and we only get to experience one possible history. How, then, can we ever assemble a body of evidence that allows us to make statements such as O?

The solution to this problem lies in the property of ergodicity. This is a property that a system may or may not have and that, if present, serves as the desired metaphysical prerequisite for scientific inference. To explain this property, let us give an example. Suppose T = {1,2,3,…}, and the system has all the time symmetries in the set Ψ = {ψr : r = 1,2,3,….}. Heuristically, the symmetries in Ψ can be interpreted as describing the evolution of the system over time. Suppose each time-step corresponds to a day. Then the history h = (a,b,c,d,e,….) describes a situation where today’s state is a, tomorrow’s is b, the next day’s is c, and so on. The transformed history ψ1(h) = (b,c,d,e,f,….) describes a situation where today’s state is b, tomorrow’s is c, the following day’s is d, and so on. Thus, ψ1(h) describes the same “world” as h, but as seen from the perspective of tomorrow. Likewise, ψ2(h) = (c,d,e,f,g,….) describes the same “world” as h, but as seen from the perspective of the day after tomorrow, and so on.

Given the set Ψ of symmetries, an event E (a subset of Ω) is Ψ-invariant if the inverse image of E under ψ is E itself, for all ψ in Ψ. This implies that if a history h is in E, then ψ(h) will also be in E, for all ψ. In effect, if the world is in the set E today, it will remain in E tomorrow, and the day after tomorrow, and so on. Thus, E is a “persistent” event: an event one cannot escape from by moving forward in time. In a coin-tossing system, where Ψ is still the set of time translations, examples of Ψ- invariant events are “all Heads”, where E contains only the history (Heads, Heads, Heads, …), and “all Tails”, where E contains only the history (Tails, Tails, Tails, …).

The system is ergodic (with respect to Ψ) if, for any Ψ-invariant event E, the unconditional probability of E, i.e., PrΩ(E), is either 0 or 1. In other words, the only persistent events are those which occur in almost no history (i.e., PrΩ(E) = 0) and those which occur in almost every history (i.e., PrΩ(E) = 1). Our coin-tossing system is ergodic, as exemplified by the fact that the Ψ-invariant events “all Heads” and “all Tails” occur with probability 0.

In an ergodic system, it is possible to estimate the probability of any event “empirically”, by simply counting the frequency with which that event occurs. Frequencies are thus evidence for probabilities. The formal statement of this is the following important result from the theory of dynamical systems and stochastic processes.

Ergodic Theorem: Suppose the system is ergodic. Let E be any event and let h be any history. For all times t in T, let Nt be the number of elements r in the set {1, 2, …, t} such that ψr(h) is in E. Then, with probability 1, the ratio Nt/t will converge to PrΩ(E) as t increases towards infinity.

Intuitively, Nt is the number of times the event E has “occurred” in history h from time 1 up to time t. The ratio Nt/t is therefore the frequency of occurrence of event E (up to time t) in history h. This frequency might be measured, for example, by performing a sequence of experiments or observations at times 1, 2, …, t. The Ergodic Theorem says that, almost certainly (i.e., with probability 1), the empirical frequency will converge to the true probability of E, PrΩ(E), as the number of observations becomes large. The estimation of the true conditional probability structure from the frequencies of Heads and Tails in our illustrative coin-tossing system is possible precisely because the system is ergodic.

To understand the significance of this result, let Y and Z be two subsets of X, and suppose E is the event “h(1) is in Y”, while D is the event “h(2) is in Z”. Then the intersection E ∩ D is the event “h(1) is in Y, and h(2) is in Z”. The Ergodic Theorem says that, by performing a sequence of observations over time, we can empirically estimate PrΩ(E) and PrΩ(E ∩ D) with arbitrarily high precision. Thus, we can compute the ratio PrΩ(E ∩ D)/PrΩ(E). But this ratio is simply the conditional probability PrΕ(D). And so, we are able to estimate the conditional probability that the state at time 2 will be in Z, given that at time 1 it was in Y. This illustrates that, by allowing us to estimate unconditional probabilities empirically, the Ergodic Theorem also allows us to estimate conditional probabilities, and in this way to learn the properties of the conditional probability structure {PrE}E⊆Ω.

We may thus conclude that ergodicity is what allows us to generalize from local observations to global laws. In effect, when we engage in scientific inference about some system, or even about the world at large, we rely on the hypothesis that this system, or the world, is ergodic. If our system, or the world, were “dappled”, then presumably we would not be able to presuppose ergodicity, and hence our ability to make scientific generalizations would be compromised.