The Fallacy of Deviant Analyticity. Thought of the Day 128.0

cropped-cropped-CR-reduced1

Carnap’s thesis of pluralism in mathematics is quite radical. We are told that “any postulates and any rules of inference [may] be chosen arbitrarily”; for example, the question of whether the Principle of Selection (that is, the Axiom of Choice (AC)) should be admitted is “purely one of expedience” (Logical Syntax of Language); more generally,

The [logico-mathematical sentences] are, from the point of view of material interpretation, expedients for the purpose of operating with the [descriptive sentences]. Thus, in laying down [a logico-mathematical sentence] as a primitive sentence, only usefulness for this purpose is to be taken into consideration.

So the pluralism is quite broad – it extends to AC and even to ∏01-sentences. There are problems in maintaining ∏01-pluralism. One cannot, on pain of inconsistency, think that statements about consistency are not “mere matters of expedience” without thinking that ∏01-statements generally are not mere “matters of expedience”. The question of whether a given ∏01-sentence holds is not a mere matter of expedience; rather, such questions fall within the provenance of theoretical reason. One reason is that in adopting a ∏01-sentence one could always be struck by a counter-example. Other reasons have to do with the clarity of our conception of the natural numbers and with our experience to date with that structure. On this basis, for no sentence of first-order arithmetic is the question of whether it holds a mere matter of experience. Certainly this is the default view from which one must be moved.

What does Carnap have to say that will sway us from the default view, and lead us to embrace his radical form of pluralism? In approaching this question it is important to bear in mind that there are two general interpretations of Carnap. According to the first interpretation – the substantive – Carnap is really trying to argue for the pluralist conception. According to the second interpretation – the non-substantive – he is merely trying to persuade us of it, that is, to show that of all the options it is most “expedient”.

The most obvious approach to securing pluralism is to appeal to the work on analyticity and content. For if mathematical truths are without content and, moreover, this claim can be maintained with respect to an arbitrary mathematical system, then one could argue that even apparently incompatible systems have null content and hence are really compatible (since there is no contentual-conflict).

Now, in order for this to secure radical pluralism, Carnap would have to first secure his claim that mathematical truths are without content. But, he has not done so. Instead, he has merely provided us with a piece of technical machinery that can be used to articulate any one of a number of views concerning mathematical content and he has adjusted the parameters so as to articulate his particular view. So he has not secured the thesis of radical pluralism. Thus, on the substantive interpretation, Carnap has failed to achieve his end.

This leaves us with the non-substantive interpretation. There are a number of problems that arise for this version of Carnap. To begin with, Carnap’s technical machinery is not even suitable for articulating his thesis of radical pluralism since (using either the definition of analyticity for Language I or Language II) there is no metalanguage in which one can say that two apparently incompatible systems S1 and S2 both have null content and hence are really contentually compatible. To fix ideas, consider a paradigm case of an apparent conflict that we should like to dissolve by saying that there is no contentual-conflict: Let S1 = PA + φ and S2 = PA + ¬φ, where φ is any arithmetical sentence, and let the metatheory be MA = ZFC. The trouble is that on the approach to Language I, although in MT (metatheory) we can prove that each system is ω-complete (which is a start since we wish to say that each system has null content), we can also prove that one has null content while the other has total content (that is, in ω-logic, every sentence of arithmetic is a consequence). So, we cannot, within MT articulate the idea that there is no contentual-conflict. The approach to Language II involves a complementary problem. To see this note that while a strong logic like ω-logic is something that one can apply to a formal system, a truth definition is something that applies to a language (in our modern sense). Thus, on this approach, in MT the definition of analyticity given for S1 and S2 is the same (since the two systems are couched in the same language). So, although in MT we can say that S1 and S2 do not have a contentual-conflict this is only because we have given a deviant definition of analyticity, one that is blind to the fact that in a very straightforward sense φ is analytic in S1 while ¬φ is analytic in S2.

Now, although Carnap’s machinery is not adequate to articulate the thesis of radical pluralism in a given metatheory, under certain circumstances he can attempt to articulate the thesis by changing the metatheory. For example, let S1 = PA + Con(ZF + AD) and S2 = PA + ¬Con(ZF + AD) and suppose we wish to articulate both the idea that the two systems have null content and the idea that Con(ZF + AD) is analytic in S1 while ¬Con(ZF + AD) is analytic in S2. No single metatheory (on either of Carnap’s approaches) can do this. But it turns out that because of the kind of assessment sensitivity, there are two metatheories MT1 and MT2 such that in MT1 we can say both that S1 has null content and that Con(ZF + AD) is analytic in S1, while in MT2 we can say both that S2 has null content and that ¬Con(ZF + AD) is analytic in S2. But, of course, this is simply because (any such metatheory) MT1 proves Con(ZF+AD) and (any such metatheory) MT2 proves ¬Con(ZF+AD). So we have done no more than reflect the difference between the systems in the metatheories. Thus, although Carnap does not have a way of articulating his radical pluralism (in a given metalanguage), he certainly has a way of manifesting it (by making corresponding changes in his metatheories).

As a final retreat Carnap might say that he is not trying to persuade us of a thesis that (concerning a collection of systems) can be articulated in a given framework but rather is trying to persuade us to adopt a thorough radical pluralism as a “way of life”. He has certainly shown us how we can make the requisite adjustments in our metatheory so as to consistently manifest radical pluralism. But does this amount to more than an algorithm for begging the question? Has Carnap shown us that there is no question to beg? He has not said anything persuasive in favour of embracing a thorough radical pluralism as the “most expedient” of the options. The trouble with Carnap’s entire approach is that the question of pluralism has been detached from actual developments in mathematics.

Advertisement

Valencies of Predicates. Thought of the Day 125.0

Naturalizing semiotics - The triadic sign of Charles Sanders Pei

Since icons are the means of representing qualities, they generally constitute the predicative side of more complicated signs:

The only way of directly communicating an idea is by means of an icon; and every indirect method of communicating an idea must depend for its establishment upon the use of an icon. Hence, every assertion must contain an icon or set of icons, or else must contain signs whose meaning is only explicable by icons. The idea which the set of icons (or the equivalent of a set of icons) contained in an assertion signifies may be termed the predicate of the assertion. (Collected Papers of Charles Sanders Peirce)

Thus, the predicate in logic as well as ordinary language is essentially iconic. It is important to remember here Peirce’s generalization of the predicate from the traditional subject-copula-predicate structure. Predicates exist with more than one subject slot; this is the basis for Peirce’s logic of relatives and permits at the same time enlarging the scope of logic considerably and approaching it to ordinary language where several-slot-predicates prevail, for instance in all verbs with a valency larger than one. In his definition of these predicates by means of valency, that is, number of empty slots in which subjects or more generally indices may be inserted, Peirce is actually the founder of valency grammar in the tradition of Tesnière. So, for instance, the structure ‘_ gives _ to _’ where the underlinings refer to slots, is a trivalent predicate. Thus, the word classes associated with predicates are not only adjectives, but verbs and common nouns; in short all descriptive features in language are predicative.

This entails the fact that the similarity charted in icons covers more complicated cases than does the ordinary use of the word. Thus,

where ordinary logic considers only a single, special kind of relation, that of similarity, – a relation, too, of a particularly featureless and insignificant kind, the logic of relatives imagines a relation in general to be placed. Consequently, in place of the class, which is composed of a number of individual objects or facts brought together by means of their relation of similarity, the logic of relatives considers the system, which is composed of objects brought together by any kind of relations whatsoever. (The New Elements of Mathematics)

This allows for abstract similarity because one phenomenon may be similar to another in so far as both of them partake in the same relation, or more generally, in the same system – relations and systems being complicated predicates.

But not only more abstract features may thus act as the qualities invoked in an icon; these qualities may be of widely varying generality:

But instead of a single icon, or sign by resemblance of a familiar image or ‘dream’, evocable at will, there may be a complexus of such icons, forming a composite image of which the whole is not familiar. But though the whole is not familiar, yet not only are the parts familiar images, but there will also be a familiar image in its mode of composition. ( ) The sort of idea which an icon embodies, if it be such that it can convey any positive information, being applicable to some things but not to others, is called a first intention. The idea embodied by an icon, which cannot of itself convey any information, being applicable to everything or nothing, but which may, nevertheless, be useful in modifying other icons, is called a second intention. 

What Peirce distinguishes in these scholastic standard notions borrowed from Aquinas via Scotus, is, in fact, the difference between Husserlian formal and material ontology. Formal qualities like genus, species, dependencies, quantities, spatial and temporal extension, and so on are of course attributable to any phenomenon and do not as such, in themselves, convey any information in so far as they are always instantiated in and thus, like other Second Intentions, in the Husserlian manner dependent upon First Intentions, but they are nevertheless indispensable in the composition of first intentional descriptions. The fact that a certain phenomenon is composed of parts, has a form, belongs to a species, has an extension, has been mentioned in a sentence etc. does not convey the slightest information of it until it by means of first intentional icons is specified which parts in which composition, which species, which form, etc. Thus, here Peirce makes a hierarchy of icons which we could call material and formal, respectively, in which the latter are dependent on the former. One may note in passing that the distinctions in Peirce’s semiotics are themselves built upon such Second Intentions; thus it is no wonder that every sign must possess some Iconic element. Furthermore, the very anatomy of the proposition becomes just like in Husserlian rational grammar a question of formal, synthetic a priori regularities.

Among Peirce’s forms of inference, similarity plays a certain role within abduction, his notion for a ‘qualified guess’ in which a particular fact gives rise to the formation of a hypothesis which would have the fact in question as a consequence. Many such different hypotheses are of course possible for a given fact, and this inference is not necessary, but merely possible, suggestive. Precisely for this reason, similarity plays a seminal role here: an

originary Argument, or Abduction, is an argument which presents facts in its Premiss which presents a similarity to the fact stated in the conclusion but which could perfectly be true without the latter being so.

The hypothesis proposed is abducted by some sort of iconic relation to the fact to be explained. Thus, similarity is the very source of new ideas – which must subsequently be controlled deductively and inductively, to be sure. But iconicity does not only play this role in the contents of abductive inference, it plays an even more important role in the very form of logical inference in general:

Given a conventional or other general sign of an object, to deduce any other truth than that which it explicitly signifies, it is necessary, in all cases, to replace that sign by an icon. This capacity of revealing unexpected truth is precisely that wherein the utility of algebraic formulae consists, so that the iconic character is the prevailing one.

The very form of inferences depends on it being an icon; thus for Peirce the syllogistic schema inherent in reasoning has an iconic character:

‘Whenever one thing suggests another, both are together in the mind for an instant. [ ] every proposition like the premiss, that is having an icon like it, would involve [ ] a proposition related to it as the conclusion [ ]’. Thus, first and foremost deduction is an icon: ‘I suppose it would be the general opinion of logicians, as it certainly was long mine, that the Syllogism is a Symbol, because of its Generality.’ …. The truth, however, appears to be that all deductive reasoning, even simple syllogism, involves an element of observation; namely deduction consists in constructing an icon or diagram the relation of whose parts shall present a complete analogy with those of the parts of the objects of reasoning, of experimenting upon this image in the imagination, and of observing the result so as to discover unnoticed and hidden relations among the parts. 

It then is no wonder that synthetic a priori truths exist – even if Peirce prefers notions like ‘observable, universal truths’ – the result of a deduction may contain more than what is immediately present in the premises, due to the iconic quality of the inference.

Belief Networks “Acyclicity”. Thought of the Day 69.0

Belief networks are used to model uncertainty in a domain. The term “belief networks” encompasses a whole range of different but related techniques which deal with reasoning under uncertainty. Both quantitative (mainly using Bayesian probabilistic methods) and qualitative techniques are used. Influence diagrams are an extension to belief networks; they are used when working with decision making. Belief networks are used to develop knowledge based applications in domains which are characterised by inherent uncertainty. Increasingly, belief network techniques are being employed to deliver advanced knowledge based systems to solve real world problems. Belief networks are particularly useful for diagnostic applications and have been used in many deployed systems. The free-text help facility in the Microsoft Office product employs Bayesian belief network technology. Within a belief network the belief of each node (the node’s conditional probability) is calculated based on observed evidence. Various methods have been developed for evaluating node beliefs and for performing probabilistic inference. Influence diagrams, which are an extension of belief networks, provide facilities for structuring the goals of the diagnosis and for ascertaining the value (the influence) that given information will have when determining a diagnosis. In influence diagrams, there are three types of node: chance nodes, which correspond to the nodes in Bayesian belief networks; utility nodes, which represent the utilities of decisions; and decision nodes, which represent decisions which can be taken to influence the state of the world. Influence diagrams are useful in real world applications where there is often a cost, both in terms of time and money, in obtaining information.

The basic idea in belief networks is that the problem domain is modelled as a set of nodes interconnected with arcs to form a directed acyclic graph. Each node represents a random variable, or uncertain quantity, which can take two or more possible values. The arcs signify the existence of direct influences between the linked variables, and the strength of each influence is quantified by a forward conditional probability.

The Belief Network, which is also called the Bayesian Network, is a directed acyclic graph for probabilistic reasoning. It defines the conditional dependencies of the model by associating each node X with a conditional probability P(X|Pa(X)), where Pa(X) denotes the parents of X. Here are two of its conditional independence properties:

1. Each node is conditionally independent of its non-descendants given its parents.

2. Each node is conditionally independent of all other nodes given its Markov blanket, which consists of its parents, children, and children’s parents.

The inference of Belief Network is to compute the posterior probability distribution

P(H|V) = P(H,V)/ ∑HP(H,V)

where H is the set of the query variables, and V is the set of the evidence variables. Approximate inference involves sampling to compute posteriors. The Sigmoid Belief Network is a type of the Belief Network such that

P(Xi = 1|Pa(Xi)) = σ( ∑Xj ∈ Pa(Xi) WjiXj + bi)

where Wji is the weight assigned to the edge from Xj to Xi, and σ is the sigmoid function.

Untitled

ε-calculus and Hilbert’s Contentual Number Theory: Proselytizing Intuitionism. Thought of the Day 67.0

Untitled

Hilbert came to reject Russell’s logicist solution to the consistency problem for arithmetic, mainly for the reason that the axiom of reducibility cannot be accepted as a purely logical axiom. He concluded that the aim of reducing set theory, and with it the usual methods of analysis, to logic, has not been achieved today and maybe cannot be achieved at all. At the same time, Brouwer’s intuitionist mathematics gained currency. In particular, Hilbert’s former student Hermann Weyl converted to intuitionism.

According to Hilbert, there is a privileged part of mathematics, contentual elementary number theory, which relies only on a “purely intuitive basis of concrete signs.” Whereas the operating with abstract concepts was considered “inadequate and uncertain,” there is a realm of extra-logical discrete objects, which exist intuitively as immediate experience before all thought. If logical inference is to be certain, then these objects must be capable of being completely surveyed in all their parts, and their presentation, their difference, their succession (like the objects themselves) must exist for us immediately, intuitively, as something which cannot be reduced to something else.

The objects in questions are signs, both numerals and the signs that make up formulas a formal proofs. The domain of contentual number theory consists in the finitary numerals, i.e., sequences of strokes. These have no meaning, i.e., they do not stand for abstract objects, but they can be operated on (e.g., concatenated) and compared. Knowledge of their properties and relations is intuitive and unmediated by logical inference. Contentual number theory developed this way is secure, according to Hilbert: no contradictions can arise simply because there is no logical structure in the propositions of contentual number theory. The intuitive-contentual operations with signs form the basis of Hilbert’s meta-mathematics. Just as contentual number theory operates with sequences of strokes, so meta-mathematics operates with sequences of symbols (formulas, proofs). Formulas and proofs can be syntactically manipulated, and the properties and relationships of formulas and proofs are similarly based in a logic-free intuitive capacity which guarantees certainty of knowledge about formulas and proofs arrived at by such syntactic operations. Mathematics itself, however, operates with abstract concepts, e.g., quantifiers, sets, functions, and uses logical inference based on principles such as mathematical induction or the principle of the excluded middle. These “concept-formations” and modes of reasoning had been criticized by Brouwer and others on grounds that they presuppose infinite totalities as given, or that they involve impredicative definitions. Hilbert’s aim was to justify their use. To this end, he pointed out that they can be formalized in axiomatic systems (such as that of Principia or those developed by Hilbert himself), and mathematical propositions and proofs thus turn into formulas and derivations from axioms according to strictly circumscribed rules of derivation. Mathematics, to Hilbert, “becomes an inventory of provable formulas.” In this way the proofs of mathematics are subject to metamathematical, contentual investigation. The goal of Hilbert is then to give a contentual, meta-mathematical proof that there can be no derivation of a contradiction, i.e., no formal derivation of a formula A and of its negation ¬A.

Hilbert and Bernays developed the ε-calculus as their definitive formalism for axiom systems for arithmetic and analysis, and the so-called ε-substitution method as the preferred approach to giving consistency proofs. Briefly, the ε-calculus is a formalism that includes ε as a term-forming operator. If A(x) is a formula, then εxA(x) is a term, which intuitively stands for a witness for A(x). In a logical formalism containing the ε-operator, the quantifiers can be defined by: ∃x A(x) ≡ A(εxA(x)) and ∀x A(x) ≡ A(εx¬A(x)). The only additional axiom necessary is the so-called “transfinite axiom,” A(t) → A(εxA(x)). Based on this idea, Hilbert and his collaborators developed axiomatizations of number theory and analysis. Consistency proofs for these systems were then given using the ε-substitution method. The idea of this method is, roughly, that the ε-terms εxA(x) occurring in a formal proof are replaced by actual numerals, resulting in a quantifier-free proof. Suppose we had a (suitably normalized) derivation of 0 = 1 that contains only one ε-term εxA(x). Replace all occurrences of εxA(x) by 0. The instances of the transfinite axiom then are all of the form A(t) → A(0). Since no other ε-terms occur in the proof, A(t) and A(0) are basic numerical formulas without quantifiers and, we may assume, also without free variables. So they can be evaluated by finitary calculation. If all such instances turn out to be true numerical formulas, we are done. If not, this must be because A(t) is true for some t, and A(0) is false. Then replace εxA(x) instead by n, where n is the numerical value of the term t. The resulting proof is then seen to be a derivation of 0 = 1 from true, purely numerical formulas using only modus ponens, and this is impossible. Indeed, the procedure works with only slight modifications even in the presence of the induction axiom, which in the ε-calculus takes the form of a least number principle: A(t) → εxA(x) ≤ t, which intuitively requires εxA(x) to be the least witness for A(x).

Conjuncted: Ergodicity. Thought of the Day 51.1

ergod_noise

When we scientifically investigate a system, we cannot normally observe all possible histories in Ω, or directly access the conditional probability structure {PrE}E⊆Ω. Instead, we can only observe specific events. Conducting many “runs” of the same experiment is an attempt to observe as many histories of a system as possible, but even the best experimental design rarely allows us to observe all histories or to read off the full conditional probability structure. Furthermore, this strategy works only for smaller systems that we can isolate in laboratory conditions. When the system is the economy, the global ecosystem, or the universe in its entirety, we are stuck in a single history. We cannot step outside that history and look at alternative histories. Nonetheless, we would like to infer something about the laws of the system in general, and especially about the true probability distribution over histories.

Can we discern the system’s laws and true probabilities from observations of specific events? And what kinds of regularities must the system display in order to make this possible? In other words, are there certain “metaphysical prerequisites” that must be in place for scientific inference to work?

To answer these questions, we first consider a very simple example. Here T = {1,2,3,…}, and the system’s state at any time is the outcome of an independent coin toss. So the state space is X = {Heads, Tails}, and each possible history in Ω is one possible Heads/Tails sequence.

Suppose the true conditional probability structure on Ω is induced by the single parameter p, the probability of Heads. In this example, the Law of Large Numbers guarantees that, with probability 1, the limiting frequency of Heads in a given history (as time goes to infinity) will match p. This means that the subset of Ω consisting of “well-behaved” histories has probability 1, where a history is well-behaved if (i) there exists a limiting frequency of Heads for it (i.e., the proportion of Heads converges to a well-defined limit as time goes to infinity) and (ii) that limiting frequency is p. For this reason, we will almost certainly (with probability 1) arrive at the true conditional probability structure on Ω on the basis of observing just a single history and counting the number of Heads and Tails in it.

Does this result generalize? The short answer is “yes”, provided the system’s symmetries are of the right kind. Without suitable symmetries, generalizing from local observations to global laws is not possible. In a slogan, for scientific inference to work, there must be sufficient regularities in the system. In our toy system of the coin tosses, there are. Wigner (1967) recognized this point, taking symmetries to be “a prerequisite for the very possibility of discovering the laws of nature”.

Generally, symmetries allow us to infer general laws from specific observations. For example, let T = {1,2,3,…}, and let Y and Z be two subsets of the state space X. Suppose we have made the observation O: “whenever the state is in the set Y at time 5, there is a 50% probability that it will be in Z at time 6”. Suppose we know, or are justified in hypothesizing, that the system has the set of time symmetries {ψr : r = 1,2,3,….}, with ψr(t) = t + r, as defined as in the previous section. Then, from observation O, we can deduce the following general law: “for any t in T, if the state of the system is in the set Y at time t, there is a 50% probability that it will be in Z at time t + 1”.

However, this example still has a problem. It only shows that if we could make observation O, then our generalization would be warranted, provided the system has the relevant symmetries. But the “if” is a big “if”. Recall what observation O says: “whenever the system’s state is in the set Y at time 5, there is a 50% probability that it will be in the set Z at time 6”. Clearly, this statement is only empirically well supported – and thus a real observation rather than a mere hypothesis – if we can make many observations of possible histories at times 5 and 6. We can do this if the system is an experimental apparatus in a lab or a virtual system in a computer, which we are manipulating and observing “from the outside”, and on which we can perform many “runs” of an experiment. But, as noted above, if we are participants in the system, as in the case of the economy, an ecosystem, or the universe at large, we only get to experience times 5 and 6 once, and we only get to experience one possible history. How, then, can we ever assemble a body of evidence that allows us to make statements such as O?

The solution to this problem lies in the property of ergodicity. This is a property that a system may or may not have and that, if present, serves as the desired metaphysical prerequisite for scientific inference. To explain this property, let us give an example. Suppose T = {1,2,3,…}, and the system has all the time symmetries in the set Ψ = {ψr : r = 1,2,3,….}. Heuristically, the symmetries in Ψ can be interpreted as describing the evolution of the system over time. Suppose each time-step corresponds to a day. Then the history h = (a,b,c,d,e,….) describes a situation where today’s state is a, tomorrow’s is b, the next day’s is c, and so on. The transformed history ψ1(h) = (b,c,d,e,f,….) describes a situation where today’s state is b, tomorrow’s is c, the following day’s is d, and so on. Thus, ψ1(h) describes the same “world” as h, but as seen from the perspective of tomorrow. Likewise, ψ2(h) = (c,d,e,f,g,….) describes the same “world” as h, but as seen from the perspective of the day after tomorrow, and so on.

Given the set Ψ of symmetries, an event E (a subset of Ω) is Ψ-invariant if the inverse image of E under ψ is E itself, for all ψ in Ψ. This implies that if a history h is in E, then ψ(h) will also be in E, for all ψ. In effect, if the world is in the set E today, it will remain in E tomorrow, and the day after tomorrow, and so on. Thus, E is a “persistent” event: an event one cannot escape from by moving forward in time. In a coin-tossing system, where Ψ is still the set of time translations, examples of Ψ- invariant events are “all Heads”, where E contains only the history (Heads, Heads, Heads, …), and “all Tails”, where E contains only the history (Tails, Tails, Tails, …).

The system is ergodic (with respect to Ψ) if, for any Ψ-invariant event E, the unconditional probability of E, i.e., PrΩ(E), is either 0 or 1. In other words, the only persistent events are those which occur in almost no history (i.e., PrΩ(E) = 0) and those which occur in almost every history (i.e., PrΩ(E) = 1). Our coin-tossing system is ergodic, as exemplified by the fact that the Ψ-invariant events “all Heads” and “all Tails” occur with probability 0.

In an ergodic system, it is possible to estimate the probability of any event “empirically”, by simply counting the frequency with which that event occurs. Frequencies are thus evidence for probabilities. The formal statement of this is the following important result from the theory of dynamical systems and stochastic processes.

Ergodic Theorem: Suppose the system is ergodic. Let E be any event and let h be any history. For all times t in T, let Nt be the number of elements r in the set {1, 2, …, t} such that ψr(h) is in E. Then, with probability 1, the ratio Nt/t will converge to PrΩ(E) as t increases towards infinity.

Intuitively, Nt is the number of times the event E has “occurred” in history h from time 1 up to time t. The ratio Nt/t is therefore the frequency of occurrence of event E (up to time t) in history h. This frequency might be measured, for example, by performing a sequence of experiments or observations at times 1, 2, …, t. The Ergodic Theorem says that, almost certainly (i.e., with probability 1), the empirical frequency will converge to the true probability of E, PrΩ(E), as the number of observations becomes large. The estimation of the true conditional probability structure from the frequencies of Heads and Tails in our illustrative coin-tossing system is possible precisely because the system is ergodic.

To understand the significance of this result, let Y and Z be two subsets of X, and suppose E is the event “h(1) is in Y”, while D is the event “h(2) is in Z”. Then the intersection E ∩ D is the event “h(1) is in Y, and h(2) is in Z”. The Ergodic Theorem says that, by performing a sequence of observations over time, we can empirically estimate PrΩ(E) and PrΩ(E ∩ D) with arbitrarily high precision. Thus, we can compute the ratio PrΩ(E ∩ D)/PrΩ(E). But this ratio is simply the conditional probability PrΕ(D). And so, we are able to estimate the conditional probability that the state at time 2 will be in Z, given that at time 1 it was in Y. This illustrates that, by allowing us to estimate unconditional probabilities empirically, the Ergodic Theorem also allows us to estimate conditional probabilities, and in this way to learn the properties of the conditional probability structure {PrE}E⊆Ω.

We may thus conclude that ergodicity is what allows us to generalize from local observations to global laws. In effect, when we engage in scientific inference about some system, or even about the world at large, we rely on the hypothesis that this system, or the world, is ergodic. If our system, or the world, were “dappled”, then presumably we would not be able to presuppose ergodicity, and hence our ability to make scientific generalizations would be compromised.