# Conjuncted: Ergodicity. Thought of the Day 51.1

When we scientifically investigate a system, we cannot normally observe all possible histories in Ω, or directly access the conditional probability structure {PrE}E⊆Ω. Instead, we can only observe specific events. Conducting many “runs” of the same experiment is an attempt to observe as many histories of a system as possible, but even the best experimental design rarely allows us to observe all histories or to read off the full conditional probability structure. Furthermore, this strategy works only for smaller systems that we can isolate in laboratory conditions. When the system is the economy, the global ecosystem, or the universe in its entirety, we are stuck in a single history. We cannot step outside that history and look at alternative histories. Nonetheless, we would like to infer something about the laws of the system in general, and especially about the true probability distribution over histories.

Can we discern the system’s laws and true probabilities from observations of specific events? And what kinds of regularities must the system display in order to make this possible? In other words, are there certain “metaphysical prerequisites” that must be in place for scientific inference to work?

To answer these questions, we first consider a very simple example. Here T = {1,2,3,…}, and the system’s state at any time is the outcome of an independent coin toss. So the state space is X = {Heads, Tails}, and each possible history in Ω is one possible Heads/Tails sequence.

Suppose the true conditional probability structure on Ω is induced by the single parameter p, the probability of Heads. In this example, the Law of Large Numbers guarantees that, with probability 1, the limiting frequency of Heads in a given history (as time goes to infinity) will match p. This means that the subset of Ω consisting of “well-behaved” histories has probability 1, where a history is well-behaved if (i) there exists a limiting frequency of Heads for it (i.e., the proportion of Heads converges to a well-defined limit as time goes to infinity) and (ii) that limiting frequency is p. For this reason, we will almost certainly (with probability 1) arrive at the true conditional probability structure on Ω on the basis of observing just a single history and counting the number of Heads and Tails in it.

Does this result generalize? The short answer is “yes”, provided the system’s symmetries are of the right kind. Without suitable symmetries, generalizing from local observations to global laws is not possible. In a slogan, for scientific inference to work, there must be sufficient regularities in the system. In our toy system of the coin tosses, there are. Wigner (1967) recognized this point, taking symmetries to be “a prerequisite for the very possibility of discovering the laws of nature”.

Generally, symmetries allow us to infer general laws from specific observations. For example, let T = {1,2,3,…}, and let Y and Z be two subsets of the state space X. Suppose we have made the observation O: “whenever the state is in the set Y at time 5, there is a 50% probability that it will be in Z at time 6”. Suppose we know, or are justified in hypothesizing, that the system has the set of time symmetries {ψr : r = 1,2,3,….}, with ψr(t) = t + r, as defined as in the previous section. Then, from observation O, we can deduce the following general law: “for any t in T, if the state of the system is in the set Y at time t, there is a 50% probability that it will be in Z at time t + 1”.

However, this example still has a problem. It only shows that if we could make observation O, then our generalization would be warranted, provided the system has the relevant symmetries. But the “if” is a big “if”. Recall what observation O says: “whenever the system’s state is in the set Y at time 5, there is a 50% probability that it will be in the set Z at time 6”. Clearly, this statement is only empirically well supported – and thus a real observation rather than a mere hypothesis – if we can make many observations of possible histories at times 5 and 6. We can do this if the system is an experimental apparatus in a lab or a virtual system in a computer, which we are manipulating and observing “from the outside”, and on which we can perform many “runs” of an experiment. But, as noted above, if we are participants in the system, as in the case of the economy, an ecosystem, or the universe at large, we only get to experience times 5 and 6 once, and we only get to experience one possible history. How, then, can we ever assemble a body of evidence that allows us to make statements such as O?

The solution to this problem lies in the property of ergodicity. This is a property that a system may or may not have and that, if present, serves as the desired metaphysical prerequisite for scientific inference. To explain this property, let us give an example. Suppose T = {1,2,3,…}, and the system has all the time symmetries in the set Ψ = {ψr : r = 1,2,3,….}. Heuristically, the symmetries in Ψ can be interpreted as describing the evolution of the system over time. Suppose each time-step corresponds to a day. Then the history h = (a,b,c,d,e,….) describes a situation where today’s state is a, tomorrow’s is b, the next day’s is c, and so on. The transformed history ψ1(h) = (b,c,d,e,f,….) describes a situation where today’s state is b, tomorrow’s is c, the following day’s is d, and so on. Thus, ψ1(h) describes the same “world” as h, but as seen from the perspective of tomorrow. Likewise, ψ2(h) = (c,d,e,f,g,….) describes the same “world” as h, but as seen from the perspective of the day after tomorrow, and so on.

Given the set Ψ of symmetries, an event E (a subset of Ω) is Ψ-invariant if the inverse image of E under ψ is E itself, for all ψ in Ψ. This implies that if a history h is in E, then ψ(h) will also be in E, for all ψ. In effect, if the world is in the set E today, it will remain in E tomorrow, and the day after tomorrow, and so on. Thus, E is a “persistent” event: an event one cannot escape from by moving forward in time. In a coin-tossing system, where Ψ is still the set of time translations, examples of Ψ- invariant events are “all Heads”, where E contains only the history (Heads, Heads, Heads, …), and “all Tails”, where E contains only the history (Tails, Tails, Tails, …).

The system is ergodic (with respect to Ψ) if, for any Ψ-invariant event E, the unconditional probability of E, i.e., PrΩ(E), is either 0 or 1. In other words, the only persistent events are those which occur in almost no history (i.e., PrΩ(E) = 0) and those which occur in almost every history (i.e., PrΩ(E) = 1). Our coin-tossing system is ergodic, as exemplified by the fact that the Ψ-invariant events “all Heads” and “all Tails” occur with probability 0.

In an ergodic system, it is possible to estimate the probability of any event “empirically”, by simply counting the frequency with which that event occurs. Frequencies are thus evidence for probabilities. The formal statement of this is the following important result from the theory of dynamical systems and stochastic processes.

Ergodic Theorem: Suppose the system is ergodic. Let E be any event and let h be any history. For all times t in T, let Nt be the number of elements r in the set {1, 2, …, t} such that ψr(h) is in E. Then, with probability 1, the ratio Nt/t will converge to PrΩ(E) as t increases towards infinity.

Intuitively, Nt is the number of times the event E has “occurred” in history h from time 1 up to time t. The ratio Nt/t is therefore the frequency of occurrence of event E (up to time t) in history h. This frequency might be measured, for example, by performing a sequence of experiments or observations at times 1, 2, …, t. The Ergodic Theorem says that, almost certainly (i.e., with probability 1), the empirical frequency will converge to the true probability of E, PrΩ(E), as the number of observations becomes large. The estimation of the true conditional probability structure from the frequencies of Heads and Tails in our illustrative coin-tossing system is possible precisely because the system is ergodic.

To understand the significance of this result, let Y and Z be two subsets of X, and suppose E is the event “h(1) is in Y”, while D is the event “h(2) is in Z”. Then the intersection E ∩ D is the event “h(1) is in Y, and h(2) is in Z”. The Ergodic Theorem says that, by performing a sequence of observations over time, we can empirically estimate PrΩ(E) and PrΩ(E ∩ D) with arbitrarily high precision. Thus, we can compute the ratio PrΩ(E ∩ D)/PrΩ(E). But this ratio is simply the conditional probability PrΕ(D). And so, we are able to estimate the conditional probability that the state at time 2 will be in Z, given that at time 1 it was in Y. This illustrates that, by allowing us to estimate unconditional probabilities empirically, the Ergodic Theorem also allows us to estimate conditional probabilities, and in this way to learn the properties of the conditional probability structure {PrE}E⊆Ω.

We may thus conclude that ergodicity is what allows us to generalize from local observations to global laws. In effect, when we engage in scientific inference about some system, or even about the world at large, we rely on the hypothesis that this system, or the world, is ergodic. If our system, or the world, were “dappled”, then presumably we would not be able to presuppose ergodicity, and hence our ability to make scientific generalizations would be compromised.