Conjuncted: Ergodicity. Thought of the Day 51.1


When we scientifically investigate a system, we cannot normally observe all possible histories in Ω, or directly access the conditional probability structure {PrE}E⊆Ω. Instead, we can only observe specific events. Conducting many “runs” of the same experiment is an attempt to observe as many histories of a system as possible, but even the best experimental design rarely allows us to observe all histories or to read off the full conditional probability structure. Furthermore, this strategy works only for smaller systems that we can isolate in laboratory conditions. When the system is the economy, the global ecosystem, or the universe in its entirety, we are stuck in a single history. We cannot step outside that history and look at alternative histories. Nonetheless, we would like to infer something about the laws of the system in general, and especially about the true probability distribution over histories.

Can we discern the system’s laws and true probabilities from observations of specific events? And what kinds of regularities must the system display in order to make this possible? In other words, are there certain “metaphysical prerequisites” that must be in place for scientific inference to work?

To answer these questions, we first consider a very simple example. Here T = {1,2,3,…}, and the system’s state at any time is the outcome of an independent coin toss. So the state space is X = {Heads, Tails}, and each possible history in Ω is one possible Heads/Tails sequence.

Suppose the true conditional probability structure on Ω is induced by the single parameter p, the probability of Heads. In this example, the Law of Large Numbers guarantees that, with probability 1, the limiting frequency of Heads in a given history (as time goes to infinity) will match p. This means that the subset of Ω consisting of “well-behaved” histories has probability 1, where a history is well-behaved if (i) there exists a limiting frequency of Heads for it (i.e., the proportion of Heads converges to a well-defined limit as time goes to infinity) and (ii) that limiting frequency is p. For this reason, we will almost certainly (with probability 1) arrive at the true conditional probability structure on Ω on the basis of observing just a single history and counting the number of Heads and Tails in it.

Does this result generalize? The short answer is “yes”, provided the system’s symmetries are of the right kind. Without suitable symmetries, generalizing from local observations to global laws is not possible. In a slogan, for scientific inference to work, there must be sufficient regularities in the system. In our toy system of the coin tosses, there are. Wigner (1967) recognized this point, taking symmetries to be “a prerequisite for the very possibility of discovering the laws of nature”.

Generally, symmetries allow us to infer general laws from specific observations. For example, let T = {1,2,3,…}, and let Y and Z be two subsets of the state space X. Suppose we have made the observation O: “whenever the state is in the set Y at time 5, there is a 50% probability that it will be in Z at time 6”. Suppose we know, or are justified in hypothesizing, that the system has the set of time symmetries {ψr : r = 1,2,3,….}, with ψr(t) = t + r, as defined as in the previous section. Then, from observation O, we can deduce the following general law: “for any t in T, if the state of the system is in the set Y at time t, there is a 50% probability that it will be in Z at time t + 1”.

However, this example still has a problem. It only shows that if we could make observation O, then our generalization would be warranted, provided the system has the relevant symmetries. But the “if” is a big “if”. Recall what observation O says: “whenever the system’s state is in the set Y at time 5, there is a 50% probability that it will be in the set Z at time 6”. Clearly, this statement is only empirically well supported – and thus a real observation rather than a mere hypothesis – if we can make many observations of possible histories at times 5 and 6. We can do this if the system is an experimental apparatus in a lab or a virtual system in a computer, which we are manipulating and observing “from the outside”, and on which we can perform many “runs” of an experiment. But, as noted above, if we are participants in the system, as in the case of the economy, an ecosystem, or the universe at large, we only get to experience times 5 and 6 once, and we only get to experience one possible history. How, then, can we ever assemble a body of evidence that allows us to make statements such as O?

The solution to this problem lies in the property of ergodicity. This is a property that a system may or may not have and that, if present, serves as the desired metaphysical prerequisite for scientific inference. To explain this property, let us give an example. Suppose T = {1,2,3,…}, and the system has all the time symmetries in the set Ψ = {ψr : r = 1,2,3,….}. Heuristically, the symmetries in Ψ can be interpreted as describing the evolution of the system over time. Suppose each time-step corresponds to a day. Then the history h = (a,b,c,d,e,….) describes a situation where today’s state is a, tomorrow’s is b, the next day’s is c, and so on. The transformed history ψ1(h) = (b,c,d,e,f,….) describes a situation where today’s state is b, tomorrow’s is c, the following day’s is d, and so on. Thus, ψ1(h) describes the same “world” as h, but as seen from the perspective of tomorrow. Likewise, ψ2(h) = (c,d,e,f,g,….) describes the same “world” as h, but as seen from the perspective of the day after tomorrow, and so on.

Given the set Ψ of symmetries, an event E (a subset of Ω) is Ψ-invariant if the inverse image of E under ψ is E itself, for all ψ in Ψ. This implies that if a history h is in E, then ψ(h) will also be in E, for all ψ. In effect, if the world is in the set E today, it will remain in E tomorrow, and the day after tomorrow, and so on. Thus, E is a “persistent” event: an event one cannot escape from by moving forward in time. In a coin-tossing system, where Ψ is still the set of time translations, examples of Ψ- invariant events are “all Heads”, where E contains only the history (Heads, Heads, Heads, …), and “all Tails”, where E contains only the history (Tails, Tails, Tails, …).

The system is ergodic (with respect to Ψ) if, for any Ψ-invariant event E, the unconditional probability of E, i.e., PrΩ(E), is either 0 or 1. In other words, the only persistent events are those which occur in almost no history (i.e., PrΩ(E) = 0) and those which occur in almost every history (i.e., PrΩ(E) = 1). Our coin-tossing system is ergodic, as exemplified by the fact that the Ψ-invariant events “all Heads” and “all Tails” occur with probability 0.

In an ergodic system, it is possible to estimate the probability of any event “empirically”, by simply counting the frequency with which that event occurs. Frequencies are thus evidence for probabilities. The formal statement of this is the following important result from the theory of dynamical systems and stochastic processes.

Ergodic Theorem: Suppose the system is ergodic. Let E be any event and let h be any history. For all times t in T, let Nt be the number of elements r in the set {1, 2, …, t} such that ψr(h) is in E. Then, with probability 1, the ratio Nt/t will converge to PrΩ(E) as t increases towards infinity.

Intuitively, Nt is the number of times the event E has “occurred” in history h from time 1 up to time t. The ratio Nt/t is therefore the frequency of occurrence of event E (up to time t) in history h. This frequency might be measured, for example, by performing a sequence of experiments or observations at times 1, 2, …, t. The Ergodic Theorem says that, almost certainly (i.e., with probability 1), the empirical frequency will converge to the true probability of E, PrΩ(E), as the number of observations becomes large. The estimation of the true conditional probability structure from the frequencies of Heads and Tails in our illustrative coin-tossing system is possible precisely because the system is ergodic.

To understand the significance of this result, let Y and Z be two subsets of X, and suppose E is the event “h(1) is in Y”, while D is the event “h(2) is in Z”. Then the intersection E ∩ D is the event “h(1) is in Y, and h(2) is in Z”. The Ergodic Theorem says that, by performing a sequence of observations over time, we can empirically estimate PrΩ(E) and PrΩ(E ∩ D) with arbitrarily high precision. Thus, we can compute the ratio PrΩ(E ∩ D)/PrΩ(E). But this ratio is simply the conditional probability PrΕ(D). And so, we are able to estimate the conditional probability that the state at time 2 will be in Z, given that at time 1 it was in Y. This illustrates that, by allowing us to estimate unconditional probabilities empirically, the Ergodic Theorem also allows us to estimate conditional probabilities, and in this way to learn the properties of the conditional probability structure {PrE}E⊆Ω.

We may thus conclude that ergodicity is what allows us to generalize from local observations to global laws. In effect, when we engage in scientific inference about some system, or even about the world at large, we rely on the hypothesis that this system, or the world, is ergodic. If our system, or the world, were “dappled”, then presumably we would not be able to presuppose ergodicity, and hence our ability to make scientific generalizations would be compromised.

Ergodic Theory. Thought of the Day 51.0


Classical dynamical systems have a particularly rich set of time symmetries. Let (X, φ) be a dynamical system. A classical dynamical system consists of a set X (the state space) and a function φ from X into itself that determines how the state changes over time (the dynamics). Let T={0,1,2,3,….}. Given any state x in X (the initial conditions), the orbit of x is the history h defined by h(0) = x, h(1) = φ(x), h(2) = φ(φ(x)), and so on. Let Ω be the set of all orbits determined by (X, φ) in this way. Let {Pr’E}E⊆X be any conditional probability structure on X. For any events E and D in Ω, we define PrE(D) = Pr’E’(D’), where E’ is the set of all states x in X whose orbits lie in E, and D’ is the set of all states x in X whose orbits lie in D. Then {PrE}E⊆Ω is a conditional probability structure on Ω. Thus, Ω and {PrE}E⊆Ω together form a temporally evolving system. However, not every temporally evolving system arises in this way. Suppose the function φ (which maps from X into itself) is surjective, i.e., for all x in X, there is some y in X such that φ(y)=x. Then the set Ω of orbits is invariant under all time-shifts. Let {Pr’E}E⊆X be a conditional probability structure on X, and let {PrE}E⊆Ω be the conditional probability structure it induces on Ω. Suppose that {Pr’E}E⊆X is φ-invariant, i.e., for any subsets E and D of X, if E’ = φ–1(E) and D’ = φ–1(D), then Pr’E’(D’) = Pr’E(D). Then every time shift is a temporal symmetry of the resulting temporally evolving system. The study of dynamical systems equipped with invariant probability measures is the purview of ergodic theory.

Nomological Possibility and Necessity


An event E is nomologically possible in history h at time t if the initial segment of that history up to t admits at least one continuation in Ω that lies in E; and E is nomologically necessary in h at t if every continuation of the history’s initial segment up to t lies in E.

More formally, we say that one history, h’, is accessible from another, h, at time t if the initial segments of h and h’ up to time t coincide, i.e., ht = ht‘. We then write h’Rth. The binary relation Rt on possible histories is in fact an equivalence relation (reflexive, symmetric, and transitive). Now, an event E ⊆ Ω is nomologically possible in history h at time t if some history h’ in Ω that is accessible from h at t is contained in E. Similarly, an event E ⊆ Ω is nomologically necessary in history h at time t if every history h’ in Ω that is accessible from h at t is contained in E.

In this way, we can define two modal operators, ♦t and ¤t, to express possibility and necessity at time t. We define each of them as a mapping from events to events. For any event E ⊆ Ω,

t E = {h ∈ Ω : for some h’ ∈ Ω with h’Rth, we have h’ ∈ E},

¤t E = {h ∈ Ω : for all h’ ∈ Ω with h’Rth, we have h’ ∈ E}.

So, ♦t E is the set of all histories in which E is possible at time t, and ¤t E is the set of all histories in which E is necessary at time t. Accordingly, we say that “ ♦t E” holds in history h if h is an element of ♦t E, and “ ¤t E” holds in h if h is an element of ¤t E. As one would expect, the two modal operators are duals of each other: for any event E ⊆ Ω, we have ¤t E = ~ ♦t ~E and ♦E = ~ ¤t ~E.

Although we have here defined nomological possibility and necessity, we can analogously define logical possibility and necessity. To do this, we must simply replace every occurrence of the set Ω of nomologically possible histories in our definitions with the set H of logically possible histories. Second, by defining the operators ♦t and ¤t as functions from events to events, we have adopted a semantic definition of these modal notions. However, we could also define them syntactically, by introducing an explicit modal logic. For each point in time t, the logic corresponding to the operators ♦t and ¤t would then be an instance of a standard S5 modal logic.

The analysis shows how nomological possibility and necessity depend on the dynamics of the system. In particular, as time progresses, the notion of possibility becomes more demanding: fewer events remain possible at each time. And the notion of necessity becomes less demanding: more events become necessary at each time, for instance due to having been “settled” in the past. Formally, for any t and t’ in T with t < t’ and any event E ⊆ Ω,

if ♦t’ E then ♦E,

if ¤t E then ¤t’ E.

Furthermore, in a deterministic system, for every event E and any time t, we have ♦t E = ¤t E. In other words, an event is possible in any history h at time t if and only if it is necessary in h at t. In an indeterministic system, by contrast, necessity and possibility come apart.

Let us say that one history, h’, is accessible from another, h, relative to a set T’ of time points, if the restrictions of h and h’ to T’ coincide, i.e., h’T’ = hT’. We then write h’RT’h. Accessibility at time t is the special case where T’ is the set of points in time up to time t. We can define nomological possibility and necessity relative to T’ as follows. For any event E ⊆ Ω,

T’ E = {h ∈ Ω : for some h’ ∈ Ω with h’RT’h, we have h’ ∈ E},

¤T’ E = {h ∈ Ω : for all h’ ∈ Ω with h’RT’h, we have h’ ∈ E}.

Although these modal notions are much less familiar than the standard ones (possibility and necessity at time t), they are useful for some purposes. In particular, they allow us to express the fact that the states of a system during a particular period of time, T’ ⊆ T, render some events E possible or necessary.

Finally, our definitions of possibility and necessity relative to some general subset T’ of T also allow us to define completely “atemporal” notions of possibility and necessity. If we take T’ to be the empty set, then the accessibility relation RT’ becomes the universal relation, under which every history is related to every other. An event E is possible in this atemporal sense (i.e., ♦E) iff E is a non-empty subset of Ω, and it is necessary in this atemporal sense (i.e., ¤E) if E coincides with all of Ω. These notions might be viewed as possibility and necessity from the perspective of some observer who has no temporal or historical location within the system and looks at it from the outside.