The Affinity of Mirror Symmetry to Algebraic Geometry: Going Beyond Formalism



Even though formalism of homological mirror symmetry is an established case, what of other explanations of mirror symmetry which lie closer to classical differential and algebraic geometry? One way to tackle this is the so-called Strominger, Yau and Zaslow mirror symmetry or SYZ in short.

The central physical ingredient in this proposal is T-duality. To explain this, let us consider a superconformal sigma model with target space (M, g), and denote it (defined as a geometric functor, or as a set of correlation functions), as

CFT(M, g)

In physics, a duality is an equivalence

CFT(M, g) ≅ CFT(M′, g′)

which holds despite the fact that the underlying geometries (M,g) and (M′, g′) are not classically diffeomorphic.

T-duality is a duality which relates two CFT’s with toroidal target space, M ≅ M′ ≅ Td, but different metrics. In rough terms, the duality relates a “small” target space, with noncontractible cycles of length L < ls, with a “large” target space in which all such cycles have length L > ls.

This sort of relation is generic to dualities and follows from the following logic. If all length scales (lengths of cycles, curvature lengths, etc.) are greater than ls, string theory reduces to conventional geometry. Now, in conventional geometry, we know what it means for (M, g) and (M′, g′) to be non-isomorphic. Any modification to this notion must be associated with a breakdown of conventional geometry, which requires some length scale to be “sub-stringy,” with L < ls. To state T-duality precisely, let us first consider M = M′ = S1. We parameterise this with a coordinate X ∈ R making the identification X ∼ X + 2π. Consider a Euclidean metric gR given by ds2 = R2dX2. The real parameter R is usually called the “radius” from the obvious embedding in R2. This manifold is Ricci-flat and thus the sigma model with this target space is a conformal field theory, the “c = 1 boson.” Let us furthermore set the string scale ls = 1. With this, we attain a complete physical equivalence.

CFT(S1, gR) ≅ CFT(S1, g1/R)

Thus these two target spaces are indistinguishable from the point of view of string theory.

Just to give a physical picture for what this means, suppose for sake of discussion that superstring theory describes our universe, and thus that in some sense there must be six extra spatial dimensions. Suppose further that we had evidence that the extra dimensions factorized topologically and metrically as K5 × S1; then it would make sense to ask: What is the radius R of this S1 in our universe? In principle this could be measured by producing sufficiently energetic particles (so-called “Kaluza-Klein modes”), or perhaps measuring deviations from Newton’s inverse square law of gravity at distances L ∼ R. In string theory, T-duality implies that R ≥ ls, because any theory with R < ls is equivalent to another theory with R > ls. Thus we have a nontrivial relation between two (in principle) observable quantities, R and ls, which one might imagine testing experimentally. Let us now consider the theory CFT(Td, g), where Td is the d-dimensional torus, with coordinates Xi parameterising Rd/2πZd, and a constant metric tensor gij. Then there is a complete physical equivalence

CFT(Td, g) ≅ CFT(Td, g−1)

In fact this is just one element of a discrete group of T-duality symmetries, generated by T-dualities along one-cycles, and large diffeomorphisms (those not continuously connected to the identity). The complete group is isomorphic to SO(d, d; Z).

While very different from conventional geometry, T-duality has a simple intuitive explanation. This starts with the observation that the possible embeddings of a string into X can be classified by the fundamental group π1(X). Strings representing non-trivial homotopy classes are usually referred to as “winding states.” Furthermore, since strings interact by interconnecting at points, the group structure on π1 provided by concatenation of based loops is meaningful and is respected by interactions in the string theory. Now π1(Td) ≅ Zd, as an abelian group, referred to as the group of “winding numbers”.

Of course, there is another Zd we could bring into the discussion, the Pontryagin dual of the U(1)d of which Td is an affinization. An element of this group is referred to physically as a “momentum,” as it is the eigenvalue of a translation operator on Td. Again, this group structure is respected by the interactions. These two group structures, momentum and winding, can be summarized in the statement that the full closed string algebra contains the group algebra C[Zd] ⊕ C[Zd].

In essence, the point of T-duality is that if we quantize the string on a sufficiently small target space, the roles of momentum and winding will be interchanged. But the main point can be seen by bringing in some elementary spectral geometry. Besides the algebra structure, another invariant of a conformal field theory is the spectrum of its Hamiltonian H (technically, the Virasoro operator L0 + L ̄0). This Hamiltonian can be thought of as an analog of the standard Laplacian ∆g on functions on X, and its spectrum on Td with metric g is

Spec ∆= {∑i,j=1d gijpipj; pi ∈ Zd}

On the other hand, the energy of a winding string is (intuitively) a function of its length. On our torus, a geodesic with winding number w ∈ Zd has length squared

L2 = ∑i,j=1d gijwiwj

Now, the only string theory input we need to bring in is that the total Hamiltonian contains both terms,

H = ∆g + L2 + · · ·

where the extra terms … express the energy of excited (or “oscillator”) modes of the string. Then, the inversion g → g−1, combined with the interchange p ↔ w, leaves the spectrum of H invariant. This is T-duality.

There is a simple generalization of the above to the case with a non-zero B-field on the torus satisfying dB = 0. In this case, since B is a constant antisymmetric tensor, we can label CFT’s by the matrix g + B. Now, the basic T-duality relation becomes

CFT(Td, g + B) ≅ CFT(Td, (g + B)−1)

Another generalization, which is considerably more subtle, is to do T-duality in families, or fiberwise T-duality. The same arguments can be made, and would become precise in the limit that the metric on the fibers varies on length scales far greater than ls, and has curvature lengths far greater than ls. This is sometimes called the “adiabatic limit” in physics. While this is a very restrictive assumption, there are more heuristic physical arguments that T-duality should hold more generally, with corrections to the relations proportional to curvatures ls2R and derivatives ls∂ of the fiber metric, both in perturbation theory and from world-sheet instantons.

Categories of Pointwise Convergence Topology: Theory(ies) of Bundles.

Let H be a fixed, separable Hilbert space of dimension ≥ 1. Lets denote the associated projective space of H by P = P(H). It is compact iff H is finite-dimensional. Let PU = PU(H) = U(H)/U(1) be the projective unitary group of H equipped with the compact-open topology. A projective bundle over X is a locally trivial bundle of projective spaces, i.e., a fibre bundle P → X with fibre P(H) and structure group PU(H). An application of the Banach-Steinhaus theorem shows that we may identify projective bundles with principal PU(H)-bundles and the pointwise convergence topology on PU(H).

If G is a topological group, let GX denote the sheaf of germs of continuous functions G → X, i.e., the sheaf associated to the constant presheaf given by U → F(U) = G. Given a projective bundle P → X and a sufficiently fine good open cover {Ui}i∈I of X, the transition functions between trivializations P|Ui can be lifted to bundle isomorphisms gij on double intersections Uij = Ui ∩ Uj which are projectively coherent, i.e., over each of the triple intersections Uijk = Ui ∩ Uj ∩ Uk the composition gki gjk gij is given as multiplication by a U(1)-valued function fijk : Uijk → U(1). The collection {(Uij, fijk)} defines a U(1)-valued two-cocycle called a B-field on X,which represents a class BP in the sheaf cohomology group H2(X, U(1)X). On the other hand, the sheaf cohomology H1(X, PU(H)X) consists of isomorphism classes of principal PU(H)-bundles, and we can consider the isomorphism class [P] ∈ H1(X,PU(H)X).

There is an isomorphism

H1(X, PU(H)X) → H2(X, U(1)X) provided by the

boundary map [P] ↦ BP. There is also an isomorphism

H2(X, U(1)X) → H3(X, ZX) ≅ H3(X, Z)

The image δ(P) ∈ H3(X, Z) of BP is called the Dixmier-Douady invariant of P. When δ(P) = [H] is represented in H3(X, R) by a closed three-form H on X, called the H-flux of the given B-field BP, we will write P = PH. One has δ(P) = 0 iff the projective bundle P comes from a vector bundle E → X, i.e., P = P(E). By Serre’s theorem every torsion element of H3(X,Z) arises from a finite-dimensional bundle P. Explicitly, consider the commutative diagram of exact sequences of groups given by


where we identify the cyclic group Zn with the group of n-th roots of unity. Let P be a projective bundle with structure group PU(n), i.e., with fibres P(Cn). Then the commutative diagram of long exact sequences of sheaf cohomology groups associated to the above commutative diagram of groups implies that the element BP ∈ H2(X, U(1)X) comes from H2(X, (Zn)X), and therefore its order divides n.

One also has δ(P1 ⊗ P2) = δ(P1) + δ(P2) and δ(P) = −δ(P). This follows from the commutative diagram


and the fact that P ⊗ P = P(E) where E is the vector bundle of Hilbert-Schmidt endomorphisms of P . Putting everything together, it follows that the cohomology group H3(X, Z) is isomorphic to the group of stable equivalence classes of principal PU(H)-bundles P → X with the operation of tensor product.

We are now ready to define the twisted K-theory of the manifold X equipped with a projective bundle P → X, such that Px = P(H) ∀ x ∈ X. We will first give a definition in terms of Fredholm operators, and then provide some equivalent, but more geometric definitions. Let H be a Z2-graded Hilbert space. We define Fred0(H) to be the space of self-adjoint degree 1 Fredholm operators T on H such that T2 − 1 ∈ K(H), together with the subspace topology induced by the embedding Fred0(H) ֒→ B(H) × K(H) given by T → (T, T2 − 1) where the algebra of bounded linear operators B(H) is given the compact-open topology and the Banach algebra of compact operators K = K(H) is given the norm topology.

Let P = PH → X be a projective Hilbert bundle. Then we can construct an associated bundle Fred0(P) whose fibres are Fred0(H). We define the twisted K-theory group of the pair (X, P) to be the group of homotopy classes of maps

K0(X, H) = [X, Fred0(PH)]

The group K0(X, H) depends functorially on the pair (X, PH), and an isomorphism of projective bundles ρ : P → P′ induces a group isomorphism ρ∗ : K0(X, H) → K0(X, H′). Addition in K0(X, H) is defined by fibre-wise direct sum, so that the sum of two elements lies in K0(X, H2) with [H2] = δ(P ⊗ P(C2)) = δ(P) = [H]. Under the isomorphism H ⊗ C2 ≅ H, there is a projective bundle isomorphism P → P ⊗ P(C2) for any projective bundle P and so K0(X, H2) is canonically isomorphic to K0(X, H). When [H] is a non-torsion element of H3(X, Z), so that P = PH is an infinite-dimensional bundle of projective spaces, then the index map K0(X, H) → Z is zero, i.e., any section of Fred0(P) takes values in the index zero component of Fred0(H).

Let us now describe some other models for twisted K-theory which will be useful in our physical applications later on. A definition in algebraic K-theory may given as follows. A bundle of projective spaces P yields a bundle End(P) of algebras. However, if H is an infinite-dimensional Hilbert space, then one has natural isomorphisms H ≅ H ⊕ H and

End(H) ≅ Hom(H ⊕ H, H) ≅ End(H) ⊕ End(H)

as left End(H)-modules, and so the algebraic K-theory of the algebra End(H) is trivial. Instead, we will work with the Banach algebra K(H) of compact operators on H with the norm topology. Given that the unitary group U(H) with the compact-open topology acts continuously on K(H) by conjugation, to a given projective bundle PH we can associate a bundle of compact operators EH → X given by


with δ(EH) = [H]. The Banach algebra AH := C0(X, EH) of continuous sections of EH vanishing at infinity is the continuous trace C∗-algebra CT(X, H). Then the twisted K-theory group K(X, H) of X is canonically isomorphic to the algebraic K-theory group K(AH).

We will also need a smooth version of this definition. Let AH be the smooth subalgebra of AH given by the algebra CT(X, H) = C(X, L1PH),

where L1PH = PH ×PUL1. Then the inclusion CT(X, H) → CT(X, H) induces an isomorphism KCT(X, H) → KCT(X, H) of algebraic K-theory groups. Upon choosing a bundle gerbe connection, one has an isomorphism KCT(X, H) ≅ K(X, H) with the twisted K-theory defined in terms of projective Hilbert bundles P = PH over X.

Finally, we propose a general definition based on K-theory with coefficients in a sheaf of rings. It parallels the bundle gerbe approach to twisted K-theory. Let B be a Banach algebra over C. Let E(B, X) be the category of continuous B-bundles over X, and let C(X, B) be the sheaf of continuous maps X → B. The ring structure in B equips C(X, B) with the structure of a sheaf of rings over X. We can therefore consider left (or right) C(X, B)-modules, and in particular the category LF C(X, B) of locally free C(X, B)-modules. Using the functor in the usual way, for X an equivalence of additive categories

E(B, X) ≅ LF (C(X, B))

Since these are both additive categories, we can apply the Grothendieck functor to each of them and obtain the abelian groups K(LF(C(X, B))) and K(E(B, X)). The equivalence of categories ensures that there is a natural isomorphism of groups

K(LF (C(X, B))) ≅ K(E(B, X))

This motivates the following general definition. If A is a sheaf of rings over X, then we define the K-theory of X with coefficients in A to be the abelian group

K(X, A) := K LF(A)

For example, consider the case B = C. Then C(X, C) is just the sheaf of continuous functions X → C, while E(C, X) is the category of complex vector bundles over X. Using the isomorphism of K-theory groups we then have

K(X, C(X,C)) := K(LF (C(X, C))) ≅ K (E(C, X)) = K0(X)

The definition of twisted K-theory uses another special instance of this general construction. For this, we define an Azumaya algebra over X of rank m to be a locally trivial algebra bundle over X with fibre isomorphic to the algebra of m × m complex matrices over C, Mm(C). An example is the algebra End(E) of endomorphisms of a complex vector bundle E → X. We can define an equivalence relation on the set A(X) of Azumaya algebras over X in the following way. Two Azumaya algebras A, A′ are called equivalent if there are vector bundles E, E′ over X such that the algebras A ⊗ End(E), A′ ⊗ End(E′) are isomorphic. Then every Azumaya algebra of the form End(E) is equivalent to the algebra of functions C(X) on X. The set of all equivalence classes is a group under the tensor product of algebras, called the Brauer group of X and denoted Br(X). By Serre’s theorem there is an isomorphism

δ : Br(X) → tor(H3(X, Z))

where tor(H3(X, Z)) is the torsion subgroup of H3(X, Z).

If A is an Azumaya algebra bundle, then the space of continuous sections C(X, A) of X is a ring and we can consider the algebraic K-theory group K(A) := K0(C(X,A)) of equivalence classes of projective C(X, A)-modules, which depends only on the equivalence class of A in the Brauer group. Under the equivalence, we can represent the Brauer group Br(X) as the set of isomorphism classes of sheaves of Azumaya algebras. Let A be a sheaf of Azumaya algebras, and LF(A) the category of locally free A-modules. Then as above there is an isomorphism

K(X, C(X, A)) ≅ K Proj (C(X, A))

where Proj (C(X, A)) is the category of finitely-generated projective C(X, A)-modules. The group on the right-hand side is the group K(A). For given [H] ∈ tor(H3(X, Z)) and A ∈ Br(X) such that δ(A) = [H], this group can be identified as the twisted K-theory group K0(X, H) of X with twisting A. This definition is equivalent to the description in terms of bundle gerbe modules, and from this construction it follows that K0(X, H) is a subgroup of the ordinary K-theory of X. If δ(A) = 0, then A is equivalent to C(X) and we have K(A) := K0(C(X)) = K0(X). The projective C(X, A)-modules over a rank m Azumaya algebra A are vector bundles E → X with fibre Cnm ≅ (Cm)⊕n, which is naturally an Mm(C)-module.


Fibrations of Elliptic Curves in F-Theory.


F-theory compactifications are by definition compactifications of the type IIB string with non-zero, and in general non-constant string coupling – they are thus intrinsically non-perturbative. F-theory may also seen as a construction to geometrize (and thereby making manifest) certain features pertaining to the S-duality of the type IIB string.

Let us first recapitulate the most important massless bosonic fields of the type IIB string. From the NS-NS sector, we have the graviton gμν, the antisymmetric 2-form field B as well as the dilaton φ; the latter, when exponentiated, serves as the coupling constant of the theory. Moreover, from the R-R sector we have the p-form tensor fields C(p) with p = 0,2,4. It is also convenient to include the magnetic duals of these fields, B(6), C(6) and C(8) (C(4) has self-dual field strength). It is useful to combine the dilaton with the axion into one complex field:

τIIB ≡ C(0) + ie —– (1)

The S-duality then acts via projective SL(2, Z) transformations in the canonical manner:

τIIB → (aτIIB + b)/(cτIIB + d) with a, b, c, d ∈ Z and ad – bc = 1

Furthermore, it acts via simple matrix multiplication on the other fields if these are grouped into doublets (B(2)C(2)), (B(6)C(4)), while C(4) stays invariant.

The simplest F-theory compactifications are the highest dimensional ones, and simplest of all is the compactification of the type IIB string on the 2-sphere, P1. However, as the first Chern class does not vanish: C1(P1) = – 2, this by itself cannot be a good, supersymmetry preserving background. The remedy is to add extra 7-branes to the theory, which sit at arbitrary points zi on the P1, and otherwise fill the 7+1 non-compact space-time dimensions. If this is done in the right way, C1(P1) is cancelled, thereby providing a consistent background.


Encircling the location of a 7-brane in the z-plane leads to a jump of the perceived type IIB string coupling, τIIB →τIIB  +1.

To explain how this works, consider first a single D7-brane located at an arbitrary given point z0 on the P1. A D7-brane carries by definition one unit of D7-brane charge, since it is a unit source of C(8). This means that is it magnetically charged with respect to the dual field C(0), which enters in the complexified type IIB coupling in (1). As a consequence, encircling the plane location z0 will induce a non-trivial monodromy, that is, a jump on the coupling. But this then implies that in the neighborhood of the D7-brane, we must have a non-constant string coupling of the form: τIIB(z) = 1/2πiIn[z – z0]; we thus indeed have a truly non-perturbative situation.

In view of the SL(2, Z) action on the string coupling (1), it is natural to interpret it as a modular parameter of a two-torus, T2, and this is what then gives a geometrical meaning to the S-duality group. This modular parameter τIIB = τIIB(Z) is not constant over the P1 compactification manifold, the shape of the T2 will accordingly vary along P1. The relevant geometrical object will therefore not be the direct product manifold T2 x P1, but rather a fibration of T2 over P1


Fibration of an elliptic curve over P1, which in total makes a K3 surface.

The logarithmic behavior of τIIB(z) in the vicinity of a 7-brane means that the T2 fiber is singular at the brane location. It is known from mathematics that each of such singular fibers contributes 1/12 to the first Chern class. Therefore we need to put 24 of them in order to have a consistent type IIB background with C1 = 0. The mathematical data: “Tfibered over P1 with 24 singular fibers” is now exactly what characterizes the K3 surface; indeed it is the only complex two-dimensional manifold with vanishing first Chern class (apart from T4).

The K3 manifold that arises in this context is so far just a formal construct, introduced to encode of the behavior of the string coupling in the presence of 7-branes in an elegant and useful way. One may speculate about a possible more concrete physical significance, such as a compactification manifold of a yet unknown 12 dimensional “F-theory”. The existence of such a theory is still unclear, but all we need the K3 for is to use its intriguing geometric properties for computing physical quantities (the quartic gauge threshold couplings, ultimately).

In order to do explicit computations, we first of all need a concrete representation of the K3 surface. Since the families of K3’s in question are elliptically fibered, the natural starting point is the two-torus T2. It can be represented in the well-known “Weierstraβ” form:

WT2 = y2 + x3 + xf + g = 0 —– (2)

which in turn is invariantly characterized by the J-function:

J = 4(24f)3/(4f3 + 27g2) —– (3)

An elliptically fibered K3 surface can be made out of (2) by letting f → f8(z) and g → g12(z) become polynomials in the P1 coordinate z, of the indicated orders. The locations zi of the 7-branes, which correspond to the locations of the singular fibers where J(τIIB(zi)) → ∞, are then precisely where the discriminant

∆(z) ≡ 4f83(z) + 27g122(z)

=: ∏i=124(z –  zi) vanishes.

Time and World-Lines

Let γ: [s1, s2] → M be a smooth, future-directed timelike curve in M with tangent field ξa. We associate with it an elapsed proper time (relative to gab) given by

∥γ∥= ∫s1s2 (gabξaξb)1/2 ds

This elapsed proper time is invariant under reparametrization of γ and is just what we would otherwise describe as the length of (the image of) γ . The following is another basic principle of relativity theory:

Clocks record the passage of elapsed proper time along their world-lines.

Again, a number of qualifications and comments are called for. We have taken for granted that we know what “clocks” are. We have assumed that they have worldlines (rather than worldtubes). And we have overlooked the fact that ordinary clocks (e.g., the alarm clock on the nightstand) do not do well at all when subjected to extreme acceleration, tidal forces, and so forth. (Try smashing the alarm clock against the wall.) Again, these concerns are important and raise interesting questions about the role of idealization in the formulation of physical theory. (One might construe an “ideal clock” as a point-size test object that perfectly records the passage of proper time along its worldline, and then take the above principle to assert that real clocks are, under appropriate conditions and to varying degrees of accuracy, approximately ideal.) But they do not have much to do with relativity theory as such. Similar concerns arise when one attempts to formulate corresponding principles about clock behavior within the framework of Newtonian theory.

Now suppose that one has determined the conformal structure of spacetime, say, by using light rays. Then one can use clocks, rather than free particles, to determine the conformal factor.

Let g′ab be a second smooth metric on M, with g′ab = Ω2gab. Further suppose that the two metrics assign the same lengths to timelike curves – i.e., ∥γ∥g′ab = ∥γ∥gab ∀ smooth, timelike curves γ: I → M. Then Ω = 1 everywhere. (Here ∥γ∥gab is the length of γ relative to gab.)

Let ξoa be an arbitrary timelike vector at an arbitrary point p in M. We can certainly find a smooth, timelike curve γ: [s1, s2] → M through p whose tangent at p is ξoa. By our hypothesis, ∥γ∥g′ab = ∥γ∥gab. So, if ξa is the tangent field to γ,

s1s2 (g’ab ξaξb)1/2 ds = ∫s1s2 (gabξaξb)1/2 ds

∀ s in [s1, s2]. It follows that g′abξaξb = gabξaξb at every point on the image of γ. In particular, it follows that (g′ab − gab) ξoa ξob = 0 at p. But ξoa was an arbitrary timelike vector at p. So, g′ab = gab at our arbitrary point p. The principle gives the whole story of relativistic clock behavior. In particular, it implies the path dependence of clock readings. If two clocks start at an event p and travel along different trajectories to an event q, then, in general, they will record different elapsed times for the trip. This is true no matter how similar the clocks are. (We may stipulate that they came off the same assembly line.) This is the case because, as the principle asserts, the elapsed time recorded by each of the clocks is just the length of the timelike curve it traverses from p to q and, in general, those lengths will be different.

Suppose we consider all future-directed timelike curves from p to q. It is natural to ask if there are any that minimize or maximize the recorded elapsed time between the events. The answer to the first question is “no.” Indeed, one then has the following proposition:

Let p and q be events in M such that p ≪ q. Then, for all ε > 0, there exists a smooth, future directed timelike curve γ from p to q with ∥γ ∥ < ε. (But there is no such curve with length 0, since all timelike curves have non-zero length.)


If there is a smooth, timelike curve connecting p and q, there is also a jointed, zig-zag null curve connecting them. It has length 0. But we can approximate the jointed null curve arbitrarily closely with smooth timelike curves that swing back and forth. So (by the continuity of the length function), we should expect that, for all ε > 0, there is an approximating timelike curve that has length less than ε.

The answer to the second question (“Can one maximize recorded elapsed time between p and q?”) is “yes” if one restricts attention to local regions of spacetime. In the case of positive definite metrics, i.e., ones with signature of form (n, 0) – we know geodesics are locally shortest curves. The corresponding result for Lorentzian metrics is that timelike geodesics are locally longest curves.

Let γ: I → M be a smooth, future-directed, timelike curve. Then γ can be reparametrized so as to be a geodesic iff ∀ s ∈ I there exists an open set O containing γ(s) such that , ∀ s1, s2 ∈ I with s1 ≤ s ≤ s2, if the image of γ′ = γ|[s1, s2] is contained in O, then γ′ (and its reparametrizations) are longer than all other timelike curves in O from γ(s1) to γ(s2). (Here γ|[s1, s2] is the restriction of γ to the interval [s1, s2].)

Of all clocks passing locally from p to q, the one that will record the greatest elapsed time is the one that “falls freely” from p to q. To get a clock to read a smaller elapsed time than the maximal value, one will have to accelerate the clock. Now, acceleration requires fuel, and fuel is not free. So the above proposition has the consequence that (locally) “saving time costs money.” And proposition before that may be taken to imply that “with enough money one can save as much time as one wants.” The restriction here to local regions of spacetime is essential. The connection described between clock behavior and acceleration does not, in general, hold on a global scale. In some relativistic spacetimes, one can find future-directed timelike geodesics connecting two events that have different lengths, and so clocks following the curves will record different elapsed times between the events even though both are in a state of free fall. Furthermore – this follows from the preceding claim by continuity considerations alone – it can be the case that of two clocks passing between the events, the one that undergoes acceleration during the trip records a greater elapsed time than the one that remains in a state of free fall. (A rolled-up version of two-dimensional Minkowski spacetime provides a simple example)


Two-dimensional Minkowski spacetime rolledup into a cylindrical spacetime. Three timelike curves are displayed: γ1 and γ3 are geodesics; γ2 is not; γ1 is longer than γ2; and γ2 is longer than γ3.

The connection we have been considering between clock behavior and acceleration was once thought to be paradoxical. Recall the so-called “clock paradox.” Suppose two clocks, A and B, pass from one event to another in a suitably small region of spacetime. Further suppose A does so in a state of free fall but B undergoes acceleration at some point along the way. Then, we know, A will record a greater elapsed time for the trip than B. This was thought paradoxical because it was believed that relativity theory denies the possibility of distinguishing “absolutely” between free-fall motion and accelerated motion. (If we are equally well entitled to think that it is clock B that is in a state of free fall and A that undergoes acceleration, then, by parity of reasoning, it should be B that records the greater elapsed time.) The resolution of the paradox, if one can call it that, is that relativity theory makes no such denial. The situations of A and B here are not symmetric. The distinction between accelerated motion and free fall makes every bit as much sense in relativity theory as it does in Newtonian physics.

A “timelike curve” should be understood to be a smooth, future-directed, timelike curve parametrized by elapsed proper time – i.e., by arc length. In that case, the tangent field ξa of the curve has unit length (ξaξa = 1). And if a particle happens to have the image of the curve as its worldline, then, at any point, ξa is called the particle’s four-velocity there.

Unique Derivative Operator: Reparametrization. Metric Part 2.


Moving on from first part.

Suppose ∇ is a derivative operator, and gab is a metric, on the manifold M. Then ∇ is compatible with gab iff ∇a gbc = 0.

Suppose γ is an arbitrary smooth curve with tangent field ξa and λa is an arbitrary smooth field on γ satisfying ξnnλa = 0. Then

ξnn(gabλaλb) = gabλaξnnλb + gabλbξnnλa + λaλbξnngab

= λaλbξnngab

Suppose first that ∇ngab = 0. Then it follows immediately that ξnngabλaλb = 0. So ∇ is compatible with gab. Suppose next that ∇ is compatible with gab. Then ∀ choices of γ and λa (satisfying ξnnλa =0), we have λaλbξnngab = 0. Since the choice of λa (at any particular point) is arbitrary and gab is symmetric, it follows that ξnngab = 0. But this must be true for arbitrary ξa (at any particular point), and so we have ∇ngab = 0.

Note that the condition of compatibility is also equivalent to ∇agbc = 0. Hence,

0 = gbnaδcn = gbna(gnrgrc) = gbngnragrc + gbngrcagnr

= δbragrc + gbngrcagnr = ∇agbc + gbngrcagnr.

So if ∇agbc = 0,it follows immediately that ∇agbc = 0. Conversely, if ∇agbc =0, then gbngrcagnr = 0. And therefore,

0 = gpbgscgbngrcagnr = δnpδrsagnr = ∇agps

The basic fact about compatible derivative operators is the following.

Suppose gab is a metric on the manifold M. Then there is a unique derivative operator on M that is compatible with gab.

It turns out that if a manifold admits a metric, then it necessarily satisfies the countable cover condition. And then it guarantees the existence of a derivative operator.) We do prove that if M admits a derivative operator ∇, then it admits exactly one ∇′ that is compatible with gab.

Every derivative operator ∇′ on M can be realized as ∇′ = (∇, Cabc), where Cabc is a smooth, symmetric field on M. Now

∇′agbc = ∇agbc + gnc Cnab + gbn Cnac = ∇agbc + Ccab + Cbac. So ∇′ will be compatible with gab (i.e., ∇′agbc = 0) iff

agbc = −Ccab − Cbac —– (1)

Thus it suffices for us to prove that there exists a unique smooth, symmetric field Cabc on M satisfying equation (1). To do so, we write equation (1) twice more after permuting the indices:

cgab = −Cbca − Cacb,

bgac = −Ccba − Cabc

If we subtract these two from the first equation, and use the fact that Cabc is symmetric in (b, c), we get

Cabc = 1/2 (∇agbc − ∇bgac − ∇cgab) —– (2)

and, therefore,

Cabc = 1/2 gan (∇ngbc − ∇bgnc − ∇cgnb) —– (3)

This establishes uniqueness. But clearly the field Cabc defined by equation (3) is smooth, symmetric, and satisfies equation (1). So we have existence as well.

In the case of positive definite metrics, there is another way to capture the significance of compatibility of derivative operators with metrics. Suppose the metric gab on M is positive definite and γ : [s1, s2] → M is a smooth curve on M. We associate with γ a length

|γ| = ∫s1s2 gabξaξb ds,

where ξa is the tangent field to γ. This assigned length is invariant under reparametrization. For suppose σ : [t1, t2] → [s1, s2] is a diffeomorphism we shall write s = σ(t) and ξ′a is the tangent field of γ′ = γ ◦ σ : [t1, t2] → M. Then

ξ′a = ξads/dt

We may as well require that the reparametrization preserve the orientation of the original curve – i.e., require that σ (t1) = s1 and σ (t2) = s2. In this case, ds/dt > 0 everywhere. (Only small changes are needed if we allow the reparametrization to reverse the orientation of the curve. In that case, ds/dt < 0 everywhere.) It

follows that

|γ’| = ∫t1t2 (gabξ′aξ′b)1/2 dt = ∫t1t2 (gabξaξb)1/2 ds/dt

= ∫s1s2 (gabξaξb)1/2 ds = |γ|

Let us say that γ : I → M is a curve from p to q if I is of the form [s1, s2], p = γ(s1), and q = γ(s2). In this (positive definite) case, we take the distance from p to q to be

d(p,q)=g.l.b. |γ|:γ is a smooth curve from p to q.

Further, we say that a curve γ : I → M is minimal if, for all s ∈ I, ∃ an ε > 0 such that, for all s1, s2 ∈ I with s1 ≤ s ≤ s2, if s2 − s1 < ε and if γ′ = γ|[s1, s2] (the restriction of γ to [s1, s2]), then |γ′| = d(γ(s1), γ(s2)) . Intuitively, minimal curves are “locally shortest curves.” Certainly they need not be “shortest curves” outright. (Consider, for example, two points on the “equator” of a two-sphere that are not antipodal to one another. An equatorial curve running from one to the other the “long way” qualifies as a minimal curve.)

One can characterize the unique derivative operator compatible with a positive definite metric gab in terms of the latter’s associated minimal curves. But in doing so, one has to pay attention to parametrization.

Let us say that a smooth curve γ : I → M with tangent field ξa is parametrized by arc length if ∀ ξa, gabξaξb = 1. In this case, if I = [s1, s2], then

|γ| = ∫s1s2 (gabξaξb)1/2 ds = ∫s1s2 1.ds = s2 – s1

Any non-trivial smooth curve can always be reparametrized by arc length.

Disjointed Regularity in Open Classes of Elementary Topology


Let x, y, … denote first-order structures in St𝜏, x ≈ y will denote isomorphism.

x ∼n,𝜏 y means that there is a sequence 0 ≠ I0 ⊆ …. ⊆ In of sets of 𝜏-partial isomorphism of finite domain so that, for i < j ≤ n, f ∈ Ii and a ∈ x (respectively, b ∈ y), there is g ∈ Ij such that g ⊇ f and a ∈ Dom(g) (respectively, b ∈ Im(g)). The later is called the extension property.

x ∼𝜏 y means the above holds for an infinite chain 0 ≠ I0 ⊆ …. ⊆ In ⊆ …

Fraïssé’s characterization of elementary equivalence says that for finite relational vocabularies: x ≡ y iff x ∼n,𝜏 y. To have it available for vocabularies containing function symbols add the complexity of terms in atomic formulas to the quantifier rank. It is well known that for countable x, y : x ∼𝜏 y implies x ≈ y.

Given a vocabulary 𝜏 let 𝜏 be a disjoint renaming of 𝜏. If x, y ∈ St𝜏 have the same power, let y be an isomorphic copy of y sharing the universe with x and renamed to be of type 𝜏. In this context, (x, y) will denote the 𝜏 ∪ 𝜏-structure that results of expanding x with the relations of y.

Lemma: There is a vocabulary 𝜏+ ⊇ 𝜏 ∪ 𝜏 such that for each finite vocabulary 𝜏0 ⊆ 𝜏 there is a sequence of elementary classes 𝛥1 ⊇ 𝛥2 ⊇ 𝛥3 ⊇ …. in St𝜏+ such that if 𝜋 = 𝜋𝜏+,𝜏∪𝜏 then (1) 𝜋(𝛥𝑛) = {(x,y) : |x| = |y| ≥ 𝜔, x ≡n,𝜏0 y}, (2) 𝜋(⋂n 𝛥n) = {(x, y) : |x| = |y| ≥ 𝜔, x ∼𝜏0 y}. Moreover, ⋂n𝛥n is the reduct of an elementary class.

Proof. Let 𝛥 be the class of structures (x, y, <, a, I) where < is a discrete linear order with minimum but no maximum and I codes for each c ≤ a a family Ic = {I(c, i, −, −)}i∈x of partial 𝜏0-𝜏0–isomorphisms from x into y, such that for c < c’ ≤ a : Ic ⊆ Ic and the extension property holds. Describe this by a first-order sentence 𝜃𝛥 of type 𝜏+ ⊇ 𝜏0 ∪ 𝜏0 and set 𝛥𝑛 = ModL(𝜃𝛥 ∧ ∃≥n x(x ≤ a)}. Then condition (1) in the Lemma is granted by Fraïssé’s characterization and the fact that x being (2) is granted because (x, y, <, a, I) ∈ ⋂n𝛥n iff < contains an infinite increasing 𝜔-chain below a, a ∑11 condition.

A topology on St𝜏 is invariant if its open (closed) classes are closed under isomorphic structures. Of course, it is superfluous if we identify isomorphic structures.

Theorem: Let Γ be a regular compact topology finer than the elementary topology on each class St𝜏 such that the countable structures are dense in St𝜏 and reducts and renamings are continuous for these topologies. Then Γ𝜏 is the elementary topology ∀ 𝜏.

Proof: We show that any pair of disjoint closed classes C1, C2 of Γ𝜏 may be separated by an elementary class. Assume this is not the case since Ci are compact in the topology Γ𝜏 then they are compact for the elementary topology and, by regularity of the latter, ∃ xi ∈ Ci such that x1 ≡ x2 in L𝜔𝜔(𝜏). The xi must be infinite, otherwise they would be isomorphic contradicting the disjointedness of the Ci. By normality of Γ𝜏, there are towers Ui ⊆ Ci ⊆ Ui ⊆ Ci, i = 1,2, separating the Ci with Ui, Ui open and Ci, Ci closed in Γ𝜏 and disjoint. Let I be a first-order sentence of type 𝜏 ⊇ 𝜏 such that (z, ..) |= I ⇔ z is infinite, and let π be the corresponding reduct operation. For fixed n ∈ ω and the finite 𝜏0  ⊆ 𝜏, let t be a first-order sentence describing the common ≡n,𝜏0 – equivalence class of x1, x2. As,

(xi,..) ∈ Mod𝜏(I) ∩ π-1 Mod(t) ∩ π-1Ui, i = 1, 2,..

and this class is open in Γ𝜏‘ by continuity of π, then by the density hypothesis there are countable xi ∈ Ui , i = 1, 2, such that x1n,𝜏 x2. Thus for some expansion of (x1, x2),

(x, x,..) ∈ 𝛥n,𝜏0 ∩ 𝜋1−1(𝐶1) ∩ (𝜌𝜋2)−1(C2) —– (1)

where 𝛥𝑛,𝜏0 is the class of Lemma, 𝜋1, 𝜋2 are reducts, and 𝜌 is a renaming:

𝜋1(x1, x2, …) = x1 𝜋1 : St𝜏+ → St𝜏∪𝜏 → St𝜏

𝜋2(x1, x2, …) = x2 𝜋2 : St𝜏+ → St𝜏∪𝜏 → St𝜏

𝜌(x2) = x2 𝜌 : St𝜏 → St𝜏

Since the classes (1) are closed by continuity of the above functors then ⋂n𝛥n,𝜏0 ∩ 𝜋1−1(C1) ∩ (𝜌𝜋2)−1(C2) is non-emtpy by compactness of Γ𝜏+. But ⋂n𝛥n,𝜏0 = 𝜋(V) with V elementary of type 𝜏++ ⊇ 𝜏+. Then

V ∩ π-1π1-1(U1) ∩ π-1(ρπ2)-1 (U2) ≠ 0

is open for ΓL++ and the density condition it must contain a countable structure (x1, x*2, ..). Thus (x1, x*2, ..) ∈ ∩n 𝛥𝑛,𝜏0, with xi ∈ Ui ⊆ Ci. It follows that x1 ~𝜏0 x2 and thus x1 |𝜏0 ≈ x2 |𝜏0. Let δ𝜏0 be a first-order sentence of type 𝜏 ∪ 𝜏* ∪{h} such that (x, y*, h) |= δ𝜏0 ⇔ h : x |𝜏0 ≈ y|𝜏0. By compactness,

(∩𝜏0fin𝜏 Mod𝜏∪𝜏*∪{f}𝜏0)) ∩ π1-1(C1) ∩ (ρπ2)-1(C2) ≠ 0

and we have h : x1 ≈ x2, xi ∈ Ci, contradicting the disjointedness of Ci. Finally, if C is a closed class of Γ𝜏 and x ∉ C, clΓ𝜏{x} is disjoint from C by regularity of Γ𝜏. Then clΓ𝜏{x} and C may be separated by open classes of elementary topology, which implies C is closed in this topology.

Infinite Sequences and Halting Problem. Thought of the Day 76.0


In attempting to extend the notion of depth from finite strings to infinite sequences, one encounters a familiar phenomenon: the definitions become sharper (e.g. recursively invariant), but their intuitive meaning is less clear, because of distinctions (e.g. between infintely-often and almost-everywhere properties) that do not exist in the finite case.

An infinite sequence X is called strongly deep if at every significance level s, and for every recursive function f, all but finitely many initial segments Xn have depth exceeding f(n).

It is necessary to require the initial segments to be deep almost everywhere rather than infinitely often, because even the most trivial sequence has infinitely many deep initial segments Xn (viz. the segments whose lengths n are deep numbers).

It is not difficult to show that the property of strong depth is invariant under truth-table equivalence (this is the same as Turing equivalence in recursively bounded time, or via a total recursive operator), and that the same notion would result if the initial segments were required to be deep in the sense of receiving less than 2−s of their algorithmic probability from f(n)-fast programs. The characteristic sequence of the halting set K is an example of a strongly deep sequence.

A weaker definition of depth, also invariant under truth-table equivalence, is perhaps more analogous to that adopted for finite strings:

An infinite sequence X is weakly deep if it is not computable in recursively bounded time from any algorithmically random infinite sequence.

Computability in recursively bounded time is equivalent to two other properties, viz. truth-table reducibility and reducibility via a total recursive operator.

By contrast to the situation with truth-table reducibility, Péter Gacs has shown that every sequence is computable from (i.e. Turing reducible to) an algorithmically random sequence if no bound is imposed on the time. This is the infinite analog of far more obvious fact that every finite string is computable from an algorithmically random string (e.g. its minimal program).

Every strongly deep sequence is weakly deep, but by intermittently padding K with large blocks of zeros, one can construct a weakly deep sequence with infinitely many shallow initial segments.

Truth table reducibility to an algorithmically random sequence is equivalent to the property studied by Levin et. al. of being random with respect to some recursive measure. Levin calls sequences with this property “proper” or “complete” sequences, and views them as more realistic and interesting than other sequences because they are the typical outcomes of probabilistic or deterministic effective processes operating in recursively bounded time.

Weakly deep sequences arise with finite probability when a universal Turing machine (with one-way input and output tapes, so that it can act as a transducer of infinite sequences) is given an infinite coin toss sequence for input. These sequences are necessarily produced very slowly: the time to output the n’th digit being bounded by no recursive function, and the output sequence contains evidence of this slowness. Because they are produced with finite probability, such sequences can contain only finite information about the halting problem.

Is General Theory of Relativity a Gauge Theory? Trajectories of Diffeomorphism.


Historically the problem of observables in classical and quantum gravity is closely related to the so-called Einstein hole problem, i.e. to some of the consequences of general covariance in general relativity (GTR).

The central question is the physical meaning of the points of the event manifold underlying GTR. In contrast to pure mathematics this is a non-trivial point in physics. While in pure differential geometry one simply decrees the existence of, for example, a (pseudo-) Riemannian manifold with a differentiable structure (i.e., an appropriate cover with coordinate patches) plus a (pseudo-) Riemannian metric, g, the relation to physics is not simply one-one. In popular textbooks about GTR, it is frequently stated that all diffeomorphic (space-time) manifolds, M are physically indistinguishable. Put differently:

S − T = Riem/Diff —– (1)

This becomes particularly virulent in the Einstein hole problem. i.e., assuming that we have a region of space-time, free of matter, we can apply a local diffeomorphism which only acts within this hole, letting the exterior invariant. We get thus in general two different metric tensors

g(x) , g′(x) := Φ ◦ g(x) —– (2)

in the hole while certain inital conditions lying outside of the hole are unchanged, thus yielding two different solutions of the Einstein field equations.

Many physicists consider this to be a violation of determinism (which it is not!) and hence argue that the class of observable quantities have to be drastically reduced in (quantum) gravity theory. They follow the line of reasoning developed by Dirac in the context of gauge theory, thus implying that GTR is essentially also a gauge theory. This then winds up to the conclusion:

Dirac observables in quantum gravity are quantities which are diffeomorphism invariant with the diffeomorphism group, Diff acting from M to M, i.e.

Φ : M → M —– (3)

One should note that with respect to physical observations there is no violation of determinism. An observer can never really observe two different metric fields on one and the same space-time manifold. This can only happen on the mathematical paper. He will use a fixed measurement protocol, using rods and clocks in e.g. a local inertial frame where special relativity locally applies and then extend the results to general coordinate frames.

We get a certain orbit under Diff if we start from a particular manifold M with a metric tensor g and take the orbit

{M, Φ ◦g} —– (4)

In general we have additional fields and matter distributions on M which are transformd accordingly.

Note that not even scalars are invariant in general in the above sense, i.e., not even the Ricci scalar is observable in the Dirac sense:

R(x) ≠ Φ ◦ R(x) —– (5)

in the generic case. Thus, this would imply that the class of admissible observables can be pretty small (even empty!). Furthermore, it follows that points of M are not a priori distinguishable. On the other hand, many consider the Ricci scalar at a point to be an observable quantity.

This winds up to the question whether GTR is a true gauge theory or perhaps only apparently so at a first glance, while on a more fundamental level it is something different. In the words of Kuchar (What is observable..),

Quantities non-invariant under the full diffeomorphism group are observable in gravity.

The reason for these apparently diverging opinions stems from the role reference systems are assumed to play in GTR with some arguing that the gauge property of general coordinate invariance is only of a formal nature.

In the hole argument it is for example argued that it is important to add some particle trajectories which cross each other, thus generating concrete events on M. As these point events transform accordingly under a diffeomorphism, the distance between the corresponding coordinates x, y equals the distance between the transformed points Φ(x), Φ(y), thus being a Dirac observable. On the other hand, the coordinates x or y are not observable.

One should note that this observation is somewhat tautological in the realm of Riemannian geometry as the metric is an absolute quantity, put differently (and somewhat sloppily), ds2 is invariant under passive and by the same token active coordinate transformation (diffeomorphisms) because, while conceptually different, the transformation properties under the latter operations are defined as in the passive case. In the case of GTR this absolute quantity enters via the equivalence principle i.e., distances are measured for example in a local inertial frame (LIF) where special relativity holds and are then generalized to arbitrary coordinate systems.

Conjuncted: Occam’s Razor and Nomological Hypothesis. Thought of the Day 51.1.1


Conjuncted here, here and here.

A temporally evolving system must possess a sufficiently rich set of symmetries to allow us to infer general laws from a finite set of empirical observations. But what justifies this hypothesis?

This question is central to the entire scientific enterprise. Why are we justified in assuming that scientific laws are the same in different spatial locations, or that they will be the same from one day to the next? Why should replicability of other scientists’ experimental results be considered the norm, rather than a miraculous exception? Why is it normally safe to assume that the outcomes of experiments will be insensitive to irrelevant details? Why, for that matter, are we justified in the inductive generalizations that are ubiquitous in everyday reasoning?

In effect, we are assuming that the scientific phenomena under investigation are invariant under certain symmetries – both temporal and spatial, including translations, rotations, and so on. But where do we get this assumption from? The answer lies in the principle of Occam’s Razor.

Roughly speaking, this principle says that, if two theories are equally consistent with the empirical data, we should prefer the simpler theory:

Occam’s Razor: Given any body of empirical evidence about a temporally evolving system, always assume that the system has the largest possible set of symmetries consistent with that evidence.

Making it more precise, we begin by explaining what it means for a particular symmetry to be “consistent” with a body of empirical evidence. Formally, our total body of evidence can be represented as a subset E of H, i.e., namely the set of all logically possible histories that are not ruled out by that evidence. Note that we cannot assume that our evidence is a subset of Ω; when we scientifically investigate a system, we do not normally know what Ω is. Hence we can only assume that E is a subset of the larger set H of logically possible histories.

Now let ψ be a transformation of H, and suppose that we are testing the hypothesis that ψ is a symmetry of the system. For any positive integer n, let ψn be the transformation obtained by applying ψ repeatedly, n times in a row. For example, if ψ is a rotation about some axis by angle θ, then ψn is the rotation by the angle nθ. For any such transformation ψn, we write ψ–n(E) to denote the inverse image in H of E under ψn. We say that the transformation ψ is consistent with the evidence E if the intersection

E ∩ ψ–1(E) ∩ ψ–2(E) ∩ ψ–3(E) ∩ …

is non-empty. This means that the available evidence (i.e., E) does not falsify the hypothesis that ψ is a symmetry of the system.

For example, suppose we are interested in whether cosmic microwave background radiation is isotropic, i.e., the same in every direction. Suppose we measure a background radiation level of x1 when we point the telescope in direction d1, and a radiation level of x2 when we point it in direction d2. Call these events E1 and E2. Thus, our experimental evidence is summarized by the event E = E1 ∩ E2. Let ψ be a spatial rotation that rotates d1 to d2. Then, focusing for simplicity just on the first two terms of the infinite intersection above,

E ∩ ψ–1(E) = E1 ∩ E2 ∩ ψ–1(E1) ∩ ψ–1(E2).

If x1 = x2, we have E1 = ψ–1(E2), and the expression for E ∩ ψ–1(E) simplifies to E1 ∩ E2 ∩ ψ–1(E1), which has at least a chance of being non-empty, meaning that the evidence has not (yet) falsified isotropy. But if x1 ≠ x2, then E1 and ψ–1(E2) are disjoint. In that case, the intersection E ∩ ψ–1(E) is empty, and the evidence is inconsistent with isotropy. As it happens, we know from recent astronomy that x1 ≠ x2 in some cases, so cosmic microwave background radiation is not isotropic, and ψ is not a symmetry.

Our version of Occam’s Razor now says that we should postulate as symmetries of our system a maximal monoid of transformations consistent with our evidence. Formally, a monoid Ψ of transformations (where each ψ in Ψ is a function from H into itself) is consistent with evidence E if the intersection

ψ∈Ψ ψ–1(E)

is non-empty. This is the generalization of the infinite intersection that appeared in our definition of an individual transformation’s consistency with the evidence. Further, a monoid Ψ that is consistent with E is maximal if no proper superset of Ψ forms a monoid that is also consistent with E.

Occam’s Razor (formal): Given any body E of empirical evidence about a temporally evolving system, always assume that the set of symmetries of the system is a maximal monoid Ψ consistent with E.

What is the significance of this principle? We define Γ to be the set of all symmetries of our temporally evolving system. In practice, we do not know Γ. A monoid Ψ that passes the test of Occam’s Razor, however, can be viewed as our best guess as to what Γ is.

Furthermore, if Ψ is this monoid, and E is our body of evidence, the intersection

ψ∈Ψ ψ–1(E)

can be viewed as our best guess as to what the set of nomologically possible histories is. It consists of all those histories among the logically possible ones that are not ruled out by the postulated symmetry monoid Ψ and the observed evidence E. We thus call this intersection our nomological hypothesis and label it Ω(Ψ,E).

To see that this construction is not completely far-fetched, note that, under certain conditions, our nomological hypothesis does indeed reflect the truth about nomological possibility. If the hypothesized symmetry monoid Ψ is a subset of the true symmetry monoid Γ of our temporally evolving system – i.e., we have postulated some of the right symmetries – then the true set Ω of all nomologically possible histories will be a subset of Ω(Ψ,E). So, our nomological hypothesis will be consistent with the truth and will, at most, be logically weaker than the truth.

Given the hypothesized symmetry monoid Ψ, we can then assume provisionally (i) that any empirical observation we make, corresponding to some event D, can be generalized to a Ψ-invariant law and (ii) that unconditional and conditional probabilities can be estimated from empirical frequency data using a suitable version of the Ergodic Theorem.

Conjuncted: Ergodicity. Thought of the Day 51.1


When we scientifically investigate a system, we cannot normally observe all possible histories in Ω, or directly access the conditional probability structure {PrE}E⊆Ω. Instead, we can only observe specific events. Conducting many “runs” of the same experiment is an attempt to observe as many histories of a system as possible, but even the best experimental design rarely allows us to observe all histories or to read off the full conditional probability structure. Furthermore, this strategy works only for smaller systems that we can isolate in laboratory conditions. When the system is the economy, the global ecosystem, or the universe in its entirety, we are stuck in a single history. We cannot step outside that history and look at alternative histories. Nonetheless, we would like to infer something about the laws of the system in general, and especially about the true probability distribution over histories.

Can we discern the system’s laws and true probabilities from observations of specific events? And what kinds of regularities must the system display in order to make this possible? In other words, are there certain “metaphysical prerequisites” that must be in place for scientific inference to work?

To answer these questions, we first consider a very simple example. Here T = {1,2,3,…}, and the system’s state at any time is the outcome of an independent coin toss. So the state space is X = {Heads, Tails}, and each possible history in Ω is one possible Heads/Tails sequence.

Suppose the true conditional probability structure on Ω is induced by the single parameter p, the probability of Heads. In this example, the Law of Large Numbers guarantees that, with probability 1, the limiting frequency of Heads in a given history (as time goes to infinity) will match p. This means that the subset of Ω consisting of “well-behaved” histories has probability 1, where a history is well-behaved if (i) there exists a limiting frequency of Heads for it (i.e., the proportion of Heads converges to a well-defined limit as time goes to infinity) and (ii) that limiting frequency is p. For this reason, we will almost certainly (with probability 1) arrive at the true conditional probability structure on Ω on the basis of observing just a single history and counting the number of Heads and Tails in it.

Does this result generalize? The short answer is “yes”, provided the system’s symmetries are of the right kind. Without suitable symmetries, generalizing from local observations to global laws is not possible. In a slogan, for scientific inference to work, there must be sufficient regularities in the system. In our toy system of the coin tosses, there are. Wigner (1967) recognized this point, taking symmetries to be “a prerequisite for the very possibility of discovering the laws of nature”.

Generally, symmetries allow us to infer general laws from specific observations. For example, let T = {1,2,3,…}, and let Y and Z be two subsets of the state space X. Suppose we have made the observation O: “whenever the state is in the set Y at time 5, there is a 50% probability that it will be in Z at time 6”. Suppose we know, or are justified in hypothesizing, that the system has the set of time symmetries {ψr : r = 1,2,3,….}, with ψr(t) = t + r, as defined as in the previous section. Then, from observation O, we can deduce the following general law: “for any t in T, if the state of the system is in the set Y at time t, there is a 50% probability that it will be in Z at time t + 1”.

However, this example still has a problem. It only shows that if we could make observation O, then our generalization would be warranted, provided the system has the relevant symmetries. But the “if” is a big “if”. Recall what observation O says: “whenever the system’s state is in the set Y at time 5, there is a 50% probability that it will be in the set Z at time 6”. Clearly, this statement is only empirically well supported – and thus a real observation rather than a mere hypothesis – if we can make many observations of possible histories at times 5 and 6. We can do this if the system is an experimental apparatus in a lab or a virtual system in a computer, which we are manipulating and observing “from the outside”, and on which we can perform many “runs” of an experiment. But, as noted above, if we are participants in the system, as in the case of the economy, an ecosystem, or the universe at large, we only get to experience times 5 and 6 once, and we only get to experience one possible history. How, then, can we ever assemble a body of evidence that allows us to make statements such as O?

The solution to this problem lies in the property of ergodicity. This is a property that a system may or may not have and that, if present, serves as the desired metaphysical prerequisite for scientific inference. To explain this property, let us give an example. Suppose T = {1,2,3,…}, and the system has all the time symmetries in the set Ψ = {ψr : r = 1,2,3,….}. Heuristically, the symmetries in Ψ can be interpreted as describing the evolution of the system over time. Suppose each time-step corresponds to a day. Then the history h = (a,b,c,d,e,….) describes a situation where today’s state is a, tomorrow’s is b, the next day’s is c, and so on. The transformed history ψ1(h) = (b,c,d,e,f,….) describes a situation where today’s state is b, tomorrow’s is c, the following day’s is d, and so on. Thus, ψ1(h) describes the same “world” as h, but as seen from the perspective of tomorrow. Likewise, ψ2(h) = (c,d,e,f,g,….) describes the same “world” as h, but as seen from the perspective of the day after tomorrow, and so on.

Given the set Ψ of symmetries, an event E (a subset of Ω) is Ψ-invariant if the inverse image of E under ψ is E itself, for all ψ in Ψ. This implies that if a history h is in E, then ψ(h) will also be in E, for all ψ. In effect, if the world is in the set E today, it will remain in E tomorrow, and the day after tomorrow, and so on. Thus, E is a “persistent” event: an event one cannot escape from by moving forward in time. In a coin-tossing system, where Ψ is still the set of time translations, examples of Ψ- invariant events are “all Heads”, where E contains only the history (Heads, Heads, Heads, …), and “all Tails”, where E contains only the history (Tails, Tails, Tails, …).

The system is ergodic (with respect to Ψ) if, for any Ψ-invariant event E, the unconditional probability of E, i.e., PrΩ(E), is either 0 or 1. In other words, the only persistent events are those which occur in almost no history (i.e., PrΩ(E) = 0) and those which occur in almost every history (i.e., PrΩ(E) = 1). Our coin-tossing system is ergodic, as exemplified by the fact that the Ψ-invariant events “all Heads” and “all Tails” occur with probability 0.

In an ergodic system, it is possible to estimate the probability of any event “empirically”, by simply counting the frequency with which that event occurs. Frequencies are thus evidence for probabilities. The formal statement of this is the following important result from the theory of dynamical systems and stochastic processes.

Ergodic Theorem: Suppose the system is ergodic. Let E be any event and let h be any history. For all times t in T, let Nt be the number of elements r in the set {1, 2, …, t} such that ψr(h) is in E. Then, with probability 1, the ratio Nt/t will converge to PrΩ(E) as t increases towards infinity.

Intuitively, Nt is the number of times the event E has “occurred” in history h from time 1 up to time t. The ratio Nt/t is therefore the frequency of occurrence of event E (up to time t) in history h. This frequency might be measured, for example, by performing a sequence of experiments or observations at times 1, 2, …, t. The Ergodic Theorem says that, almost certainly (i.e., with probability 1), the empirical frequency will converge to the true probability of E, PrΩ(E), as the number of observations becomes large. The estimation of the true conditional probability structure from the frequencies of Heads and Tails in our illustrative coin-tossing system is possible precisely because the system is ergodic.

To understand the significance of this result, let Y and Z be two subsets of X, and suppose E is the event “h(1) is in Y”, while D is the event “h(2) is in Z”. Then the intersection E ∩ D is the event “h(1) is in Y, and h(2) is in Z”. The Ergodic Theorem says that, by performing a sequence of observations over time, we can empirically estimate PrΩ(E) and PrΩ(E ∩ D) with arbitrarily high precision. Thus, we can compute the ratio PrΩ(E ∩ D)/PrΩ(E). But this ratio is simply the conditional probability PrΕ(D). And so, we are able to estimate the conditional probability that the state at time 2 will be in Z, given that at time 1 it was in Y. This illustrates that, by allowing us to estimate unconditional probabilities empirically, the Ergodic Theorem also allows us to estimate conditional probabilities, and in this way to learn the properties of the conditional probability structure {PrE}E⊆Ω.

We may thus conclude that ergodicity is what allows us to generalize from local observations to global laws. In effect, when we engage in scientific inference about some system, or even about the world at large, we rely on the hypothesis that this system, or the world, is ergodic. If our system, or the world, were “dappled”, then presumably we would not be able to presuppose ergodicity, and hence our ability to make scientific generalizations would be compromised.