The Affinity of Mirror Symmetry to Algebraic Geometry: Going Beyond Formalism



Even though formalism of homological mirror symmetry is an established case, what of other explanations of mirror symmetry which lie closer to classical differential and algebraic geometry? One way to tackle this is the so-called Strominger, Yau and Zaslow mirror symmetry or SYZ in short.

The central physical ingredient in this proposal is T-duality. To explain this, let us consider a superconformal sigma model with target space (M, g), and denote it (defined as a geometric functor, or as a set of correlation functions), as

CFT(M, g)

In physics, a duality is an equivalence

CFT(M, g) ≅ CFT(M′, g′)

which holds despite the fact that the underlying geometries (M,g) and (M′, g′) are not classically diffeomorphic.

T-duality is a duality which relates two CFT’s with toroidal target space, M ≅ M′ ≅ Td, but different metrics. In rough terms, the duality relates a “small” target space, with noncontractible cycles of length L < ls, with a “large” target space in which all such cycles have length L > ls.

This sort of relation is generic to dualities and follows from the following logic. If all length scales (lengths of cycles, curvature lengths, etc.) are greater than ls, string theory reduces to conventional geometry. Now, in conventional geometry, we know what it means for (M, g) and (M′, g′) to be non-isomorphic. Any modification to this notion must be associated with a breakdown of conventional geometry, which requires some length scale to be “sub-stringy,” with L < ls. To state T-duality precisely, let us first consider M = M′ = S1. We parameterise this with a coordinate X ∈ R making the identification X ∼ X + 2π. Consider a Euclidean metric gR given by ds2 = R2dX2. The real parameter R is usually called the “radius” from the obvious embedding in R2. This manifold is Ricci-flat and thus the sigma model with this target space is a conformal field theory, the “c = 1 boson.” Let us furthermore set the string scale ls = 1. With this, we attain a complete physical equivalence.

CFT(S1, gR) ≅ CFT(S1, g1/R)

Thus these two target spaces are indistinguishable from the point of view of string theory.

Just to give a physical picture for what this means, suppose for sake of discussion that superstring theory describes our universe, and thus that in some sense there must be six extra spatial dimensions. Suppose further that we had evidence that the extra dimensions factorized topologically and metrically as K5 × S1; then it would make sense to ask: What is the radius R of this S1 in our universe? In principle this could be measured by producing sufficiently energetic particles (so-called “Kaluza-Klein modes”), or perhaps measuring deviations from Newton’s inverse square law of gravity at distances L ∼ R. In string theory, T-duality implies that R ≥ ls, because any theory with R < ls is equivalent to another theory with R > ls. Thus we have a nontrivial relation between two (in principle) observable quantities, R and ls, which one might imagine testing experimentally. Let us now consider the theory CFT(Td, g), where Td is the d-dimensional torus, with coordinates Xi parameterising Rd/2πZd, and a constant metric tensor gij. Then there is a complete physical equivalence

CFT(Td, g) ≅ CFT(Td, g−1)

In fact this is just one element of a discrete group of T-duality symmetries, generated by T-dualities along one-cycles, and large diffeomorphisms (those not continuously connected to the identity). The complete group is isomorphic to SO(d, d; Z).

While very different from conventional geometry, T-duality has a simple intuitive explanation. This starts with the observation that the possible embeddings of a string into X can be classified by the fundamental group π1(X). Strings representing non-trivial homotopy classes are usually referred to as “winding states.” Furthermore, since strings interact by interconnecting at points, the group structure on π1 provided by concatenation of based loops is meaningful and is respected by interactions in the string theory. Now π1(Td) ≅ Zd, as an abelian group, referred to as the group of “winding numbers”.

Of course, there is another Zd we could bring into the discussion, the Pontryagin dual of the U(1)d of which Td is an affinization. An element of this group is referred to physically as a “momentum,” as it is the eigenvalue of a translation operator on Td. Again, this group structure is respected by the interactions. These two group structures, momentum and winding, can be summarized in the statement that the full closed string algebra contains the group algebra C[Zd] ⊕ C[Zd].

In essence, the point of T-duality is that if we quantize the string on a sufficiently small target space, the roles of momentum and winding will be interchanged. But the main point can be seen by bringing in some elementary spectral geometry. Besides the algebra structure, another invariant of a conformal field theory is the spectrum of its Hamiltonian H (technically, the Virasoro operator L0 + L ̄0). This Hamiltonian can be thought of as an analog of the standard Laplacian ∆g on functions on X, and its spectrum on Td with metric g is

Spec ∆= {∑i,j=1d gijpipj; pi ∈ Zd}

On the other hand, the energy of a winding string is (intuitively) a function of its length. On our torus, a geodesic with winding number w ∈ Zd has length squared

L2 = ∑i,j=1d gijwiwj

Now, the only string theory input we need to bring in is that the total Hamiltonian contains both terms,

H = ∆g + L2 + · · ·

where the extra terms … express the energy of excited (or “oscillator”) modes of the string. Then, the inversion g → g−1, combined with the interchange p ↔ w, leaves the spectrum of H invariant. This is T-duality.

There is a simple generalization of the above to the case with a non-zero B-field on the torus satisfying dB = 0. In this case, since B is a constant antisymmetric tensor, we can label CFT’s by the matrix g + B. Now, the basic T-duality relation becomes

CFT(Td, g + B) ≅ CFT(Td, (g + B)−1)

Another generalization, which is considerably more subtle, is to do T-duality in families, or fiberwise T-duality. The same arguments can be made, and would become precise in the limit that the metric on the fibers varies on length scales far greater than ls, and has curvature lengths far greater than ls. This is sometimes called the “adiabatic limit” in physics. While this is a very restrictive assumption, there are more heuristic physical arguments that T-duality should hold more generally, with corrections to the relations proportional to curvatures ls2R and derivatives ls∂ of the fiber metric, both in perturbation theory and from world-sheet instantons.

The Biological Kant. Note Quote.


The biological treatise takes as its object the realm of physics left out of Kant’s critical demarcations of scientific, that is, mathematical and mechanistic, physics. Here, the main idea was that scientifically understandable Nature was defined by lawfulness. In his Metaphysical Foundations of Natural Science, this idea was taken further in the following claim:

I claim, however, that there is only as much proper science to be found in any special doctrine on nature as there is mathematics therein, and further ‘a pure doctrine on nature about certain things in nature (doctrine on bodies and doctrine on minds) is only possible by means of mathematics’.

The basic idea is thus to identify Nature’s lawfulness with its ability to be studied by means of mathematical schemata uniting understanding and intuition. The central schema, to Kant, was numbers, so apt to be used in the understanding of mechanically caused movement. But already here, Kant is very well aware of a whole series of aspects of spontaneuosly experienced Nature is left out of sight by the concentration on matter in movement, and he calls for these further realms of Nature to be studied by a continuation of the Copernican turn, by the mind’s further study of the utmost limits of itself. Why do we spontaneously see natural purposes, in Nature? Purposiveness is wholly different from necessity, crucial to Kant’s definition of Nature. There is no reason in the general concept of Nature (as lawful) to assume that nature’s objects may serve each other as purposes. Nevertheless, we do not stop assuming just that. But what we do when we ascribe purposes to Nature is using the faculties of mind in another way than in science, much closer to the way we use them in the appreciation of beauty and art, the object of the first part of the book immediately before the treatment of teleological judgment. This judgment is characterized by a central distinction, already widely argued in this first part of the book: the difference between determinative and reflective judgments, respectively. While the judgment used scientifically to decide whether a specific case follows a certain rule in explanation by means of a derivation from a principle, and thus constitutes the objectivity of the object in question – the judgment which is reflective lacks all these features. It does not proceed by means of explanation, but by mere analogy; it is not constitutive, but merely regulative; it does not prove anything but merely judges, and it has no principle of reason to rest its head upon but the very act of judging itself. These ideas are now elaborated throughout the critic of teleological judgment.


In the section Analytik der teleologischen Urteilskraft, Kant gradually approaches the question: first is treated the merely formal expediency: We may ascribe purposes to geometry in so far as it is useful to us, just like rivers carrying fertile soils with them for trees to grow in may be ascribed purposes; these are, however, merely contingent purposes, dependent on an external telos. The crucial point is the existence of objects which are only possible as such in so far as defined by purposes:

That its form is not possible after mere natural laws, that is, such things which may not be known by us through understanding applied to objects of the senses; on the contrary that even the empirical knowledge about them, regarding their cause and effect, presupposes concepts of reason.

The idea here is that in order to conceive of objects which may not be explained with reference to understanding and its (in this case, mechanical) concepts only, these must be grasped by the non-empirical ideas of reason itself. If causes are perceived as being interlinked in chains, then such contingencies are to be thought of only as small causal circles on the chain, that is, as things being their own cause. Hence Kant’s definition of the Idea of a natural purpose:

an object exists as natural purpose, when it is cause and effect of itself.

This can be thought as an idea without contradiction, Kant maintains, but not conceived. This circularity (the small causal circles) is a very important feature in Kant’s tentative schematization of purposiveness. Another way of coining this Idea is – things as natural purposes are organized beings. This entails that naturally purposeful objects must possess a certain spatio-temporal construction: the parts of such a thing must be possible only through their relation to the whole – and, conversely, the parts must actively connect themselves to this whole. Thus, the corresponding idea can be summed up as the Idea of the Whole which is necessary to pass judgment on any empirical organism, and it is very interesting to note that Kant sums up the determination of any part of a Whole by all other parts in the phrase that a natural purpose is possible only as an organized and self-organizing being. This is probably the very birth certificate of the metaphysics of self-organization. It is important to keep in mind that Kant does not feel any vitalist temptation at supposing any organizing power or any autonomy on the part of the whole which may come into being only by this process of self-organization between its parts. When Kant talks about the forming power in the formation of the Whole, it is thus nothing outside of this self-organization of its parts.

This leads to Kant’s final definition: an organized being is that in which all that is alternating is ends and means. This idea is extremely important as a formalization of the idea of teleology: the natural purposes do not imply that there exists given, stable ends for nature to pursue, on the contrary, they are locally defined by causal cycles, in which every part interchangeably assumes the role of ends and means. Thus, there is no absolute end in this construal of nature’s teleology; it analyzes teleology formally at the same time as it relativizes it with respect to substance. Kant takes care to note that this maxim needs not be restricted to the beings – animals – which we spontaneously tend to judge as purposeful. The idea of natural purposes thus entails that there might exist a plan in nature rendering processes which we have all reasons to disgust purposeful for us. In this vision, teleology might embrace causality – and even aesthetics:

Also natural beauty, that is, its harmony with the free play of our epistemological faculties in the experience and judgment of its appearance can be seen in the way of objective purposivity of nature in its totality as system, in which man is a member.

An important consequence of Kant’s doctrine is that their teleology is so to speak secularized in two ways: (1) it is formal, and (2) it is local. It is formal because self-organization does not ascribe any special, substantial goal for organisms to pursue – other than the sustainment of self-organization. Thus teleology is merely a formal property in certain types of systems. This is why teleology is also local – it is to be found in certain systems when the causal chain form loops, as Kant metaphorically describes the cycles involved in self-organization – it is no overarching goal governing organisms from the outside. Teleology is a local, bottom-up, process only.

Kant does not in any way doubt the existence of organized beings, what is at stake is the possibility of dealing with them scientifically in terms of mechanics. Even if they exist as a given thing in experience, natural purposes can not receive any concept. This implies that biology is evident in so far as the existence of organisms cannot be doubted. Biology will never rise to the heights of science, its attempts at doing so are beforehand delimited, all scientific explanations of organisms being bound to be mechanical. Following this line of argument, it corresponds very well to present-day reductionism in biology, trying to take all problems of phenotypical characters, organization, morphogenesis, behavior, ecology, etc. back to the biochemistry of genetics. But the other side of the argument is that no matter how successful this reduction may prove, it will never be able to reduce or replace the teleological point of view necessary in order to understand the organism as such in the first place.

Evidently, there is something deeply unsatisfactory in this conclusion which is why most biologists have hesitated at adopting it and cling to either full-blown reductionism or to some brand of vitalism, subjecting themselves to the dangers of ‘transcendental illusion’ and allowing for some Goethe-like intuitive idea without any schematization. Kant tries to soften up the question by philosophical means by establishing an crossing over from metaphysics to physics, or, from the metaphysical constraints on mechanical physics and to physics in its empirical totality, including the organized beings of biology. Pure mechanics leaves physics as a whole unorganized, and this organization is sought to be established by means of mediating concepts’. Among them is the formative power, which is not conceived of in a vitalist substantialist manner, but rather a notion referring to the means by which matter manages to self-organize. It thus comprehends not only biological organization, but macrophysic solid matter physics as well. Here, he adds an important argument to the critic of judgment:

Because man is conscious of himself as a self-moving machine, without being able to further understand such a possibility, he can, and is entitled to, introduce a priori organic-moving forces of bodies into the classification of bodies in general and thus to distinguish mere mechanical bodies from self-propelled organic bodies.

Husserl’s Flip-Flop on Arithmetic Axiomatics. Thought of the Day 118.0


Husserl’s position in his Philosophy of Arithmetic (Psychological and Logical Investigations with Supplementary Texts) was resolutely anti-axiomatic. He attacked those who fell into remote, artificial constructions which, with the intent of building the elementary arithmetic concepts out of their ultimate definitional properties, interpret and change their meaning so much that totally strange, practically and scientifically useless conceptual formations finally result. Especially targeted was Frege’s ideal of the

founding of arithmetic on a sequence of formal definitions, out of which all the theorems of that science could be deduced purely syllogistically.

As soon as one comes to the ultimate, elemental concepts, Husserl reasoned, all defining has to come to an end. All one can then do is to point to the concrete phenomena from or through which the concepts are abstracted and show the nature of the abstractive process. A verbal explanation should place us in the proper state of mind for picking out, in inner or outer intuition, the abstract moments intended and for reproducing in ourselves the mental processes required for the formation of the concept. He said that his analyses had shown with incontestable clarity that the concepts of multiplicity and unity rest directly upon ultimate, elemental psychical data, and so belong among the indefinable concepts. Since the concept of number was so closely joined to them, one could scarcely speak of defining it either. All these points are made on the only pages of Philosophy of Arithmetic that Husserl ever explicitly retracted.

In On the Concept of Number, Husserl had set out to anchor arithmetical concepts in direct experience by analyzing the actual psychological processes to which he thought the concept of number owed its genesis. To obtain the concept of number of a concrete set of objects, say A, A, and A, he explained, one abstracts from the particular characteristics of the individual contents collected, only considering and retaining each one insofar as it is a something or a one. Regarding their collective combination, one thus obtains the general form of the set belonging to the set in question: one and one, etc. and. . . and one, to which a number name is assigned.

The enthusiastic espousal of psychologism of On the Concept of Number is not found in Philosophy of Arithmetic. Husserl later confessed that doubts about basic differences between the concept of number and the concept of collecting, which was all that could be obtained from reflection on acts, had troubled and tormented him from the very beginning and had eventually extended to all categorial concepts and to concepts of objectivities of any sort whatsoever, ultimately to include modern analysis and the theory of manifolds, and simultaneously to mathematical logic and the entire field of logic in general. He did not see how one could reconcile the objectivity of mathematics with psychological foundations for logic.

In sharp contrast to Brouwer who denounced logic as a source of truth, from the mid-1890s on, Husserl defended the view, which he attributed to Frege’s teacher Hermann Lotze, that pure arithmetic was basically no more than a branch of logic that had undergone independent development. He bid students not to be “scared” by that thought and to grow used to Lotze’s initially strange idea that arithmetic was only a particularly highly developed piece of logic.

Years later, Husserl would explain in Formal and Transcendental Logic that his

war against logical psychologism was meant to serve no other end than the supremely important one of making the specific province of analytic logic visible in its purity and ideal particularity, freeing it from the psychologizing confusions and misinterpretations in which it had remained enmeshed from the beginning.

He had come to see arithmetic truths as being analytic, as grounded in meanings independently of matters of fact. He had come to believe that the entire overthrowing of psychologism through phenomenology showed that his analyses in On the Concept of Number and Philosophy of Arithmetic had to be considered a pure a priori analysis of essence. For him, pure arithmetic, pure mathematics, and pure logic were a priori disciplines entirely grounded in conceptual essentialities, where truth was nothing other than the analysis of essences or concepts. Pure mathematics as pure arithmetic investigated what is grounded in the essence of number. Pure mathematical laws were laws of essence.

He is said to have told his students that it was to be stressed repeatedly and emphatically that the ideal entities so unpleasant for empiricistic logic, and so consistently disregarded by it, had not been artificially devised either by himself, or by Bolzano, but were given beforehand by the meaning of the universal talk of propositions and truths indispensable in all the sciences. This, he said, was an indubitable fact that had to be the starting point of all logic. All purely mathematical propositions, he taught, express something about the essence of what is mathematical. Their denial is consequently an absurdity. Denying a proposition of the natural sciences, a proposition about real matters of fact, never means an absurdity, a contradiction in terms. In denying the law of gravity, I cast experience to the wind. I violate the evident, extremely valuable probability that experience has established for the laws. But, I do not say anything “unthinkable,” absurd, something that nullifies the meaning of the word as I do when I say that 2 × 2 is not 4, but 5.

Husserl taught that every judgment either is a truth or cannot be a truth, that every presentation either accorded with a possible experience adequately redeeming it, or was in conflict with the experience, and that grounded in the essence of agreement was the fact that it was incompatible with the conflict, and grounded in the essence of conflict that it was incompatible with agreement. For him, that meant that truth ruled out falsehood and falsehood ruled out truth. And, likewise, existence and non-existence, correctness and incorrectness cancelled one another out in every sense. He believed that that became immediately apparent as soon as one had clarified the essence of existence and truth, of correctness and incorrectness, of Evidenz as consciousness of givenness, of being and not-being in fully redeeming intuition.

At the same time, Husserl contended, one grasps the “ultimate meaning” of the basic logical law of contradiction and of the excluded middle. When we state the law of validity that of any two contradictory propositions one holds and the other does not hold, when we say that for every proposition there is a contradictory one, Husserl explained, then we are continually speaking of the proposition in its ideal unity and not at all about mental experiences of individuals, not even in the most general way. With talk of truth it is always a matter of propositions in their ideal unity, of the meaning of statements, a matter of something identical and atemporal. What lies in the identically-ideal meaning of one’s words, what one cannot deny without invalidating the fixed meaning of one’s words has nothing at all to do with experience and induction. It has only to do with concepts. In sharp contrast to this, Brouwer saw intuitionistic mathematics as deviating from classical mathematics because the latter uses logic to generate theorems and in particular applies the principle of the excluded middle. He believed that Intuitionism had proven that no mathematical reality corresponds to the affirmation of the principle of the excluded middle and to conclusions derived by means of it. He reasoned that “since logic is based on mathematics – and not vice versa – the use of the Principle of the Excluded Middle is not permissible as part of a mathematical proof.”

Kant and Non-Euclidean Geometries. Thought of the Day 94.0


The argument that non-Euclidean geometries contradict Kant’s doctrine on the nature of space apparently goes back to Hermann Helmholtz and was retaken by several philosophers of science such as Hans Reichenbach (The Philosophy of Space and Time) who devoted much work to this subject. In a essay written in 1870, Helmholtz argued that the axioms of geometry are not a priori synthetic judgments (in the sense given by Kant), since they can be subjected to experiments. Given that Euclidian geometry is not the only possible geometry, as was believed in Kant’s time, it should be possible to determine by means of measurements whether, for instance, the sum of the three angles of a triangle is 180 degrees or whether two straight parallel lines always keep the same distance among them. If it were not the case, then it would have been demonstrated experimentally that space is not Euclidean. Thus the possibility of verifying the axioms of geometry would prove that they are empirical and not given a priori.

Helmholtz developed his own version of a non-Euclidean geometry on the basis of what he believed to be the fundamental condition for all geometries: “the possibility of figures moving without change of form or size”; without this possibility, it would be impossible to define what a measurement is. According to Helmholtz:

the axioms of geometry are not concerned with space-relations only but also at the same time with the mechanical deportment of solidest bodies in motion.

Nevertheless, he was aware that a strict Kantian might argue that the rigidity of bodies is an a priori property, but

then we should have to maintain that the axioms of geometry are not synthetic propositions… they would merely define what qualities and deportment a body must have to be recognized as rigid.

At this point, it is worth noticing that Helmholtz’s formulation of geometry is a rudimentary version of what was later developed as the theory of Lie groups. As for the transport of rigid bodies, it is well known that rigid motion cannot be defined in the framework of the theory of relativity: since there is no absolute simultaneity of events, it is impossible to move all parts of a material body in a coordinated and simultaneous way. What is defined as the length of a body depends on the reference frame from where it is observed. Thus, it is meaningless to invoke the rigidity of bodies as the basis of a geometry that pretend to describe the real world; it is only in the mathematical realm that the rigid displacement of a figure can be defined in terms of what mathematicians call a congruence.

Arguments similar to those of Helmholtz were given by Reichenbach in his intent to refute Kant’s doctrine on the nature of space and time. Essentially, the argument boils down to the following: Kant assumed that the axioms of geometry are given a priori and he only had classical geometry in mind, Einstein demonstrated that space is not Euclidean and that this could be verified empirically, ergo Kant was wrong. However, Kant did not state that space must be Euclidean; instead, he argued that it is a pure form of intuition. As such, space has no physical reality of its own, and therefore it is meaningless to ascribe physical properties to it. Actually, Kant never mentioned Euclid directly in his work, but he did refer many times to the physics of Newton, which is based on classical geometry. Kant had in mind the axioms of this geometry which is a most powerful tool of Newtonian mechanics. Actually, he did not even exclude the possibility of other geometries, as can be seen in his early speculations on the dimensionality of space.

The important point missed by Reichenbach is that Riemannian geometry is necessarily based on Euclidean geometry. More precisely, a Riemannian space must be considered as locally Euclidean in order to be able to define basic concepts such as distance and parallel transport; this is achieved by defining a flat tangent space at every point, and then extending all properties of this flat space to the globally curved space (Luther Pfahler Eisenhart Riemannian Geometry). To begin with, the structure of a Riemannian space is given by its metric tensor gμν from which the (differential) length is defined as ds2 = gμν dxμ dxν; but this is nothing less than a generalization of the usual Pythagoras theorem in Euclidean space. As for the fundamental concept of parallel transport, it is taken directly from its analogue in Euclidean space: it refers to the transport of abstract (not material, as Helmholtz believed) figures in such a space. Thus Riemann’s geometry cannot be free of synthetic a priori propositions because it is entirely based upon concepts such as length and congruence taken form Euclid. We may conclude that Euclids geometry is the condition of possibility for a more general geometry, such as Riemann’s, simply because it is the natural geometry adapted to our understanding; Kant would say that it is our form of grasping space intuitively. The possibility of constructing abstract spaces does not refute Kant’s thesis; on the contrary, it reinforces it.

Platonist Assertory Mathematics. Thought of the Day 88.0


Traditional Platonism, according to which our mathematical theories are bodies of truths about a realm of mathematical objects, assumes that only some amongst consistent theory candidates succeed in correctly describing the mathematical realm. For platonists, while mathematicians may contemplate alternative consistent extensions of the axioms for ZF (Zermelo–Fraenkel) set theory, for example, at most one such extension can correctly describe how things really are with the universe of sets. Thus, according to Platonists such as Kurt Gödel, intuition together with quasi-empirical methods (such as the justification of axioms by appeal to their intuitively acceptable consequences) can guide us in discovering which amongst alternative axiom candidates for set theory has things right about set theoretic reality. Alternatively, according to empiricists such as Quine, who hold that our belief in the truth of mathematical theories is justified by their role in empirical science, empirical evidence can choose between alternative consistent set theories. In Quine’s view, we are justified in believing the truth of the minimal amount of set theory required by our most attractive scientific account of the world.

Despite their differences at the level of detail, both of these versions of Platonism share the assumption that mere consistency is not enough for a mathematical theory: For such a theory to be true, it must correctly describe a realm of objects, where the existence of these objects is not guaranteed by consistency alone. Such a view of mathematical theories requires that we must have some grasp of the intended interpretation of an axiomatic theory that is independent of our axiomatization – otherwise inquiry into whether our axioms “get things right” about this intended interpretation would be futile. Hence, it is natural to see these Platonist views of mathematics as following Frege in holding that axioms

. . . must not contain a word or sign whose sense and meaning, or whose contribution to the expression of a thought, was not already completely laid down, so that there is no doubt about the sense of the proposition and the thought it expresses. The only question can be whether this thought is true and what its truth rests on. (Frege to Hilbert Gottlob Frege The Philosophical and Mathematical Correspondence)

On such an account, our mathematical axioms express genuine assertions (thoughts), which may or may not succeed in asserting truths about their subject matter. These Platonist views are “assertory” views of mathematics. Assertory views of mathematics make room for a gap between our mathematical theories and their intended subject matter, and the possibility of such a gap leads to at least two difficulties for traditional Platonism. These difficulties are articulated by Paul Benacerraf (here and here) in his aforementioned papers. The first difficulty comes from the realization that our mathematical theories, even when axioms are supplemented with less formal characterizations of their subject matter, may be insufficient to choose between alternative interpretations. For example, assertory views hold that the Peano axioms for arithmetic aim to assert truths about the natural numbers. But there are many candidate interpretations of these axioms, and nothing in the axioms, or in our wider mathematical practices, seems to suffice to pin down one interpretation over any other as the correct one. The view of mathematical theories as assertions about a specific realm of objects seems to force there to be facts about the correct interpretation of our theories even if, so far as our mathematical practice goes (for example, in the case of arithmetic), any ω-sequence would do.

Benacerraf’s second worry is perhaps even more pressing for assertory views. The possibility of a gap between our mathematical theories and their intended subject matter raises the question, “How do we know that our mathematical theories have things right about their subject matter?”. To answer this, we need to consider the nature of the purported objects about which our theories are supposed to assert truths. It seems that our best characterization of mathematical objects is negative: to account for the extent of our mathematical theories, and the timelessness of mathematical truths, it seems reasonable to suppose that mathematical objects are non-physical, non- spatiotemporal (and, it is sometimes added, mind- and language-independent) objects – in short, mathematical objects are abstract. But this negative characterization makes it difficult to say anything positive about how we could know anything about how things are with these objects. Assertory, Platonist views of mathematics are thus challenged to explain just how we are meant to evaluate our mathematical assertions – just how do the kinds of evidence these Platonists present in support of their theories succeed in ensuring that these theories track the truth?

Unique Derivative Operator: Reparametrization. Metric Part 2.


Moving on from first part.

Suppose ∇ is a derivative operator, and gab is a metric, on the manifold M. Then ∇ is compatible with gab iff ∇a gbc = 0.

Suppose γ is an arbitrary smooth curve with tangent field ξa and λa is an arbitrary smooth field on γ satisfying ξnnλa = 0. Then

ξnn(gabλaλb) = gabλaξnnλb + gabλbξnnλa + λaλbξnngab

= λaλbξnngab

Suppose first that ∇ngab = 0. Then it follows immediately that ξnngabλaλb = 0. So ∇ is compatible with gab. Suppose next that ∇ is compatible with gab. Then ∀ choices of γ and λa (satisfying ξnnλa =0), we have λaλbξnngab = 0. Since the choice of λa (at any particular point) is arbitrary and gab is symmetric, it follows that ξnngab = 0. But this must be true for arbitrary ξa (at any particular point), and so we have ∇ngab = 0.

Note that the condition of compatibility is also equivalent to ∇agbc = 0. Hence,

0 = gbnaδcn = gbna(gnrgrc) = gbngnragrc + gbngrcagnr

= δbragrc + gbngrcagnr = ∇agbc + gbngrcagnr.

So if ∇agbc = 0,it follows immediately that ∇agbc = 0. Conversely, if ∇agbc =0, then gbngrcagnr = 0. And therefore,

0 = gpbgscgbngrcagnr = δnpδrsagnr = ∇agps

The basic fact about compatible derivative operators is the following.

Suppose gab is a metric on the manifold M. Then there is a unique derivative operator on M that is compatible with gab.

It turns out that if a manifold admits a metric, then it necessarily satisfies the countable cover condition. And then it guarantees the existence of a derivative operator.) We do prove that if M admits a derivative operator ∇, then it admits exactly one ∇′ that is compatible with gab.

Every derivative operator ∇′ on M can be realized as ∇′ = (∇, Cabc), where Cabc is a smooth, symmetric field on M. Now

∇′agbc = ∇agbc + gnc Cnab + gbn Cnac = ∇agbc + Ccab + Cbac. So ∇′ will be compatible with gab (i.e., ∇′agbc = 0) iff

agbc = −Ccab − Cbac —– (1)

Thus it suffices for us to prove that there exists a unique smooth, symmetric field Cabc on M satisfying equation (1). To do so, we write equation (1) twice more after permuting the indices:

cgab = −Cbca − Cacb,

bgac = −Ccba − Cabc

If we subtract these two from the first equation, and use the fact that Cabc is symmetric in (b, c), we get

Cabc = 1/2 (∇agbc − ∇bgac − ∇cgab) —– (2)

and, therefore,

Cabc = 1/2 gan (∇ngbc − ∇bgnc − ∇cgnb) —– (3)

This establishes uniqueness. But clearly the field Cabc defined by equation (3) is smooth, symmetric, and satisfies equation (1). So we have existence as well.

In the case of positive definite metrics, there is another way to capture the significance of compatibility of derivative operators with metrics. Suppose the metric gab on M is positive definite and γ : [s1, s2] → M is a smooth curve on M. We associate with γ a length

|γ| = ∫s1s2 gabξaξb ds,

where ξa is the tangent field to γ. This assigned length is invariant under reparametrization. For suppose σ : [t1, t2] → [s1, s2] is a diffeomorphism we shall write s = σ(t) and ξ′a is the tangent field of γ′ = γ ◦ σ : [t1, t2] → M. Then

ξ′a = ξads/dt

We may as well require that the reparametrization preserve the orientation of the original curve – i.e., require that σ (t1) = s1 and σ (t2) = s2. In this case, ds/dt > 0 everywhere. (Only small changes are needed if we allow the reparametrization to reverse the orientation of the curve. In that case, ds/dt < 0 everywhere.) It

follows that

|γ’| = ∫t1t2 (gabξ′aξ′b)1/2 dt = ∫t1t2 (gabξaξb)1/2 ds/dt

= ∫s1s2 (gabξaξb)1/2 ds = |γ|

Let us say that γ : I → M is a curve from p to q if I is of the form [s1, s2], p = γ(s1), and q = γ(s2). In this (positive definite) case, we take the distance from p to q to be

d(p,q)=g.l.b. |γ|:γ is a smooth curve from p to q.

Further, we say that a curve γ : I → M is minimal if, for all s ∈ I, ∃ an ε > 0 such that, for all s1, s2 ∈ I with s1 ≤ s ≤ s2, if s2 − s1 < ε and if γ′ = γ|[s1, s2] (the restriction of γ to [s1, s2]), then |γ′| = d(γ(s1), γ(s2)) . Intuitively, minimal curves are “locally shortest curves.” Certainly they need not be “shortest curves” outright. (Consider, for example, two points on the “equator” of a two-sphere that are not antipodal to one another. An equatorial curve running from one to the other the “long way” qualifies as a minimal curve.)

One can characterize the unique derivative operator compatible with a positive definite metric gab in terms of the latter’s associated minimal curves. But in doing so, one has to pay attention to parametrization.

Let us say that a smooth curve γ : I → M with tangent field ξa is parametrized by arc length if ∀ ξa, gabξaξb = 1. In this case, if I = [s1, s2], then

|γ| = ∫s1s2 (gabξaξb)1/2 ds = ∫s1s2 1.ds = s2 – s1

Any non-trivial smooth curve can always be reparametrized by arc length.

Grothendieck’s Universes and Wiles Proof (Fermat’s Last Theorem). Thought of the Day 77.0


In formulating the general theory of cohomology Grothendieck developed the concept of a universe – a collection of sets large enough to be closed under any operation that arose. Grothendieck proved that the existence of a single universe is equivalent over ZFC to the existence of a strongly inaccessible cardinal. More precisely, 𝑈 is the set 𝑉𝛼 of all sets with rank below 𝛼 for some uncountable strongly inaccessible cardinal.

Colin McLarty summarised the general situation:

Large cardinals as such were neither interesting nor problematic to Grothendieck and this paper shares his view. For him they were merely legitimate means to something else. He wanted to organize explicit calculational arithmetic into a geometric conceptual order. He found ways to do this in cohomology and used them to produce calculations which had eluded a decade of top mathematicians pursuing the Weil conjectures. He thereby produced the basis of most current algebraic geometry and not only the parts bearing on arithmetic. His cohomology rests on universes but weaker foundations also suffice at the loss of some of the desired conceptual order.

The applications of cohomology theory implicitly rely on universes. Most number theorists regard the applications as requiring much less than their ‘on their face’ strength and in particular believe the large cardinal appeals are ‘easily eliminable’. There are in fact two issues. McLarty writes:

Wiles’s proof uses hard arithmetic some of which is on its face one or two orders above PA, and it uses functorial organizing tools some of which are on their face stronger than ZFC.

There are two current programs for verifying in detail the intuition that the formal requirements for Wiles proof of Fermat’s last theorem can be substantially reduced. On the one hand, McLarty’s current work aims to reduce the ‘on their face’ strength of the results in cohomology from large cardinal hypotheses to finite order Peano. On the other hand Macintyre aims to reduce the ‘on their face’ strength of results in hard arithmetic to Peano. These programs may be complementary or a full implementation of Macintyre’s might avoid the first.

McLarty reduces

  1. ‘ all of SGA (Revêtements Étales et Groupe Fondamental)’ to Bounded Zermelo plus a Universe.
  2. “‘the currently existing applications” to Bounded Zermelo itself, thus the con-sistency strength of simple type theory.’ The Grothendieck duality theorem and others like it become theorem schema.

The essential insight of the McLarty’s papers on cohomology is the role of replacement in giving strength to the universe hypothesis. A 𝑍𝐶-universe is defined to be a transitive set U modeling 𝑍𝐶 such that every subset of an element of 𝑈 is itself an element of 𝑈. He remarks that any 𝑉𝛼 for 𝛼 a limit ordinal is provable in 𝑍𝐹𝐶 to be a 𝑍𝐶-universe. McLarty then asserts the essential use of replacement in the original Grothendieck formulation is to prove: For an arbitrary ring 𝑅 every module over 𝑅 embeds in an injective 𝑅-module and thus injective resolutions exist for all 𝑅-modules. But he gives a proof in a system with the proof theoretic strength of finite order arithmetic that every sheaf of modules on any small site has an infinite resolution.

Angus Macintyre dismisses with little comment the worries about the use of ‘large-structure’ tools in Wiles proof. He begins his appendix,

At present, all roads to a proof of Fermat’s Last Theorem pass through some version of a Modularity Theorem (generically MT) about elliptic curves defined over Q . . . A casual look at the literature may suggest that in the formulation of MT (or in some of the arguments proving whatever version of MT is required) there is essential appeal to higher-order quantification, over one of the following.

He then lists such objects as C, modular forms, Galois representations …and summarises that a superficial formulation of MT would be 𝛱1m for some small 𝑚. But he continues,

I hope nevertheless that the present account will convince all except professional sceptics that MT is really 𝛱01.

There then follows a 13 page highly technical sketch of an argument for the proposition that MT can be expressed by a sentence in 𝛱01 along with a less-detailed strategy for proving MT in PA.

Macintyre’s complexity analysis is in traditional proof theoretic terms. But his remark that ‘genus’ is more a useful geometric classification of curves than the syntactic notion of degree suggests that other criteria may be relevant. McLarty’s approach is not really a meta-theorem, but a statement that there was only one essential use of replacement and it can be eliminated. In contrast, Macintyre argues that ‘apparent second order quantification’ can be replaced by first order quantification. But the argument requires deep understanding of the number theory for each replacement in a large number of situations. Again, there is no general theorem that this type of result is provable in PA.



During his attempt to axiomatize the category of all categories, Lawvere says

Our intuition tells us that whenever two categories exist in our world, then so does the corresponding category of all natural transformations between the functors from the first category to the second (The Category of Categories as a Foundation).

However, if one tries to reduce categorial constructions to set theory, one faces some serious problems in the case of a category of functors. Lawvere (who, according to his aim of axiomatization, is not concerned by such a reduction) relies here on “intuition” to stress that those working with categorial concepts despite these problems have the feeling that the envisaged construction is clear, meaningful and legitimate. Not the reducibility to set theory, but an “intuition” to be specified answers for clarity, meaningfulness and legitimacy of a construction emerging in a mathematical working situation. In particular, Lawvere relies on a collective intuition, a common sense – for he explicitly says “our intuition”. Further, one obviously has to deal here with common sense on a technical level, for the “we” can only extend to a community used to the work with the concepts concerned.

In the tradition of philosophy, “intuition” means immediate, i.e., not conceptually mediated cognition. The use of the term in the context of validity (immediate insight in the truth of a proposition) is to be thoroughly distinguished from its use in the sensual context (the German Anschauung). Now, language is a manner of representation, too, but contrary to language, in the context of images the concept of validity is meaningless.

Obviously, the aspect of cognition guiding is touched on here. Especially the sensual intuition can take the guiding (or heuristic) function. There have been many working situations in history of mathematics in which making the objects of investigation accessible to a sensual intuition (by providing a Veranschaulichung) yielded considerable progress in the development of the knowledge concerning these objects. As an example, take the following account by Emil Artin of Emmy Noether’s contribution to the theory of algebras:

Emmy Noether introduced the concept of representation space – a vector space upon which the elements of the algebra operate as linear transformations, the composition of the linear transformation reflecting the multiplication in the algebra. By doing so she enables us to use our geometric intuition.

Similarly, Fréchet thinks to have really “powered” research in the theory of functions and functionals by the introduction of a “geometrical” terminology:

One can [ …] consider the numbers of the sequence [of coefficients of a Taylor series] as coordinates of a point in a space [ …] of infinitely many dimensions. There are several advantages to proceeding thus, for instance the advantage which is always present when geometrical language is employed, since this language is so appropriate to intuition due to the analogies it gives birth to.

Mathematical terminology often stems from a current language usage whose (intuitive, sensual) connotation is welcomed and serves to give the user an “intuition” of what is intended. While Category Theory is often classified as a highly abstract matter quite remote from intuition, in reality it yields, together with its applications, a multitude of examples for the role of current language in mathematical conceptualization.

This notwithstanding, there is naturally also a tendency in contemporary mathematics to eliminate as much as possible commitments to (sensual) intuition in the erection of a theory. It seems that algebraic geometry fulfills only in the language of schemes that essential requirement of all contemporary mathematics: to state its definitions and theorems in their natural abstract and formal setting in which they can be considered independent of geometric intuition (Mumford D., Fogarty J. Geometric Invariant Theory).

In the pragmatist approach, intuition is seen as a relation. This means: one uses a piece of language in an intuitive manner (or not); intuitive use depends on the situation of utterance, and it can be learned and transformed. The reason for this relational point of view, consists in the pragmatist conviction that each cognition of an object depends on the means of cognition employed – this means that for pragmatism there is no intuitive (in the sense of “immediate”) cognition; the term “intuitive” has to be given a new meaning.

What does it mean to use something intuitively? Heinzmann makes the following proposal: one uses language intuitively if one does not even have the idea to question validity. Hence, the term intuition in the Heinzmannian reading of pragmatism takes a different meaning, no longer signifies an immediate grasp. However, it is yet to be explained what it means for objects in general (and not only for propositions) to “question the validity of a use”. One uses an object intuitively, if one is not concerned with how the rules of constitution of the object have been arrived at, if one does not focus the materialization of these rules but only the benefits of an application of the object in the present context. “In principle”, the cognition of an object is determined by another cognition, and this determination finds its expression in the “rules of constitution”; one uses it intuitively (one does not bother about the being determined of its cognition), if one does not question the rules of constitution (does not focus the cognition which determines it). This is precisely what one does when using an object as a tool – because in doing so, one does not (yet) ask which cognition determines the object. When something is used as a tool, this constitutes an intuitive use, whereas the use of something as an object does not (this defines tool and object). Here, each concept in principle can play both roles; among two concepts, one may happen to be used intuitively before and the other after the progress of insight. Note that with respect to a given cognition, Peirce when saying “the cognition which determines it” always thinks of a previous cognition because he thinks of a determination of a cognition in our thought by previous thoughts. In conceptual history of mathematics, however, one most often introduced an object first as a tool and only after having done so did it come to one’s mind to ask for “the cognition which determines the cognition of this object” (that means, to ask how the use of this object can be legitimized).

The idea that it could depend on the situation whether validity is questioned or not has formerly been overlooked, perhaps because one always looked for a reductionist epistemology where the capacity called intuition is used exclusively at the last level of regression; in a pragmatist epistemology, to the contrary, intuition is used at every level in form of the not thematized tools. In classical systems, intuition was not simply conceived as a capacity; it was actually conceived as a capacity common to all human beings. “But the power of intuitively distinguishing intuitions from other cognitions has not prevented men from disputing very warmly as to which cognitions are intuitive”. Moreover, Peirce criticises strongly cartesian individualism (which has it that the individual has the capacity to find the truth). We could sum up this philosophy thus: we cannot reach definite truth, only provisional; significant progress is not made individually but only collectively; one cannot pretend that the history of thought did not take place and start from scratch, but every cognition is determined by a previous cognition (maybe by other individuals); one cannot uncover the ultimate foundation of our cognitions; rather, the fact that we sometimes reach a new level of insight, “deeper” than those thought of as fundamental before, merely indicates that there is no “deepest” level. The feeling that something is “intuitive” indicates a prejudice which can be philosophically criticised (even if this does not occur to us at the beginning).

In our approach, intuitive use is collectively determined: it depends on the particular usage of the community of users whether validity criteria are or are not questioned in a given situation of language use. However, it is acknowledged that for example scientific communities develop usages making them communities of language users on their own. Hence, situations of language use are not only partitioned into those where it comes to the users’ mind to question validity criteria and those where it does not, but moreover this partition is specific to a particular community (actually, the community of language users is established partly through a peculiar partition; this is a definition of the term “community of language users”). The existence of different communities with different common senses can lead to the following situation: something is used intuitively by one group, not intuitively by another. In this case, discussions inside the discipline occur; one has to cope with competing common senses (which are therefore not really “common”). This constitutes a task for the historian.

String’s Depth of Burial


A string’s depth might be defined as the execution time of its minimal program.

The difficulty with this definition arises in cases where the minimal program is only a few bits smaller than some much faster program, such as a print program, to compute the same output x. In this case, slight changes in x may induce arbitrarily large changes in the run time of the minimal program, by changing which of the two competing programs is minimal. Analogous instability manifests itself in translating programs from one universal machine to another. This instability emphasizes the essential role of the quantity of buried redundancy, not as a measure of depth, but as a certifier of depth. In terms of the philosophy-of-science metaphor, an object whose minimal program is only a few bits smaller than its print program is like an observation that points to a nontrivial hypothesis, but with only a low level of statistical confidence.

To adequately characterize a finite string’s depth one must therefore consider the amount of buried redundancy as well as the depth of its burial. A string’s depth at significance level s might thus be defined as that amount of time complexity which is attested by s bits worth of buried redundancy. This characterization of depth may be formalized in several ways.

A string’s depth at significance level s be defined as the time required to compute the string by a program no more than s bits larger than the minimal program.

This definition solves the stability problem, but is unsatisfactory in the way it treats multiple programs of the same length. Intuitively, 2k distinct (n + k)-bit programs that compute same output ought to be accorded the same weight as one n-bit program; but, by the present definition, they would be given no more weight than one (n + k)-bit program.

A string’s depth at signicifcance level s depth might be defined as the time t required for the string’s time-bounded algorithmic probability Pt(x) to rise to within a factor 2−s of its asymptotic time-unbounded value P(x).

This formalizes the notion that for the string to have originated by an effective process of t steps or fewer is less plausible than for the first s tosses of a fair coin all to come up heads.

It is not known whether there exist strings that are deep, in other words, strings having no small fast programs, even though they have enough large fast programs to contribute a significant fraction of their algorithmic probability. Such strings might be called deterministically deep but probabilistically shallow, because their chance of being produced quickly in a probabilistic computation (e.g. one where the input bits of U are supplied by coin tossing) is significant compared to their chance of being produced slowly. The question of whether such strings exist is probably hard to answer because it does not relativize uniformly. Deterministic and probabilistic depths are not very different relative to a random coin-toss oracle A of the equality of random-oracle-relativized deterministic and probabilistic polynomial time complexity classes; but they can be very different relative to an oracle B deliberately designed to hide information from deterministic computations (this parallels Hunt’s proof that deterministic and probabilistic polynomial time are unequal relative to such an oracle).

(Depth of Finite Strings): Let x and w be strings and s a significance parameter. A string’s depth at significance level s, denoted Ds(x), will be defined as min{T(p) : (|p|−|p| < s)∧(U(p) = x)}, the least time required to compute it by a s-incompressible program. At any given significance level, a string will be called t-deep if its depth exceeds t, and t-shallow otherwise.

The difference between this definition and the previous one is rather subtle philosophically and not very great quantitatively. Philosophically, when each individual hypothesis for the rapid origin of x is implausible at the 2−s confidence level, then it requires only that a weighted average of all such hypotheses be implausible.

There exist constants c1 and c2 such that for any string x, if programs running in time ≤ t contribute a fraction between 2−s and 2−s+1 of the string’s total algorithmic probability, then x has depth at most t at significance level s + c1 and depth at least t at significance level s − min{H(s), H(t)} − c2.

Proof : The first part follows easily from the fact that any k-compressible self-delimiting program p is associated with a unique, k − O(1) bits shorter, program of the form “execute the result of executing p∗”. Therefore there exists a constant c1 such that if all t-fast programs for x were s + c1– compressible, the associated shorter programs would contribute more than the total algorithmic probability of x. The second part follows because, roughly, if fast programs contribute only a small fraction of the algorithmic probability of x, then the property of being a fast program for x is so unusual that no program having that property can be random. More precisely, the t-fast programs for x constitute a finite prefix set, a superset S of which can be computed by a program of size H(x) + min{H(t), H(s)} + O(1) bits. (Given x∗ and either t∗ or s∗, begin enumerating all self-delimiting programs that compute x, in order of increasing running time, and quit when either the running time exceeds t or the accumulated measure of programs so far enumerated exceeds 2−(H(x)−s)). Therefore there exists a constant c2 such that, every member of S, and thus every t-fast program for x, is compressible by at least s − min{H(s), H(t)} − O(1) bits.

The ability of universal machines to simulate one another efficiently implies a corresponding degree of machine-independence for depth: for any two efficiently universal machines of the sort considered here, there exists a constant c and a linear polynomial L such that for any t, strings whose (s+c)-significant depth is at least L(t) on one machine will have s-significant depth at least t on the other.

Depth of one string relative to another may be defined analogously, and represents the plausible time required to produce one string, x, from another, w.

(Relative Depth of Finite Strings): For any two strings w and x, the depth of x relative to w at significance level s, denoted Ds(x/w), will be defined as min{T(p, w) : (|p|−|(p/w)∗| < s)∧(U(p, w) = x)}, the least time required to compute x from w by a program that is s-incompressible relative to w.

Depth of a string relative to its length is a particularly useful notion, allowing us, as it were, to consider the triviality or nontriviality of the “content” of a string (i.e. its bit sequence), independent of its “form” (length). For example, although the infinite sequence 000… is intuitively trivial, its initial segment 0n is deep whenever n is deep. However, 0n is always shallow relative to n, as is, with high probability, a random string of length n.

In order to adequately represent the intuitive notion of stored mathematical work, it is necessary that depth obey a “slow growth” law, i.e. that fast deterministic processes be unable to transform a shallow object into a deep one, and that fast probabilistic processes be able to do so only with low probability.

(Slow Growth Law): Given any data string x and two significance parameters s2 > s1, a random program generated by coin tossing has probability less than 2−(s2−s1)+O(1) of transforming x into an excessively deep output, i.e. one whose s2-significant depth exceeds the s1-significant depth of x plus the run time of the transforming program plus O(1). More precisely, there exist positive constants c1, c2 such that for all strings x, and all pairs of significance parameters s2 > s1, the prefix set {q : Ds2(U(q, x)) > Ds1(x) + T(q, x) + c1} has measure less than 2−(s2−s1)+c2.

Proof: Let p be a s1-incompressible program which computes x in time Ds1(x), and let r be the restart prefix mentioned in the definition of the U machine. Let Q be the prefix set {q : Ds2(U(q, x)) > T(q, x) + Ds1(x) + c1}, where the constant c1 is sufficient to cover the time overhead of concatenation. For all q ∈ Q, the program rpq by definition computes some deep result U(q, x) in less time than that result’s own s2-significant depth, and so rpq must be compressible by s2 bits. The sum of the algorithmic probabilities of strings of the form rpq, where q ∈ Q, is therefore

Σq∈Q P(rpq)< Σq∈Q 2−|rpq| + s2 = 2−|r|−|p|+s2 μ(Q)

On the other hand, since the self-delimiting program p can be recovered from any string of the form rpq (by deleting r and executing the remainder pq until halting occurs, by which time exactly p will have been read), the algorithmic probability of p is at least as great (within a constant factor) as the sum of the algorithmic probabilities of the strings {rpq : q ∈ Q} considered above:

P(p) > μ(Q) · 2−|r|−|p|+s2−O(1)

Recalling the fact that minimal program size is equal within a constant factor to the −log of algorithmic probability, and the s1-incompressibility of p, we have P(p) < 2−(|p|−s1+O(1)), and therefore finally

μ(Q) < 2−(s2−s1)+O(1), which was to be demonstrated.

Universal Turing Machine: Algorithmic Halting


A natural number x will be identified with the x’th binary string in lexicographic order (Λ,0,1,00,01,10,11,000…), and a set X of natural numbers will be identified with its characteristic sequence, and with the real number between 0 and 1 having that sequence as its dyadic expansion. The length of a string x will be denoted |x|, the n’th bit of an infinite sequence X will be noted X(n), and the initial n bits of X will be denoted Xn. Concatenation of strings p and q will be denoted pq.

We now define the information content (and later the depth) of finite strings using a universal Turing machine U. A universal Turing machine may be viewed as a partial recursive function of two arguments. It is universal in the sense that by varying one argument (“program”) any partial recursive function of the other argument (“data”) can be obtained. In the usual machine formats, program, data and output are all finite strings, or, equivalently, natural numbers. However, it is not possible to take a uniformly weighted average over a countably infinite set. Chaitin’s universal machine has two tapes: a read-only one-way tape containing the infinite program; and an ordinary two-way read/write tape, which is used for data input, intermediate work, and output, all of which are finite strings. Our machine differs from Chaitin’s in having some additional auxiliary storage (e.g. another read/write tape) which is needed only to improve the time efficiency of simulations.

We consider only terminating computations, during which, of course, only a finite portion of the program tape can be read. Therefore, the machine’s behavior can still be described by a partial recursive function of two string arguments U(p, w), if we use the first argument to represent that portion of the program that is actually read in the course of a particular computation. The expression U (p, w) = x will be used to indicate that the U machine, started with any infinite sequence beginning with p on its program tape and the finite string w on its data tape, performs a halting computation which reads exactly the initial portion p of the program, and leaves output data x on the data tape at the end of the computation. In all other cases (reading less than p, more than p, or failing to halt), the function U(p, w) is undefined. Wherever U(p, w) is defined, we say that p is a self-delimiting program to compute x from w, and we use T(p, w) to represent the time (machine cycles) of the computation. Often we will consider computations without input data; in that case we abbreviate U(p, Λ) and T(p, Λ) as U(p) and T(p) respectively.

The self-delimiting convention for the program tape forces the domain of U and T, for each data input w, to be a prefix set, that is, a set of strings no member of which is the extension of any other member. Any prefix set S obeys the Kraft inequality

p∈S 2−|p| ≤ 1

Besides being self-delimiting with regard to its program tape, the U machine must be efficiently universal in the sense of being able to simulate any other machine of its kind (Turing machines with self-delimiting program tape) with at most an additive constant constant increase in program size and a linear increase in execution time.

Without loss of generality we assume that there exists for the U machine a constant prefix r which has the effect of stacking an instruction to restart the computation when it would otherwise end. This gives the machine the ability to concatenate programs to run consecutively: if U(p, w) = x and U(q, x) = y, then U(rpq, w) = y. Moreover, this concatenation should be efficient in the sense that T (rpq, w) should exceed T (p, w) + T (q, x) by at most O(1). This efficiency of running concatenated programs can be realized with the help of the auxiliary storage to stack the restart instructions.

Sometimes we will generalize U to have access to an “oracle” A, i.e. an infinite look-up table which the machine can consult in the course of its computation. The oracle may be thought of as an arbitrary 0/1-valued function A(x) which the machine can cause to be evaluated by writing the argument x on a special tape and entering a special state of the finite control unit. In the next machine cycle the oracle responds by sending back the value A(x). The time required to evaluate the function is thus linear in the length of its argument. In particular we consider the case in which the information in the oracle is random, each location of the look-up table having been filled by an independent coin toss. Such a random oracle is a function whose values are reproducible, but otherwise unpredictable and uncorrelated.

Let {φAi (p, w): i = 0,1,2…} be an acceptable Gödel numbering of A-partial recursive functions of two arguments and {φAi (p, w)} an associated Blum complexity measure, henceforth referred to as time. An index j is called self-delimiting iff, for all oracles A and all values w of the second argument, the set { x : φAj (x, w) is defined} is a prefix set. A self-delimiting index has efficient concatenation if there exists a string r such that for all oracles A and all strings w, x, y, p, and q,if φAj (p, w) = x and φAj (q, x) = y, then φAj(rpq, w) = y and φAj (rpq, w) = φAj (p, w) + φAj (q, x) + O(1). A self-delimiting index u with efficient concatenation is called efficiently universal iff, for every self-delimiting index j with efficient concatenation, there exists a simulation program s and a linear polynomial L such that for all oracles A and all strings p and w, and

φAu(sp, w) = φAj (p, w)


ΦAu(sp, w) ≤ L(ΦAj (p, w))

The functions UA(p,w) and TA(p,w) are defined respectively as φAu(p, w) and ΦAu(p, w), where u is an efficiently universal index.

For any string x, the minimal program, denoted x∗, is min{p : U(p) = x}, the least self-delimiting program to compute x. For any two strings x and w, the minimal program of x relative to w, denoted (x/w)∗, is defined similarly as min{p : U(p,w) = x}.

By contrast to its minimal program, any string x also has a print program, of length |x| + O(log|x|), which simply transcribes the string x from a verbatim description of x contained within the program. The print program is logarithmically longer than x because, being self-delimiting, it must indicate the length as well as the contents of x. Because it makes no effort to exploit redundancies to achieve efficient coding, the print program can be made to run quickly (e.g. linear time in |x|, in the present formalism). Extra information w may help, but cannot significantly hinder, the computation of x, since a finite subprogram would suffice to tell U to simply erase w before proceeding. Therefore, a relative minimal program (x/w)∗ may be much shorter than the corresponding absolute minimal program x∗, but can only be longer by O(1), independent of x and w.

A string is compressible by s bits if its minimal program is shorter by at least s bits than the string itself, i.e. if |x∗| ≤ |x| − s. Similarly, a string x is said to be compressible by s bits relative to a string w if |(x/w)∗| ≤ |x| − s. Regardless of how compressible a string x may be, its minimal program x∗ is compressible by at most an additive constant depending on the universal computer but independent of x. [If (x∗)∗ were much smaller than x∗, then the role of x∗ as minimal program for x would be undercut by a program of the form “execute the result of executing (x∗)∗.”] Similarly, a relative minimal program (x/w)∗ is compressible relative to w by at most a constant number of bits independent of x or w.

The algorithmic probability of a string x, denoted P(x), is defined as {∑2−|p| : U(p) = x}. This is the probability that the U machine, with a random program chosen by coin tossing and an initially blank data tape, will halt with output x. The time-bounded algorithmic probability, Pt(x), is defined similarly, except that the sum is taken only over programs which halt within time t. We use P(x/w) and Pt(x/w) to denote the analogous algorithmic probabilities of one string x relative to another w, i.e. for computations that begin with w on the data tape and halt with x on the data tape.

The algorithmic entropy H(x) is defined as the least integer greater than −log2P(x), and the conditional entropy H(x/w) is defined similarly as the least integer greater than − log2P(x/w). Among the most important properties of the algorithmic entropy is its equality, to within O(1), with the size of the minimal program:

∃c∀x∀wH(x/w) ≤ |(x/w)∗| ≤ H(x/w) + c

The first part of the relation, viz. that algorithmic entropy should be no greater than minimal program size, is obvious, because of the minimal program’s own contribution to the algorithmic probability. The second half of the relation is less obvious. The approximate equality of algorithmic entropy and minimal program size means that there are few near-minimal programs for any given input/output pair (x/w), and that every string gets an O(1) fraction of its algorithmic probability from its minimal program.

Finite strings, such as minimal programs, which are incompressible or nearly so are called algorithmically random. The definition of randomness for finite strings is necessarily a little vague because of the ±O(1) machine-dependence of H(x) and, in the case of strings other than self-delimiting programs, because of the question of how to count the information encoded in the string’s length, as opposed to its bit sequence. Roughly speaking, an n-bit self-delimiting program is considered random (and therefore not ad-hoc as a hypothesis) iff its information content is about n bits, i.e. iff it is incompressible; while an externally delimited n-bit string is considered random iff its information content is about n + H(n) bits, enough to specify both its length and its contents.

For infinite binary sequences (which may be viewed also as real numbers in the unit interval, or as characteristic sequences of sets of natural numbers) randomness can be defined sharply: a sequence X is incompressible, or algorithmically random, if there is an O(1) bound to the compressibility of its initial segments Xn. Intuitively, an infinite sequence is random if it is typical in every way of sequences that might be produced by tossing a fair coin; in other words, if it belongs to no informally definable set of measure zero. Algorithmically random sequences constitute a larger class, including sequences such as Ω which can be specified by ineffective definitions.

The busy beaver function B(n) is the greatest number computable by a self-delimiting program of n bits or fewer. The halting set K is {x : φx(x) converges}. This is the standard representation of the halting problem.

The self-delimiting halting set K0 is the (prefix) set of all self-delimiting programs for the U machine that halt: {p : U(p) converges}.

K and K0 are readily computed from one another (e.g. by regarding the self-delimiting programs as a subset of ordinary programs, the first 2n bits of K0 can be recovered from the first 2n+O(1) bits of K; by encoding each n-bit ordinary program as a self-delimiting program of length n + O(log n), the first 2n bits of K can be recovered from the first 2n+O(log n) bits of K0.)

The halting probability Ω is defined as {2−|p| : U(p) converges}, the probability that the U machine would halt on an infinite input supplied by coin tossing. Ω is thus a real number between 0 and 1.

The first 2n bits of K0 can be computed from the first n bits of Ω, by enumerating halting programs until enough have halted to account for all but 2−n of the total halting probability. The time required for this decoding (between B(n − O(1)) and B(n + H(n) + O(1)) grows faster than any computable function of n. Although K0 is only slowly computable from Ω, the first n bits of Ω can be rapidly computed from the first 2n+H(n)+O(1) bits of K0, by asking about the halting of programs of the form “enumerate halting programs until (if ever) their cumulative weight exceeds q, then halt”, where q is an n-bit rational number…