Probability Space Intertwines Random Walks – Thought of the Day 144.0

Many deliberations of stochasticity start with “let (Ω, F, P) be a probability space”. One can actually follow such discussions without having the slightest idea what Ω is and who lives inside. So, what is “Ω, F, P” and why do we need it? Indeed, for many users of probability and statistics, a random variable X is synonymous with its probability distribution μX and all computations such as sums, expectations, etc., done on random variables amount to analytical operations such as integrations, Fourier transforms, convolutions, etc., done on their distributions. For defining such operations, you do not need a probability space. Isn’t this all there is to it?

One can in fact compute quite a lot of things without using probability spaces in an essential way. However the notions of probability space and random variable are central in modern probability theory so it is important to understand why and when these concepts are relevant.

From a modelling perspective, the starting point is a set of observations taking values in some set E (think for instance of numerical measurement, E = R) for which we would like to build a stochastic model. We would like to represent such observations x1, . . . , xn as samples drawn from a random variable X defined on some probability space (Ω, F, P). It is important to see that the only natural ingredient here is the set E where the random variables will take their values: the set of events Ω is not given a priori and there are many different ways to construct a probability space (Ω, F, P) for modelling the same set of observations.

Sometimes it is natural to identify Ω with E, i.e., to identify the randomness ω with its observed effect. For example if we consider the outcome of a dice rolling experiment as an integer-valued random variable X, we can define the set of events to be precisely the set of possible outcomes: Ω = {1, 2, 3, 4, 5, 6}. In this case, X(ω) = ω: the outcome of the randomness is identified with the randomness itself. This choice of Ω is called the canonical space for the random variable X. In this case the random variable X is simply the identity map X(ω) = ω and the probability measure P is formally the same as the distribution of X. Note that here X is a one-to-one map: given the outcome of X one knows which scenario has happened so any other random variable Y is completely determined by the observation of X. Therefore using the canonical construction for the random variable X, we cannot define, on the same probability space, another random variable which is independent of X: X will be the sole source of randomness for all other variables in the model. This also show that, although the canonical construction is the simplest way to construct a probability space for representing a given random variable, it forces us to identify this particular random variable with the “source of randomness” in the model. Therefore when we want to deal with models with a sufficiently rich structure, we need to distinguish Ω – the set of scenarios of randomness – from E, the set of values of our random variables.

Let us give an example where it is natural to distinguish the source of randomness from the random variable itself. For instance, if one is modelling the market value of a stock at some date T in the future as a random variable S1, one may consider that the stock value is affected by many factors such as external news, market supply and demand, economic indicators, etc., summed up in some abstract variable ω, which may not even have a numerical representation: it corresponds to a scenario for the future evolution of the market. S1(ω) is then the stock value if the market scenario which occurs is given by ω. If the only interesting quantity in the model is the stock price then one can always label the scenario ω by the value of the stock price S1(ω), which amounts to identifying all scenarios where the stock S1 takes the same value and using the canonical construction. However if one considers a richer model where there are now other stocks S2, S3, . . . involved, it is more natural to distinguish the scenario ω from the random variables S1(ω), S2(ω),… whose values are observed in these scenarios but may not completely pin them down: knowing S1(ω), S2(ω),… one does not necessarily know which scenario has happened. In this way one reserves the possibility of adding more random variables later on without changing the probability space.

These have the following important consequence: the probabilistic description of a random variable X can be reduced to the knowledge of its distribution μX only in the case where the random variable X is the only source of randomness. In this case, a stochastic model can be built using a canonical construction for X. In all other cases – as soon as we are concerned with a second random variable which is not a deterministic function of X – the underlying probability measure P contains more information on X than just its distribution. In particular, it contains all the information about the dependence of the random variable X with respect to all other random variables in the model: specifying P means specifying the joint distributions of all random variables constructed on Ω. For instance, knowing the distributions μX, μY of two variables X, Y does not allow to compute their covariance or joint moments. Only in the case where all random variables involved are mutually independent can one reduce all computations to operations on their distributions. This is the case covered in most introductory texts on probability, which explains why one can go quite far, for example in the study of random walks, without formalizing the notion of probability space.

Affine Schemes

Let us associate to any commutative ring A its spectrum, that is the topological space Spec A. As a set, Spec A consists of all the prime ideals in A. For each subset S A we define as closed sets in Spec A:

V(S) := {p ∈ Spec A | S ⊂ p} ⊂ Spec A

If X is an affine variety, defined over an algebraically closed field, and O(X) is its coordinate ring, we have that the points of the topological space underlying X are in one-to-one correspondence with the maximal ideals in O(X).

We also define the basic open sets in Spec A as

Uƒ := Spec A \ V(ƒ) = Spec Aƒ with ƒ ∈ A,

where Aƒ = A[ƒ-1] is the localization of A obtained by inverting the element ƒ. The collection of the basic open sets Uƒ, ∀ ƒ ∈ A forms a base for Zariski topology. Next, we define the structure sheaf OA on the topological space Spec A. In order to do this, it is enough to give an assignment

U ↦ OA(U) for each basic open set U = Uƒ in Spec A.

The assignment

Uƒ ↦ Aƒ

defines a B-sheaf on the topological space Spec A and it extends uniquely to a sheaf of commutative rings on Spec A, called the structure sheaf and denoted by OA. Moreover, the stalk at a point p ∈ Spec A, OA,p is the localization Ap of the ring at the prime p. While the differentiable manifolds are locally modeled, as ringed spaces, by (Rn, CRn), the schemes are geometric objects modeled by the spectrum of commutative rings.

Affine scheme is a locally ringed space isomorphic to Spec A for some commutative ring A. We say that X is a scheme if X = (|X|, OX) is a locally ringed space, which is locally isomorphic to affine schemes. In other words, for each x ∈ |X|, ∃ an open set Ux ⊂ |X| such that (Ux, OX|Ux) is an affine scheme. A morphism of schemes is just a morphism of locally ringed spaces.

There is an equivalence of categories between the category of affine schemes (aschemes) and the category of commutative rings (rings). This equivalence is defined on the objects by

(rings)op → (aschemes), A Spec A

In particular a morphism of commutative rings A → B contravariantly to a morphism Spec B → Spec A of the corresponding affine superschemes.

Since any affine variety X is completely described by the knowledge of its coordinate ring O(X), we can associate uniquely to an affine variety X, the affine scheme Spec O(X). A morphism between algebraic varieties determines uniquely a morphism between the corresponding schemes. In the language of categories, we say we have a fully faithful functor from the category of algebraic varieties to the category of schemes.

Universal Turing Machine: Algorithmic Halting

A natural number x will be identified with the x’th binary string in lexicographic order (Λ,0,1,00,01,10,11,000…), and a set X of natural numbers will be identified with its characteristic sequence, and with the real number between 0 and 1 having that sequence as its dyadic expansion. The length of a string x will be denoted |x|, the n’th bit of an infinite sequence X will be noted X(n), and the initial n bits of X will be denoted Xn. Concatenation of strings p and q will be denoted pq.

We now define the information content (and later the depth) of finite strings using a universal Turing machine U. A universal Turing machine may be viewed as a partial recursive function of two arguments. It is universal in the sense that by varying one argument (“program”) any partial recursive function of the other argument (“data”) can be obtained. In the usual machine formats, program, data and output are all finite strings, or, equivalently, natural numbers. However, it is not possible to take a uniformly weighted average over a countably infinite set. Chaitin’s universal machine has two tapes: a read-only one-way tape containing the infinite program; and an ordinary two-way read/write tape, which is used for data input, intermediate work, and output, all of which are finite strings. Our machine differs from Chaitin’s in having some additional auxiliary storage (e.g. another read/write tape) which is needed only to improve the time efficiency of simulations.

We consider only terminating computations, during which, of course, only a finite portion of the program tape can be read. Therefore, the machine’s behavior can still be described by a partial recursive function of two string arguments U(p, w), if we use the first argument to represent that portion of the program that is actually read in the course of a particular computation. The expression U (p, w) = x will be used to indicate that the U machine, started with any infinite sequence beginning with p on its program tape and the finite string w on its data tape, performs a halting computation which reads exactly the initial portion p of the program, and leaves output data x on the data tape at the end of the computation. In all other cases (reading less than p, more than p, or failing to halt), the function U(p, w) is undefined. Wherever U(p, w) is defined, we say that p is a self-delimiting program to compute x from w, and we use T(p, w) to represent the time (machine cycles) of the computation. Often we will consider computations without input data; in that case we abbreviate U(p, Λ) and T(p, Λ) as U(p) and T(p) respectively.

The self-delimiting convention for the program tape forces the domain of U and T, for each data input w, to be a prefix set, that is, a set of strings no member of which is the extension of any other member. Any prefix set S obeys the Kraft inequality

p∈S 2−|p| ≤ 1

Besides being self-delimiting with regard to its program tape, the U machine must be efficiently universal in the sense of being able to simulate any other machine of its kind (Turing machines with self-delimiting program tape) with at most an additive constant constant increase in program size and a linear increase in execution time.

Without loss of generality we assume that there exists for the U machine a constant prefix r which has the effect of stacking an instruction to restart the computation when it would otherwise end. This gives the machine the ability to concatenate programs to run consecutively: if U(p, w) = x and U(q, x) = y, then U(rpq, w) = y. Moreover, this concatenation should be efficient in the sense that T (rpq, w) should exceed T (p, w) + T (q, x) by at most O(1). This efficiency of running concatenated programs can be realized with the help of the auxiliary storage to stack the restart instructions.

Sometimes we will generalize U to have access to an “oracle” A, i.e. an infinite look-up table which the machine can consult in the course of its computation. The oracle may be thought of as an arbitrary 0/1-valued function A(x) which the machine can cause to be evaluated by writing the argument x on a special tape and entering a special state of the finite control unit. In the next machine cycle the oracle responds by sending back the value A(x). The time required to evaluate the function is thus linear in the length of its argument. In particular we consider the case in which the information in the oracle is random, each location of the look-up table having been filled by an independent coin toss. Such a random oracle is a function whose values are reproducible, but otherwise unpredictable and uncorrelated.

Let {φAi (p, w): i = 0,1,2…} be an acceptable Gödel numbering of A-partial recursive functions of two arguments and {φAi (p, w)} an associated Blum complexity measure, henceforth referred to as time. An index j is called self-delimiting iff, for all oracles A and all values w of the second argument, the set { x : φAj (x, w) is defined} is a prefix set. A self-delimiting index has efficient concatenation if there exists a string r such that for all oracles A and all strings w, x, y, p, and q,if φAj (p, w) = x and φAj (q, x) = y, then φAj(rpq, w) = y and φAj (rpq, w) = φAj (p, w) + φAj (q, x) + O(1). A self-delimiting index u with efficient concatenation is called efficiently universal iff, for every self-delimiting index j with efficient concatenation, there exists a simulation program s and a linear polynomial L such that for all oracles A and all strings p and w, and

φAu(sp, w) = φAj (p, w)

and

ΦAu(sp, w) ≤ L(ΦAj (p, w))

The functions UA(p,w) and TA(p,w) are defined respectively as φAu(p, w) and ΦAu(p, w), where u is an efficiently universal index.

For any string x, the minimal program, denoted x∗, is min{p : U(p) = x}, the least self-delimiting program to compute x. For any two strings x and w, the minimal program of x relative to w, denoted (x/w)∗, is defined similarly as min{p : U(p,w) = x}.

By contrast to its minimal program, any string x also has a print program, of length |x| + O(log|x|), which simply transcribes the string x from a verbatim description of x contained within the program. The print program is logarithmically longer than x because, being self-delimiting, it must indicate the length as well as the contents of x. Because it makes no effort to exploit redundancies to achieve efficient coding, the print program can be made to run quickly (e.g. linear time in |x|, in the present formalism). Extra information w may help, but cannot significantly hinder, the computation of x, since a finite subprogram would suffice to tell U to simply erase w before proceeding. Therefore, a relative minimal program (x/w)∗ may be much shorter than the corresponding absolute minimal program x∗, but can only be longer by O(1), independent of x and w.

A string is compressible by s bits if its minimal program is shorter by at least s bits than the string itself, i.e. if |x∗| ≤ |x| − s. Similarly, a string x is said to be compressible by s bits relative to a string w if |(x/w)∗| ≤ |x| − s. Regardless of how compressible a string x may be, its minimal program x∗ is compressible by at most an additive constant depending on the universal computer but independent of x. [If (x∗)∗ were much smaller than x∗, then the role of x∗ as minimal program for x would be undercut by a program of the form “execute the result of executing (x∗)∗.”] Similarly, a relative minimal program (x/w)∗ is compressible relative to w by at most a constant number of bits independent of x or w.

The algorithmic probability of a string x, denoted P(x), is defined as {∑2−|p| : U(p) = x}. This is the probability that the U machine, with a random program chosen by coin tossing and an initially blank data tape, will halt with output x. The time-bounded algorithmic probability, Pt(x), is defined similarly, except that the sum is taken only over programs which halt within time t. We use P(x/w) and Pt(x/w) to denote the analogous algorithmic probabilities of one string x relative to another w, i.e. for computations that begin with w on the data tape and halt with x on the data tape.

The algorithmic entropy H(x) is defined as the least integer greater than −log2P(x), and the conditional entropy H(x/w) is defined similarly as the least integer greater than − log2P(x/w). Among the most important properties of the algorithmic entropy is its equality, to within O(1), with the size of the minimal program:

∃c∀x∀wH(x/w) ≤ |(x/w)∗| ≤ H(x/w) + c

The first part of the relation, viz. that algorithmic entropy should be no greater than minimal program size, is obvious, because of the minimal program’s own contribution to the algorithmic probability. The second half of the relation is less obvious. The approximate equality of algorithmic entropy and minimal program size means that there are few near-minimal programs for any given input/output pair (x/w), and that every string gets an O(1) fraction of its algorithmic probability from its minimal program.

Finite strings, such as minimal programs, which are incompressible or nearly so are called algorithmically random. The definition of randomness for finite strings is necessarily a little vague because of the ±O(1) machine-dependence of H(x) and, in the case of strings other than self-delimiting programs, because of the question of how to count the information encoded in the string’s length, as opposed to its bit sequence. Roughly speaking, an n-bit self-delimiting program is considered random (and therefore not ad-hoc as a hypothesis) iff its information content is about n bits, i.e. iff it is incompressible; while an externally delimited n-bit string is considered random iff its information content is about n + H(n) bits, enough to specify both its length and its contents.

For infinite binary sequences (which may be viewed also as real numbers in the unit interval, or as characteristic sequences of sets of natural numbers) randomness can be defined sharply: a sequence X is incompressible, or algorithmically random, if there is an O(1) bound to the compressibility of its initial segments Xn. Intuitively, an infinite sequence is random if it is typical in every way of sequences that might be produced by tossing a fair coin; in other words, if it belongs to no informally definable set of measure zero. Algorithmically random sequences constitute a larger class, including sequences such as Ω which can be specified by ineffective definitions.

The busy beaver function B(n) is the greatest number computable by a self-delimiting program of n bits or fewer. The halting set K is {x : φx(x) converges}. This is the standard representation of the halting problem.

The self-delimiting halting set K0 is the (prefix) set of all self-delimiting programs for the U machine that halt: {p : U(p) converges}.

K and K0 are readily computed from one another (e.g. by regarding the self-delimiting programs as a subset of ordinary programs, the first 2n bits of K0 can be recovered from the first 2n+O(1) bits of K; by encoding each n-bit ordinary program as a self-delimiting program of length n + O(log n), the first 2n bits of K can be recovered from the first 2n+O(log n) bits of K0.)

The halting probability Ω is defined as {2−|p| : U(p) converges}, the probability that the U machine would halt on an infinite input supplied by coin tossing. Ω is thus a real number between 0 and 1.

The first 2n bits of K0 can be computed from the first n bits of Ω, by enumerating halting programs until enough have halted to account for all but 2−n of the total halting probability. The time required for this decoding (between B(n − O(1)) and B(n + H(n) + O(1)) grows faster than any computable function of n. Although K0 is only slowly computable from Ω, the first n bits of Ω can be rapidly computed from the first 2n+H(n)+O(1) bits of K0, by asking about the halting of programs of the form “enumerate halting programs until (if ever) their cumulative weight exceeds q, then halt”, where q is an n-bit rational number…

Tarski, Wittgenstein and Undecidable Sentences in Affine Relation to Gödel’s. Thought of the Day 65.0

I imagine someone asking my advice; he says: “I have constructed a proposition (I will use ‘P’ to designate it) in Russell’s symbolism, and by means of certain definitions and transformations it can be so interpreted that it says: ‘P is not provable in Russell’s system.’ Must I not say that this proposition on the one hand is true, and on the other hand is unprovable? For suppose it were false; then it is true that it is provable. And that surely cannot be! And if it is proved, then it is proved that it is not provable. Thus it can only be true, but unprovable.” — Wittgenstein

Any language of such a set, say Peano Arithmetic PA (or Russell and Whitehead’s Principia Mathematica, or ZFC), expresses – in a finite, unambiguous, and communicable manner – relations between concepts that are external to the language PA (or to Principia, or to ZFC). Each such language is, thus, essentially two-valued, since a relation either holds or does not hold externally (relative to the language).

Further, a selected, finite, number of primitive formal assertions about a finite set of selected primitive relations of, say, PA are defined as axiomatically PA-provable; all other assertions about relations that can be effectively defined in terms of the primitive relations are termed as PA-provable if, and only if, there is a finite sequence of assertions of PA, each of which is either a primitive assertion, or which can effectively be determined in a finite number of steps as an immediate consequence of any two assertions preceding it in the sequence by a finite set of rules of consequence.

The philosophical dimensions of this emerges if we take M as the standard, arithmetical, interpretation of PA, where:

(a)  the set of non-negative integers is the domain,

(b)  the integer 0 is the interpretation of the symbol “0” of PA,

(c)  the successor operation (addition of 1) is the interpretation of the “ ‘ ” function,

(d)  ordinary addition and multiplication are the interpretations of “+” and “.“,

(e) the interpretation of the predicate letter “=” is the equality relation.

Now, post-Gödel, the standard interpretation of classical theory seems to be that:

(f) PA can, indeed, be interpreted in M;

(g) assertions in M are decidable by Tarski’s definitions of satisfiability and truth;

(h) Tarskian truth and satisfiability are, however, not effectively verifiable in M.

Tarski made clear his indebtedness to Gödel’s methods,

We owe the method used here to Gödel who employed it for other purposes in his recently published work Gödel. This exceedingly important and interesting article is not directly connected with the theme of our work it deals with strictly methodological problems the consistency and completeness of deductive systems, nevertheless we shall be able to use the methods and in part also the results of Gödel’s investigations for our purpose.

On the other hand Tarski strongly emphasized the fact that his results were obtained independently, even though Tarski’s theorem on the undefinability of truth implies the existence of undecidable sentences, and hence Gödel’s first incompleteness theorem. Shifting gears here, how far was the Wittgensteinian quote really close to Gödel’s? However, the question, implicit in Wittgenstein’s argument regarding the possibility of a semantic contradiction in Gödel’s reasoning, then arises: How can we assert that a PA-assertion (whether such an assertion is PA-provable or not) is true under interpretation in M, so long as such truth remains effectively unverifiable in M? Since the issue is not resolved unambiguously by Gödel in his paper (nor, apparently, by subsequent standard interpretations of his formal reasoning and conclusions), Wittgenstein’s quote can be taken to argue that, although we may validly draw various conclusions from Gödel’s formal reasoning and conclusions, the existence of a true or false assertion of M cannot be amongst them.

Algebraic constructs (A,U), such as Vec, Grp, Mon, and Lat, can be fully described by the following data, called the monad associated with (A,U):

1. the functor T : Set → Set, where T = U ◦ F and F : Set → A is the associated free functor,

2. the natural transformation η : idSet → T formed by universal arrows, and

3. the natural transformation μ : T ◦ T → T given by the unique homomorphism μX : T(TX) → TX that extends idTX.

In these cases, there is a canonical concrete isomorphism K between (A,U) and the full concrete subcategory of Alg(T) consisting of those T-algebras TX →x X that satisfy the equations x ◦ ηX = idX and x ◦ Tx = x ◦ μX. The latter subcategory is called the Eilenberg-Moore category of the monad (T, η, μ). The above observation makes it possible, in the following four steps, to express the “degree of algebraic character” of arbitrary concrete categories that have free objects:

Step 1: With every concrete category (A,U) over X that has free objects (or, more generally, with every adjoint functor A →U X) one can associate, in an essentially unique way, an adjoint situation (η, ε) : F -|U : A → X.

Step 2: With every adjoint situation (η, ε) : F -|U : A → X one can associate a monad T = (T, η, μ) on X, where T = U ◦ F : X → X.

Step 3: With every monad T = (T, η, μ) on X one can associate a concrete subcategory of Alg(T) denoted by (XT, UT) and called the category of T-algebras.

Step 4:  With every concrete category (A,U) over X that has free objects one can associate a distinguished concrete functor (A,U) →K (XT , UT) into the associated category of T-algebras called the comparison functor for (A, U).

Concrete categories that are concretely isomorphic to a category of T-algebras for some monad T have a distinct “algebraic flavor”. Such categories (A,U) and their forgetful functors U are called monadic. It turns out that a concrete category (A, U ) is monadic iff it has free objects and its associated comparison functor (A,U) →K (XT , UT) is an isomorphism. Thus, for concrete categories (A,U) that have free objects, the associated comparison functor can be considered as a means of measuring the “algebraic character” of (A,U); and the associated category of T-algebras can be considered to be the “algebraic part” of (A,U). In particular,

(a) every finitary variety is monadic,

(b) the category TopGrp, considered as a concrete category

1. over Top, is monadic,
2. over Set, is not monadic; the associated comparison functor is the forgetful functor TopGrp → Grp, so that the construct Grp may be considered as the “algebraic part” of the construct TopGrp,

(c) the construct Top is not monadic; the associated comparison functor is the forgetful functor Top → Set itself, so that the construct Set may be considered as the “algebraic part” of the construct Top; hence the construct Top may be considered as having a trivial “algebraic part”.

Among constructs, monadicity captures the idea of “algebraicness” rather well. Unfortunately, however, the behavior of monadic categories in general is far from satisfactory. Monadic functors can fail badly to reflect properties of the base category (e.g., the existence of colimits or of suitable factorization structures), and they are not closed under composition.

Ergodic Theory. Thought of the Day 51.0

Classical dynamical systems have a particularly rich set of time symmetries. Let (X, φ) be a dynamical system. A classical dynamical system consists of a set X (the state space) and a function φ from X into itself that determines how the state changes over time (the dynamics). Let T={0,1,2,3,….}. Given any state x in X (the initial conditions), the orbit of x is the history h defined by h(0) = x, h(1) = φ(x), h(2) = φ(φ(x)), and so on. Let Ω be the set of all orbits determined by (X, φ) in this way. Let {Pr’E}E⊆X be any conditional probability structure on X. For any events E and D in Ω, we define PrE(D) = Pr’E’(D’), where E’ is the set of all states x in X whose orbits lie in E, and D’ is the set of all states x in X whose orbits lie in D. Then {PrE}E⊆Ω is a conditional probability structure on Ω. Thus, Ω and {PrE}E⊆Ω together form a temporally evolving system. However, not every temporally evolving system arises in this way. Suppose the function φ (which maps from X into itself) is surjective, i.e., for all x in X, there is some y in X such that φ(y)=x. Then the set Ω of orbits is invariant under all time-shifts. Let {Pr’E}E⊆X be a conditional probability structure on X, and let {PrE}E⊆Ω be the conditional probability structure it induces on Ω. Suppose that {Pr’E}E⊆X is φ-invariant, i.e., for any subsets E and D of X, if E’ = φ–1(E) and D’ = φ–1(D), then Pr’E’(D’) = Pr’E(D). Then every time shift is a temporal symmetry of the resulting temporally evolving system. The study of dynamical systems equipped with invariant probability measures is the purview of ergodic theory.

Frege-Russell and Mathematical Identity

Frege considered it a principal task of his logical reform of arithmetic to provide absolutely determinate identity conditions for the objects of that science, i.e. for numbers. Referring to the contemporary situation in this discipline he writes:

How I propose to improve upon it can be no more than indicated in the present work. With numbers … it is a matter of fixing the sense of an identity.

Frege makes the following critically important assumption : identity is a general logical concept, which is not specific to mathematics. Frege says:

It is not only among numbers that the relationship of identity is found. From which it seems to follow that we ought not to define it specially for the case of numbers. We should expect the concept of identity to have been fixed first, and that then from it together with the concept of number it must be possible to deduce when numbers are identical with one another, without there being need for this purpose of a special definition of numerical identity as well.

In a different place Frege says clearly that this concept of identity is absolutely stable across all possible domains and contexts:

Identity is a relation given to us in such a specific form that it is inconceivable that various forms of it should occur.

Frege’s definition of natural number, as modified in Russell (Bertrand Russell – Principles of Mathematics) later became standard. Intuitively the number 3 is what all collections consisting of three members (trios) share in common. Now instead of looking for a common form, essence or type of trios let us simply consider all such things together. According to Frege and Russell the collection (class, set) of all trios just is the number 3. Similarly for other numbers. Isn’t this construction circular? Frege and Russell provide the following argument which they claim allows us to avoid circularity here: given two different collections we may learn whether or not they have the same number of members without knowing this number and even without the notion of number itself. It is sufficient to find a one-one correspondence between members of two given collections. If there is such a correspondence, the two collections comprise the same number of members, or to avoid any reference to numbers we can say that the two collections are equivalent. This equivalence is Humean. Let us define natural numbers as equivalence classes under this relation. This definition reduces the question of identity of numbers to that of identity of classes. This latter question is settled through the axiomatization of set theory in a logical calculus with identity. Thus Frege’s project is realized: it has been seen how the logical concept of identity applies to numbers. In an axiomatic setting “identities” in Quine’s sense (that is, identity conditions) of mathematical objects are provided by an axiom schema of the form

∀x ∀y (x=y ↔ ___ )

called the Identity Schema (IS). This does not resolve the identity problem though because any given system of axioms, generally speaking, has multiple models. The case of isomorphic models is similar to that of equal numbers or coincident points (naively construed): there are good reasons to think of isomorphic models as one and there is also good reason to think of them as many. So the paradox of mathematical “doubles” reappears. It is a highly non-trivial fact that different models of Peano arithmetic, ZF, and other important axiomatic systems are not necessarily isomorphic. Thus logical analysis à la Frege-Russell certainly clarifies the mathematical concepts involved but it does not settle the identity issue as Frege believed it did. In the recent philosophy of mathematics literature the problem of the identity of mathematical objects is usually considered in the logical setting just mentioned: either as the problem of the non-uniqueness of the models of a given axiomatic system or as the problem of how to fill in the Identity Schema. At the first glance the Frege-Russell proposal concerning the identity issue in mathematics seems judicious and innocent (and it certainly does not depend upon the rest of their logicist project): to stick to a certain logical discipline in speaking about identity (everywhere and in particular in mathematics).

Symmetry: Mirror of a Manifold is the Opposite of its Fundamental Poincaré ∞-groupoid

Given a set X = {a,b,c,..} such as the natural numbers N = {0,1,…,p,…}, there is a standard procedure that amounts to regard X as a category with only identity morphisms. This is the discrete functor that takes X to the category denoted by Disc(X) where the hom-sets are given by Hom(a,b) = ∅ if a ≠ b, and Hom(a,b) = {Ida} = 1 if a = b. Disc(X) is in fact a groupoid.

But in category theory, there is also a procedure called opposite or dual, that takes a general category C to its opposite Cop. Let us call Cop the reflection of C by the mirror functor (−)op.

Now the problem is that if we restrict this procedure to categories such as Disc(X), there is no way to distinguish Disc(X) from Disc(X)op. And this is what we mean by sets don’t show symmetries. In the program of Voevodsky, we can interpret this by saying that:

The identity type is not good for sets, instead we should use the Equivalence type. But to get this, we need to move to from sets to Kan complexes i.e., ∞-groupoids.

The notion of a Kan complex is an abstraction of the combinatorial structure found in the singular simplicial complex of a topological space. There the existence of retractions of any geometric simplex to any of its horns – simplices missing one face and their interior – means that all horns in the singular complex can be filled with genuine simplices, the Kan filler condition.

At the same time, the notion of a Kan complex is an abstraction of the structure found in the nerve of a groupoid, the Duskin nerve of a 2-groupoid and generally the nerves of n-groupoids ∀ n ≤ ∞ n. In other words, Kan complexes constitute a geometric model for ∞-groupoids/homotopy types which is based on the shape given by the simplex category. Thus Kan complexes serve to support homotopy theory.

So far we’ve used set theory with this lack of symmetries, as foundations for mathematics. Grothendieck has seen this when he moved from sheaves of sets, to sheaves of groupoid (stacks), because he wanted to allow objects to have symmetries (automorphisms). If we look at the Giraud-Grothendieck picture on nonabelian cohomology, then what happens is an extension of coefficients U : Set ֒→ Cat. We should consider first the comma category Cat ↓ U, whose objects are functors C → Disc(X). And then we should consider the full subcategory consisting of functors C → Disc(X) that are equivalences of categories. This will force C to be a groupoid, that looks like a set. And we call such C → Disc(X) a Quillen-Segal U-object.

This category of Quillen-Segal objects should be called the category of sets with symmetries. Following Grothendieck’s point of view, we’ve denoted by CatU[Set] the comma category, and think of it as categories with coefficients or coordinates in sets. This terminology is justified by the fact that the functor U : Set ֒→ Cat is a morphism of (higher) topos, that defines a geometric point in Cat. The category of set with symmetries is like the homotopy neighborhood of this point, similar to a one-point going to a disc or any contractible object. The advantage of the Quillen-Segal formalism is the presence of a Quillen model structure on CatU[Set] such that the fibrant objects are Quillen-Segal objects.

In standard terminology this means that if we embed a set X in Cat as Disc(X), and take an ‘projective resolution’ of it, then we get an equivalence of groupoids P → Disc(X), and P has symmetries. Concretely what happens is just a factorization of the identity (type) Id : Disc(X) → Disc(X) as a cofibration followed by a trivial fibration:

Disc(X)  ֒→ P → Disc(X)

This process of embedding Set ֒→ QS{CatU[Set]} is a minimal homotopy enhancement. The idea is that there is no good notion of homotopy (weak equivalence) in Set, but there are at least two notions in Cat: equivalences of categories and the equivalences of classifying spaces. This last class of weak equivalences is what happens with mirror phenomenons. The mirror of a manifold should be the opposite of its fundamental Poincaré ∞-groupoid.

Diffeomorphism Invariance: General Relativity Spacetime Points Cannot Possess Haecceity.

Eliminative or radical ontic structural realism (ROSR) offers a radical cure—appropriate given its name—to what it perceives to be the ailing of traditional, object-based realist interpretations of fundamental theories in physics: rid their ontologies entirely of objects. The world does not, according to this view, consist of fundamental objects, which may or may not be individuals with a well-defined intrinsic identity, but instead of physical structures that are purely relational in the sense of networks of ‘free-standing’ physical relations without relata.

Advocates of ROSR have taken at least three distinct issues in fundamental physics to support their case. The quantum statistical features of an ensemble of elementary quantum particles of the same kind as well as the features of entangled elementary quantum (field) systems as illustrated in the violation of Bell-type inequalities challenge the standard understanding of the identity and individuality of fundamental physical objects: considered on their own, an elementary quantum particle part of the above mentioned ensemble or an entangled elementary quantum system (that is, an elementary quantum system standing in a quantum entanglement relation) cannot be said to satisfy genuine and empirically meaningful identity conditions. Thirdly, it has been argued that one of the consequences of the diffeomorphism invariance and background independence found in general relativity (GTR) is that spacetime points should not be considered as traditional objects possessing some haecceity, i.e. some identity on their own.

The trouble with ROSR is that its main assertion appears squarely incoherent: insofar as relations can be exemplified, they can only be exemplified by some relata. Given this conceptual dependence of relations upon relata, any contention that relations can exist floating freely from some objects that stand in those relations seems incoherent. If we accept an ontological commitment e.g. to universals, we may well be able to affirm that relations exist independently of relata – as abstracta in a Platonic heaven. The trouble is that ROSR is supposed to be a form of scientific realism, and as such committed to asserting that at least certain elements of the relevant theories of fundamental physics faithfully capture elements of physical reality. Thus, a defender of ROSR must claim that, fundamentally, relations-sans-relata are exemplified in the physical world, and that contravenes both the intuitive and the usual technical conceptualization of relations.

The usual extensional understanding of n-ary relations just equates them with subsets of the n-fold Cartesian product of the set of elementary objects assumed to figure in the relevant ontology over which the relation is defined. This extensional, ultimately set-theoretic, conceptualization of relations pervades philosophy and operates in the background of fundamental physical theories as they are usually formulated, as well as their philosophical appraisal in the structuralist literature. The charge then is that the fundamental physical structures that are represented in the fundamental physical theories are just not of the ‘object-free’ type suggested by ROSR.

While ROSR should not be held to the conceptual standards dictated by the metaphysical prejudices it denies, giving up the set-theoretical framework and the ineliminable reference to objects and relata attending its characterizations of relations and structure requires an alternative conceptualization of these notions so central to the position. This alternative conceptualization remains necessary even in the light of ‘metaphysics first’ complaints, which insist that ROSR’s problem must be confronted, first and foremost, at the metaphysical level, and that the question of how to represent structure in our language and in our theories only arises in the wake of a coherent metaphysical solution. But the radical may do as much metaphysics as she likes, articulate her theory and her realist commitments she must, and in order to do that, a coherent conceptualization of what it is to have free-floating relations exemplified in the physical world is necessary.

ROSR thus confronts a dilemma: either soften to a more moderate structural realist position or else develop the requisite alternative conceptualizations of relations and of structures and apply them to fundamental physical theories. A number of structural realists have grabbed the first leg and proposed less radical and non-eliminative versions of ontic structural realism (OSR). These moderate cousins of ROSR aim to take seriously the difficulties of the traditional metaphysics of objects for understanding fundamental physics while avoiding these major objections against ROSR by keeping some thin notion of object. The picture typically offered is that of a balance between relations and their relata, coupled to an insistence that these relata do not possess their identity intrinsically, but only by virtue of occupying a relational position in a structural complex. Because it strikes this ontological balance, we term this moderate version of OSR ‘balanced ontic structural realism’ (BOSR).

But holding their ground may reward the ROSRer with certain advantages over its moderate competitors. First, were the complete elimination of relata to succeed, then structural realism would not confront any of the known headaches concerning the identity of these objects or, relatedly, the status of the Principle of the Identity of Indiscernibles. To be sure, this embarrassment can arguably be avoided by other moves; but eliminating objects altogether simply obliterates any concerns whether two objects are one and the same. Secondly, and speculatively, alternative formulations of our fundamental physical theories may shed light on a path toward a quantum theory of gravity.

For these presumed advantages to come to bear, however, the possibility of a precise formulation of the notion of ‘free-standing’ (or ‘object-free’) structure, in the sense of a network of relations without relata (without objects) must thus be achieved.  Jonathan Bain has argued that category theory provides the appropriate mathematical framework for ROSR, allowing for an ‘object-free’ notion of relation, and hence of structure. This argument can only succeed, however, if the category-theoretical formulation of (some of the) fundamental physical theories has some physical salience that the set-theoretical formulation lacks, or proves to be preferable qua formulation of a physical theory in some other way.

F. A. Muller has argued that neither set theory nor category theory provide the tools necessary to clarify the “Central Claim” of structural realism that the world, or parts of the world, have or are some structure. The main reason for this arises from the failure of reference in the contexts of both set theory and category theory, at least if some minimal realist constraints are imposed on how reference can function. Consequently, Muller argues that an appropriately realist stucturalist is better served by fixing the concept of structure by axiomatization rather than by (set-theoretical or category-theoretical) definition.