Malicious Machine Learnings? Privacy Preservation and Computational Correctness Across Parties. Note Quote/Didactics.


Multi-Party Computation deals with the following problem: There are n ≥ 2 parties P1, . . ., Pn where party Pi holds input ti, 1 ≤ i ≤ n, and they wish to compute together a functions = f (t1, . . . , tn) on their inputs. The goal is that each party will learn the output of the function, s, yet with the restriction that Pi will not learn any additional information about the input of the other parties aside from what can be deduced from the pair (ti, s). Clearly it is the secrecy restriction that adds complexity to the problem, as without it each party could announce its input to all other parties, and each party would locally compute the value of the function. Thus, the goal of Multi-Party Computation is to achieve the following two properties at the same time: correctness of the computation and privacy preservation of the inputs.

The following two generalizations are often useful:

(i) Probabilistic functions. Here the value of the function depends on some random string r chosen according to some distribution: s = f (t1, . . . , tn; r). An example of this is the coin-flipping functionality, which takes no inputs, and outputs an unbiased random bit. It is crucial that the value r is not controlled by any of the parties, but is somehow jointly generated during the computation.

(ii) Multioutput functions. It is not mandatory that there be a single output of the function. More generally there could be a unique output for each party, i.e., (s1, . . . , sn) = f(t1,…, tn). In this case, only party Pi learns the output si, and no other party learns any information about the other parties’ input and outputs aside from what can be derived from its own input and output.

One of the most interesting aspects of Multi-Party Computation is to reach the objective of computing the function value, but under the assumption that some of the parties may deviate from the protocol. In cryptography, the parties are usually divided into two types: honest and faulty. An honest party follows the protocol without any deviation. Otherwise, the party is considered to be faulty. The faulty behavior can exemplify itself in a wide range of possibilities. The most benign faulty behavior is where the parties follow the protocol, yet try to learn as much as possible about the inputs of the other parties. These parties are called honest-but-curious (or semihonest). At the other end of the spectrum, the parties may deviate from the prescribed protocol in any way that they desire, with the goal of either influencing the computed output value in some way, or of learning as much as possible about the inputs of the other parties. These parties are called malicious.

We envision an adversary A, who controls all the faulty parties and can coordinate their actions. Thus, in a sense we assume that the faulty parties are working together and can exert the most knowledge and influence over the computation out of this collusion. The adversary can corrupt any number of parties out of the n participating parties. Yet, in order to be able to achieve a solution to the problem, in many cases we would need to limit the number of corrupted parties. This limit is called the threshold k, indicating that the protocol remains secure as long as the number of corrupted parties is at most k.

Assume that there exists a trusted party who privately receives the inputs of all the participating parties, calculates the output value s, and then transmits this value to each one of the parties. This process clearly computes the correct output of f, and also does not enable the participating parties to learn any additional information about the inputs of others. We call this model the ideal model. The security of Multi-Party Computation then states that a protocol is secure if its execution satisfies the following: (1) the honest parties compute the same (correct) outputs as they would in the ideal model; and (2) the protocol does not expose more information than a comparable execution with the trusted party, in the ideal model.

Intuitively, the adversary’s interaction with the parties (on a vector of inputs) in the protocol generates a transcript. This transcript is a random variable that includes the outputs of all the honest parties, which is needed to ensure correctness, and the output of the adversary A. The latter output, without loss of generality, includes all the information that the adversary learned, including its inputs, private state, all the messages sent by the honest parties to A, and, depending on the model, maybe even include more information, such as public messages that the honest parties exchanged. If we show that exactly the same transcript distribution can be generated when interacting with the trusted party in the ideal model, then we are guaranteed that no information is leaked from the computation via the execution of the protocol, as we know that the ideal process does not expose any information about the inputs. More formally,

Let f be a function on n inputs and let π be a protocol that computes the function f. Given an adversary A, which controls some set of parties, we define REALA,π(t) to be the sequence of outputs of honest parties resulting from the execution of π on input vector t under the attack of A, in addition to the output of A. Similarly, given an adversary A′ which controls a set of parties, we define IDEALA′,f(t) to be the sequence of outputs of honest parties computed by the trusted party in the ideal model on input vector t, in addition to the output of A′. We say that π securely computes f if, for every adversary A as above, ∃ an adversary A′, which controls the same parties in the ideal model, such that, on any input vector t, we have that the distribution of REALA,π(t) is “indistinguishable” from the distribution of IDEALA′,f(t).

Intuitively, the task of the ideal adversary A′ is to generate (almost) the same output as A generates in the real execution or the real model. Thus, the attacker A′ is often called the simulator of A. The transcript value generated in the ideal model, IDEALA′,f(t), also includes the outputs of the honest parties (even though we do not give these outputs to A′), which we know were correctly computed by the trusted party. Thus, the real transcript REALA,π(t) should also include correct outputs of the honest parties in the real model.

We assumed that every party Pi has an input ti, which it enters into the computation. However, if Pi is faulty, nothing stops Pi from changing ti into some ti′. Thus, the notion of a “correct” input is defined only for honest parties. However, the “effective” input of a faulty party Pi could be defined as the value ti′ that the simulator A′ gives to the trusted party in the ideal model. Indeed, since the outputs of honest parties look the same in both models, for all effective purposes Pi must have “contributed” the same input ti′ in the real model.

Another possible misbehavior of Pi, even in the ideal model, might be a refusal to give any input at all to the trusted party. This can be handled in a variety of ways, ranging from aborting the entire computation to simply assigning ti some “default value.” For concreteness, we assume that the domain of f includes a special symbol ⊥ indicating this refusal to give the input, so that it is well defined how f should be computed on such missing inputs. What this requires is that in any real protocol we detect when a party does not enter its input and deal with it exactly in the same manner as if the party would input ⊥ in the ideal model.

As regards security, it is implicitly assumed that all honest parties receive the output of the computation. This is achieved by stating that IDEALA′,f(t) includes the outputs of all honest parties. We therefore say that our currency guarantees output delivery. A more relaxed property than output delivery is fairness. If fairness is achieved, then this means that if at least one (even faulty) party learns its outputs, then all (honest) parties eventually do too. A bit more formally, we allow the ideal model adversary A′ to instruct the trusted party not to compute any of the outputs. In this case, in the ideal model either all the parties learn the output, or none do. Since A’s transcript is indistinguishable from A′’s this guarantees that the same fairness guarantee must hold in the real model as well.

A further relaxation of the definition of security is to provide only correctness and privacy. This means that faulty parties can learn their outputs, and prevent the honest parties from learning theirs. Yet, at the same time the protocol will still guarantee that (1) if an honest party receives an output, then this is the correct value, and (2) the privacy of the inputs and outputs of the honest parties is preserved.

The basic security notions are universal and model-independent. However, specific implementations crucially depend on spelling out precisely the model where the computation will be carried out. In particular, the following issues must be specified:

  1. The faulty parties could be honest-but-curious or malicious, and there is usually an upper bound k on the number of parties that the adversary can corrupt.
  2. Distinguishing between the computational setting and the information theoretic setting, in the latter, the adversary is unlimited in its computing powers. Thus, the term “indistinguishable” is formalized by requiring the two transcript distributions to be either identical (so-called perfect security) or, at least, statistically close in their variation distance (so-called statistical security). On the other hand, in the computational, the power of the adversary (as well as that of the honest parties) is restricted. A bit more precisely, Multi-Party Computation problem is parameterized by the security parameter λ, in which case (a) all the computation and communication shall be done in time polynomial in λ; and (b) the misbehavior strategies of the faulty parties are also restricted to be run in time polynomial in λ. Furthermore, the term “indistinguishability” is formalized by computational indistinguishability: two distribution ensembles {Xλ}λ and {Yλ}λ are said to be computationally indistinguishable, if for any polynomial-time distinguisher D, the quantity ε, defined as |Pr[D(Xλ) = 1] − Pr[D(Yλ) = 1]|, is a “negligible” function of λ. This means that for any j > 0 and all sufficiently large λ, ε eventually becomes smaller than λ − j. This modeling helps us to build secure Multi-Party Computational protocols depending on plausible computational assumptions, such as the hardness of factoring large integers.
  3. The two common communication assumptions are the existence of a secure channel and the existence of a broadcast channel. Secure channels assume that every pair of parties Pi and Pj are connected via an authenticated, private channel. A broadcast channel is a channel with the following properties: if a party Pi (honest or faulty) broadcasts a message m, then m is correctly received by all the parties (who are also sure the message came from Pi). In particular, if an honest party receives m, then it knows that every other honest party also received m. A different communication assumption is the existence of envelopes. An envelope guarantees the following properties: a value m can be stored inside the envelope, it will be held without exposure for a given period of time, and then the value m will be revealed without modification. A ballot box is an enhancement of the envelope setting that also provides a random shuffling mechanism of the envelopes. These are, of course, idealized assumptions that allow for a clean description of a protocol, as they separate the communication issues from the computational ones. These idealized assumptions may be realized by a physical mechanisms, but in some settings such mechanisms may not be available. Then it is important to address the question if and under what circumstances we can remove a given communication assumption. For example, we know that the assumption of a secure channel can be substituted with a protocol, but under the introduction of a computational assumption and a public key infrastructure.

Philosophical Identity of Derived Correspondences Between Smooth Varieties.


Let there be a morphism f : X → Y between varieties. Then all the information about f is encoded in the graph Γf ⊂ X × Y of f, which (as a set) is defined as

Γf = {(x, f(x)) : x ∈ X} ⊂ X × Y —– (1)

Now consider the natural projections pX, pY from X × Y to the factors X, Y. Restricted to the subvariety Γf, pX is an isomorphism (since f is a morphism). The fibres of pY restricted to Γf are just the fibres of f; so for example f is proper iff pY | Γf is.

If H(−) is any reasonable covariant homology theory (say singular homology in the complex topology for X, Y compact), then we have a natural push forward map

f : H(X) → H(Y)

This map can be expressed in terms of the graph Γf and the projection maps as

f(α) = pY∗ (pX(α) ∪ [Γf]) —– (2)

where [Γf] ∈ H (X × Y) is the fundamental class of the subvariety [Γf]. Generalizing this construction gives us the notion of a “multi-valued function” or correspondence from X to Y, simply defined to be a general subvariety Γ ⊂ X × Y, replacing the assumption that pX be an isomorphism with some weaker assumption, such as pXf, pY | Γf finite or proper. The right hand side of (2) defines a generalized pushforward map

Γ : H(X) → H(Y)

A subvariety Γ ⊂ X × Y can be represented by its structure sheaf OΓ on X × Y. Associated to the projection maps pX, pY, we also have pullback and pushforward operations on sheaves. The cup product on homology turns out to have an analogue too, namely tensor product. So, appropriately interpreted, (2) makes sense as an operation from the derived category of X to that of Y.

A derived correspondence between a pair of smooth varieties X, Y is an object F ∈ Db(X × Y) with support which is proper over both factors. A derived correspondence defines a functor ΦF by

ΦF : Db(X) → Db(Y)
(−) ↦ RpY∗(LpX(−) ⊗L F)

where (−) could refer to both objects and morphisms in Db(X). F is sometimes called the kernel of the functor ΦF.

The functor ΦF is exact, as it is defined as a composite of exact functors. Since the projection pX is flat, the derived pullback LpX is the same as ordinary pullback pX. Given derived correspondences E ∈ Db(X × Y), F ∈ Db(Y × Z), we obtain functors Φ: Db(X) → Db(Y), Φ: Db(Y) → Db(Z), which can then be composed to get a functor

ΦF ◦ Φ: Db(X) → Db(Z)

which is a two-sided identity with respect to composition of kernels.

Probability Space Intertwines Random Walks – Thought of the Day 144.0


agByQMany deliberations of stochasticity start with “let (Ω, F, P) be a probability space”. One can actually follow such discussions without having the slightest idea what Ω is and who lives inside. So, what is “Ω, F, P” and why do we need it? Indeed, for many users of probability and statistics, a random variable X is synonymous with its probability distribution μX and all computations such as sums, expectations, etc., done on random variables amount to analytical operations such as integrations, Fourier transforms, convolutions, etc., done on their distributions. For defining such operations, you do not need a probability space. Isn’t this all there is to it?

One can in fact compute quite a lot of things without using probability spaces in an essential way. However the notions of probability space and random variable are central in modern probability theory so it is important to understand why and when these concepts are relevant.

From a modelling perspective, the starting point is a set of observations taking values in some set E (think for instance of numerical measurement, E = R) for which we would like to build a stochastic model. We would like to represent such observations x1, . . . , xn as samples drawn from a random variable X defined on some probability space (Ω, F, P). It is important to see that the only natural ingredient here is the set E where the random variables will take their values: the set of events Ω is not given a priori and there are many different ways to construct a probability space (Ω, F, P) for modelling the same set of observations.

Sometimes it is natural to identify Ω with E, i.e., to identify the randomness ω with its observed effect. For example if we consider the outcome of a dice rolling experiment as an integer-valued random variable X, we can define the set of events to be precisely the set of possible outcomes: Ω = {1, 2, 3, 4, 5, 6}. In this case, X(ω) = ω: the outcome of the randomness is identified with the randomness itself. This choice of Ω is called the canonical space for the random variable X. In this case the random variable X is simply the identity map X(ω) = ω and the probability measure P is formally the same as the distribution of X. Note that here X is a one-to-one map: given the outcome of X one knows which scenario has happened so any other random variable Y is completely determined by the observation of X. Therefore using the canonical construction for the random variable X, we cannot define, on the same probability space, another random variable which is independent of X: X will be the sole source of randomness for all other variables in the model. This also show that, although the canonical construction is the simplest way to construct a probability space for representing a given random variable, it forces us to identify this particular random variable with the “source of randomness” in the model. Therefore when we want to deal with models with a sufficiently rich structure, we need to distinguish Ω – the set of scenarios of randomness – from E, the set of values of our random variables.

Let us give an example where it is natural to distinguish the source of randomness from the random variable itself. For instance, if one is modelling the market value of a stock at some date T in the future as a random variable S1, one may consider that the stock value is affected by many factors such as external news, market supply and demand, economic indicators, etc., summed up in some abstract variable ω, which may not even have a numerical representation: it corresponds to a scenario for the future evolution of the market. S1(ω) is then the stock value if the market scenario which occurs is given by ω. If the only interesting quantity in the model is the stock price then one can always label the scenario ω by the value of the stock price S1(ω), which amounts to identifying all scenarios where the stock S1 takes the same value and using the canonical construction. However if one considers a richer model where there are now other stocks S2, S3, . . . involved, it is more natural to distinguish the scenario ω from the random variables S1(ω), S2(ω),… whose values are observed in these scenarios but may not completely pin them down: knowing S1(ω), S2(ω),… one does not necessarily know which scenario has happened. In this way one reserves the possibility of adding more random variables later on without changing the probability space.

These have the following important consequence: the probabilistic description of a random variable X can be reduced to the knowledge of its distribution μX only in the case where the random variable X is the only source of randomness. In this case, a stochastic model can be built using a canonical construction for X. In all other cases – as soon as we are concerned with a second random variable which is not a deterministic function of X – the underlying probability measure P contains more information on X than just its distribution. In particular, it contains all the information about the dependence of the random variable X with respect to all other random variables in the model: specifying P means specifying the joint distributions of all random variables constructed on Ω. For instance, knowing the distributions μX, μY of two variables X, Y does not allow to compute their covariance or joint moments. Only in the case where all random variables involved are mutually independent can one reduce all computations to operations on their distributions. This is the case covered in most introductory texts on probability, which explains why one can go quite far, for example in the study of random walks, without formalizing the notion of probability space.

Conjuncted: Affine Schemes: How Would Functors Carry the Same Information?


If we go to the generality of schemes, the extra structure overshadows the topological points and leaves out crucial details so that we have little information, without the full knowledge of the sheaf. For example the evaluation of odd functions on topological points is always zero. This implies that the structure sheaf of a supermanifold cannot be reconstructed from its underlying topological space.

The functor of points is a categorical device to bring back our attention to the points of a scheme; however the notion of point needs to be suitably generalized to go beyond the points of the topological space underlying the scheme.

Grothendieck’s idea behind the definition of the functor of points associated to a scheme is the following. If X is a scheme, for each commutative ring A, we can define the set of the A-points of X in analogy to the way the classical geometers used to define the rational or integral points on a variety. The crucial difference is that we do not focus on just one commutative ring A, but we consider the A-points for all commutative rings A. In fact, the scheme we start from is completely recaptured only by the collection of the A-points for every commutative ring A, together with the admissible morphisms.

Let (rings) denote the category of commutative rings and (schemes) the category of schemes.

Let (|X|, OX) be a scheme and let T ∈ (schemes). We call the T-points of X, the set of all scheme morphisms {T → X}, that we denote by Hom(T, X). We then define the functor of points hX of the scheme X as the representable functor defined on the objects as

hX: (schemes)op → (sets), haX(A) = Hom(Spec A, X) = A-points of X

Notice that when X is affine, X ≅ Spec O(X) and we have

haX(A) = Hom(Spec A, O(X)) = Hom(O(X), A)

In this case the functor haX is again representable.

Consider the affine schemes X = Spec O(X) and Y = Spec O(Y). There is a one-to-one correspondence between the scheme morphisms X → Y and the ring morphisms O(X) → O(Y). Both hX and haare defined on morphisms in the natural way. If φ: T → S is a morphism and ƒ ∈ Hom(S, X), we define hX(φ)(ƒ) = ƒ ○ φ. Similarly, if ψ: A → Bis a ring morphism and g ∈ Hom(O(X), A), we define haX(ψ)(g) = ψ ○ g. The functors hX and haare for a given scheme X not really different but carry the same information. The functor of points hof a scheme X is completely determined by its restriction to the category of affine schemes, or equivalently by the functor

haX: (rings) → (sets), haX(A) = Hom(Spec A, X)

Let M = (|M|, OM) be a locally ringed space and let (rspaces) denote the category of locally ringed spaces. We define the functor of points of locally ringed spaces M as the representable functor

hM: (rspaces)op → (sets), hM(T) = Hom(T, M)

hM is defined on the manifold as

hM(φ)(g) = g ○ φ

If the locally ringed space M is a differentiable manifold, then

Hom(M, N) ≅ Hom(C(N), C(M))

This takes us to the theory of Yoneda’s Lemma.

Let C be a category, and let X, Y be objects in C and let hX: Cop → (sets) be the representable functors defined on the objects as hX(T) = Hom(T, X), and on the arrows as hX(φ)(ƒ) = ƒ . φ, for φ: T → S, ƒ ∈ Hom(T, X)

If F: Cop → (sets), then we have a one-to-one correspondence between sets:

{hX → F} ⇔ F(X)

The functor

h: C → Fun(Cop, (sets)), X ↦ hX,

is an equivalence of C with a full subcategory of functors. In particular, hX ≅ hY iff X ≅ Y and the natural transformations hX → hY are in one-to-one correspondence with the morphisms X → Y.

Two schemes (manifolds) are isomorphic iff their functors of points are isomorphic.

The advantages of using the functorial language are many. Morphisms of schemes are just maps between the sets of their A-points, respecting functorial properties. This often simplifies matters, allowing allowing for leaving the sheaves machinery in the background. The problem with such an approach, however, is that not all the functors from (schemes) to (sets) are the functors of points of a scheme, i.e., they are representable.

A functor F: (rings) → (sets) is of the form F(A) = Hom(Spec A, X) for a scheme X iff:

F is local or is a sheaf in Zariski Topology. This means that for each ring R and for every collection αi ∈ F(Rƒi), with (ƒi, i ∈ I) = R, so that αi and αj map to the same element in F(Rƒiƒj) ∀ i and j ∃ a unique element α ∈ F(R) mapping to each αi, and

F admits a cover by open affine subfunctors, which means that ∃ a family Ui of subfunctors of F, i.e. Ui(R) ⊂ F(R) ∀ R ∈ (rings), Ui = hSpec Ui, with the property that ∀ natural transformations ƒ: hSpec A  → F, the functors ƒ-1(Ui), defined as ƒ-1(Ui)(R) = ƒ-1(Ui(R)), are all representable, i.e. ƒ-1(Ui) = hVi, and the Vi form an open covering for Spec A.

This states the conditions we expect for F to be the functor of points of a scheme. Namely, locally, F must look like the functor of points of a scheme, moreover F must be a sheaf, i.e. F must have a gluing property that allows us to patch together the open affine cover.

The Third Trichotomy. Thought of the Day 121.0


The decisive logical role is played by continuity in the third trichotomy which is Peirce’s generalization of the old distinction between term, proposition and argument in logic. In him, the technical notions are rhema, dicent and argument, and all of them may be represented by symbols. A crucial step in Peirce’s logic of relations (parallel to Frege) is the extension of the predicate from having only one possible subject in a proposition – to the possibility for a predicate to take potentially infinitely many subjects. Predicates so complicated may be reduced, however, to combination of (at most) three-subject predicates, according to Peirce’s reduction hypothesis. Let us consider the definitions from ‘Syllabus (The Essential Peirce Selected Philosophical Writings, Volume 2)’ in continuation of the earlier trichotomies:

According to the third trichotomy, a Sign may be termed a Rheme, a Dicisign or Dicent Sign (that is, a proposition or quasi-proposition), or an Argument.

A Rheme is a Sign which, for its Interpretant, is a Sign of qualitative possibility, that is, is understood as representing such and such a kind of possible Object. Any Rheme, perhaps, will afford some information; but it is not interpreted as doing so.

A Dicent Sign is a Sign, which, for its Interpretant, is a Sign of actual existence. It cannot, therefore, be an Icon, which affords no ground for an interpretation of it as referring to actual existence. A Dicisign necessarily involves, as a part of it, a Rheme, to describe the fact which it is interpreted as indicating. But this is a peculiar kind of Rheme; and while it is essential to the Dicisign, it by no means constitutes it.

An Argument is a Sign which, for its Interpretant, is a Sign of a law. Or we may say that a Rheme is a sign which is understood to represent its object in its characters merely; that a Dicisign is a sign which is understood to represent its object in respect to actual existence; and that an Argument is a Sign which is understood to represent its Object in its character as Sign. ( ) The proposition need not be asserted or judged. It may be contemplated as a sign capable of being asserted or denied. This sign itself retains its full meaning whether it be actually asserted or not. ( ) The proposition professes to be really affected by the actual existent or real law to which it refers. The argument makes the same pretension, but that is not the principal pretension of the argument. The rheme makes no such pretension.

The interpretant of the Argument represents it as an instance of a general class of Arguments, which class on the whole will always tend to the truth. It is this law, in some shape, which the argument urges; and this ‘urging’ is the mode of representation proper to Arguments.

Predicates being general is of course a standard logical notion; in Peirce’s version this generality is further emphasized by the fact that the simple predicate is seen as relational and containing up to three subject slots to be filled in; each of them may be occupied by a continuum of possible subjects. The predicate itself refers to a possible property, a possible relation between subjects; the empty – or partly satiated – predicate does not in itself constitute any claim that this relation does in fact hold. The information it contains is potential, because no single or general indication has yet been chosen to indicate which subjects among the continuum of possible subjects it refers to. The proposition, on the contrary, the dicisign, is a predicate where some of the empty slots have been filled in with indices (proper names, demonstrative pronomina, deixis, gesture, etc.), and is, in fact, asserted. It thus consists of an indexical part and an iconical part, corresponding to the usual distinction between subject and predicate, with its indexical part connecting it to some level of reference reality. This reality needs not, of course, be actual reality; the subject slots may be filled in with general subjects thus importing pieces of continuity into it – but the reality status of such subjects may vary, so it may equally be filled in with fictitious references of all sorts. Even if the dicisign, the proposition, is not an icon, it contains, via its rhematic core, iconical properties. Elsewhere, Peirce simply defines the dicisign as a sign making explicit its reference. Thus a portrait equipped with a sign indicating the portraitee will be a dicisign, just like a charicature draft with a pointing gesture towards the person it depicts will be a dicisign. Even such dicisigns may be general; the pointing gesture could single out a group or a representative for a whole class of objects. While the dicisign specifies its object, the argument is a sign specifying its interpretant – which is what is normally called the conclusion. The argument thus consists of two dicisigns, a premiss (which may be, in turn, composed of several dicisigns and is traditionally seen as consisting of two dicisigns) and a conclusion – a dicisign represented as ensuing from the premiss due to the power of some law. The argument is thus – just like the other thirdness signs in the trichotomies – in itself general. It is a legisign and a symbol – but adds to them the explicit specification of a general, lawlike interpretant. In the full-blown sign, the argument, the more primitive degenerate sign types are orchestrated together in a threefold generality where no less than three continua are evoked: first, the argument itself is a legisign with a halo of possible instantions of itself as a sign; second, it is a symbol referring to a general object, in turn with a halo of possible instantiations around it; third, the argument implies a general law which is represented by one instantiation (the premiss and the rule of inference) but which has a halo of other, related inferences as possible instantiations. As Peirce says, the argument persuades us that this lawlike connection holds for all other cases being of the same type.

Price-Earnings Ratio. Note Quote.

The price-earnings ratio (P/E) is arguably the most popular price multiple. There are numerous definitions and variations of the price-earnings ratio. In its simplest form, the price-earnings ratio relates current share price to earnings per share.


The forward (or estimated) price-earnings ratio is based on the current stock price and the estimated earnings for future full scal years. Depending on how far out analysts are forecasting annual earnings (typically, for the current year and the next two fiscal years), a company can have multiple forward price-earnings ratios. The forward P/E will change as earnings estimates are revised when new information is released and quarterly earnings become available. Also, forward price-earnings ratios are calculated using estimated earnings based on the current fundamentals. A company’s fundamentals could change drastically over a short period of time and estimates may lag the changes as analysts digest the new facts and revise their outlooks.

The average price-earnings ratio attempts to smooth out the price-earnings ratio by reducing daily variation caused by stock price movements that may be the result of general volatility in the stock market. Different sources may calculate this figure differently. Average P/E is defined as the average of the high and low price-earnings ratios for a given year. The high P/E is calculated by dividing the high stock price for the year by the annual earnings per share fully diluted from continuing operations. The low P/E for the year is calculated using the low stock price for the year.

The relative price-earnings ratio helps to compare a company’s price-earnings ratio to the price-earnings ratio of the overall market, both currently and historically. Relative P/E is calculated by dividing the firm’s price-earnings ratio by the market’s price-earnings ratio.

The price-earnings ratio is used to gauge market expectation of future performance. Even when using historical earnings, the current price of a stock is a compilation of the market’s belief in future prospects. Broadly, a high price-earnings ratio means the market believes that that the company has strong future growth prospects. A low price-earnings ratio generally means the market has low earnings growth expectations for the firm or there is high risk or uncertainty of the firm actually achieving growth. However, looking at a price-earnings ratio alone may not be too illuminating. It will always be more useful to compare the price-earnings ratios of one company to those of other companies in the same industry and to the market in general. Furthermore, tracking a stock’s price-earnings ratio over time is useful in determining how the current valuation compares to historical trends.

Gordon growth model is a variant of the discounted cash flow model, is a method for valuing intrinsic value of a stock or business. Many researches on P/E ratios are based on this constant dividend growth model.

When investors purchase a stock, they expect two kinds of cash flows: dividend during holding shares and expected stock price at the end of shareholding. As the expected share price is decided by future dividend, then we can use the unlimited discount to value the current price of stocks.

A normal model for the intrinsic value of a stock:

V = D1/(1+R)1 + D2/(1+R)2 +…+ Dn/(1+R)n = ∑t=1 Dt/(1+R)t (n→∞) —– (1)

In (1)

V: intrinsic value of the stock;

Dt: dividend for the tth year

R: discount rate, namely required rate of return;

t: the year for dividend payment.

Assume the market is efficient, the share price should be equal to the intrinsic value of the stock, then equation (1) becomes:

P0 = D1/(1+R)1 + D2/(1+R)2 +…+ Dn/(1+R)n = ∑t=1 Dt/(1+R)t (n→∞) —– (2)

where P0: purchase price of the stock;

Dt: dividend for the tth year

R: discount rate, namely required rate of return;

t: the year for dividend payment.

Assume the dividend grows stably at the rate of g, we derive the constant dividend growth model.

That is Gordon constant dividend growth model:

P0 = D1/(1+R)1 + D2/(1+R)2 +…+ Dn/(1+R)n = D0(1+g)/(1+R)1 + D0(1+g)2/(1+R)2 +….+ D0(1+g)n/(1+R)n = ∑t=1 D0(1+g)t/(1+R)t —– (3)

When g is a constant, and R>g at the same time, then equation (3) can be modified as the following:

P0 = D0(1+g)/(R-g) = D1/(R-g) —– (4)

where, P0: purchase price of the stock;

D0: dividend at the purchase time;

D1: dividend for the 1st year;

R: discount rate, namely required rate of return;

g: the growth rate of dividend.

We suppose that the return on dividend b is fixed, then equation (4) divided by E1 is:

P0/E1 = (D1/E1)/(R-g) = b/(R-g) —– (5)

where, P0: purchase price of the stock;

D1: dividend for the 1st year;

E1: earnings per share (EPS) of the 1st year after purchase;

b: return on dividend;

R: discount rate, namely required rate of return;

g: the growth rate of dividend.

Therefrom we derive the P/E ratio theoretical computation model, from which appear factors deciding P/E directly, namely return on dividend, required rate of return and the growth rate of dividend. The P/E ratio is related positively to the return on dividend and required rate of return, and negatively to the growth rate of dividend.

Realistically speaking, most investors relate high P/E ratios to corporations with fast growth of future profits. However, the risk closely linked the speedy growth is also very important. They can counterbalance each other. For instance, when other elements are equal, the higher the risk of a stock, the lower is its P/E ratio, but high growth rate can counterbalance the high risk, thus lead to a high P/E ratio. P/E ratio reflects the rational investors’ expectation on the companies’ growth potential and risk in the future. The growth rate of dividend (g) and required rate of return (R) in the equation also response growth opportunity and risk factors.

Financial indices such as Dividend Payout Ratio, Liability-Assets (L/A) Ratio and indices that reflecting growth and profitability are employed in this paper as direct influence factors that have impact on companies’ P/E ratios.

Derived from (5), the dividend payout ratio has a direct positive effect on P/E ratio. When there is a high dividend payout ratio, the returns and stock value investors expected will also rise, which lead to a high P/E ratio. Conversely, the P/E ratio will be correspondingly lower.

Earnings per share (EPS) is another direct factor, while its impact on P/E ratio is negative. It reflects the relation between capital size and profit level of the company. When the profit level is the same, the larger the capital size, the lower the EPS will be, then the higher the P/E ratio will be. When the liability-assets ratio is high, which represents that the proportion of the equity capital is lower than debt capital, then the EPS will be high and finally the P/E ratio will led to be low. Therefore, the companies’ L/A ratio also negatively correlate to P/E ratio.

Some other financial indices including growth rate of EPS, ROE, growth rate of ROE, growth rate of net assets, growth rate of main business income and growth rate of main business profit should theoretically positively correlate to P/E ratios, because if the companies’ growth and profitability are both great, then investors’ expectation will be high, and then the stock prices and P/E ratios will be correspondingly high. Conversely, they will be low.

In the Gordon growth model, the growth of dividend is calculated based on the return on retained earnings reinvestment, r, therefore:

g = r (1-b) = retention ratio return on retained earnings.

As a result,

P0/E1 = b/(R-g) = b/(R-r(1-b)) —– (6)

Especially, when the expected return on retained earnings equal to the required rate of return (i.e. r = R) or when the retained earnings is zero (i.e. b=1),

There is:

P0/E1 = 1/R —– (7)

Obviously, in (7) the theoretical value of P/E ratio is the reciprocal of the required rate of return. According to the Capital Asset Pricing Model (CAPM), the average yields of the stock market should be equal to risk-free yield plus total risk premium. When there not exists any risk, then the required rate of return will equal to the market interest rate. Thus, the P/E ratio here turns into the reciprocal of the market interest rate.

As an important influence factor, the annual interest rate affect on both market average and companies’ individual P/E ratios. On the side of market average P/E ratio, when interest rate declines, funds will move to security markets, funds supply volume increasing will lead to the rise of share prices, and then rise in P/E ratios. In contrast, when interest rate rises, revulsion of capitals will reflow into banks, funds supply will be critical, share prices decline as well as P/E ratios. On the other side on the companies’ P/E ratio, the raise on interest rate will be albatross of companies, all other conditions remain, earnings will reduce, then equity will lessen, large deviation between operation performance and expected returns appears, can not support a high level of P/E ratio, so stock prices will decline. As a result, both market average and companies’ individual P/E ratios will be influenced by the annual interest rate.

It is also suitable to estimate the market average P/E ratio, and only when all the above assumptions are satisfied, that the practical P/E ratio amount to the theoretical value. However, different from the securities market, the interest rate is relatively rigid, especially to the strict control of interest rate countries; the interest rate adjustment is not so frequent, so that it is not synchronous with macroeconomic fundamentals. Reversely, the stock market reflects the macroeconomic fundamentals; high expectation of investors can raise up the stock prices, sequent the growth of the aggregate value of the whole market. Other market behaviors can also lead to changes of average P/E ratios. Therefore, it is impossible that the average P/E ratio is identical with the theoretical one. Variance exits inevitably, the key is to measure a rational range for this variance.

For the market average P/E ratio, P should be the aggregate value of listed stocks, and E is the total level of capital gains. To the maturity market, the reasonable average P/E ratio should be the reciprocal of the average yields of the market; usually the bank annual interest is used to represent the average yields of the market.

The return on retained earnings is an expected value in theory, while it is always hard to forecast, so the return on equity (ROE) is used to estimate the value.

(6) can then evolve as,

P0/E1 = b/(R-g) = b/(R-r(1-b)) = b/(R-ROE(1-b)) —– (8)

From (8) we can know, ROE is one of the influence factors to P/E ratio, which measures the value companies created for shareholders. It is positively correlated to the P/E ratio. The usefulness of any price-earnings ratio is limited to firms that have positive actual and expected earnings. Depending on the data source you use, companies with negative earnings will have a “null” value for a P/E while other sources will report a P/E of zero. In addition, earnings are subject to management assumptions and manipulation more than other income statement items such as sales, making it hard to get a true sense of value.

Consequentialism -X- (Pareto Efficiency) -X- Deontology


Let us check the Polity to begin with:

1. N is the set of all individuals in society.

And that which their politics concerns – the state of society.

2. S is the set of all possible information contained within society, so that a set s ∈ 2S (2S being the set of all possible subsets of S) contains all extant information about a particular iteration of society and will be called the state of society. S is an arbitrary topological space.

And the means by which individuals make judgements about that which their politics concerns. Their preferences over the information contained within the state of society.

3. Each individual i ∈ N has a complete and transitive preference relation ≽i defined over a set of preference-information Si ⊂ S such that si ≽ s′i can be read “individual i prefers preference information si at least as much as preference-information s′i”.

Any particular set of preference-information si ⊂ Si can be thought of as the state of society as viewed by individual i. The set of preference-information for individual i is a subset of the information contained within a particular iteration of society, so si ⊂ s ⊂ S.

A particular state of society s is a Pareto efficient if there is no other state of society s′ for which one individual strictly prefers their preference-information s′i ⊂ s′ to that particular state si ⊂ s, and the preference-information s′j ⊂ s′ in the other state s′ is at least as preferred by every other individual j ≠ i.

4. A state s ∈ S is said to be Pareto efficient iff ∄ s′ ∈ 2S & i ∈ N : s′i ≻ si & s′j ≽ sj ∀ j ≠ i ∈ N.

To put it crudely, a particular state of society is Pareto efficient if no individual can be made “better off” without making another individual “worse off”. A dynamic concept which mirrors this is the concept of a Pareto improvement – whereby a change in the state of society leaves everyone at least indifferent, and at least one individual in a preferable situation.

5. A movement between two states of society, s → s′ is called a Pareto improvement iff ∃ i ∈ N : s′i ≻ si & s′j ≽ sj ∀ j ≠ i ∈ N .

Note that this does not imply that s′ is a Pareto efficient state, because the same could potentially be said of a movement s′ → s′′. The state s′ is only a Pareto efficient state if we cannot find yet another state for which the movement to that state is a Pareto improvement. The following Theorem, demonstrates this distinction and gives an alternative definition of Pareto efficiency.

Theorem: A state s ∈ 2S is Pareto efficient iff there is no other state s′ for which the movement s → s′ is a Pareto improvement.

If one adheres to a consequentialist political doctrine (such as classical utilitarianism) rather than a deontological doctrine (such as liberalism) in which action is guided by some categorical imperative other than consequentialism, the guide offered by Pareto improvement is the least controversial, and least politically committal criterion to decision-making one can find. Indeed if we restrict political statements to those which concern the assignation of losses, it is a-political. It makes a value judgement only about who ought gain (whosoever stands to).

Unless one holds a strict deontological doctrine in the style, say, of Robert Nozick’s Anarchy state and Utopia (in which the maintenance of individual freedom is the categorical imperative), or John Rawls’ A Theory of Justice (in which again individual freedom is the primary categorical imperative and the betterment of the “poorest” the second categorical imperative), it is more difficult to argue against implementing some decision which will cause a change of society which all individuals in society will be at worst indifferent to. Than arguing for some decision rule which will induce a change of society which some individual will find less preferable. To the rationalisitic economist it seems almost petty, certainly irrational to argue against this criterion, like those individuals who demand “fairness” in the famous “dictator” experiment rather than accept someone else becoming “better off”, and themselves no “worse off”.

Infinite Sequences and Halting Problem. Thought of the Day 76.0


In attempting to extend the notion of depth from finite strings to infinite sequences, one encounters a familiar phenomenon: the definitions become sharper (e.g. recursively invariant), but their intuitive meaning is less clear, because of distinctions (e.g. between infintely-often and almost-everywhere properties) that do not exist in the finite case.

An infinite sequence X is called strongly deep if at every significance level s, and for every recursive function f, all but finitely many initial segments Xn have depth exceeding f(n).

It is necessary to require the initial segments to be deep almost everywhere rather than infinitely often, because even the most trivial sequence has infinitely many deep initial segments Xn (viz. the segments whose lengths n are deep numbers).

It is not difficult to show that the property of strong depth is invariant under truth-table equivalence (this is the same as Turing equivalence in recursively bounded time, or via a total recursive operator), and that the same notion would result if the initial segments were required to be deep in the sense of receiving less than 2−s of their algorithmic probability from f(n)-fast programs. The characteristic sequence of the halting set K is an example of a strongly deep sequence.

A weaker definition of depth, also invariant under truth-table equivalence, is perhaps more analogous to that adopted for finite strings:

An infinite sequence X is weakly deep if it is not computable in recursively bounded time from any algorithmically random infinite sequence.

Computability in recursively bounded time is equivalent to two other properties, viz. truth-table reducibility and reducibility via a total recursive operator.

By contrast to the situation with truth-table reducibility, Péter Gacs has shown that every sequence is computable from (i.e. Turing reducible to) an algorithmically random sequence if no bound is imposed on the time. This is the infinite analog of far more obvious fact that every finite string is computable from an algorithmically random string (e.g. its minimal program).

Every strongly deep sequence is weakly deep, but by intermittently padding K with large blocks of zeros, one can construct a weakly deep sequence with infinitely many shallow initial segments.

Truth table reducibility to an algorithmically random sequence is equivalent to the property studied by Levin et. al. of being random with respect to some recursive measure. Levin calls sequences with this property “proper” or “complete” sequences, and views them as more realistic and interesting than other sequences because they are the typical outcomes of probabilistic or deterministic effective processes operating in recursively bounded time.

Weakly deep sequences arise with finite probability when a universal Turing machine (with one-way input and output tapes, so that it can act as a transducer of infinite sequences) is given an infinite coin toss sequence for input. These sequences are necessarily produced very slowly: the time to output the n’th digit being bounded by no recursive function, and the output sequence contains evidence of this slowness. Because they are produced with finite probability, such sequences can contain only finite information about the halting problem.


In greedy exchange, when two individuals meet, the richer person takes one unit of capital from the poorer person, as represented by the reaction scheme (j, k) → (j + 1, k − 1) for j ≥ k. In the rate equation approximation, the densities ck(t) now evolve according to

dck/dt = ck-1j=1k-1cj + ck+1j=k+1cj – ckN – c2k —– (1)

The first two terms account for the gain in ck(t) due to the interaction between pairs of individuals of capitals (j, k−1), with j k, respectively. The last two terms correspondingly account for the loss of ck(t). One can check that the wealth density M1 ≡ ∑k=1 k ck(t) is conserved, and that the population density obeys

dN/dt = -c1N —– (2)

Equation (1) are conceptually similar to the Smoluchowski equations for aggregation with a constant reaction rate. Mathematically, however, they appear to be more complex and we have been unable to solve them analytically. Fortunately, equation (1) is amenable to a scaling solution. For this purpose, we first re-write equation (1) as

dck/dt = -ck(ck + ck+1) + N(ck-1 – ck) + (ck+1 – ck-1)∑j=kcj —– (3)

Taking the continuum limit and substituting the scaling ansatz,

ck(t) ≅ N2C(x), with x = kN —– (4)

transforms equations (2) and (3) to

dN/dt = -C(0)N3 —– (5)


C(0)[2C + xC’] = 2C2 + C'[1 – 2∫xdyC(y)] —– (6)

where C ′ = dC/dx. Note also that the scaling function must obey the integral relations

xdxC(x) = 1 and ∫xdxxC(x) = 1 —– (7)

The former follows from the definition of density, N = ∑ck(t) ≅ N∫dx C(x), while the latter follows if we set, without loss of generality, the conserved wealth density equal to unity, ∑kkck(t) = 1.

Introducing B(x) = ∫0x dyC(y) recasts equation (6) into C(0)[2B′ + xB′′] = 2B′2 + B′′[2B − 1]. Integrating twice gives [C(0)x − B][B − 1] = 0, with solution B(x) = C(0)x for x < xf and B(x) = 1 for x ≥ xf, from which we conclude that the scaled wealth distribution C(x) = B′(x) coincides with the zero-temperature Fermi distribution;

C(x) = C(0), for x < xf

= 0, for x ≥ xf —– (8)

Hence the scaled profile has a sharp front at x = xf, with xf = 1/C(0) found by matching the two branches of the solution for B(x). Making use of the second integral relation, equation (7), gives C(0) = 1/2 and thereby closes the solution. Thus, the unscaled wealth distribution ck(t) reads,

ck(t) = 1/(2t), for k < 2√t

= 0, for k ≥ 2√t —– (9)

and the total density is N(t) = t-1/2


Figure: Simulation results for the wealth distribution in greedy additive exchange based on 2500 configurations for 106 traders. Shown are the scaled distributions C(x) versus x = kN for t = 1.5n, with n = 18, 24, 30, and 36; these steepen with increasing time. Each data set has been av- eraged over a range of ≈ 3% of the data points to reduce fluctuations.

These predictions by numerical simulations are shown in the figure. In the simulation, two individuals are randomly chosen to undergo greedy exchange and this process is repeated. When an individual reaches zero capital he is eliminated from the system, and the number of active traders is reduced by one. After each reaction, the time is incremented by the inverse of the number of active traders. While the mean-field predictions are substantially corroborated, the scaled wealth distribution for finite time actually resembles a finite-temperature Fermi distribution. As time increases, the wealth distribution becomes sharper and approaches equation (9). In analogy with the Fermi distribution, the relative width of the front may be viewed as an effective temperature. Thus the wealth distribution is characterized by two scales; one of order √t characterizes the typical wealth of active traders and a second, smaller scale which characterizes the width of the front.

To quantify the spreading of the front, let us include the next corrections in the continuum limit of the rate equations, equation (3). This gives,

∂c/∂t = 2∂/∂k [c∫kdjc(j)] – c∂c/∂k – N∂c/∂k + N/2 ∂2c/∂k2 —– (10)

Here, the second and fourth terms on the RHS denote the second corrections. since, the convective third term determines the location of the front to be at kf = 2√t, it is natural to expect that the diffusive fourth term describes the spreading of the front. the term c∂c/∂k  turns out to be negligible in comparison to the diffusive spreading term and is henceforth neglected. The dominant convective term can be removed by transforming to a frame of reference which moves with the front namely, k → K = k − 2√t. among the remaining terms in the transformed rate equation, the width of the front region W can now be determined by demanding that the diffusion term has the same order of magnitude as the reactive terms, i.e. N ∂2c/∂k∼ c2. This implies W ∼ √(N/c). Combining this with N = t−1/2 and c ∼ t−1 gives W ∼ t1/4, or a relative width w = W/kf ∼ t−1/4. This suggests the appropriate scaling ansatz for the front region is

ck(t) = 1/t X(ξ), ξ = (k – 2√t)/ t1/4 —– (11)

Substituting this ansatz into equation (10) gives a non-linear single variable integro-differential equation for the scaling function X(ξ). Together with the appropriate boundary conditions, this represents, in principle, a more complete solution to the wealth distribution. However, the essential scaling behavior of the finite-time spreading of the front is already described by equation (11), so that solving for X(ξ) itself does not provide additional scaling information. Analysis gives w ∼ t−α with α ≅ 1/5. We attribute this discrepancy to the fact that w is obtained by differentiating C(x), an operation which generally leads to an increase in numerical errors.

Evolutionary Game Theory. Note Quote


In classical evolutionary biology the fitness landscape for possible strategies is considered static. Therefore optimization theory is the usual tool in order to analyze the evolution of strategies that consequently tend to climb the peaks of the static landscape. However in more realistic scenarios the evolution of populations modifies the environment so that the fitness landscape becomes dynamic. In other words, the maxima of the fitness landscape depend on the number of specimens that adopt every strategy (frequency-dependent landscape). In this case, when the evolution depends on agents’ actions, game theory is the adequate mathematical tool to describe the process. But this is precisely the scheme in that the evolving physical laws (i.e. algorithms or strategies) are generated from the agent-agent interactions (bottom-up process) submitted to natural selection.

The concept of evolutionarily stable strategy (ESS) is central to evolutionary game theory. An ESS is defined as that strategy that cannot be displaced by any alternative strategy when being followed by the great majority – almost all of systems in a population. In general,

an ESS is not necessarily optimal; however it might be assumed that in the last stages of evolution — before achieving the quantum equilibrium — the fitness landscape of possible strategies could be considered static or at least slow varying. In this simplified case an ESS would be one with the highest payoff therefore satisfying an optimizing criterion. Different ESSs could exist in other regions of the fitness landscape.

In the information-theoretic Darwinian approach it seems plausible to assume as optimization criterion the optimization of information flows for the system. A set of three regulating principles could be:

Structure: The complexity of the system is optimized (maximized).. The definition that is adopted for complexity is Bennett’s logical depth that for a binary string is the time needed to execute the minimal program that generates such string. There is no a general acceptance of the definition of complexity, neither is there a consensus on the relation between the increase of complexity – for a certain definition – and Darwinian evolution. However, it seems that there is some agreement on the fact that, in the long term, Darwinian evolution should drive to an increase in complexity in the biological realm for an adequate natural definition of this concept. Then the complexity of a system at time in this theory would be the Bennett’s logical depth of the program stored at time in its Turing machine. The increase of complexity is a characteristic of Lamarckian evolution, and it is also admitted that the trend of evolution in the Darwinian theory is in the direction in which complexity grows, although whether this tendency depends on the timescale – or some other factors – is still not very clear.

Dynamics: The information outflow of the system is optimized (minimized). The information is the Fisher information measure for the probability density function of the position of the system. According to S. A. Frank, natural selection acts maximizing the Fisher information within a Darwinian system. As a consequence, assuming that the flow of information between a system and its surroundings can be modeled as a zero-sum game, Darwinian systems would follow dynamics.

Interaction: The interaction between two subsystems optimizes (maximizes) the complexity of the total system. The complexity is again equated to the Bennett’s logical depth. The role of Interaction is central in the generation of composite systems, therefore in the structure for the information processor of composite systems resulting from the logical interconnections among the processors of the constituents. There is an enticing option of defining the complexity of a system in contextual terms as the capacity of a system for anticipating the behavior at t + ∆t of the surrounding systems included in the sphere of radius r centered in the position X(t) occupied by the system. This definition would directly drive to the maximization of the predictive power for the systems that maximized their complexity. However, this magnitude would definitely be very difficult to even estimate, in principle much more than the usual definitions for complexity.

Quantum behavior of microscopic systems should now emerge from the ESS. In other terms, the postulates of quantum mechanics should be deduced from the application of the three regulating principles on our physical systems endowed with an information processor.

Let us apply Structure. It is reasonable to consider that the maximization of the complexity of a system would in turn maximize the predictive power of such system. And this optimal statistical inference capacity would plausibly induce the complex Hilbert space structure for the system’s space of states. Let us now consider Dynamics. This is basically the application of the principle of minimum Fisher information or maximum Cramer-Rao bound on the probability distribution function for the position of the system. The concept of entanglement seems to be determinant to study the generation of composite systems, in particular in this theory through applying Interaction. The theory admits a simple model that characterizes the entanglement between two subsystems as the mutual exchange of randomizers (R1, R2), programs (P1, P2) – with their respective anticipation modules (A1, A2) – and wave functions (Ψ1, Ψ2). In this way, both subsystems can anticipate not only the behavior of their corresponding surrounding systems, but also that of the environment of its partner entangled subsystem. In addition, entanglement can be considered a natural phenomenon in this theory, a consequence of the tendency to increase the complexity, and therefore, in a certain sense, an experimental support to the theory.

In addition, the information-theoretic Darwinian approach is a minimalist realist theory – every system follows a continuous trajectory in time, as in Bohmian mechanics, a local theory in physical space – in this theory apparent nonlocality, as in Bell’s inequality violations, would be an artifact of the anticipation module in the information space, although randomness would necessarily be intrinsic to nature through the random number generator methodologically associated with every fundamental system at t = 0, and as essential ingredient to start and fuel – through variation – Darwinian evolution. As time increases, random events determined by the random number generators would progressively be replaced by causal events determined by the evolving programs that gradually take control of the elementary systems. Randomness would be displaced by causality as physical Darwinian evolution gave rise to the quantum equilibrium regime, but not completely, since randomness would play a crucial role in the optimization of strategies – thus, of information flows – as game theory states.