# Incomplete Markets and Calibrations for Coherence with Hedged Portfolios. Thought of the Day 154.0 In complete market models such as the Black-Scholes model, probability does not really matter: the “objective” evolution of the asset is only there to define the set of “impossible” events and serves to specify the class of equivalent measures. Thus, two statistical models P1 ∼ P2 with equivalent measures lead to the same option prices in a complete market setting.

This is not true anymore in incomplete markets: probabilities matter and model specification has to be taken seriously since it will affect hedging decisions. This situation is more realistic but also more challenging and calls for an integrated approach between option pricing methods and statistical modeling. In incomplete markets, not only does probability matter but attitudes to risk also matter: utility based methods explicitly incorporate these into the hedging problem via utility functions. While these methods are focused on hedging with the underlying asset, common practice is to use liquid call/put options to hedge exotic options. In incomplete markets, options are not redundant assets; therefore, if options are available as hedging instruments they can and should be used to improve hedging performance.

While the lack of liquidity in the options market prevents in practice from using dynamic hedges involving options, options are commonly used for static hedging: call options are frequently used for dealing with volatility or convexity exposures and for hedging barrier options.

What are the implications of hedging with options for the choice of a pricing rule? Consider a contingent claim H and assume that we have as hedging instruments a set of benchmark options with prices Ci, i = 1 . . . n and terminal payoffs Hi, i = 1 . . . n. A static hedge of H is a portfolio composed from the options Hi, i = 1 . . . n and the numeraire, in order to match as closely as possible the terminal payoff of H:

H = V0 + ∑i=1n xiHi + ∫0T φdS + ε —– (1)

where ε is an hedging error representing the nonhedgeable risk. Typically Hi are payoffs of call or put options and are not possible to replicate using the underlying so adding them to the hedge portfolio increases the span of hedgeable claims and reduces residual risk.

Consider a pricing rule Q. Assume that EQ[ε] = 0 (otherwise EQ[ε] can be added to V0). Then the claim H is valued under Q as:

e-rTEQ[H] = V0 ∑i=1n xe-rTEQ[Hi] —– (2)

since the stochastic integral term, being a Q-martingale, has zero expectation. On the other hand, the cost of setting up the hedging portfolio is:

V0 + ∑i=1n xCi —– (3)

So the value of the claim given by the pricing rule Q corresponds to the cost of the hedging portfolio if the model prices of the benchmark options Hi correspond to their market prices Ci:

∀i = 1, …, n

e-rTEQ[Hi] = Ci∗ —– (4)

This condition is called calibration, where a pricing rule verifies the calibration of the option prices Ci, i = 1, . . . , n. This condition is necessary to guarantee the coherence between model prices and the cost of hedging with portfolios and if the model is not calibrated then the model price for a claim H may have no relation with the effective cost of hedging it using the available options Hi. If a pricing rule Q is specified in an ad hoc way, the calibration conditions will not be verified, and thus one way to ensure them is to incorporate them as constraints in the choice of the pricing measure Q.

# Network Theoretic of the Fermionic Quantum State – Epistemological Rumination. Thought of the Day 150.0 In quantum physics, fundamental particles are believed to be of two types: fermions or bosons, depending on the value of their spin (an intrinsic ‘angular moment’ of the particle). Fermions have half-integer spin and cannot occupy a quantum state (a configuration with specified microscopic degrees of freedom, or quantum numbers) that is already occupied. In other words, at most one fermion at a time can occupy one quantum state. The resulting probability that a quantum state is occupied is known as the Fermi-Dirac statistics.

Now, if we want to convert this into a model with maximum entropy, where the real movement is defined topologically, then we require a reproduction of heterogeneity that is observed. The starting recourse is network theory with an ensemble of networks where each vertex i has the same degree ki as in the real network. This choice is justified by the fact that, being an entirely local topological property, the degree is expected to be directly affected by some intrinsic (non-topological) property of vertices. The caveat is that the real shouldn’t be compared with the randomized, which could otherwise lead to interpreting the observed as ‘unavoidable’ topological constraints, in the sense that the violation of the observed values would lead to an ‘impossible’, or at least very unrealistic values.

The resulting model is known as the Configuration Model, and is defined as a maximum-entropy ensemble of graphs with given degree sequence. The degree sequence, which is the constraint defining the model, is nothing but the ordered vector k of degrees of all vertices (where the ith component ki is the degree of vertex i). The ordering preserves the ‘identity’ of vertices: in the resulting network ensemble, the expected degree ⟨ki⟩ of each vertex i is the same as the empirical value ki for that vertex. In the Configuration Model, the graph probability is given by

P(A) = ∏i<jqij(aij) =  ∏i<jpijaij (1 – pij)1-aij —– (1)

where qij(a) = pija (1 – pij)1-a is the probability that particular entry of the adjacency matrix A takes the value aij = a, which is a Bernoulli process with different pairs of vertices characterized by different connection probabilities pij. A Bernoulli trial (or Bernoulli process) is the simplest random event, i.e. one characterized by only two possible outcomes. One of the two outcomes is referred to as the ‘success’ and is assigned a probability p. The other outcome is referred to as the ‘failure’, and is assigned the complementary probability 1 − p. These probabilities read

⟨aij⟩ = pij = (xixj)/(1 + xixj) —– (2)

where xi is the Lagrange multiplier obtained by ensuring that the expected degree of the corresponding vertex i equals its observed value: ⟨ki⟩ = ki ∀ i. As always happens in maximum-entropy ensembles, the probabilistic nature of configurations implies that the constraints are valid only on average (the angular brackets indicate an average over the ensemble of realizable networks). Also note that pij is a monotonically increasing function of xi and xj. This implies that ⟨ki⟩ is a monotonically increasing function of xi. An important consequence is that two variables i and j with the same degree ki = kj must have the same value xi = xj. (2) provides an interesting connection with quantum physics, and in particular the statistical mechanics of fermions. The ‘selection rules’ of fermions dictate that only one particle at a time can occupy a single-particle state, exactly as each pair of vertices in binary networks can be either connected or disconnected. In this analogy, every pair i, j of vertices is a ‘quantum state’ identified by the ‘quantum numbers’ i and j. So each link of a binary network is like a fermion that can be in one of the available states, provided that no two objects are in the same state. (2) indicates the expected number of particles/links in the state specified by i and j. With no surprise, it has the same form of the so-called Fermi-Dirac statistics describing the expected number of fermions in a given quantum state. The probabilistic nature of links allows also for the presence of empty states, whose occurrence is now regulated by the probability coefficients (1 − pij). The Configuration Model allows the whole degree sequence of the observed network to be preserved (on average), while randomizing other (unconstrained) network properties. now, when one compares the higher-order (unconstrained) observed topological properties with their expected values calculated over the maximum-entropy ensemble, it should be indicative of the fact that the degree of sequence is informative in explaining the rest of the topology, which is a consequent via probabilities in (2). Colliding these into a scatter plot, the agreement between model and observations can be simply assessed as follows: the less scattered the cloud of points around the identity function, the better the agreement between model and reality. In principle, a broadly scattered cloud around the identity function would indicate the little effectiveness of the chosen constraints in reproducing the unconstrained properties, signaling the presence of genuine higher-order patterns of self-organization, not simply explainable in terms of the degree sequence alone. Thus, the ‘fermionic’ character of the binary model is the mere result of the restriction that no two binary links can be placed between any two vertices, leading to a mathematical result which is formally equivalent to the one of quantum statistics.

# CUSUM Deceleration. Drunken Risibility. CUSUM, or cumulative sum is used for detecting and monitoring change detection. Let us introduce a measurable space (Ω, F), where Ω = R, F = ∪nFn and F= σ{Yi, i ∈ {0, 1, …, n}}. The law of the sequence  Yi, i = 1, …., is defined by the family of probability measures {Pv}, v ∈ N*. In other words, the probability measure Pv for a given v > 0, playing the role of the change point, is the measure generated on Ω by the sequence Yi, i = 1, … , when the distribution of the Yi’s changes at time v. The probability measures P0 and P are the measures generated on Ω by the random variables Yi when they have an identical distribution. In other words, the system defined by the sequence Yi undergoes a “regime change” from the distribution P0 to the distribution P at the change point time v.

The CUSUM (cumulative sum control chart) statistic is defined as the maximum of the log-likelihood ratio of the measure Pv to the measure P on the σ-algebra Fn. That is,

Cn := max0≤v≤n log dPv/dP|Fn —– (1)

is the CUSUM statistic on the σ-algebra Fn. The CUSUM statistic process is then the collection of the CUSUM statistics {Cn} of (1) for n = 1, ….

The CUSUM stopping rule is then

T(h) := inf {n ≥ 0: max0≤v≤n log dPv/dP|Fn ≥ h} —– (2)

for some threshold h > 0. In the CUSUM stopping rule (2), the CUSUM statistic process of (1) is initialized at

C0 = 0 —– (3)

The CUSUM statistic process was first introduced by E. S. Page in the form that it takes when the sequence of random variables Yis independent and Gaussian; that is, Yi ∼ N(μ, 1), i = 1, 2,…, with μ = μ0 for i < 𝜈 and μ = μ1 for i ≥ 𝜈. Since its introduction by Page, the CUSUM statistic process of (1) and its associated CUSUM stopping time of (2) have been used in a plethora of applications where it is of interest to perform detection of abrupt changes in the statistical behavior of observations in real time. Examples of such applications are signal processing, monitoring the outbreak of an epidemic, financial surveillance, and computer vision. The popularity of the CUSUM stopping time (2) is mainly due to its low complexity and optimality properties in both discrete and continuous time models.

Let Yi ∼ N(μ0, σ2) that change to Yi ∼ N(μ1, σ2) at the change point time v. We now proceed to derive the form of the CUSUM statistic process (1) and its associated CUSUM stopping time (2). Let us denote by φ(x) = 1/√2π e-x2/2 the Gaussian kernel. For the sequence of random variables Yi given earlier,

Cn := max0≤v≤n log dPv/dP|Fn

= max0≤v≤n log (∏i=1v-1φ(Yi0)/σ ∏i=vnφ(Yi1)/σ)/∏i=1nφ(Yi0)/σ

= 1/σ2max0≤v≤n 1 – μ0)∑i=vn[Yi – (μ1 + μ0)/2] —– (4)

In view of (3), let us initialize the sequence (4) at Y0 = (μ1 + μ0)/2 and distinguish two cases.

a) μ> μ0: divide out (μ1 – μ0), multiply by the constant σ2 in (4) and use (2) to obtain CUSUM stopping T+:

T+(h+) = inf {n ≥ 0: max0≤v≤n i=vn[Yi – (μ1 + μ0)/2] ≥ h+} —– (5)

for an appropriately scaled threshold h> 0.

b) μ< μ0: divide out (μ1 – μ0), multiply by the constant σ2 in (4) and use (2) to obtain CUSUM stopping T:

T(h) = inf {n ≥ 0: max0≤v≤n i=vn[(μ1 + μ0)/2 – Yi] ≥ h} —– (6)

for an appropriately scaled threshold h > 0.

The sequences form a CUSUM according to the deviation of the monitored sequential observations from the average of their pre- and postchange means. Although the stopping times (5) and (6) can be derived by formal CUSUM regime change considerations, they may also be used as general nonparametric stopping rules directly applied to sequential observations.

# Metaphysical Would-Be(s). Drunken Risibility. If one were to look at Quine’s commitment to similarity, natural kinds, dispositions, causal statements, etc., it is evident, that it takes him close to Peirce’s conception of Thirdness – even if Quine in an utopian vision imagines that all such concepts in a remote future will dissolve and vanish in favor of purely microstructural descriptions.

A crucial difference remains, however, which becomes evident when one looks at Quine’s brief formula for ontological commitment, the famous idea that ‘to be is to be the value of a bound variable’. For even if this motto is stated exactly to avoid commitment to several different types of being, it immediately prompts the question: the equation, in which the variable is presumably bound, which status does it have? Governing the behavior of existing variable values, is that not in some sense being real?

This will be Peirce’s realist idea – that regularities, tendencies, dispositions, patterns, may possess real existence, independent of any observer. In Peirce, this description of Thirdness is concentrated in the expression ‘real possibility’, and even it may sound exceedingly metaphysical at a first glance, it amounts, at a closer look, to regularities charted by science that are not mere shorthands for collections of single events but do possess reality status. In Peirce, the idea of real possibilities thus springs from his philosophy of science – he observes that science, contrary to philosophy, is spontaneously realist, and is right in being so. Real possibilities are thus counterposed to mere subjective possibilities due to lack of knowledge on the part of the subject speaking: the possibility of ‘not known not to be true’.

In a famous piece of self-critique from his late, realist period, Peirce attacks his earlier arguments (from ‘How to Make Our Ideas Clear’, in the late 1890s considered by himself the birth certificate of pragmatism after James’s reference to Peirce as pragmatism’s inventor). Then, he wrote

let us ask what we mean by calling a thing hard. Evidently that it will not be scratched by many other substances. The whole conception of this quality, as of every other, lies in its conceived effects. There is absolutely no difference between a hard thing and a soft thing so long as they are not brought to the test. Suppose, then, that a diamond could be crystallized in the midst of a cushion of soft cotton, and should remain there until it was finally burned up. Would it be false to say that that diamond was soft? […] Reflection will show that the reply is this: there would be no falsity in such modes of speech.

More than twenty-five years later, however, he attacks this argument as bearing witness to the nominalism of his youth. Now instead he supports the

scholastic doctrine of realism. This is usually defined as the opinion that there are real objects that are general, among the number being the modes of determination of existent singulars, if, indeed, these be not the only such objects. But the belief in this can hardly escape being accompanied by the acknowledgment that there are, besides, real vagues, and especially real possibilities. For possibility being the denial of a necessity, which is a kind of generality, is vague like any other contradiction of a general. Indeed, it is the reality of some possibilities that pragmaticism is most concerned to insist upon. The article of January 1878 endeavored to gloze over this point as unsuited to the exoteric public addressed; or perhaps the writer wavered in his own mind. He said that if a diamond were to be formed in a bed of cotton-wool, and were to be consumed there without ever having been pressed upon by any hard edge or point, it would be merely a question of nomenclature whether that diamond should be said to have been hard or not. No doubt this is true, except for the abominable falsehood in the word MERELY, implying that symbols are unreal. Nomenclature involves classification; and classification is true or false, and the generals to which it refers are either reals in the one case, or figments in the other. For if the reader will turn to the original maxim of pragmaticism at the beginning of this article, he will see that the question is, not what did happen, but whether it would have been well to engage in any line of conduct whose successful issue depended upon whether that diamond would resist an attempt to scratch it, or whether all other logical means of determining how it ought to be classed would lead to the conclusion which, to quote the very words of that article, would be ‘the belief which alone could be the result of investigation carried sufficiently far.’ Pragmaticism makes the ultimate intellectual purport of what you please to consist in conceived conditional resolutions, or their substance; and therefore, the conditional propositions, with their hypothetical antecedents, in which such resolutions consist, being of the ultimate nature of meaning, must be capable of being true, that is, of expressing whatever there be which is such as the proposition expresses, independently of being thought to be so in any judgment, or being represented to be so in any other symbol of any man or men. But that amounts to saying that possibility is sometimes of a real kind. (The Essential Peirce Selected Philosophical Writings, Volume 2)

In the same year, he states, in a letter to the Italian pragmatist Signor Calderoni:

I myself went too far in the direction of nominalism when I said that it was a mere question of the convenience of speech whether we say that a diamond is hard when it is not pressed upon, or whether we say that it is soft until it is pressed upon. I now say that experiment will prove that the diamond is hard, as a positive fact. That is, it is a real fact that it would resist pressure, which amounts to extreme scholastic realism. I deny that pragmaticism as originally defined by me made the intellectual purport of symbols to consist in our conduct. On the contrary, I was most careful to say that it consists in our concept of what our conduct would be upon conceivable occasions. For I had long before declared that absolute individuals were entia rationis, and not realities. A concept determinate in all respects is as fictitious as a concept definite in all respects. I do not think we can ever have a logical right to infer, even as probable, the existence of anything entirely contrary in its nature to all that we can experience or imagine.

Here lies the core of Peirce’s metaphysical insistence on the reality of ‘would-be’s. Real possibilities, or would-bes, are vague to the extent that they describe certain tendential, conditional behaviors only, while they do not prescribe any other aspect of the single objects they subsume. They are, furthermore, represented in rationally interrelated clusters of concepts: the fact that the diamond is in fact hard, no matter if it scratches anything or not, lies in the fact that the diamond’s carbon structure displays a certain spatial arrangement – so it is an aspect of the very concept of diamond. And this is why the old pragmatic maxim may not work without real possibilities: it is they that the very maxim rests upon, because it is they that provide us with the ‘conceived consequences’ of accepting a concept. The maxim remains a test to weed out empty concepts with no conceived consequences – that is, empty a priori reasoning and superfluous metaphysical assumptions. But what remains after the maxim has been put to use, is real possibilities. Real possibilities thus connect epistemology, expressed in the pragmatic maxim, to ontology: real possibilities are what science may grasp in conditional hypotheses.

The question is whether Peirce’s revision of his old ‘nominalist’ beliefs form part of a more general development in Peirce from nominalism to realism. The locus classicus of this idea is Max Fisch (Peirce, Semeiotic and Pragmatism) where Fisch outlines a development from an initial nominalism (albeit of a strange kind, refusing, as always in Peirce, the existence of individuals determinate in all respects) via a series of steps towards realism, culminating after the turn of the century. Fisch’s first step is then Peirce’s theory of the real as that which reasoning would finally have as its result; the second step his Berkeley review with its anti-nominalism and the idea that the real is what is unaffected by what we may think of it; the third step is his pragmatist idea that beliefs are conceived habits of action, even if he here clings to the idea that the conditionals in which habits are expressed are material implications only – like the definition of ‘hard’; the fourth step his reading of Abbott’s realist Scientific Theism (which later influenced his conception of scientific universals) and his introduction of the index in his theory of signs; the fifth step his acceptance of the reality of continuity; the sixth the introduction of real possibilities, accompanied by the development of existential graphs, topology and Peirce’s changing view of Hegelianism; the seventh, the identification of pragmatism with realism; the eighth ‘his last stronghold, that of Philonian or material implication’. A further realist development exchanging Peirce’s early frequentist idea of probability for a dispositional theory of probability was, according to Fisch, never finished.

The issue of implication concerns the old discussion quoted by Cicero between the Hellenistic logicians Philo and Diodorus. The former formulated what we know today as material implication, while the latter objected on common-sense ground that material implication does not capture implication in everyday language and thought and another implication type should be sought. As is well known, material implication says that p ⇒ q is equivalent to the claim that either p is false or q is true – so that p ⇒ q is false only when p is true and q is false. The problems arise when p is false, for any false p makes the implication true, and this leads to strange possibilities of true inferences. The two parts of the implication have no connection with each other at all, such as would be the spontaneous idea in everyday thought. It is true that Peirce as a logician generally supports material (‘Philonian’) implication – but it is also true that he does express some second thoughts at around the same time as the afterthoughts on the diamond example.

Peirce is a forerunner of the attempts to construct alternatives such as strict implication, and the reason why is, of course, that real possibilities are not adequately depicted by material implication. Peirce is in need of an implication which may somehow picture the causal dependency of q on p. The basic reason for the mature Peirce’s problems with the representation of real possibilities is not primarily logical, however. It is scientific. Peirce realizes that the scientific charting of anything but singular, actual events necessitates the real existence of tendencies and relations connecting singular events. Now, what kinds are those tendencies and relations? The hard diamond example seems to emphasize causality, but this probably depends on the point of view chosen. The ‘conceived consequences’ of the pragmatic maxim may be causal indeed: if we accept gravity as a real concept, then masses will attract one another – but they may all the same be structural: if we accept horse riders as a real concept, then we should expect horses, persons, the taming of horses, etc. to exist, or they may be teleological. In any case, the interpretation of the pragmatic maxim in terms of real possibilities paves the way for a distinction between empty a priori suppositions and real a priori structures.

# Econophysics: Financial White Noise Switch. Thought of the Day 115.0 What is the cause of large market fluctuation? Some economists blame irrationality behind the fat-tail distribution. Some economists observed that social psychology might create market fad and panic, which can be modeled by collective behavior in statistical mechanics. For example, the bi-modular distribution was discovered from empirical data in option prices. One possible mechanism of polarized behavior is collective action studied in physics and social psychology. Sudden regime switch or phase transition may occur between uni-modular and bi-modular distribution when field parameter changes across some threshold. The Ising model in equilibrium statistical mechanics was borrowed to study social psychology. Its phase transition from uni-modular to bi-modular distribution describes statistical features when a stable society turns into a divided society. The problem of the Ising model is that its key parameter, the social temperature, has no operational definition in social system. A better alternative parameter is the intensity of social interaction in collective action.

A difficult issue in business cycle theory is how to explain the recurrent feature of business cycles that is widely observed from macro and financial indexes. The problem is: business cycles are not strictly periodic and not truly random. Their correlations are not short like random walk and have multiple frequencies that changing over time. Therefore, all kinds of math models are tried in business cycle theory, including deterministic, stochastic, linear and nonlinear models. We outline economic models in terms of their base function, including white noise with short correlations, persistent cycles with long correlations, and color chaos model with erratic amplitude and narrow frequency band like biological clock. The steady state of probability distribution function in the Ising Model of Collective Behavior with h = 0 (without central propaganda field). a. Uni-modular distribution with low social stress (k = 0). Moderate stable behavior with weak interaction and high social temperature. b. Marginal distribution at the phase transition with medium social stress (k = 2). Behavioral phase transition occurs between stable and unstable society induced by collective behavior. c. Bi-modular distribution with high social stress (k = 2.5). The society splits into two opposing groups under low social temperature and strong social interactions in unstable society.

Deterministic models are used by Keynesian economists for endogenous mechanism of business cycles, such as the case of the accelerator-multiplier model. The stochastic models are used by the Frisch model of noise-driven cycles that attributes external shocks as the driving force of business fluctuations. Since 1980s, the discovery of economic chaos and the application of statistical mechanics provide more advanced models for describing business cycles. Graphically, The steady state of probability distribution function in socio-psychological model of collective choice. Here, “a” is the independent parameter; “b” is the interaction parameter. a Centered distribution with b < a (denoted by short dashed curve). It happens when independent decision rooted in individualistic orientation overcomes social pressure through mutual communication. b Horizontal flat distribution with b = a (denoted by long dashed line). Marginal case when individualistic orientation balances the social pressure. c Polarized distribution with b > a (denoted by solid line). It occurs when social pressure through mutual communication is stronger than independent judgment. Numerical 1 autocorrelations from time series generated by random noise and harmonic wave. The solid line is white noise. The broken line is a sine wave with period P = 1.

Linear harmonic cycles with unique frequency are introduced in business cycle theory. The auto-correlations from harmonic cycle and white noise are shown in the above figure. Auto-correlation function from harmonic cycles is a cosine wave. The amplitude of cosine wave is slightly decayed because of limited data points in numerical experiment. Auto-correlations from a random series are an erratic series with rapid decade from one to residual fluctuations in numerical calculation. The auto-regressive (AR) model in discrete time is a combination of white noise term for simulating short-term auto-correlations from empirical data.

The deterministic model of chaos can be classified into white chaos and color chaos. White chaos is generated by nonlinear difference equation in discrete-time, such as one-dimensional logistic map and two-dimensional Henon map. Its autocorrelations and power spectra look like white noise. Its correlation dimension can be less than one. White noise model is simple in mathematical analysis but rarely used in empirical analysis, since it needs intrinsic time unit.

Color chaos is generated by nonlinear differential equations in continuous-time, such as three-dimensional Lorenz model and one-dimensional model with delay-differential model in biology and economics. Its autocorrelations looks like a decayed cosine wave, and its power spectra seem a combination of harmonic cycles and white noise. The correlation dimension is between one and two for 3D differential equations, and varying for delay-differential equation. History shows the remarkable resilience of a market that experienced a series of wars and crises. The related issue is why the economy can recover from severe damage and out of equilibrium? Mathematically speaking, we may exam the regime stability under parameter change. One major weakness of the linear oscillator model is that the regime of periodic cycle is fragile or marginally stable under changing parameter. Only nonlinear oscillator model is capable of generating resilient cycles within a finite area under changing parameters. The typical example of linear models is the Samuelson model of multiplier-accelerator. Linear stochastic models have similar problem like linear deterministic models. For example, the so-called unit root solution occurs only at the borderline of the unit root. If a small parameter change leads to cross the unit circle, the stochastic solution will fall into damped (inside the unit circle) or explosive (outside the unit circle) solution.

# Stock Hedging Loss and Risk A stock is supposed to be bought at time zero with price S0, and to be sold at time T with uncertain price ST. In order to hedge the market risk of the stock, the company decides to choose one of the available put options written on the same stock with maturity at time τ, where τ is prior and close to T, and the n available put options are specified by their strike prices Ki (i = 1,2,··· ,n). As the prices of different put options are also different, the company needs to determine an optimal hedge ratio h (0 ≤ h ≤ 1) with respect to the chosen strike price. The cost of hedging should be less than or equal to the predetermined hedging budget C. In other words, the company needs to determine the optimal strike price and hedging ratio under the constraint of hedging budget. The chosen put option is supposed to finish in-the-money at maturity, and the constraint of hedging expenditure is supposed to be binding.

Suppose the market price of the stock is S0 at time zero, the hedge ratio is h, the price of the put option is P0, and the riskless interest rate is r. At time T, the time value of the hedging portfolio is

S0erT + hP0erT —– (1)

and the market price of the portfolio is

ST + h(K − Sτ)+ er(T − τ) —— (2)

therefore the loss of the portfolio is

L = S0erT + hP0erT − (ST +h(K − Sτ)+ er(T − τ)—– (3)

where x+ = max(x, 0), which is the payoff function of put option at maturity. For a given threshold v, the probability that the amount of loss exceeds v is denoted as

α = Prob{L ≥ v} —– (4)

in other words, v is the Value-at-Risk (VaR) at α percentage level. There are several alternative measures of risk, such as CVaR (Conditional Value-at-Risk), ESF (Expected Shortfall), CTE (Conditional Tail Expectation), and other coherent risk measures.

The mathematical model of stock price is chosen to be a geometric Brownian motion

dSt/St = μdt + σdBt —– (5)

where St is the stock price at time t (0 < t ≤ T), μ and σ are the drift and the volatility of stock price, and Bt is a standard Brownian motion. The solution of the stochastic differential equation is

St = S0 eσBt + (μ − 1/2σ2)t —– (6)

where B0 = 0, and St is lognormally distributed.

For a given threshold of loss v, the probability that the loss exceeds v is

Prob {L ≥ v} = E [I{X≤c1}FY(g(X) − X)] + E [I{X≥c1}FY (c2 − X)] —– (7)

where E[X] is the expectation of random variable X. I{X<c} is the index function of X such that I{X<c} = 1 when {X < c} is true, otherwise I{X<c} = 0. FY(y) is the cumulative distribution function of random variable Y, and

c1 = 1/σ [ln(k/S0) – (μ – 1/2σ2)τ]

g(X) = 1/σ [ln((S0 + hP0)erT − h(K − f(X))er(T − τ) − v)/S0 – (μ – 1/2σ2)T]

f(X) = S0 eσX + (μ−1σ2

c2 = 1/σ [ln((S0 + hP0)erT − v)/S0 – (μ – 1/2σ2)T]

X and Y are both normally distributed, where X ∼ N(0, √τ), Y ∼ N(0, √(T−τ)).

For a specified hedging strategy, Q(v) = Prob {L ≥ v} is a decreasing function of v. The VaR under α level can be obtained from equation

Q(v) = α —– (8)

The expectations can be calculated with Monte Carlo simulation methods, and the optimal hedging strategy which has the smallest VaR can be obtained from (8) by numerical searching methods.

# Infinite Sequences and Halting Problem. Thought of the Day 76.0 In attempting to extend the notion of depth from finite strings to infinite sequences, one encounters a familiar phenomenon: the definitions become sharper (e.g. recursively invariant), but their intuitive meaning is less clear, because of distinctions (e.g. between infintely-often and almost-everywhere properties) that do not exist in the finite case.

An infinite sequence X is called strongly deep if at every significance level s, and for every recursive function f, all but finitely many initial segments Xn have depth exceeding f(n).

It is necessary to require the initial segments to be deep almost everywhere rather than infinitely often, because even the most trivial sequence has infinitely many deep initial segments Xn (viz. the segments whose lengths n are deep numbers).

It is not difficult to show that the property of strong depth is invariant under truth-table equivalence (this is the same as Turing equivalence in recursively bounded time, or via a total recursive operator), and that the same notion would result if the initial segments were required to be deep in the sense of receiving less than 2−s of their algorithmic probability from f(n)-fast programs. The characteristic sequence of the halting set K is an example of a strongly deep sequence.

A weaker definition of depth, also invariant under truth-table equivalence, is perhaps more analogous to that adopted for finite strings:

An infinite sequence X is weakly deep if it is not computable in recursively bounded time from any algorithmically random infinite sequence.

Computability in recursively bounded time is equivalent to two other properties, viz. truth-table reducibility and reducibility via a total recursive operator.

By contrast to the situation with truth-table reducibility, Péter Gacs has shown that every sequence is computable from (i.e. Turing reducible to) an algorithmically random sequence if no bound is imposed on the time. This is the infinite analog of far more obvious fact that every finite string is computable from an algorithmically random string (e.g. its minimal program).

Every strongly deep sequence is weakly deep, but by intermittently padding K with large blocks of zeros, one can construct a weakly deep sequence with infinitely many shallow initial segments.

Truth table reducibility to an algorithmically random sequence is equivalent to the property studied by Levin et. al. of being random with respect to some recursive measure. Levin calls sequences with this property “proper” or “complete” sequences, and views them as more realistic and interesting than other sequences because they are the typical outcomes of probabilistic or deterministic effective processes operating in recursively bounded time.

Weakly deep sequences arise with finite probability when a universal Turing machine (with one-way input and output tapes, so that it can act as a transducer of infinite sequences) is given an infinite coin toss sequence for input. These sequences are necessarily produced very slowly: the time to output the n’th digit being bounded by no recursive function, and the output sequence contains evidence of this slowness. Because they are produced with finite probability, such sequences can contain only finite information about the halting problem.

# String’s Depth of Burial A string’s depth might be defined as the execution time of its minimal program.

The difficulty with this definition arises in cases where the minimal program is only a few bits smaller than some much faster program, such as a print program, to compute the same output x. In this case, slight changes in x may induce arbitrarily large changes in the run time of the minimal program, by changing which of the two competing programs is minimal. Analogous instability manifests itself in translating programs from one universal machine to another. This instability emphasizes the essential role of the quantity of buried redundancy, not as a measure of depth, but as a certifier of depth. In terms of the philosophy-of-science metaphor, an object whose minimal program is only a few bits smaller than its print program is like an observation that points to a nontrivial hypothesis, but with only a low level of statistical confidence.

To adequately characterize a finite string’s depth one must therefore consider the amount of buried redundancy as well as the depth of its burial. A string’s depth at significance level s might thus be defined as that amount of time complexity which is attested by s bits worth of buried redundancy. This characterization of depth may be formalized in several ways.

A string’s depth at significance level s be defined as the time required to compute the string by a program no more than s bits larger than the minimal program.

This definition solves the stability problem, but is unsatisfactory in the way it treats multiple programs of the same length. Intuitively, 2k distinct (n + k)-bit programs that compute same output ought to be accorded the same weight as one n-bit program; but, by the present definition, they would be given no more weight than one (n + k)-bit program.

A string’s depth at signicifcance level s depth might be defined as the time t required for the string’s time-bounded algorithmic probability Pt(x) to rise to within a factor 2−s of its asymptotic time-unbounded value P(x).

This formalizes the notion that for the string to have originated by an effective process of t steps or fewer is less plausible than for the first s tosses of a fair coin all to come up heads.

It is not known whether there exist strings that are deep, in other words, strings having no small fast programs, even though they have enough large fast programs to contribute a significant fraction of their algorithmic probability. Such strings might be called deterministically deep but probabilistically shallow, because their chance of being produced quickly in a probabilistic computation (e.g. one where the input bits of U are supplied by coin tossing) is significant compared to their chance of being produced slowly. The question of whether such strings exist is probably hard to answer because it does not relativize uniformly. Deterministic and probabilistic depths are not very different relative to a random coin-toss oracle A of the equality of random-oracle-relativized deterministic and probabilistic polynomial time complexity classes; but they can be very different relative to an oracle B deliberately designed to hide information from deterministic computations (this parallels Hunt’s proof that deterministic and probabilistic polynomial time are unequal relative to such an oracle).

(Depth of Finite Strings): Let x and w be strings and s a significance parameter. A string’s depth at significance level s, denoted Ds(x), will be defined as min{T(p) : (|p|−|p| < s)∧(U(p) = x)}, the least time required to compute it by a s-incompressible program. At any given significance level, a string will be called t-deep if its depth exceeds t, and t-shallow otherwise.

The difference between this definition and the previous one is rather subtle philosophically and not very great quantitatively. Philosophically, when each individual hypothesis for the rapid origin of x is implausible at the 2−s confidence level, then it requires only that a weighted average of all such hypotheses be implausible.

There exist constants c1 and c2 such that for any string x, if programs running in time ≤ t contribute a fraction between 2−s and 2−s+1 of the string’s total algorithmic probability, then x has depth at most t at significance level s + c1 and depth at least t at significance level s − min{H(s), H(t)} − c2.

Proof : The first part follows easily from the fact that any k-compressible self-delimiting program p is associated with a unique, k − O(1) bits shorter, program of the form “execute the result of executing p∗”. Therefore there exists a constant c1 such that if all t-fast programs for x were s + c1– compressible, the associated shorter programs would contribute more than the total algorithmic probability of x. The second part follows because, roughly, if fast programs contribute only a small fraction of the algorithmic probability of x, then the property of being a fast program for x is so unusual that no program having that property can be random. More precisely, the t-fast programs for x constitute a finite prefix set, a superset S of which can be computed by a program of size H(x) + min{H(t), H(s)} + O(1) bits. (Given x∗ and either t∗ or s∗, begin enumerating all self-delimiting programs that compute x, in order of increasing running time, and quit when either the running time exceeds t or the accumulated measure of programs so far enumerated exceeds 2−(H(x)−s)). Therefore there exists a constant c2 such that, every member of S, and thus every t-fast program for x, is compressible by at least s − min{H(s), H(t)} − O(1) bits.

The ability of universal machines to simulate one another efficiently implies a corresponding degree of machine-independence for depth: for any two efficiently universal machines of the sort considered here, there exists a constant c and a linear polynomial L such that for any t, strings whose (s+c)-significant depth is at least L(t) on one machine will have s-significant depth at least t on the other.

Depth of one string relative to another may be defined analogously, and represents the plausible time required to produce one string, x, from another, w.

(Relative Depth of Finite Strings): For any two strings w and x, the depth of x relative to w at significance level s, denoted Ds(x/w), will be defined as min{T(p, w) : (|p|−|(p/w)∗| < s)∧(U(p, w) = x)}, the least time required to compute x from w by a program that is s-incompressible relative to w.

Depth of a string relative to its length is a particularly useful notion, allowing us, as it were, to consider the triviality or nontriviality of the “content” of a string (i.e. its bit sequence), independent of its “form” (length). For example, although the infinite sequence 000… is intuitively trivial, its initial segment 0n is deep whenever n is deep. However, 0n is always shallow relative to n, as is, with high probability, a random string of length n.

In order to adequately represent the intuitive notion of stored mathematical work, it is necessary that depth obey a “slow growth” law, i.e. that fast deterministic processes be unable to transform a shallow object into a deep one, and that fast probabilistic processes be able to do so only with low probability.

(Slow Growth Law): Given any data string x and two significance parameters s2 > s1, a random program generated by coin tossing has probability less than 2−(s2−s1)+O(1) of transforming x into an excessively deep output, i.e. one whose s2-significant depth exceeds the s1-significant depth of x plus the run time of the transforming program plus O(1). More precisely, there exist positive constants c1, c2 such that for all strings x, and all pairs of significance parameters s2 > s1, the prefix set {q : Ds2(U(q, x)) > Ds1(x) + T(q, x) + c1} has measure less than 2−(s2−s1)+c2.

Proof: Let p be a s1-incompressible program which computes x in time Ds1(x), and let r be the restart prefix mentioned in the definition of the U machine. Let Q be the prefix set {q : Ds2(U(q, x)) > T(q, x) + Ds1(x) + c1}, where the constant c1 is sufficient to cover the time overhead of concatenation. For all q ∈ Q, the program rpq by definition computes some deep result U(q, x) in less time than that result’s own s2-significant depth, and so rpq must be compressible by s2 bits. The sum of the algorithmic probabilities of strings of the form rpq, where q ∈ Q, is therefore

Σq∈Q P(rpq)< Σq∈Q 2−|rpq| + s2 = 2−|r|−|p|+s2 μ(Q)

On the other hand, since the self-delimiting program p can be recovered from any string of the form rpq (by deleting r and executing the remainder pq until halting occurs, by which time exactly p will have been read), the algorithmic probability of p is at least as great (within a constant factor) as the sum of the algorithmic probabilities of the strings {rpq : q ∈ Q} considered above:

P(p) > μ(Q) · 2−|r|−|p|+s2−O(1)

Recalling the fact that minimal program size is equal within a constant factor to the −log of algorithmic probability, and the s1-incompressibility of p, we have P(p) < 2−(|p|−s1+O(1)), and therefore finally

μ(Q) < 2−(s2−s1)+O(1), which was to be demonstrated.

# Belief Networks “Acyclicity”. Thought of the Day 69.0

Belief networks are used to model uncertainty in a domain. The term “belief networks” encompasses a whole range of different but related techniques which deal with reasoning under uncertainty. Both quantitative (mainly using Bayesian probabilistic methods) and qualitative techniques are used. Influence diagrams are an extension to belief networks; they are used when working with decision making. Belief networks are used to develop knowledge based applications in domains which are characterised by inherent uncertainty. Increasingly, belief network techniques are being employed to deliver advanced knowledge based systems to solve real world problems. Belief networks are particularly useful for diagnostic applications and have been used in many deployed systems. The free-text help facility in the Microsoft Office product employs Bayesian belief network technology. Within a belief network the belief of each node (the node’s conditional probability) is calculated based on observed evidence. Various methods have been developed for evaluating node beliefs and for performing probabilistic inference. Influence diagrams, which are an extension of belief networks, provide facilities for structuring the goals of the diagnosis and for ascertaining the value (the influence) that given information will have when determining a diagnosis. In influence diagrams, there are three types of node: chance nodes, which correspond to the nodes in Bayesian belief networks; utility nodes, which represent the utilities of decisions; and decision nodes, which represent decisions which can be taken to influence the state of the world. Influence diagrams are useful in real world applications where there is often a cost, both in terms of time and money, in obtaining information.

The basic idea in belief networks is that the problem domain is modelled as a set of nodes interconnected with arcs to form a directed acyclic graph. Each node represents a random variable, or uncertain quantity, which can take two or more possible values. The arcs signify the existence of direct influences between the linked variables, and the strength of each influence is quantified by a forward conditional probability.

The Belief Network, which is also called the Bayesian Network, is a directed acyclic graph for probabilistic reasoning. It defines the conditional dependencies of the model by associating each node X with a conditional probability P(X|Pa(X)), where Pa(X) denotes the parents of X. Here are two of its conditional independence properties:

1. Each node is conditionally independent of its non-descendants given its parents.

2. Each node is conditionally independent of all other nodes given its Markov blanket, which consists of its parents, children, and children’s parents.

The inference of Belief Network is to compute the posterior probability distribution

P(H|V) = P(H,V)/ ∑HP(H,V)

where H is the set of the query variables, and V is the set of the evidence variables. Approximate inference involves sampling to compute posteriors. The Sigmoid Belief Network is a type of the Belief Network such that

P(Xi = 1|Pa(Xi)) = σ( ∑Xj ∈ Pa(Xi) WjiXj + bi)

where Wji is the weight assigned to the edge from Xj to Xi, and σ is the sigmoid function. # Fundamental Theorem of Asset Pricing: Tautological Meeting of Mathematical Martingale and Financial Arbitrage by the Measure of Probability. The Fundamental Theorem of Asset Pricing (FTAP hereafter) has two broad tenets, viz.

1. A market admits no arbitrage, if and only if, the market has a martingale measure.

2. Every contingent claim can be hedged, if and only if, the martingale measure is unique.

The FTAP is a theorem of mathematics, and the use of the term ‘measure’ in its statement places the FTAP within the theory of probability formulated by Andrei Kolmogorov (Foundations of the Theory of Probability) in 1933. Kolmogorov’s work took place in a context captured by Bertrand Russell, who observed that

It is important to realise the fundamental position of probability in science. . . . As to what is meant by probability, opinions differ.

In the 1920s the idea of randomness, as distinct from a lack of information, was becoming substantive in the physical sciences because of the emergence of the Copenhagen Interpretation of quantum mechanics. In the social sciences, Frank Knight argued that uncertainty was the only source of profit and the concept was pervading John Maynard Keynes’ economics (Robert Skidelsky Keynes the return of the master).

Two mathematical theories of probability had become ascendant by the late 1920s. Richard von Mises (brother of the Austrian economist Ludwig) attempted to lay down the axioms of classical probability within a framework of Empiricism, the ‘frequentist’ or ‘objective’ approach. To counter–balance von Mises, the Italian actuary Bruno de Finetti presented a more Pragmatic approach, characterised by his claim that “Probability does not exist” because it was only an expression of the observer’s view of the world. This ‘subjectivist’ approach was closely related to the less well-known position taken by the Pragmatist Frank Ramsey who developed an argument against Keynes’ Realist interpretation of probability presented in the Treatise on Probability.

Kolmogorov addressed the trichotomy of mathematical probability by generalising so that Realist, Empiricist and Pragmatist probabilities were all examples of ‘measures’ satisfying certain axioms. In doing this, a random variable became a function while an expectation was an integral: probability became a branch of Analysis, not Statistics. Von Mises criticised Kolmogorov’s generalised framework as un-necessarily complex. About a decade and a half back, the physicist Edwin Jaynes (Probability Theory The Logic Of Science) champions Leonard Savage’s subjectivist Bayesianism as having a “deeper conceptual foundation which allows it to be extended to a wider class of applications, required by current problems of science”.

The objections to measure theoretic probability for empirical scientists can be accounted for as a lack of physicality. Frequentist probability is based on the act of counting; subjectivist probability is based on a flow of information, which, following Claude Shannon, is now an observable entity in Empirical science. Measure theoretic probability is based on abstract mathematical objects unrelated to sensible phenomena. However, the generality of Kolmogorov’s approach made it flexible enough to handle problems that emerged in physics and engineering during the Second World War and his approach became widely accepted after 1950 because it was practically more useful.

In the context of the first statement of the FTAP, a ‘martingale measure’ is a probability measure, usually labelled Q, such that the (real, rather than nominal) price of an asset today, X0, is the expectation, using the martingale measure, of its (real) price in the future, XT. Formally,

X0 = EQ XT

The abstract probability distribution Q is defined so that this equality exists, not on any empirical information of historical prices or subjective judgement of future prices. The only condition placed on the relationship that the martingale measure has with the ‘natural’, or ‘physical’, probability measures usually assigned the label P, is that they agree on what is possible.

The term ‘martingale’ in this context derives from doubling strategies in gambling and it was introduced into mathematics by Jean Ville in a development of von Mises’ work. The idea that asset prices have the martingale property was first proposed by Benoit Mandelbrot in response to an early formulation of Eugene Fama’s Efficient Market Hypothesis (EMH), the two concepts being combined by Fama. For Mandelbrot and Fama the key consequence of prices being martingales was that the current price was independent of the future price and technical analysis would not prove profitable in the long run. In developing the EMH there was no discussion on the nature of the probability under which assets are martingales, and it is often assumed that the expectation is calculated under the natural measure. While the FTAP employs modern terminology in the context of value-neutrality, the idea of equating a current price with a future, uncertain, has ethical ramifications.

The other technical term in the first statement of the FTAP, arbitrage, has long been used in financial mathematics. Liber Abaci Fibonacci (Laurence Sigler Fibonaccis Liber Abaci) discusses ‘Barter of Merchandise and Similar Things’, 20 arms of cloth are worth 3 Pisan pounds and 42 rolls of cotton are similarly worth 5 Pisan pounds; it is sought how many rolls of cotton will be had for 50 arms of cloth. In this case there are three commodities, arms of cloth, rolls of cotton and Pisan pounds, and Fibonacci solves the problem by having Pisan pounds ‘arbitrate’, or ‘mediate’ as Aristotle might say, between the other two commodities.

Within neo-classical economics, the Law of One Price was developed in a series of papers between 1954 and 1964 by Kenneth Arrow, Gérard Debreu and Lionel MacKenzie in the context of general equilibrium, in particular the introduction of the Arrow Security, which, employing the Law of One Price, could be used to price any asset. It was on this principle that Black and Scholes believed the value of the warrants could be deduced by employing a hedging portfolio, in introducing their work with the statement that “it should not be possible to make sure profits” they were invoking the arbitrage argument, which had an eight hundred year history. In the context of the FTAP, ‘an arbitrage’ has developed into the ability to formulate a trading strategy such that the probability, under a natural or martingale measure, of a loss is zero, but the probability of a positive profit is not.

To understand the connection between the financial concept of arbitrage and the mathematical idea of a martingale measure, consider the most basic case of a single asset whose current price, X0, can take on one of two (present) values, XTD < XTU, at time T > 0, in the future. In this case an arbitrage would exist if X0 ≤ XTD < XTU: buying the asset now, at a price that is less than or equal to the future pay-offs, would lead to a possible profit at the end of the period, with the guarantee of no loss. Similarly, if XTD < XTU ≤ X0, short selling the asset now, and buying it back would also lead to an arbitrage. So, for there to be no arbitrage opportunities we require that

XTD < X0 < XTU

This implies that there is a number, 0 < q < 1, such that

X0 = XTD + q(XTU − XTD)

= qXTU + (1−q)XTD

The price now, X0, lies between the future prices, XTU and XTD, in the ratio q : (1 − q) and represents some sort of ‘average’. The first statement of the FTAP can be interpreted simply as “the price of an asset must lie between its maximum and minimum possible (real) future price”.

If X0 < XTD ≤ XTU we have that q < 0 whereas if XTD ≤ XTU < X0 then q > 1, and in both cases q does not represent a probability measure which by Kolmogorov’s axioms, must lie between 0 and 1. In either of these cases an arbitrage exists and a trader can make a riskless profit, the market involves ‘turpe lucrum’. This account gives an insight as to why James Bernoulli, in his moral approach to probability, considered situations where probabilities did not sum to 1, he was considering problems that were pathological not because they failed the rules of arithmetic but because they were unfair. It follows that if there are no arbitrage opportunities then quantity q can be seen as representing the ‘probability’ that the XTU price will materialise in the future. Formally

X0 = qXTU + (1−q) XTD ≡ EQ XT

The connection between the financial concept of arbitrage and the mathematical object of a martingale is essentially a tautology: both statements mean that the price today of an asset must lie between its future minimum and maximum possible value. This first statement of the FTAP was anticipated by Frank Ramsey when he defined ‘probability’ in the Pragmatic sense of ‘a degree of belief’ and argues that measuring ‘degrees of belief’ is through betting odds. On this basis he formulates some axioms of probability, including that a probability must lie between 0 and 1. He then goes on to say that

These are the laws of probability, …If anyone’s mental condition violated these laws, his choice would depend on the precise form in which the options were offered him, which would be absurd. He could have a book made against him by a cunning better and would then stand to lose in any event.

This is a Pragmatic argument that identifies the absence of the martingale measure with the existence of arbitrage and today this forms the basis of the standard argument as to why arbitrages do not exist: if they did the, other market participants would bankrupt the agent who was mis-pricing the asset. This has become known in philosophy as the ‘Dutch Book’ argument and as a consequence of the fact/value dichotomy this is often presented as a ‘matter of fact’. However, ignoring the fact/value dichotomy, the Dutch book argument is an alternative of the ‘Golden Rule’– “Do to others as you would have them do to you.”– it is infused with the moral concepts of fairness and reciprocity (Jeffrey Wattles The Golden Rule).

FTAP is the ethical concept of Justice, capturing the social norms of reciprocity and fairness. This is significant in the context of Granovetter’s discussion of embeddedness in economics. It is conventional to assume that mainstream economic theory is ‘undersocialised’: agents are rational calculators seeking to maximise an objective function. The argument presented here is that a central theorem in contemporary economics, the FTAP, is deeply embedded in social norms, despite being presented as an undersocialised mathematical object. This embeddedness is a consequence of the origins of mathematical probability being in the ethical analysis of commercial contracts: the feudal shackles are still binding this most modern of economic theories.

Ramsey goes on to make an important point

Having any definite degree of belief implies a certain measure of consistency, namely willingness to bet on a given proposition at the same odds for any stake, the stakes being measured in terms of ultimate values. Having degrees of belief obeying the laws of probability implies a further measure of consistency, namely such a consistency between the odds acceptable on different propositions as shall prevent a book being made against you.

Ramsey is arguing that an agent needs to employ the same measure in pricing all assets in a market, and this is the key result in contemporary derivative pricing. Having identified the martingale measure on the basis of a ‘primal’ asset, it is then applied across the market, in particular to derivatives on the primal asset but the well-known result that if two assets offer different ‘market prices of risk’, an arbitrage exists. This explains why the market-price of risk appears in the Radon-Nikodym derivative and the Capital Market Line, it enforces Ramsey’s consistency in pricing. The second statement of the FTAP is concerned with incomplete markets, which appear in relation to Arrow-Debreu prices. In mathematics, in the special case that there are as many, or more, assets in a market as there are possible future, uncertain, states, a unique pricing vector can be deduced for the market because of Cramer’s Rule. If the elements of the pricing vector satisfy the axioms of probability, specifically each element is positive and they all sum to one, then the market precludes arbitrage opportunities. This is the case covered by the first statement of the FTAP. In the more realistic situation that there are more possible future states than assets, the market can still be arbitrage free but the pricing vector, the martingale measure, might not be unique. The agent can still be consistent in selecting which particular martingale measure they choose to use, but another agent might choose a different measure, such that the two do not agree on a price. In the context of the Law of One Price, this means that we cannot hedge, replicate or cover, a position in the market, such that the portfolio is riskless. The significance of the second statement of the FTAP is that it tells us that in the sensible world of imperfect knowledge and transaction costs, a model within the framework of the FTAP cannot give a precise price. When faced with incompleteness in markets, agents need alternative ways to price assets and behavioural techniques have come to dominate financial theory. This feature was realised in The Port Royal Logic when it recognised the role of transaction costs in lotteries.