# Convergence in Probability Implying Stochastic Continuity. Part 3.

A compound Poisson process with intensity λ > 0 and jump size distribution f is a stochastic process Xt defined as

Xt = ∑i=1NtYi

where jumps sizes Yi are independent and identically distributed with distribution f and (Nt) is a Poisson process with intensity λ, independent from (Yi)i≥1.

The following properties of a compound Poisson process are now deduced

1. The sample paths of X are cadlag piecewise constant functions.
2. The jump times (Ti)i≥1 have the same law as the jump times of the Poisson process Nt: they can be expressed as partial sums of independent exponential random variables with parameter λ.
3. The jump sizes (Yi)i≥1 are independent and identically distributed with law f.

The Poisson process itself can be seen as a compound Poisson process on R such that Yi ≡ 1. This explains the origin of term “compound Poisson” in the definition.

Let R(n),n ≥ 0 be a random walk with step size distribution f: R(n) = ∑i=0n Yi. The compound Poisson process Xt can be obtained by changing the time of R with an independent Poisson process Nt: Xt = R(Nt). Xt thus describes the position of a random walk after a random number of time steps, given by Nt. Compound Poisson processes are Lévy processes (and part 2) and they are the only Lévy processes with piecewise constant sample paths.

(Xt)t≥0 is compound Poisson process if and only if it is a Lévy process and its sample paths are piecewise constant functions.

Let (Xt)t≥0 be a Lévy process with piecewise constant paths. We can construct, path by path, a process (Nt, t ≥ 0) which counts the jumps of X:

Nt =# {0 < s ≤ t: Xs− ≠ Xs} —– (1)

Since the trajectories of X are piecewise constant, X has a finite number of jumps in any finite interval which entails that Nt is finite ∀ finite t. Hence, it is a counting process. Let h < t. Then

Nt − Nh = #{h < s ≤ t: Xs− ≠ Xs} = #{h < s ≤ t: Xs− − Xh ≠ Xs − Xh}

Hence, Nt − Nh depends only on (Xs − Xh), h ≤ s ≤ t. Therefore, from the independence and stationarity of increments of (Xt) it follows that (Nt) also has independent and stationary increments. Using the process N, we can compute the jump sizes of X: Yn = XSn − XSn− where Sn = inf{t: Nt ≥ n}. First lets see how the increments of X conditionally on the trajectory of N are independent? Let t > s and consider the following four sets:

A1 ∈ σ(Xs)

A2 ∈ σ(Xt − Xs)

B1 ∈ σ(Nr, r ≤ s)

B2 ∈ σ(Nr − Ns, r > s)

such that P(B1) > 0 and P(B2) > 0. The independence of increments of X implies that processes (Xr − Xs, r > s) and (Xr, r ≤ s) are independent. Hence,

P[A1 ∩ B1 ∩ A2 ∩ B2] = P[A1 ∩ B1]P[A2 ∩ B2]

Moreover,

– A1 and B1 are independent from B2.

– A2 and B2 are independent from B1.

– B1 and B2 are independent from each other.

Therefore, the conditional probability of interest can be expressed as:

P[A1 ∩ A2 | B1 ∩ B2] = (P[A1 ∩ B1]P[A2 ∩ B2])/P[B1]P[B2]

= (P[A1 ∩ B1 ∩ B2]P[A2 ∩ B1 ∩ B2])/P[B1]2P[B2]2 = P[A1 | B1 ∩ B2]P[A2 | B1 ∩ B2]

This proves that Xt − Xs and Xs are independent conditionally on the trajectory of N. In particular, choosing B1 = {Ns = 1} and B2 = {Nt − Ns = 1}

we obtain that Y1 and Y2 are independent. Since we could have taken any number of increments of X and not just two of them, this proves that (Yi)i≥1 are independent. The jump sizes have the same law, in that the two-dimensional process (Xt, Nt) has stationary increments. Therefore, for every n ≥ 0 and for every s > h > 0,

E[ƒ(Xh) | Nh = 1, Nh – Ns = n] = E[ƒ(Xs+h – Xs) | Ns+h – Ns = 1, Ns – Nh = n],

Where ƒ is any bounded Borel function. This entails that for every n ≥ 0, Y1 and Yn+2 have the same law.

Let (Xt)t≥0 be a compound Poisson process.

Independence of increments. Let 0 < r < s and let ƒ and g be bounded Borel functions on Rd. To ease the notation, we prove only that Xr is independent from Xs − Xr, but the same reasoning applies to any finite number of increments. We must show that

E[ƒ(Xr)g(Xs − Xr)] = E[ƒ(Xr)]E[g(Xs − Xr)]

From the representation Xr = ∑i=1Nr and Xs − Xr = ∑i=Nr+1Ns Yi the following observations are made:

– Conditionally on the trajectory of Nt for t ∈ [0, s], Xr and Xs − Xr are independent because the first expression only depends on Yi for i ≤ Nr and the second expression only depends on Yi for i > Nr.
– The expectation E[ƒ(Xr) Nt, t ≤ s] depends only on Nr and the expectation E[g(Xs − Xr) Nt, t ≤ s] depends only on Ns − Nr.

On using the independence of increments of the Poisson process, we can write:

E[ƒ(Xr)g(Xs – Xr)] = E[E[ƒ(Xr)g(Xs – Xr) | Nt, t ≤ s]]

= E[E[ƒ(Xr) | Nt, t ≤ s] E[g(Xs – Xr) |  Nt, t ≤ s]]

= E[E[ƒ(Xr) | Nt, t ≤ s]] E[E[g(Xs – Xr) |  Nt, t ≤ s]]

= E[ƒ(Xr)] E[g(Xs – Xr)]

Stationarity of increments. Let 0 < r < s and let ƒ be a bounded Borel function.

E[ƒ(Xs – Xr)] = E[E[ ∑i=Nr+1Ns Yi | Nt, t ≤ s]]

= E[E[∑i=1Ns-Nr Yi | Nt, t ≤ s]] = E[E[∑i=1Ns-r Yi | Nt, t ≤ s]] = E[ƒ(Xs-r)]

Stochastic continuity. Xt only jumps if Nt does.

P(Ns →s<ts→t Nt) = 1

Hence, for every t > 0,

P(Xs →s<ts→t Xt) = 1

Since almost sure convergence entails convergence in probability, this implies stochastic continuity. Also, since any cadlag function may be approximated by a piecewise constant function, one may expect that general Lévy processes can be well approximated by compound Poisson ones and that by studying compound Poisson processes one can gain an insight into the properties of Lévy processes.

# Incomplete Markets and Calibrations for Coherence with Hedged Portfolios. Thought of the Day 154.0 In complete market models such as the Black-Scholes model, probability does not really matter: the “objective” evolution of the asset is only there to define the set of “impossible” events and serves to specify the class of equivalent measures. Thus, two statistical models P1 ∼ P2 with equivalent measures lead to the same option prices in a complete market setting.

This is not true anymore in incomplete markets: probabilities matter and model specification has to be taken seriously since it will affect hedging decisions. This situation is more realistic but also more challenging and calls for an integrated approach between option pricing methods and statistical modeling. In incomplete markets, not only does probability matter but attitudes to risk also matter: utility based methods explicitly incorporate these into the hedging problem via utility functions. While these methods are focused on hedging with the underlying asset, common practice is to use liquid call/put options to hedge exotic options. In incomplete markets, options are not redundant assets; therefore, if options are available as hedging instruments they can and should be used to improve hedging performance.

While the lack of liquidity in the options market prevents in practice from using dynamic hedges involving options, options are commonly used for static hedging: call options are frequently used for dealing with volatility or convexity exposures and for hedging barrier options.

What are the implications of hedging with options for the choice of a pricing rule? Consider a contingent claim H and assume that we have as hedging instruments a set of benchmark options with prices Ci, i = 1 . . . n and terminal payoffs Hi, i = 1 . . . n. A static hedge of H is a portfolio composed from the options Hi, i = 1 . . . n and the numeraire, in order to match as closely as possible the terminal payoff of H:

H = V0 + ∑i=1n xiHi + ∫0T φdS + ε —– (1)

where ε is an hedging error representing the nonhedgeable risk. Typically Hi are payoffs of call or put options and are not possible to replicate using the underlying so adding them to the hedge portfolio increases the span of hedgeable claims and reduces residual risk.

Consider a pricing rule Q. Assume that EQ[ε] = 0 (otherwise EQ[ε] can be added to V0). Then the claim H is valued under Q as:

e-rTEQ[H] = V0 ∑i=1n xe-rTEQ[Hi] —– (2)

since the stochastic integral term, being a Q-martingale, has zero expectation. On the other hand, the cost of setting up the hedging portfolio is:

V0 + ∑i=1n xCi —– (3)

So the value of the claim given by the pricing rule Q corresponds to the cost of the hedging portfolio if the model prices of the benchmark options Hi correspond to their market prices Ci:

∀i = 1, …, n

e-rTEQ[Hi] = Ci∗ —– (4)

This condition is called calibration, where a pricing rule verifies the calibration of the option prices Ci, i = 1, . . . , n. This condition is necessary to guarantee the coherence between model prices and the cost of hedging with portfolios and if the model is not calibrated then the model price for a claim H may have no relation with the effective cost of hedging it using the available options Hi. If a pricing rule Q is specified in an ad hoc way, the calibration conditions will not be verified, and thus one way to ensure them is to incorporate them as constraints in the choice of the pricing measure Q.

# Self-Financing and Dynamically Hedged Portfolio – Robert Merton’s Option Pricing. Thought of the Day 153.0 As an alternative to the riskless hedging approach, Robert Merton derived the option pricing equation via the construction of a self-financing and dynamically hedged portfolio containing the risky asset, option and riskless asset (in the form of money market account). Let QS(t) and QV(t) denote the number of units of asset and option in the portfolio, respectively, and MS(t) and MV(t) denote the currency value of QS(t) units of asset and QV(t) units of option, respectively. The self-financing portfolio is set up with zero initial net investment cost and no additional funds are added or withdrawn afterwards. The additional units acquired for one security in the portfolio is completely financed by the sale of another security in the same portfolio. The portfolio is said to be dynamic since its composition is allowed to change over time. For notational convenience, dropping the subscript t for the asset price process St, the option value process Vt and the standard Brownian process Zt. The portfolio value at time t can be expressed as

Π(t) = MS(t) + MV(t) + M(t) = QS(t)S + QV(t)V + M(t) —– (1)

where M(t) is the currency value of the riskless asset invested in a riskless money market account. Suppose the asset price process is governed by the stochastic differential equation (1) in here, we apply the Ito lemma to obtain the differential of the option value V as:

dV = ∂V/∂t dt + ∂V/∂S dS + σ2/2 S22V/∂S2 dt = (∂V/∂t + μS ∂V/∂S σ2/2 S22V/∂S2)dt + σS ∂V/∂S dZ —– (2)

If we formally write the stochastic dynamics of V as

dV/V = μV dt + σV dZ —– (3)

then μV and σV are given by

μV = (∂V/∂t + ρS ∂V/∂S + σ2/2 S22V/∂S2)/V —– (4)

and

σV = (σS ∂V/∂S)/V —– (5)

The instantaneous currency return dΠ(t) of the above portfolio is attributed to the differential price changes of asset and option and interest accrued, and the differential changes in the amount of asset, option and money market account held. The differential of Π(t) is computed as:

dΠ(t) = [QS(t) dS + QV(t) dV + rM(t) dt] + [S dQS(t) + V dQV(t) + dM(t)] —– (6)

where rM(t)dt gives the interest amount earned from the money market account over dt and dM(t) represents the change in the money market account held due to net currency gained/lost from the sale of the underlying asset and option in the portfolio. And if the portfolio is self-financing, the sum of the last three terms in the above equation is zero. The instantaneous portfolio return dΠ(t) can then be expressed as:

dΠ(t) = QS(t) dS + QV(t) dV + rM(t) dt = MS(t) dS/S + MV(t) dV/V +  rM(t) dt —– (7)

Eliminating M(t) between (1) and (7) and expressing dS/S and dV/V in terms of their stochastic dynamics, we obtain

dΠ(t) = [(μ − r)MS(t) + (μV − r)MV(t)]dt + [σMS(t) + σV MV(t)]dZ —– (8)

How can we make the above self-financing portfolio instantaneously riskless so that its return is non-stochastic? This can be achieved by choosing an appropriate proportion of asset and option according to

σMS(t) + σV MV(t) = σS QS(t) + σS ∂V/∂S QV(t) = 0

that is, the number of units of asset and option in the self-financing portfolio must be in the ratio

QS(t)/QV(t) = -∂V/∂S —– (9)

at all times. The above ratio is time dependent, so continuous readjustment of the portfolio is necessary. We now have a dynamic replicating portfolio that is riskless and requires zero initial net investment, so the non-stochastic portfolio return dΠ(t) must be zero.

(8) becomes

0 = [(μ − r)MS(t) + (μV − r)MV(t)]dt

substituting the ratio factor in the above equation, we get

(μ − r)S ∂V/∂S = (μV − r)V —– (10)

Now substituting μfrom (4) into the above equation, we get the black-Scholes equation for V,

∂V/∂t + σ2/2 S22V/∂S2 + rS ∂V/∂S – rV = 0

Suppose we take QV(t) = −1 in the above dynamically hedged self-financing portfolio, that is, the portfolio always shorts one unit of the option. By the ratio factor, the number of units of risky asset held is always kept at the level of ∂V/∂S units, which is changing continuously over time. To maintain a self-financing hedged portfolio that constantly keeps shorting one unit of the option, we need to have both the underlying asset and the riskfree asset (money market account) in the portfolio. The net cash flow resulting in the buying/selling of the risky asset in the dynamic procedure of maintaining ∂V/∂S units of the risky asset is siphoned to the money market account.

# Derivative Pricing Theory: Call, Put Options and “Black, Scholes'” Hedged Portfolio.Thought of the Day 152.0  Fischer Black and Myron Scholes revolutionized the pricing theory of options by showing how to hedge continuously the exposure on the short position of an option. Consider the writer of a call option on a risky asset. S/he is exposed to the risk of unlimited liability if the asset price rises above the strike price. To protect the writer’s short position in the call option, s/he should consider purchasing a certain amount of the underlying asset so that the loss in the short position in the call option is offset by the long position in the asset. In this way, the writer is adopting the hedging procedure. A hedged position combines an option with its underlying asset so as to achieve the goal that either the asset compensates the option against loss or otherwise. By adjusting the proportion of the underlying asset and option continuously in a portfolio, Black and Scholes demonstrated that investors can create a riskless hedging portfolio where the risk exposure associated with the stochastic asset price is eliminated. In an efficient market with no riskless arbitrage opportunity, a riskless portfolio must earn an expected rate of return equal to the riskless interest rate.

Black and Scholes made the following assumptions on the financial market.

1. Trading takes place continuously in time.
2. The riskless interest rate r is known and constant over time.
3. The asset pays no dividend.
4. There are no transaction costs in buying or selling the asset or the option, and no taxes.
5. The assets are perfectly divisible.
6. There are no penalties to short selling and the full use of proceeds is permitted.
7. There are no riskless arbitrage opportunities.

The stochastic process of the asset price St is assumed to follow the geometric Brownian motion

dSt/St = μ dt + σ dZt —– (1)

where μ is the expected rate of return, σ is the volatility and Zt is the standard Brownian process. Both μ and σ are assumed to be constant. Consider a portfolio that involves short selling of one unit of a call option and long holding of Δt units of the underlying asset. The portfolio value Π (St, t) at time t is given by

Π = −c + Δt St —– (2)

where c = c(St, t) denotes the call price. Note that Δt changes with time t, reflecting the dynamic nature of hedging. Since c is a stochastic function of St, we apply the Ito lemma to compute its differential as follows:

dc = ∂c/∂t dt + ∂c/∂St dSt + σ2/2 St2 ∂2c/∂St2 dt

such that

-dc + Δt dS= (-∂c/∂t – σ2/2 St2 ∂2c/∂St2)dt + (Δ– ∂c/∂St)dSt

= [-∂c/∂t – σ2/2 St2 ∂2c/∂St+ (Δ– ∂c/∂St)μSt]dt + (Δ– ∂c/∂St)σSdZt

The cumulative financial gain on the portfolio at time t is given by

G(Π (St, t )) = ∫0t -dc + ∫0t Δu dSu

= ∫0t [-∂c/∂u – σ2/2 Su22c/∂Su2 + (Δ– ∂c/∂Su)μSu]du + ∫0t (Δ– ∂c/∂Su)σSdZ—– (3)

The stochastic component of the portfolio gain stems from the last term, ∫0t (Δ– ∂c/∂Su)σSdZu. Suppose we adopt the dynamic hedging strategy by choosing Δu = ∂c/∂Su at all times u < t, then the financial gain becomes deterministic at all times. By virtue of no arbitrage, the financial gain should be the same as the gain from investing on the risk free asset with dynamic position whose value equals -c + Su∂c/∂Su. The deterministic gain from this dynamic position of riskless asset is given by

Mt = ∫0tr(-c + Su∂c/∂Su)du —– (4)

By equating these two deterministic gains, G(Π (St, t)) and Mt, we have

-∂c/∂u – σ2/2 Su22c/∂Su2 = r(-c + Su∂c/∂Su), 0 < u < t

which is satisfied for any asset price S if c(S, t) satisfies the equation

∂c/∂t + σ2/2 S22c/∂S+ rS∂c/∂S – rc = 0 —– (5)

This parabolic partial differential equation is called the Black–Scholes equation. Strangely, the parameter μ, which is the expected rate of return of the asset, does not appear in the equation.

To complete the formulation of the option pricing model, let’s prescribe the auxiliary condition. The terminal payoff at time T of the call with strike price X is translated into the following terminal condition:

c(S, T ) = max(S − X, 0) —– (6)

for the differential equation.

Since both the equation and the auxiliary condition do not contain ρ, one concludes that the call price does not depend on the actual expected rate of return of the asset price. The option pricing model involves five parameters: S, T, X, r and σ. Except for the volatility σ, all others are directly observable parameters. The independence of the pricing model on μ is related to the concept of risk neutrality. In a risk neutral world, investors do not demand extra returns above the riskless interest rate for bearing risks. This is in contrast to usual risk averse investors who would demand extra returns above r for risks borne in their investment portfolios. Apparently, the option is priced as if the rates of return on the underlying asset and the option are both equal to the riskless interest rate. This risk neutral valuation approach is viable if the risks from holding the underlying asset and option are hedgeable.

The governing equation for a put option can be derived similarly and the same Black–Scholes equation is obtained. Let V (S, t) denote the price of a derivative security with dependence on S and t, it can be shown that V is governed by

∂V/∂t + σ2/2 S22V/∂S+ rS∂V/∂S – rV = 0 —– (7)

The price of a particular derivative security is obtained by solving the Black–Scholes equation subject to an appropriate set of auxiliary conditions that model the corresponding contractual specifications in the derivative security.

The original derivation of the governing partial differential equation by Black and Scholes focuses on the financial notion of riskless hedging but misses the precise analysis of the dynamic change in the value of the hedged portfolio. The inconsistencies in their derivation stem from the assumption of keeping the number of units of the underlying asset in the hedged portfolio to be instantaneously constant. They take the differential change of portfolio value Π to be

dΠ =−dc + Δt dSt,

which misses the effect arising from the differential change in Δt. The ability to construct a perfectly hedged portfolio relies on the assumption of continuous trading and continuous asset price path. It has been commonly agreed that the assumed Geometric Brownian process of the asset price may not truly reflect the actual behavior of the asset price process. The asset price may exhibit jumps upon the arrival of a sudden news in the financial market. The interest rate is widely recognized to be fluctuating over time in an irregular manner rather than being constant. For an option on a risky asset, the interest rate appears only in the discount factor so that the assumption of constant/deterministic interest rate is quite acceptable for a short-lived option. The Black–Scholes pricing approach assumes continuous hedging at all times. In the real world of trading with transaction costs, this would lead to infinite transaction costs in the hedging procedure.

# Symmetrical – Asymmetrical Dialectics Within Catastrophical Dynamics. Thought of the Day 148.0 Catastrophe theory has been developed as a deterministic theory for systems that may respond to continuous changes in control variables by a discontinuous change from one equilibrium state to another. A key idea is that system under study is driven towards an equilibrium state. The behavior of the dynamical systems under study is completely determined by a so-called potential function, which depends on behavioral and control variables. The behavioral, or state variable describes the state of the system, while control variables determine the behavior of the system. The dynamics under catastrophe models can become extremely complex, and according to the classification theory of Thom, there are seven different families based on the number of control and dependent variables.

Let us suppose that the process yt evolves over t = 1,…, T as

dyt = -dV(yt; α, β)dt/dyt —– (1)

where V (yt; α, β) is the potential function describing the dynamics of the state variable ycontrolled by parameters α and β determining the system. When the right-hand side of (1) equals zero, −dV (yt; α, β)/dyt = 0, the system is in equilibrium. If the system is at a non-equilibrium point, it will move back to its equilibrium where the potential function takes the minimum values with respect to yt. While the concept of potential function is very general, i.e. it can be quadratic yielding equilibrium of a simple flat response surface, one of the most applied potential functions in behavioral sciences, a cusp potential function is defined as

−V(yt; α, β) = −1/4yt4 + 1/2βyt2 + αyt —– (2)

with equilibria at

-dV(yt; α, β)dt/dyt = -yt3 + βyt + α —– (3)

being equal to zero. The two dimensions of the control space, α and β, further depend on realizations from i = 1 . . . , n independent variables xi,t. Thus it is convenient to think about them as functions

αx = α01x1,t +…+ αnxn,t —– (4)

βx = β0 + β1x1,t +…+ βnxn,t —– (5)

The control functions αx and βx are called normal and splitting factors, or asymmetry and bifurcation factors, respectively and they determine the predicted values of yt given xi,t. This means that for each combination of values of independent variables there might be up to three predicted values of the state variable given by roots of

-dV(yt; αx, βx)dt/dyt = -yt3 + βyt + α = 0 —– (6)

This equation has one solution if

δx = 1/4αx2 − 1/27βx3 —– (7)

is greater than zero, δx > 0 and three solutions if δx < 0. This construction can serve as a statistic for bimodality, one of the catastrophe flags. The set of values for which the discriminant is equal to zero, δx = 0 is the bifurcation set which determines the set of singularity points in the system. In the case of three roots, the central root is called an “anti-prediction” and is least probable state of the system. Inside the bifurcation, when δx < 0, the surface predicts two possible values of the state variable which means that the state variable is bimodal in this case. Most of the systems in behavioral sciences are subject to noise stemming from measurement errors or inherent stochastic nature of the system under study. Thus for a real-world applications, it is necessary to add non-deterministic behavior into the system. As catastrophe theory has primarily been developed to describe deterministic systems, it may not be obvious how to extend the theory to stochastic systems. An important bridge has been provided by the Itô stochastic differential equations to establish a link between the potential function of a deterministic catastrophe system and the stationary probability density function of the corresponding stochastic process. Adding a stochastic Gaussian white noise term to the system

dyt = -dV(yt; αx, βx)dt/dyt + σytdWt —– (8)

where -dV(yt; αx, βx)dt/dyt is the deterministic term, or drift function representing the equilibrium state of the cusp catastrophe, σyt is the diffusion function and Wt is a Wiener process. When the diffusion function is constant, σyt = σ, and the current measurement scale is not to be nonlinearly transformed, the stochastic potential function is proportional to deterministic potential function and probability distribution function corresponding to the solution from (8) converges to a probability distribution function of a limiting stationary stochastic process as dynamics of yt are assumed to be much faster than changes in xi,t. The probability density that describes the distribution of the system’s states at any t is then

fs(y|x) = ψ exp((−1/4)y4 + (βx/2)y2 + αxy)/σ —– (9)

The constant ψ normalizes the probability distribution function so its integral over the entire range equals to one. As bifurcation factor βx changes from negative to positive, the fs(y|x) changes its shape from unimodal to bimodal. On the other hand, αx causes asymmetry in fs(y|x).

# Probability Space Intertwines Random Walks – Thought of the Day 144.0  Many deliberations of stochasticity start with “let (Ω, F, P) be a probability space”. One can actually follow such discussions without having the slightest idea what Ω is and who lives inside. So, what is “Ω, F, P” and why do we need it? Indeed, for many users of probability and statistics, a random variable X is synonymous with its probability distribution μX and all computations such as sums, expectations, etc., done on random variables amount to analytical operations such as integrations, Fourier transforms, convolutions, etc., done on their distributions. For defining such operations, you do not need a probability space. Isn’t this all there is to it?

One can in fact compute quite a lot of things without using probability spaces in an essential way. However the notions of probability space and random variable are central in modern probability theory so it is important to understand why and when these concepts are relevant.

From a modelling perspective, the starting point is a set of observations taking values in some set E (think for instance of numerical measurement, E = R) for which we would like to build a stochastic model. We would like to represent such observations x1, . . . , xn as samples drawn from a random variable X defined on some probability space (Ω, F, P). It is important to see that the only natural ingredient here is the set E where the random variables will take their values: the set of events Ω is not given a priori and there are many different ways to construct a probability space (Ω, F, P) for modelling the same set of observations.

Sometimes it is natural to identify Ω with E, i.e., to identify the randomness ω with its observed effect. For example if we consider the outcome of a dice rolling experiment as an integer-valued random variable X, we can define the set of events to be precisely the set of possible outcomes: Ω = {1, 2, 3, 4, 5, 6}. In this case, X(ω) = ω: the outcome of the randomness is identified with the randomness itself. This choice of Ω is called the canonical space for the random variable X. In this case the random variable X is simply the identity map X(ω) = ω and the probability measure P is formally the same as the distribution of X. Note that here X is a one-to-one map: given the outcome of X one knows which scenario has happened so any other random variable Y is completely determined by the observation of X. Therefore using the canonical construction for the random variable X, we cannot define, on the same probability space, another random variable which is independent of X: X will be the sole source of randomness for all other variables in the model. This also show that, although the canonical construction is the simplest way to construct a probability space for representing a given random variable, it forces us to identify this particular random variable with the “source of randomness” in the model. Therefore when we want to deal with models with a sufficiently rich structure, we need to distinguish Ω – the set of scenarios of randomness – from E, the set of values of our random variables.

Let us give an example where it is natural to distinguish the source of randomness from the random variable itself. For instance, if one is modelling the market value of a stock at some date T in the future as a random variable S1, one may consider that the stock value is affected by many factors such as external news, market supply and demand, economic indicators, etc., summed up in some abstract variable ω, which may not even have a numerical representation: it corresponds to a scenario for the future evolution of the market. S1(ω) is then the stock value if the market scenario which occurs is given by ω. If the only interesting quantity in the model is the stock price then one can always label the scenario ω by the value of the stock price S1(ω), which amounts to identifying all scenarios where the stock S1 takes the same value and using the canonical construction. However if one considers a richer model where there are now other stocks S2, S3, . . . involved, it is more natural to distinguish the scenario ω from the random variables S1(ω), S2(ω),… whose values are observed in these scenarios but may not completely pin them down: knowing S1(ω), S2(ω),… one does not necessarily know which scenario has happened. In this way one reserves the possibility of adding more random variables later on without changing the probability space.

These have the following important consequence: the probabilistic description of a random variable X can be reduced to the knowledge of its distribution μX only in the case where the random variable X is the only source of randomness. In this case, a stochastic model can be built using a canonical construction for X. In all other cases – as soon as we are concerned with a second random variable which is not a deterministic function of X – the underlying probability measure P contains more information on X than just its distribution. In particular, it contains all the information about the dependence of the random variable X with respect to all other random variables in the model: specifying P means specifying the joint distributions of all random variables constructed on Ω. For instance, knowing the distributions μX, μY of two variables X, Y does not allow to compute their covariance or joint moments. Only in the case where all random variables involved are mutually independent can one reduce all computations to operations on their distributions. This is the case covered in most introductory texts on probability, which explains why one can go quite far, for example in the study of random walks, without formalizing the notion of probability space.

# Cryptocurrency and Efficient Market Hypothesis. Drunken Risibility.

According to the traditional definition, a currency has three main properties: (i) it serves as a medium of exchange, (ii) it is used as a unit of account and (iii) it allows to store value. Along economic history, monies were related to political power. In the beginning, coins were minted in precious metals. Therefore, the value of a coin was intrinsically determined by the value of the metal itself. Later, money was printed in paper bank notes, but its value was linked somewhat to a quantity in gold, guarded in the vault of a central bank. Nation states have been using their political power to regulate the use of currencies and impose one currency (usually the one issued by the same nation state) as legal tender for obligations within their territory. In the twentieth century, a major change took place: abandoning gold standard. The detachment of the currencies (specially the US dollar) from the gold standard meant a recognition that the value of a currency (specially in a world of fractional banking) was not related to its content or representation in gold, but to a broader concept as the confidence in the economy in which such currency is based. In this moment, the value of a currency reflects the best judgment about the monetary policy and the “health” of its economy.

In recent years, a new type of currency, a synthetic one, emerged. We name this new type as “synthetic” because it is not the decision of a nation state, nor represents any underlying asset or tangible wealth source. It appears as a new tradable asset resulting from a private agreement and facilitated by the anonymity of internet. Among this synthetic currencies, Bitcoin (BTC) emerges as the most important one, with a market capitalization of a few hundred million short of \$80 billions. Bitcoin Price Chart from Bitstamp

There are other cryptocurrencies, based on blockchain technology, such as Litecoin (LTC), Ethereum (ETH), Ripple (XRP). The website https://coinmarketcap.com/currencies/ counts up to 641 of such monies. However, as we can observe in the figure below, Bitcoin represents 89% of the capitalization of the market of all cryptocurrencies. Cryptocurrencies. Share of market capitalization of each currency.

One open question today is if Bitcoin is in fact a, or may be considered as a, currency. Until now, we cannot observe that Bitcoin fulfills the main properties of a standard currency. It is barely (though increasingly so!) accepted as a medium of exchange (e.g. to buy some products online), it is not used as unit of account (there are no financial statements valued in Bitcoins), and we can hardly believe that, given the great swings in price, anyone can consider Bitcoin as a suitable option to store value. Given these characteristics, Bitcoin could fit as an ideal asset for speculative purposes. There is no underlying asset to relate its value to and there is an open platform to operate round the clock. Bitcoin returns, sampled every 5 hours.

Speculation has a long history and it seems inherent to capitalism. One common feature of speculative assets in history has been the difficulty in valuation. Tulipmania, the South Sea bubble, and more others, reflect on one side human greedy behavior, and on the other side, the difficulty to set an objective value to an asset. All speculative behaviors were reflected in a super-exponential growth of the time series.

Cryptocurrencies can be seen as the libertarian response to central bank failure to manage financial crises, as the one occurred in 2008. Also cryptocurrencies can bypass national restrictions to international transfers, probably at a cheaper cost. Bitcoin was created by a person or group of persons under the pseudonym Satoshi Nakamoto. The discussion of Bitcoin has several perspectives. The computer science perspective deals with the strengths and weaknesses of blockchain technology. In fact, according to R. Ali et. al., the introduction of a “distributed ledger” is the key innovation. Traditional means of payments (e.g. a credit card), rely on a central clearing house that validate operations, acting as “middleman” between buyer and seller. On contrary, the payment validation system of Bitcoin is decentralized. There is a growing army of miners, who put their computer power at disposal of the network, validating transactions by gathering together blocks, adding them to the ledger and forming a ’block chain’. This work is remunerated by giving the miners Bitcoins, what makes (until now) the validating costs cheaper than in a centralized system. The validation is made by solving some kind of algorithm. With the time solving the algorithm becomes harder, since the whole ledger must be validated. Consequently it takes more time to solve it. Contrary to traditional currencies, the total number of Bitcoins to be issued is beforehand fixed: 21 million. In fact, the issuance rate of Bitcoins is expected to diminish over time. According to Laursen and Kyed, validating the public ledger was initially rewarded with 50 Bitcoins, but the protocol foresaw halving this quantity every four years. At the current pace, the maximum number of Bitcoins will be reached in 2140. Taking into account the decentralized character, Bitcoin transactions seem secure. All transactions are recorded in several computer servers around the world. In order to commit fraud, a person should change and validate (simultaneously) several ledgers, which is almost impossible. Additional, ledgers are public, with encrypted identities of parties, making transactions “pseudonymous, not anonymous”. The legal perspective of Bitcoin is fuzzy. Bitcoin is not issued, nor endorsed by a nation state. It is not an illegal substance. As such, its transaction is not regulated.

In particular, given the nonexistence of saving accounts in Bitcoin, and consequently the absense of a Bitcoin interest rate, precludes the idea of studying the price behavior in relation with cash flows generated by Bitcoins. As a consequence, the underlying dynamics of the price signal, finds the Efficient Market Hypothesis as a theoretical framework. The Efficient Market Hypothesis (EMH) is the cornerstone of financial economics. One of the seminal works on the stochastic dynamics of speculative prices is due to L Bachelier, who in his doctoral thesis developed the first mathematical model concerning the behavior of stock prices. The systematic study of informational efficiency begun in the 1960s, when financial economics was born as a new area within economics. The classical definition due to Eugene Fama (Foundations of Finance_ Portfolio Decisions and Securities Prices 1976-06) says that a market is informationally efficient if it “fully reflects all available information”. Therefore, the key element in assessing efficiency is to determine the appropriate set of information that impels prices. Following Efficient Capital Markets, informational efficiency can be divided into three categories: (i) weak efficiency, if prices reflect the information contained in the past series of prices, (ii) semi-strong efficiency, if prices reflect all public information and (iii) strong efficiency, if prices reflect all public and private information. As a corollary of the EMH, one cannot accept the presence of long memory in financial time series, since its existence would allow a riskless profitable trading strategy. If markets are informationally efficient, arbitrage prevent the possibility of such strategies. If we consider the financial market as a dynamical structure, short term memory can exist (to some extent) without contradicting the EMH. In fact, the presence of some mispriced assets is the necessary stimulus for individuals to trade and reach an (almost) arbitrage free situation. However, the presence of long range memory is at odds with the EMH, because it would allow stable trading rules to beat the market.

The presence of long range dependence in financial time series generates a vivid debate. Whereas the presence of short term memory can stimulate investors to exploit small extra returns, making them disappear, long range correlations poses a challenge to the established financial model. As recognized by Ciaian et. al., Bitcoin price is not driven by macro-financial indicators. Consequently a detailed analysis of the underlying dynamics (Hurst exponent) becomes important to understand its emerging behavior. There are several methods (both parametric and non parametric) to calculate the Hurst exponent, which become a mandatory framework to tackle BTC trading.

# Extreme Value Theory Standard estimators of the dependence between assets are the correlation coefficient or the Spearman’s rank correlation for instance. However, as stressed by [Embrechts et al. ], these kind of dependence measures suffer from many deficiencies. Moreoever, their values are mostly controlled by relatively small moves of the asset prices around their mean. To cure this problem, it has been proposed to use the correlation coefficients conditioned on large movements of the assets. But [Boyer et al.] have emphasized that this approach suffers also from a severe systematic bias leading to spurious strategies: the conditional correlation in general evolves with time even when the true non-conditional correlation remains constant. In fact, [Malevergne and Sornette] have shown that any approach based on conditional dependence measures implies a spurious change of the intrinsic value of the dependence, measured for instance by copulas. Recall that the copula of several random variables is the (unique) function which completely embodies the dependence between these variables, irrespective of their marginal behavior (see [Nelsen] for a mathematical description of the notion of copula).

In view of these limitations of the standard statistical tools, it is natural to turn to extreme value theory. In the univariate case, extreme value theory is very useful and provides many tools for investigating the extreme tails of distributions of assets returns. These new developments rest on the existence of a few fundamental results on extremes, such as the Gnedenko-Pickands-Balkema-de Haan theorem which gives a general expression for the distribution of exceedence over a large threshold. In this framework, the study of large and extreme co-movements requires the multivariate extreme values theory, which unfortunately does not provide strong results. Indeed, in constrast with the univariate case, the class of limiting extreme-value distributions is too broad and cannot be used to constrain accurately the distribution of large co-movements.

In the spirit of the mean-variance portfolio or of utility theory which establish an investment decision on a unique risk measure, we use the coefficient of tail dependence, which, to our knowledge, was first introduced in the financial context by [Embrechts et al.]. The coefficient of tail dependence between assets Xi and Xj is a very natural and easy to understand measure of extreme co-movements. It is defined as the probability that the asset Xi incurs a large loss (or gain) assuming that the asset Xj also undergoes a large loss (or gain) at the same probability level, in the limit where this probability level explores the extreme tails of the distribution of returns of the two assets. Mathematically speaking, the coefficient of lower tail dependence between the two assets Xi and Xj , denoted by λ−ij is defined by

λ−ij = limu→0 Pr{Xi<Fi−1(u)|Xj < Fj−1(u)} —– (1)

where Fi−1(u) and Fj−1(u) represent the quantiles of assets Xand Xj at level u. Similarly the coefficient of the upper tail dependence is

λ+ij = limu→1 Pr{Xi > Fi−1(u)|Xj > Fj−1(u)} —– (2)

λ−ij and λ+ij are of concern to investors with long (respectively short) positions. We refer to [Coles et al.] and references therein for a survey of the properties of the coefficient of tail dependence. Let us stress that the use of quantiles in the definition of λ−ij and λ+ij makes them independent of the marginal distribution of the asset returns: as a consequence, the tail dependence parameters are intrinsic dependence measures. The obvious gain is an “orthogonal” decomposition of the risks into (1) individual risks carried by the marginal distribution of each asset and (2) their collective risk described by their dependence structure or copula.

Being a probability, the coefficient of tail dependence varies between 0 and 1. A large value of λ−ij means that large losses occur almost surely together. Then, large risks can not be diversified away and the assets crash together. This investor and portfolio manager nightmare is further amplified in real life situations by the limited liquidity of markets. When λ−ij vanishes, these assets are said to be asymptotically independent, but this term hides the subtlety that the assets can still present a non-zero dependence in their tails. For instance, two normally distributed assets can be shown to have a vanishing coefficient of tail dependence. Nevertheless, unless their correlation coefficient is identically zero, these assets are never independent. Thus, asymptotic independence must be understood as the weakest dependence which can be quantified by the coefficient of tail dependence.

For practical implementations, a direct application of the definitions (1) and (2) fails to provide reasonable estimations due to the double curse of dimensionality and undersampling of extreme values, so that a fully non-parametric approach is not reliable. It turns out to be possible to circumvent this fundamental difficulty by considering the general class of factor models, which are among the most widespread and versatile models in finance. They come in two classes: multiplicative and additive factor models respectively. The multiplicative factor models are generally used to model asset fluctuations due to an underlying stochastic volatility for a survey of the properties of these models). The additive factor models are made to relate asset fluctuations to market fluctuations, as in the Capital Asset Pricing Model (CAPM) and its generalizations, or to any set of common factors as in Arbitrage Pricing Theory. The coefficient of tail dependence is known in close form for both classes of factor models, which allows for an efficient empirical estimation.