# Probability Space Intertwines Random Walks – Thought of the Day 144.0

Many deliberations of stochasticity start with “let (Ω, F, P) be a probability space”. One can actually follow such discussions without having the slightest idea what Ω is and who lives inside. So, what is “Ω, F, P” and why do we need it? Indeed, for many users of probability and statistics, a random variable X is synonymous with its probability distribution μX and all computations such as sums, expectations, etc., done on random variables amount to analytical operations such as integrations, Fourier transforms, convolutions, etc., done on their distributions. For defining such operations, you do not need a probability space. Isn’t this all there is to it?

One can in fact compute quite a lot of things without using probability spaces in an essential way. However the notions of probability space and random variable are central in modern probability theory so it is important to understand why and when these concepts are relevant.

From a modelling perspective, the starting point is a set of observations taking values in some set E (think for instance of numerical measurement, E = R) for which we would like to build a stochastic model. We would like to represent such observations x1, . . . , xn as samples drawn from a random variable X defined on some probability space (Ω, F, P). It is important to see that the only natural ingredient here is the set E where the random variables will take their values: the set of events Ω is not given a priori and there are many different ways to construct a probability space (Ω, F, P) for modelling the same set of observations.

Sometimes it is natural to identify Ω with E, i.e., to identify the randomness ω with its observed effect. For example if we consider the outcome of a dice rolling experiment as an integer-valued random variable X, we can define the set of events to be precisely the set of possible outcomes: Ω = {1, 2, 3, 4, 5, 6}. In this case, X(ω) = ω: the outcome of the randomness is identified with the randomness itself. This choice of Ω is called the canonical space for the random variable X. In this case the random variable X is simply the identity map X(ω) = ω and the probability measure P is formally the same as the distribution of X. Note that here X is a one-to-one map: given the outcome of X one knows which scenario has happened so any other random variable Y is completely determined by the observation of X. Therefore using the canonical construction for the random variable X, we cannot define, on the same probability space, another random variable which is independent of X: X will be the sole source of randomness for all other variables in the model. This also show that, although the canonical construction is the simplest way to construct a probability space for representing a given random variable, it forces us to identify this particular random variable with the “source of randomness” in the model. Therefore when we want to deal with models with a sufficiently rich structure, we need to distinguish Ω – the set of scenarios of randomness – from E, the set of values of our random variables.

Let us give an example where it is natural to distinguish the source of randomness from the random variable itself. For instance, if one is modelling the market value of a stock at some date T in the future as a random variable S1, one may consider that the stock value is affected by many factors such as external news, market supply and demand, economic indicators, etc., summed up in some abstract variable ω, which may not even have a numerical representation: it corresponds to a scenario for the future evolution of the market. S1(ω) is then the stock value if the market scenario which occurs is given by ω. If the only interesting quantity in the model is the stock price then one can always label the scenario ω by the value of the stock price S1(ω), which amounts to identifying all scenarios where the stock S1 takes the same value and using the canonical construction. However if one considers a richer model where there are now other stocks S2, S3, . . . involved, it is more natural to distinguish the scenario ω from the random variables S1(ω), S2(ω),… whose values are observed in these scenarios but may not completely pin them down: knowing S1(ω), S2(ω),… one does not necessarily know which scenario has happened. In this way one reserves the possibility of adding more random variables later on without changing the probability space.

These have the following important consequence: the probabilistic description of a random variable X can be reduced to the knowledge of its distribution μX only in the case where the random variable X is the only source of randomness. In this case, a stochastic model can be built using a canonical construction for X. In all other cases – as soon as we are concerned with a second random variable which is not a deterministic function of X – the underlying probability measure P contains more information on X than just its distribution. In particular, it contains all the information about the dependence of the random variable X with respect to all other random variables in the model: specifying P means specifying the joint distributions of all random variables constructed on Ω. For instance, knowing the distributions μX, μY of two variables X, Y does not allow to compute their covariance or joint moments. Only in the case where all random variables involved are mutually independent can one reduce all computations to operations on their distributions. This is the case covered in most introductory texts on probability, which explains why one can go quite far, for example in the study of random walks, without formalizing the notion of probability space.

# Credit Bubbles. Thought of the Day 90.0

At the macro-economic level of the gross statistics of money and loan supply to the economy, the reserve banking system creates a complex interplay between money, debt, supply and demand for goods, and the general price level. Rather than being constant, as implied by theoretical descriptions, money and loan supplies are constantly changing at a rate dependent on the average loan period, and a complex of details buried in the implementation and regulation of any given banking system.

Since the majority of loans are made for years at a time, the results of these interactions play out over a long enough time scale that gross monetary features of regulatory failure, such as continuous asset price inflation, have come to be regarded as normal, e.g. ”House prices always go up”. The price level however is not only dependent on purely monetary factors, but also on the supply and demand for goods and services, including financial assets such as shares, which requires that estimates of the real price level versus production be used. As a simplification, if constant demand for goods and services is assumed as shown in the table below, then there are two possible causes of price inflation, either the money supply available to purchase the good in question has increased, or the supply of the good has been reduced. Critically, the former is simply a mathematical effect, whilst the latter is a useful signal, providing economic information on relative supply and demand levels that can be used locally by consumers and producers to adapt their behaviour. Purely arbitrary changes in both the money and the loan supply that are induced by the mechanical operation of the banking system fail to provide any economic benefit, and by distorting the actual supply and demand signal can be actively harmful.

Credit bubbles are often explained as a phenomena of irrational demand, and crowd behaviour. However, this explanation ignores the question of why they aren’t throttled by limits on the loan supply? An alternate explanation which can be offered is that their root cause is periodic failures in the regulation of the loan and money supply within the commercial banking system. The introduction of widespread securitized lending allows a rapid increase in the total amount of lending available from the banking system and an accompanying if somewhat smaller growth in the money supply. Channeled predominantly into property lending, the increased availability of money from lending sources, acted to increase house prices creating rational speculation on their increase, and over time a sizeable disruption in the market pricing mechanisms for all goods and services purchased through loans. Monetary statistics of this effect such as the Consumer Price Index (CPI) for example, are however at least partially masked by production deflation from the sizeable productivity increases over decades. Absent any limit on the total amount of credit being supplied, the only practical limit on borrowing is the availability of borrowers and their ability to sustain the capital and interest repayments demanded for their loans.

Owing to the asymmetric nature of long term debt flows there is a tendency for money to become concentrated in the lending centres, which then causes liquidity problems for the rest of the economy. Eventually repayment problems surface, especially if the practice of further borrowing to repay existing loans is allowed, since the underlying mathematical process is exponential. As general insolvency as well as a consequent debt deflation occurs, the money and loan supply contracts as the banking system removes loan capacity from the economy either from loan repayment, or as a result of bank failure. This leads to a domino effect as businesses that have become dependent on continuously rolling over debt fail and trigger further defaults. Monetary expansion and further lending is also constrained by the absence of qualified borrowers, and by the general unwillingness to either lend or borrow that results from the ensuing economic collapse. Further complications, as described by Ben Bernanke and Harold James, can occur when interactions between currencies are considered, in particular in conjunction with gold-based capital regulation, because of the difficulties in establishing the correct ratio of gold for each individual currency and maintaining it, in a system where lending and the associated money supply are continually fluctuating and gold is also being used at a national level for international debt repayments.

The debt to money imbalance created by the widespread, and global, sale of Asset Backed securities may be unique to this particular crisis. Although asset backed security issuance dropped considerably in 2008, as the resale markets were temporarily frozen, current stated policy in several countries, including the USA and the United Kingdom, is to encourage further securitization to assist the recovery of the banking sector. Unfortunately this appears to be succeeding.

# Fundamental Theorem of Asset Pricing: Tautological Meeting of Mathematical Martingale and Financial Arbitrage by the Measure of Probability.

The Fundamental Theorem of Asset Pricing (FTAP hereafter) has two broad tenets, viz.

1. A market admits no arbitrage, if and only if, the market has a martingale measure.

2. Every contingent claim can be hedged, if and only if, the martingale measure is unique.

The FTAP is a theorem of mathematics, and the use of the term ‘measure’ in its statement places the FTAP within the theory of probability formulated by Andrei Kolmogorov (Foundations of the Theory of Probability) in 1933. Kolmogorov’s work took place in a context captured by Bertrand Russell, who observed that

It is important to realise the fundamental position of probability in science. . . . As to what is meant by probability, opinions differ.

In the 1920s the idea of randomness, as distinct from a lack of information, was becoming substantive in the physical sciences because of the emergence of the Copenhagen Interpretation of quantum mechanics. In the social sciences, Frank Knight argued that uncertainty was the only source of profit and the concept was pervading John Maynard Keynes’ economics (Robert Skidelsky Keynes the return of the master).

Two mathematical theories of probability had become ascendant by the late 1920s. Richard von Mises (brother of the Austrian economist Ludwig) attempted to lay down the axioms of classical probability within a framework of Empiricism, the ‘frequentist’ or ‘objective’ approach. To counter–balance von Mises, the Italian actuary Bruno de Finetti presented a more Pragmatic approach, characterised by his claim that “Probability does not exist” because it was only an expression of the observer’s view of the world. This ‘subjectivist’ approach was closely related to the less well-known position taken by the Pragmatist Frank Ramsey who developed an argument against Keynes’ Realist interpretation of probability presented in the Treatise on Probability.

Kolmogorov addressed the trichotomy of mathematical probability by generalising so that Realist, Empiricist and Pragmatist probabilities were all examples of ‘measures’ satisfying certain axioms. In doing this, a random variable became a function while an expectation was an integral: probability became a branch of Analysis, not Statistics. Von Mises criticised Kolmogorov’s generalised framework as un-necessarily complex. About a decade and a half back, the physicist Edwin Jaynes (Probability Theory The Logic Of Science) champions Leonard Savage’s subjectivist Bayesianism as having a “deeper conceptual foundation which allows it to be extended to a wider class of applications, required by current problems of science”.

The objections to measure theoretic probability for empirical scientists can be accounted for as a lack of physicality. Frequentist probability is based on the act of counting; subjectivist probability is based on a flow of information, which, following Claude Shannon, is now an observable entity in Empirical science. Measure theoretic probability is based on abstract mathematical objects unrelated to sensible phenomena. However, the generality of Kolmogorov’s approach made it flexible enough to handle problems that emerged in physics and engineering during the Second World War and his approach became widely accepted after 1950 because it was practically more useful.

In the context of the first statement of the FTAP, a ‘martingale measure’ is a probability measure, usually labelled Q, such that the (real, rather than nominal) price of an asset today, X0, is the expectation, using the martingale measure, of its (real) price in the future, XT. Formally,

X0 = EQ XT

The abstract probability distribution Q is defined so that this equality exists, not on any empirical information of historical prices or subjective judgement of future prices. The only condition placed on the relationship that the martingale measure has with the ‘natural’, or ‘physical’, probability measures usually assigned the label P, is that they agree on what is possible.

The term ‘martingale’ in this context derives from doubling strategies in gambling and it was introduced into mathematics by Jean Ville in a development of von Mises’ work. The idea that asset prices have the martingale property was first proposed by Benoit Mandelbrot in response to an early formulation of Eugene Fama’s Efficient Market Hypothesis (EMH), the two concepts being combined by Fama. For Mandelbrot and Fama the key consequence of prices being martingales was that the current price was independent of the future price and technical analysis would not prove profitable in the long run. In developing the EMH there was no discussion on the nature of the probability under which assets are martingales, and it is often assumed that the expectation is calculated under the natural measure. While the FTAP employs modern terminology in the context of value-neutrality, the idea of equating a current price with a future, uncertain, has ethical ramifications.

The other technical term in the first statement of the FTAP, arbitrage, has long been used in financial mathematics. Liber Abaci Fibonacci (Laurence Sigler Fibonaccis Liber Abaci) discusses ‘Barter of Merchandise and Similar Things’, 20 arms of cloth are worth 3 Pisan pounds and 42 rolls of cotton are similarly worth 5 Pisan pounds; it is sought how many rolls of cotton will be had for 50 arms of cloth. In this case there are three commodities, arms of cloth, rolls of cotton and Pisan pounds, and Fibonacci solves the problem by having Pisan pounds ‘arbitrate’, or ‘mediate’ as Aristotle might say, between the other two commodities.

Within neo-classical economics, the Law of One Price was developed in a series of papers between 1954 and 1964 by Kenneth Arrow, Gérard Debreu and Lionel MacKenzie in the context of general equilibrium, in particular the introduction of the Arrow Security, which, employing the Law of One Price, could be used to price any asset. It was on this principle that Black and Scholes believed the value of the warrants could be deduced by employing a hedging portfolio, in introducing their work with the statement that “it should not be possible to make sure profits” they were invoking the arbitrage argument, which had an eight hundred year history. In the context of the FTAP, ‘an arbitrage’ has developed into the ability to formulate a trading strategy such that the probability, under a natural or martingale measure, of a loss is zero, but the probability of a positive profit is not.

To understand the connection between the financial concept of arbitrage and the mathematical idea of a martingale measure, consider the most basic case of a single asset whose current price, X0, can take on one of two (present) values, XTD < XTU, at time T > 0, in the future. In this case an arbitrage would exist if X0 ≤ XTD < XTU: buying the asset now, at a price that is less than or equal to the future pay-offs, would lead to a possible profit at the end of the period, with the guarantee of no loss. Similarly, if XTD < XTU ≤ X0, short selling the asset now, and buying it back would also lead to an arbitrage. So, for there to be no arbitrage opportunities we require that

XTD < X0 < XTU

This implies that there is a number, 0 < q < 1, such that

X0 = XTD + q(XTU − XTD)

= qXTU + (1−q)XTD

The price now, X0, lies between the future prices, XTU and XTD, in the ratio q : (1 − q) and represents some sort of ‘average’. The first statement of the FTAP can be interpreted simply as “the price of an asset must lie between its maximum and minimum possible (real) future price”.

If X0 < XTD ≤ XTU we have that q < 0 whereas if XTD ≤ XTU < X0 then q > 1, and in both cases q does not represent a probability measure which by Kolmogorov’s axioms, must lie between 0 and 1. In either of these cases an arbitrage exists and a trader can make a riskless profit, the market involves ‘turpe lucrum’. This account gives an insight as to why James Bernoulli, in his moral approach to probability, considered situations where probabilities did not sum to 1, he was considering problems that were pathological not because they failed the rules of arithmetic but because they were unfair. It follows that if there are no arbitrage opportunities then quantity q can be seen as representing the ‘probability’ that the XTU price will materialise in the future. Formally

X0 = qXTU + (1−q) XTD ≡ EQ XT

The connection between the financial concept of arbitrage and the mathematical object of a martingale is essentially a tautology: both statements mean that the price today of an asset must lie between its future minimum and maximum possible value. This first statement of the FTAP was anticipated by Frank Ramsey when he defined ‘probability’ in the Pragmatic sense of ‘a degree of belief’ and argues that measuring ‘degrees of belief’ is through betting odds. On this basis he formulates some axioms of probability, including that a probability must lie between 0 and 1. He then goes on to say that

These are the laws of probability, …If anyone’s mental condition violated these laws, his choice would depend on the precise form in which the options were offered him, which would be absurd. He could have a book made against him by a cunning better and would then stand to lose in any event.

This is a Pragmatic argument that identifies the absence of the martingale measure with the existence of arbitrage and today this forms the basis of the standard argument as to why arbitrages do not exist: if they did the, other market participants would bankrupt the agent who was mis-pricing the asset. This has become known in philosophy as the ‘Dutch Book’ argument and as a consequence of the fact/value dichotomy this is often presented as a ‘matter of fact’. However, ignoring the fact/value dichotomy, the Dutch book argument is an alternative of the ‘Golden Rule’– “Do to others as you would have them do to you.”– it is infused with the moral concepts of fairness and reciprocity (Jeffrey Wattles The Golden Rule).

FTAP is the ethical concept of Justice, capturing the social norms of reciprocity and fairness. This is significant in the context of Granovetter’s discussion of embeddedness in economics. It is conventional to assume that mainstream economic theory is ‘undersocialised’: agents are rational calculators seeking to maximise an objective function. The argument presented here is that a central theorem in contemporary economics, the FTAP, is deeply embedded in social norms, despite being presented as an undersocialised mathematical object. This embeddedness is a consequence of the origins of mathematical probability being in the ethical analysis of commercial contracts: the feudal shackles are still binding this most modern of economic theories.

Ramsey goes on to make an important point

Having any definite degree of belief implies a certain measure of consistency, namely willingness to bet on a given proposition at the same odds for any stake, the stakes being measured in terms of ultimate values. Having degrees of belief obeying the laws of probability implies a further measure of consistency, namely such a consistency between the odds acceptable on different propositions as shall prevent a book being made against you.

Ramsey is arguing that an agent needs to employ the same measure in pricing all assets in a market, and this is the key result in contemporary derivative pricing. Having identified the martingale measure on the basis of a ‘primal’ asset, it is then applied across the market, in particular to derivatives on the primal asset but the well-known result that if two assets offer different ‘market prices of risk’, an arbitrage exists. This explains why the market-price of risk appears in the Radon-Nikodym derivative and the Capital Market Line, it enforces Ramsey’s consistency in pricing. The second statement of the FTAP is concerned with incomplete markets, which appear in relation to Arrow-Debreu prices. In mathematics, in the special case that there are as many, or more, assets in a market as there are possible future, uncertain, states, a unique pricing vector can be deduced for the market because of Cramer’s Rule. If the elements of the pricing vector satisfy the axioms of probability, specifically each element is positive and they all sum to one, then the market precludes arbitrage opportunities. This is the case covered by the first statement of the FTAP. In the more realistic situation that there are more possible future states than assets, the market can still be arbitrage free but the pricing vector, the martingale measure, might not be unique. The agent can still be consistent in selecting which particular martingale measure they choose to use, but another agent might choose a different measure, such that the two do not agree on a price. In the context of the Law of One Price, this means that we cannot hedge, replicate or cover, a position in the market, such that the portfolio is riskless. The significance of the second statement of the FTAP is that it tells us that in the sensible world of imperfect knowledge and transaction costs, a model within the framework of the FTAP cannot give a precise price. When faced with incompleteness in markets, agents need alternative ways to price assets and behavioural techniques have come to dominate financial theory. This feature was realised in The Port Royal Logic when it recognised the role of transaction costs in lotteries.

# Yield Curve Dynamics or Fluctuating Multi-Factor Rate Curves

The actual dynamics (as opposed to the risk-neutral dynamics) of the forward rate curve cannot be reduced to that of the short rate: the statistical evidence points out to the necessity of taking into account more degrees of freedom in order to represent in an adequate fashion the complicated deformations of the term structure. In particular, the imperfect correlation between maturities and the rich variety of term structure deformations shows that a one factor model is too rigid to describe yield curve dynamics.

Furthermore, in practice the value of the short rate is either fixed or at least strongly influenced by an authority exterior to the market (the central banks), through a mechanism different in nature from that which determines rates of higher maturities which are negotiated on the market. The short rate can therefore be viewed as an exogenous stochastic input which then gives rise to a deformation of the term structure as the market adjusts to its variations.

Traditional term structure models define – implicitly or explicitly – the random motion of an infinite number of forward rates as diffusions driven by a finite number of independent Brownian motions. This choice may appear surprising, since it introduces a lot of constraints on the type of evolution one can ascribe to each point of the forward rate curve and greatly reduces the dimensionality i.e. the number of degrees of freedom of the model, such that the resulting model is not able to reproduce any more the complex dynamics of the term structure. Multifactor models are usually justified by refering to the results of principal component analysis of term structure fluctuations. However, one should note that the quantities of interest when dealing with the term structure of interest rates are not the first two moments of the forward rates but typically involve expectations of non-linear functions of the forward rate curve: caps and floors are typical examples from this point of view. Hence, although a multifactor model might explain the variance of the forward rate itself, the same model may not be able to explain correctly the variability of portfolio positions involving non-linear combinations of the same forward rates. In other words, a principal component whose associated eigenvalue is small may have a non-negligible effect on the fluctuations of a non-linear function of forward rates. This question is especially relevant when calculating quantiles and Value-at-Risk measures.

In a multifactor model with k sources of randomness, one can use any k + 1 instruments to hedge a given risky payoff. However, this is not what traders do in real markets: a given interest-rate contingent payoff is hedged with bonds of the same maturity. These practices reflect the existence of a risk specific to instruments of a given maturity. The representation of a maturity-specific risk means that, in a continuous-maturity limit, one must also allow the number of sources of randomness to grow with the number of maturities; otherwise one loses the localization in maturity of the source of randomness in the model.

An important ingredient for the tractability of a model is its Markovian character. Non-Markov processes are difficult to simulate and even harder to manipulate analytically. Of course, any process can be transformed into a Markov process if it is imbedded into a space of sufficiently high dimension; this amounts to injecting a sufficient number of “state variables” into the model. These state variables may or may not be observable quantities; for example one such state variable may be the short rate itself but another one could be an economic variable whose value is not deducible from knowledge of the forward rate curve. If the state variables are not directly observed, they are obtainable in principle from the observed interest rates by a filtering process. Nevertheless the presence of unobserved state variables makes the model more difficult to handle both in terms of interpretation and statistical estimation. This drawback has motivated the development of so-called affine curve models models where one imposes that the state variables be affine functions of the observed yield curve. While the affine hypothesis is not necessarily realistic from an empirical point of view, it has the property of directly relating state variables to the observed term structure.

Another feature of term structure movements is that, as a curve, the forward rate curve displays a continuous deformation: configurations of the forward rate curve at dates not too far from each other tend to be similar. Most applications require the yield curve to have some degree of smoothness e.g. differentiability with respect to the maturity. This is not only a purely mathematical requirement but is reflected in market practices of hedging and arbitrage on fixed income instruments. Market practitioners tend to hedge an interest rate risk of a given maturity with instruments of the same maturity or close to it. This important observation means that the maturity is not simply a way of indexing the family of forward rates: market operators expect forward rates whose maturities are close to behave similarly. Moreover, the model should account for the observation that the volatility term structure displays a hump but that multiple humps are never observed.

# What’s a Market Password Anyway? Towards Defining a Financial Market Random Sequence. Note Quote.

From the point of view of cryptanalysis, the algorithmic view based on frequency analysis may be taken as a hacker approach to the financial market. While the goal is clearly to find a sort of password unveiling the rules governing the price changes, what we claim is that the password may not be immune to a frequency analysis attack, because it is not the result of a true random process but rather the consequence of the application of a set of (mostly simple) rules. Yet that doesn’t mean one can crack the market once and for all, since for our system to find the said password it would have to outperform the unfolding processes affecting the market – which, as Wolfram’s PCE suggests, would require at least the same computational sophistication as the market itself, with at least one variable modelling the information being assimilated into prices by the market at any given moment. In other words, the market password is partially safe not because of the complexity of the password itself but because it reacts to the cracking method.

Whichever kind of financial instrument one looks at, the sequences of prices at successive times show some overall trends and varying amounts of apparent randomness. However, despite the fact that there is no contingent necessity of true randomness behind the market, it can certainly look that way to anyone ignoring the generative processes, anyone unable to see what other, non-random signals may be driving market movements.

Von Mises’ approach to the definition of a random sequence, which seemed at the time of its formulation to be quite problematic, contained some of the basics of the modern approach adopted by Per Martin-Löf. It is during this time that the Keynesian kind of induction may have been resorted to as a starting point for Solomonoff’s seminal work (1 and 2) on algorithmic probability.

Per Martin-Löf gave the first suitable definition of a random sequence. Intuitively, an algorithmically random sequence (or random sequence) is an infinite sequence of binary digits that appears random to any algorithm. This contrasts with the idea of randomness in probability. In that theory, no particular element of a sample space can be said to be random. Martin-Löf randomness has since been shown to admit several equivalent characterisations in terms of compression, statistical tests, and gambling strategies.

The predictive aim of economics is actually profoundly related to the concept of predicting and betting. Imagine a random walk that goes up, down, left or right by one, with each step having the same probability. If the expected time at which the walk ends is finite, predicting that the expected stop position is equal to the initial position, it is called a martingale. This is because the chances of going up, down, left or right, are the same, so that one ends up close to one’s starting position,if not exactly at that position. In economics, this can be translated into a trader’s experience. The conditional expected assets of a trader are equal to his present assets if a sequence of events is truly random.

If market price differences accumulated in a normal distribution, a rounding would produce sequences of 0 differences only. The mean and the standard deviation of the market distribution are used to create a normal distribution, which is then subtracted from the market distribution.

Schnorr provided another equivalent definition in terms of martingales. The martingale characterisation of randomness says that no betting strategy implementable by any computer (even in the weak sense of constructive strategies, which are not necessarily computable) can make money betting on a random sequence. In a true random memoryless market, no betting strategy can improve the expected winnings, nor can any option cover the risks in the long term.

Over the last few decades, several systems have shifted towards ever greater levels of complexity and information density. The result has been a shift towards Paretian outcomes, particularly within any event that contains a high percentage of informational content, i.e. when one plots the frequency rank of words contained in a large corpus of text data versus the number of occurrences or actual frequencies, Zipf showed that one obtains a power-law distribution

Departures from normality could be accounted for by the algorithmic component acting in the market, as is consonant with some empirical observations and common assumptions in economics, such as rule-based markets and agents. The paper.

# Random Uniform Deviate, or Correlation Dimension

A widely used dimension algorithm in data analysis is the correlation dimension. Fix m, a positive integer, and r, a positive real number. Given a time-series of data u(1), u(2), …, u(N),from measurements equally spaced in time, form a sequence of vectors x(1), x(2),…, x(N- m + 1) in R’, defined by x(i) = [u(i), u(i+ 1),…,u(i+ m – 1)]. Next, define for each i, 1 ≤ i ≤ N – m + 1,

Cmi (r)= (number of j such that d[x(i), x(j)] ≤ r)/(N-m+1) ———- [1]

We must define d[x(i), x(j)] for vectors x(i) and x(j). We define

d[x(i), x(j)]= maxk = 1,2,…,m (|u(i+k-1) – u(j+k-1)j) ———- [2]

From the Cmi (r), define

Cm(r) = (N- m + i)-1 ∑(N – m + 1)i = 1 Cmi (r) ———- [3]

and define

βm = limr → 0 limn → ∞ log Cm(r)/log r ———- [4]

The assertion is that for m sufficiently large, βmis the correlation dimension. Such a limiting slope has been shown to exist for the commonly studied chaotic attractors. This procedure has frequently been applied to experimental data; investigators seek a “scaling range”of r values for which log Cm(r)/log r is nearly constant for large m, and they infer that this ratio is the correlation dimension. In some instances, investigators have concluded that this procedure establishes deterministic chaos.

The later conclusion is not necessarily correct: a converged, finite correlation dimension value does not guarantee that the defining process is deterministic. Consider the following stochastic process. Fix 0 ≤ p ≤1. Define Xj = α-l/2 sin(2πj/12) ∀ j,where α is specified below. Define Yj as a family of independent identicaly distributed (i.i.d.) real random variables, with uniform density on the interval [-√3, √3]. Define Zj = 1 with probability p, Zj = 0 with probability 1 – p.

α = (∑j = 112 sin2(2πj/12)/12 ———- [5]

and define MI Xj = (1- Zj) Xj + ZjYj. Intuitively, MI X(p) is generated by first ascertaining, for each j, whether the jth sample will be from the deterministic sine wave or from the random uniform deviate, with likelihood (1- p) of the former choice, then calculating either Xj or Yj. Increasing p marks a tendency towards greater system randomness. We now show that almost surely βmin [4] equals 0 ∀ m for the MI X(p) process, p ≠ 1. Fix m, define Kj = (12m)j- 12m, and define Nj = 1 if (MI Xk(j)+l,…, k(j)+m) = (X1,. . ., Xm), Nj = 0 otherwise. The Nj are i.i.d.random variables, with the expected value of Nj, E(Nj) ≥ (1- p)m. By the Strong Law of Large Numbers,

limN → ∞ ∑j = 1N Nj/N = E(N1) ≥ (1-p)m

Observe that (∑j = 1N Nj/12 mN)2 is a lower bound to Cm(r), since xk(i)+1,…., xk(j)+1 if Ni = Nj = 1. Thus for r ‹ 1

limN → ∞ sup log Cm(r)/log r ≤ (1/log r) limN → ∞ (∑j = 1N Nj/12 mN)2 ≤ log (1-p)2m/(12m)2/log r

Since, (1-p)2m/(12m)2 is independent of r, βm = limr → 0 limN → ∞ log Cm(r)/log r = 0. Since, βm ≠ 0 with probability 0 for each m, by countable additivity, ∀m, β= 0.

The MIX(p) process can be motivated by considering an autonomous unit that produces sinusoidal output, surrounded by a world of interacting processes that in ensemble produces output that resembles noise relative to the timing of the unit. The extent to which the surrounding world interacts with the unit could be controlled by a gateway between the two, with a larger gateway admitting greater apparent noise to compete with the sinusoidal signal. It is easy to show that, given a sequence Xj, a sequence of k = 1, 2,…, m i.i.d.Yj, defined by a density function and independent of the Xj, and Z= X+ Yj, then Zj has an infinite correlation dimension. It appears that correlation dimension distinguishes between correlated and uncorrelated successive iterates, with larger estimates of dimension corresponding to uncorrelated data. For a more complete interpretation of correlation dimension results, stochastic processes with correlated increments should be analyzed. Error estimates in dimension calculations are commonly seen. In statistics, one presumes a specified underlying stochastic distribution to estimate misclassification probabilities. Without knowing the form of a distribution, or if the system is deterministic or stochastic, one must be suspicious of error estimates. There often appears to be a desire to establish a noninteger dimension value, to give a fractal and chaotic interpretation to the result, but again, prior to a thorough study of the relationship between the geometric Hausdorff dimension and the time series formula labeled correlation dimension, it is speculation to draw conclusions from a noninteger correlation dimension value.

# Is Indian GDP data turning a little too Chinese? Why to be Askance @ India’s Growth Figures?

My take on the statistics:
Well, this is a simple tweaking of the equations that differentiate the growth curve. In short, we have all been a part of exams where 9/10 is different from 99/100, even if just one number distances the actual score from the maximum one could score. On similar lines, the crimes of growth are factored in on growth year/base year. This is mathematical jugglery narrowed in on political ends. Whichever way one looks at the data, some of the indicators are still found lagging the composite growth, thereby dumbing down the economists when the growth curve mandates a pattern recognition.
GDP, when calculated at Factor Cost is related with GDP at Market Price, and written as an equation of the form,
GDP (FC) = GDP (MP) – indirect takes + subsidies