Cadlag Stochasticities: Lévy Processes. Part 1.

Untitled

A compound Poisson process with a Gaussian distribution of jump sizes, and a jump diffusion of a Lévy process with Gaussian component and finite jump intensity.

A cadlag stochastic process (Xt)t≥0 on (Ω,F,P) with values in Rd such that X0 = 0 is called a Lévy process if it possesses the following properties:

1. Independent increments: for every increasing sequence of times t0 . . . tn, the random variables Xt0, Xt1 − Xt0 , . . . , Xtn − Xtn−1 are independent.

2. Stationary increments: the law of Xt+h − Xt does not depend on t.

3. Stochastic continuity: ∀ε > 0, limh→0 P(|Xt+h − Xt| ≥ ε) = 0.

A sample function x on a well-ordered set T is cadlag if it is continuous from the right and limited from the left at every point. That is, for every t0 ∈ T, t ↓ t0 implies x(t) → x(t0), and for t ↑ t0, limt↑t0 x(t)exists, but need not be x(t0). A stochastic process X is cadlag if almost all its sample paths are cadlag.

The third condition does not imply in any way that the sample paths are continuous, and is verified by the Poisson process. It serves to exclude processes with jumps at fixed (nonrandom) times, which can be regarded as “calendar effects” and means that for given time t, the probability of seeing a jump at t is zero: discontinuities occur at random times.

If we sample a Lévy process at regular time intervals 0, ∆, 2∆, . . ., we obtain a random walk: defining Sn(∆) ≡ Xn∆, we can write Sn(∆) = ∑k=0n−1 Yk where Yk = X(k+1)∆ − Xk∆ are independent and identically dependent random variables whose distribution is the same as the distribution of X. Since this can be done for any sampling interval ∆ we see that by specifying a Lévy process one can specify a whole family of random walks Sn(∆).

Choosing n∆ = t, we see that for any t > 0 and any n ≥ 1, Xt = Sn(∆) can be represented as a sum of n independent and identically distributed random variables whose distribution is that of Xt/n: Xt can be “divided” into n independent and identically distributed parts. A distribution having this property is said to be infinitely divisible.

A probability distribution F on Rd is said to be infinitely divisible if for any integer n ≥ 2, ∃ n independent and identically distributed random variables Y1, …Yn such that Y1 + … + Yn has distribution F.

Since the distribution of independent and identically distributed sums is given by convolution of the distribution of the summands, denoting by μ the distribution of Yk-s, F = μ ∗ μ ∗ ··· ∗ μ is the nth convolution of μ. So an infinitely divisible distribution can also be defined as a distribution F for which the nth convolution root is still a probability distribution, for any n ≥ 2.

fig1

Thus, if X is a Lévy process, for any t > 0 the distribution of Xt is infinitely divisible. This puts a constraint on the possible choices of distributions for Xt: whereas the increments of a discrete-time random walk can have arbitrary distribution, the distribution of increments of a Lévy process has to be infinitely divisible.

The most common examples of infinitely divisible laws are: the Gaussian distribution, the gamma distribution, α-stable distributions and the Poisson distribution: a random variable having any of these distributions can be decomposed into a sum of n independent and identically distributed parts having the same distribution but with modified parameters. Conversely, given an infinitely divisible distribution F, it is easy to see that for any n ≥ 1 by chopping it into n independent and identically distributed components we can construct a random walk model on a time grid with step size 1/n such that the law of the position at t = 1 is given by F. In the limit, this procedure can be used to construct a continuous time Lévy process (Xt)t≥0 such that the law of X1 if given by F. Let (Xt)t≥0 be a Lévy process. Then for every t, Xt has an infinitely divisible distribution. Conversely, if F is an infinitely divisible distribution then ∃ a Lévy process (Xt) such that the distribution of X1 is given by F.

Probability Space Intertwines Random Walks – Thought of the Day 144.0

unbounded

agByQMany deliberations of stochasticity start with “let (Ω, F, P) be a probability space”. One can actually follow such discussions without having the slightest idea what Ω is and who lives inside. So, what is “Ω, F, P” and why do we need it? Indeed, for many users of probability and statistics, a random variable X is synonymous with its probability distribution μX and all computations such as sums, expectations, etc., done on random variables amount to analytical operations such as integrations, Fourier transforms, convolutions, etc., done on their distributions. For defining such operations, you do not need a probability space. Isn’t this all there is to it?

One can in fact compute quite a lot of things without using probability spaces in an essential way. However the notions of probability space and random variable are central in modern probability theory so it is important to understand why and when these concepts are relevant.

From a modelling perspective, the starting point is a set of observations taking values in some set E (think for instance of numerical measurement, E = R) for which we would like to build a stochastic model. We would like to represent such observations x1, . . . , xn as samples drawn from a random variable X defined on some probability space (Ω, F, P). It is important to see that the only natural ingredient here is the set E where the random variables will take their values: the set of events Ω is not given a priori and there are many different ways to construct a probability space (Ω, F, P) for modelling the same set of observations.

Sometimes it is natural to identify Ω with E, i.e., to identify the randomness ω with its observed effect. For example if we consider the outcome of a dice rolling experiment as an integer-valued random variable X, we can define the set of events to be precisely the set of possible outcomes: Ω = {1, 2, 3, 4, 5, 6}. In this case, X(ω) = ω: the outcome of the randomness is identified with the randomness itself. This choice of Ω is called the canonical space for the random variable X. In this case the random variable X is simply the identity map X(ω) = ω and the probability measure P is formally the same as the distribution of X. Note that here X is a one-to-one map: given the outcome of X one knows which scenario has happened so any other random variable Y is completely determined by the observation of X. Therefore using the canonical construction for the random variable X, we cannot define, on the same probability space, another random variable which is independent of X: X will be the sole source of randomness for all other variables in the model. This also show that, although the canonical construction is the simplest way to construct a probability space for representing a given random variable, it forces us to identify this particular random variable with the “source of randomness” in the model. Therefore when we want to deal with models with a sufficiently rich structure, we need to distinguish Ω – the set of scenarios of randomness – from E, the set of values of our random variables.

Let us give an example where it is natural to distinguish the source of randomness from the random variable itself. For instance, if one is modelling the market value of a stock at some date T in the future as a random variable S1, one may consider that the stock value is affected by many factors such as external news, market supply and demand, economic indicators, etc., summed up in some abstract variable ω, which may not even have a numerical representation: it corresponds to a scenario for the future evolution of the market. S1(ω) is then the stock value if the market scenario which occurs is given by ω. If the only interesting quantity in the model is the stock price then one can always label the scenario ω by the value of the stock price S1(ω), which amounts to identifying all scenarios where the stock S1 takes the same value and using the canonical construction. However if one considers a richer model where there are now other stocks S2, S3, . . . involved, it is more natural to distinguish the scenario ω from the random variables S1(ω), S2(ω),… whose values are observed in these scenarios but may not completely pin them down: knowing S1(ω), S2(ω),… one does not necessarily know which scenario has happened. In this way one reserves the possibility of adding more random variables later on without changing the probability space.

These have the following important consequence: the probabilistic description of a random variable X can be reduced to the knowledge of its distribution μX only in the case where the random variable X is the only source of randomness. In this case, a stochastic model can be built using a canonical construction for X. In all other cases – as soon as we are concerned with a second random variable which is not a deterministic function of X – the underlying probability measure P contains more information on X than just its distribution. In particular, it contains all the information about the dependence of the random variable X with respect to all other random variables in the model: specifying P means specifying the joint distributions of all random variables constructed on Ω. For instance, knowing the distributions μX, μY of two variables X, Y does not allow to compute their covariance or joint moments. Only in the case where all random variables involved are mutually independent can one reduce all computations to operations on their distributions. This is the case covered in most introductory texts on probability, which explains why one can go quite far, for example in the study of random walks, without formalizing the notion of probability space.

Gauge Theory of Arbitrage, or Financial Markets Resembling Screening in Electrodynamics

Arbitrage-image

When a mispricing appears in a market, market speculators and arbitrageurs rectify the mistake by obtaining a profit from it. In the case of profitable fluctuations they move into profitable assets, leaving comparably less profitable ones. This affects prices in such a way that all assets of similar risk become equally attractive, i.e. the speculators restore the equilibrium. If this process occurs infinitely rapidly, then the market corrects the mispricing instantly and current prices fully reflect all relevant information. In this case one says that the market is efficient. However, clearly it is an idealization and does not hold for small enough times.

The general picture, sketched above, of the restoration of equilibrium in financial markets resembles screening in electrodynamics. Indeed, in the case of electrodynamics, negative charges move into the region of the positive electric field, positive charges get out of the region and thus screen the field. Comparing this with the financial market we can say that a local virtual arbitrage opportunity with a positive excess return plays a role of the positive electric field, speculators in the long position behave as negative charges, whilst the speculators in the short position behave as positive ones. Movements of positive and negative charges screen out a profitable fluctuation and restore the equilibrium so that there is no arbitrage opportunity any more, i.e. the speculators have eliminated the arbitrage opportunity.

The analogy is apparently superficial, but it is not. The analogy emerges naturally in the framework of the Gauge Theory of Arbitrage (GTA). The theory treats a calculation of net present values and asset buying and selling as a parallel transport of money in some curved space, and interpret the interest rate, exchange rates and prices of asset as proper connection components. This structure is exactly equivalent to the geometrical structure underlying the electrodynamics where the components of the vector-potential are connection components responsible for the parallel transport of the charges. The components of the corresponding curvature tensors are the electromagnetic field in the case of electrodynamics and the excess rate of return in case of GTA. The presence of uncertainty is equivalent to the introduction of noise in the electrodynamics, i.e. quantization of the theory. It allows one to map the theory of the capital market onto the theory of quantized gauge field interacting with matter (money flow) fields. The gauge transformations of the matter field correspond to a change of the par value of the asset units which effect is eliminated by a gauge tuning of the prices and rates. Free quantum gauge field dynamics (in the absence of money flows) is described by a geometrical random walk for the assets prices with the log-normal probability distribution. In general case the consideration maps the capital market onto Quantum Electrodynamics where the price walks are affected by money flows.

Electrodynamical model of quasi-efficient financial market

Single Asset Optimal Investment Fraction

Protecting-your-nest-egg_investment-outcomes

We first consider a situation, when an investor can spend a fraction of his capital to buy shares of just one risky asset. The rest of his money he keeps in cash.

Generalizing Kelly, we consider the following simple strategy of the investor: he regularly checks the asset’s current price p(t), and sells or buys some asset shares in order to keep the current market value of his asset holdings a pre-selected fraction r of his total capital. These readjustments are made periodically at a fixed interval, which we refer to as readjustment interval, and select it as the discrete unit of time. In this work the readjustment time interval is selected once and for all, and we do not attempt optimization of its length.

We also assume that on the time-scale of this readjustment interval the asset price p(t) undergoes a geometric Brownian motion:

p(t + 1) = eη(t)p(t) —– (1)

i.e. at each time step the random number η(t) is drawn from some probability distribution π(η), and is independent of it’s value at previous time steps. This exponential notation is particularly convenient for working with multiplicative noise, keeping the necessary algebra at minimum. Under these rules of dynamics the logarithm of the asset’s price, ln p(t), performs a random walk with an average drift v = ⟨η⟩ and a dispersion D = ⟨η2⟩ − ⟨η⟩2.

It is easy to derive the time evolution of the total capital W(t) of an investor, following the above strategy:

W(t + 1) = (1 − r)W(t) + rW(t)eη(t) —– (2)

Let us assume that the value of the investor’s capital at t = 0 is W(0) = 1. The evolution of the expectation value of the expectation value of the total capital ⟨W (t)⟩ after t time steps is obviously given by the recursion ⟨W (t + 1)⟩ = (1 − r + r⟨eη⟩)⟨W (t)⟩. When ⟨eη⟩ > 1, at first thought the investor should invest all his money in the risky asset. Then the expectation value of his capital would enjoy an exponential growth with the fastest growth rate. However, it would be totally unreasonable to expect that in a typical realization of price fluctuations, the investor would be able to attain the average growth rate determined as vavg = d⟨W(t)⟩/dt. This is because the main contribution to the expectation value ⟨W(t)⟩ comes from exponentially unlikely outcomes, when the price of the asset after a long series of favorable events with η > ⟨η⟩ becomes exponentially big. Such outcomes lie well beyond reasonable fluctuations of W (t), determined by the standard deviation √Dt of ln W (t) around its average value ⟨ln W (t)⟩ = ⟨η⟩t. For the investor who deals with just one realization of the multiplicative process it is better not to rely on such unlikely events, and maximize his gain in a typical outcome of a process. To quantify the intuitively clear concept of a typical value of a random variable x, we define xtyp as a median of its distribution, i.e xtyp has the property that Prob(x > xtyp) = Prob(x < xtyp) = 1/2. In a multiplicative process (2) with r = 1, W (t + 1) = eη(t)W (t), one can show that Wtyp(t) – the typical value of W(t) – grows exponentially in time: Wtyp(t) = e⟨η⟩t at a rate vtyp = ⟨η⟩, while the expectation value ⟨W(t)⟩ also grows exponentially as ⟨W(t)⟩ = ⟨eη⟩t, but at a faster rate given by vavg = ln⟨eη⟩. Notice that ⟨lnW(t)⟩ always grows with the typical growth rate, since those very rare outcomes when W (t) is exponentially big, do not make significant contribution to this average.

The question we are going to address is: which investment fraction r provides the investor with the best typical growth rate vtyp of his capital. Kelly has answered this question for a particular realization of multiplicative stochastic process, where the capital is multiplied by 2 with probability q > 1/2, and by 0 with probability p = 1 − q. This case is realized in a gambling game, where betting on the right outcome pays 2:1, while you know the right outcome with probability q > 1/2. In our notation this case corresponds to η being equal to ln 2 with probability q and −∞ otherwise. The player’s capital in Kelly’s model with r = 1 enjoys the growth of expectation value ⟨W(t)⟩ at a rate vavg = ln2q > 0. In this case it is however particularly clear that one should not use maximization of the expectation value of the capital as the optimum criterion. If the player indeed bets all of his capital at every time step, sooner or later he will loose everything and would not be able to continue to play. In other words, r = 1 corresponds to the worst typical growth of the capital: asymptotically the player will be bankrupt with probability 1. In this example it is also very transparent, where the positive average growth rate comes from: after T rounds of the game, in a very unlikely (Prob = qT) event that the capital was multiplied by 2 at all times (the gambler guessed right all the time!), the capital is equal to 2T. This exponentially large value of the capital outweighs exponentially small probability of this event, and gives rise to an exponentially growing average. This would offer condolence to a gambler who lost everything.

We generalize Kelly’s arguments for arbitrary distribution π(η). As we will see this generalization reveals some hidden results, not realized in Kelly’s “betting” game. As we learned above, the growth of the typical value of W(t), is given by the drift of ⟨lnW(t)⟩ = vtypt, which in our case can be written as

vtyp(r) = ∫ dη π(η) ln(1 + r(eη − 1)) —– (3)

One can check that vtyp(0) = 0, since in this case the whole capital is in the form of cash and does not change in time. In another limit one has vtyp(1) = ⟨η⟩, since in this case the whole capital is invested in the asset and enjoys it’s typical growth rate (⟨η⟩ = −∞ for Kelly’s case). Can one do better by selecting 0 < r < 1? To find the maximum of vtyp(r) one differentiates (3) with respect to r and looks for a solution of the resulting equation: 0 = v’typ(r) = ∫ dη π(η) (eη −1)/(1+r(eη −1)) in the interval 0 ≤ r ≤ 1. If such a solution exists, it is unique since v′′typ(r) = − ∫ dη π(η) (eη − 1)2 / (1 + r(eη − 1))2 < 0 everywhere. The values of the v’typ(r) at 0 and 1 are given by v’typ(0) = ⟨eη⟩ − 1, and v’typ(1) = 1−⟨e−η⟩. One has to consider three possibilities:

(1) ⟨eη⟩ is realized at r = 0 and is equal to 0. In other words, one should never invest in an asset with negative average return per capital ⟨eη⟩ − 1 < 0.

(2) ⟨eη⟩ > 1 , and ⟨e−η⟩ > 1. In this case v’typ(0) > 0, but v’typ(1) < 0 and the maximum of v(r) is realized at some 0 < r < 1, which is a unique solution to v’typ(r) = 0. The typical growth rate in this case is always positive (because you could have always selected r = 0 to make it zero), but not as big as the average rate ln⟨eη⟩, which serves as an unattainable ideal limit. An intuitive understanding of why one should select r < 1 in this case comes from the following observation: the condition ⟨e−η⟩ > 1 makes ⟨1/p(t)⟩ to grow exponentially in time. Such an exponential growth indicates that the outcomes with very small p(t) are feasible and give dominant contribution to ⟨1/p(t)⟩. This is an indicator that the asset price is unstable and one should not trust his whole capital to such a risky investment.

(3) ⟨eη⟩ > 1 , and ⟨e−η⟩ < 1. This is a safe asset and one can invest his whole capital in it. The maximum vtyp(r) is achieved at r = 1 and is equal to vtyp(1) = ln⟨η⟩. A simple example of this type of asset is one in which the price p(t) with equal probabilities is multiplied by 2 or by a = 2/3. As one can see this is a marginal case in which ⟨1/p(t)⟩ = const. For a < 2/3 one should invest only a fraction r < 1 of his capital in the asset, while for a ≥ 2/3 the whole sum could be trusted to it. The specialty of the case with a = 2/3 cannot not be guessed by just looking at the typical and average growth rates of the asset! One has to go and calculate ⟨e−η⟩ to check if ⟨1/p(t)⟩ diverges. This “reliable” type of asset is a new feature of the model with a general π(η). It is never realized in Kelly’s original model, which always has ⟨η⟩ = −∞, so that it never makes sense to gamble the whole capital every time.

An interesting and somewhat counterintuitive consequence of the above results is that under certain conditions one can make his capital grow by investing in asset with a negative typical growth rate ⟨η⟩ < 0. Such asset certainly loses value, and its typical price experiences an exponential decay. Any investor bold enough to trust his whole capital in such an asset is losing money with the same rate. But as long as the fluctuations are strong enough to maintain a positive average return per capital ⟨eη⟩ − 1 > 0) one can maintain a certain fraction of his total capital invested in this asset and almost certainly make money! A simple example of such mind-boggling situation is given by a random multiplicative process in which the price of the asset with equal probabilities is doubled (goes up by 100%) or divided by 3 (goes down by 66.7%). The typical price of this asset drifts down by 18% each time step. Indeed, after T time steps one could reasonably expect the price of this asset to be ptyp(T) = 2T/2 3−T/2 = (√2/3)T ≃ 0.82T. On the other hand, the average ⟨p(t)⟩ enjoys a 17% growth ⟨p(t + 1)⟩ = 7/6 ⟨p(t)⟩ ≃ 1.17⟨W (t)⟩. As one can easily see, the optimum of the typical growth rate is achieved by maintaining a fraction r = 1/4 of the capital invested in this asset. The typical rate in this case is a meager √(25/24) ≃ 1.02, meaning that in a long run one almost certainly gets a 2% return per time step, but it is certainly better than losing 18% by investing the whole capital in this asset.

Of course the properties of a typical realization of a random multiplicative process are not fully characterized by the drift vtyp(r)t in the position of the center of mass of P(h,t), where h(t) = lnW(t) is a logarithm of the wealth of the investor. Indeed, asymptotically P (h, t) has a Gaussian shape P (h, t) =1/ (√2π D(r)t) (exp(−(h−vtyp(r)t)2)/(2D(r)t), where vtyp(r) is given by eq. (3). One needs to know the dispersion D(r) to estimate √D(r)t, which is the magnitude of characteristic deviations of h(t) away from its typical value htyp(t) = vtypt. At the infinite time horizon t → ∞, the process with the biggest vtyp(r) will certainly be preferable over any other process. This is because the separation between typical values of h(t) for two different investment fractions r grows linearly in time, while the span of typical fluctuations grows only as a √t. However, at a finite time horizon the investor should take into account both vtyp(r) and D(r) and decide what he prefers: moderate growth with small fluctuations or faster growth with still bigger fluctuations. To quantify this decision one needs to introduce an investor’s “utility function” which we will not attempt in this work. The most conservative players are advised to always keep their capital in cash, since with any other arrangement the fluctuations will certainly be bigger. As a rule one can show that the dispersion D(r) = ∫ π(η) ln2[1 + r(eη − 1)]dη − v2typ monotonically increases with r. Therefore, among two solutions with equal vtyp(r) one should always select the one with a smaller r, since it would guarantee smaller fluctuations. Here it is more convenient to switch to the standard notation. It is customary to use the random variable

Λ(t)= (p(t+1)−p(t))/p(t) = eη(t) −1 —– (4)

which is referred to as return per unit capital of the asset. The properties of a random multiplicative process are expressed in terms of the average return per capital α = ⟨Λ⟩ = ⟨eη⟩ − 1, and the volatility (standard deviation) of the return per capital σ = √(⟨Λ2⟩ – ⟨Λ⟩2. In our notation, α = ⟨eη⟩ – 1, is determined by the average and not typical growth rate of the process. For η ≪ 1 , α ≃ v + D/2 + v2/2, while the volatility σ is related to D ( the dispersion of η) through σ ≃ √D.

 

1 + 2 + 3 + … = -1/12. ✓✓✓

The Bernoulli numbers B_n are a sequence of signed rational numbers that can be defined by the exponential generating function

numberedequation1

These numbers arise in the series expansions of trigonometric functions, and are extremely important in number theory and analysis.

The Bernoulli number  can be defined by the contour integral

numberedequation2

where the contour encloses the origin, has radius less than 2pi (to avoid the poles at +/-2pii), and is traversed in a counterclockwise direction.

bernoullinumberdigits_1000

The numbers of digits in the numerator of B_n for the n=2, 4, … are 1, 1, 1, 1, 1, 3, 1, 4, 5, 6, 6, 9, 7, 11, … , while the numbers of digits in the corresponding denominators are 1, 2, 2, 2, 2, 4, 1, 3, 3, 3, 3, 4, 1, 3, 5, 3, …. Both of these are plotted above.

The denominator of B_(2n) is given by

numberedequation4

where the product is taken over the primes p, a result which is related to the von Staudt-Clausen theorem.

In 1859 Riemann published a paper giving an explicit formula for the number of primes up to any preassigned limit—a decided improvement over the approximate value given by the prime number theorem. However, Riemann’s formula depended on knowing the values at which a generalized version of the zeta function equals zero. (The Riemann zeta function is defined for all complex numbers—numbers of the form x + iy, where i = (−1), except for the line x = 1.) Riemann knew that the function equals zero for all negative even integers −2, −4, −6, … (so-called trivial zeros), and that it has an infinite number of zeros in the critical strip of complex numbers between the lines x = 0 and x = 1, and he also knew that all nontrivial zeros are symmetric with respect to the critical line x = 1/2. Riemann conjectured that all of the nontrivial zeros are on the critical line, a conjecture that subsequently became known as the Riemann hypothesis. In 1900 the German mathematician David Hilbert called the Riemann hypothesis one of the most important questions in all of mathematics, as indicated by its inclusion in his influential list of 23 unsolved problems with which he challenged 20th-century mathematicians. In 1915 the English mathematician Godfrey Hardy proved that an infinite number of zeros occur on the critical line, and by 1986 the first 1,500,000,001 nontrivial zeros were all shown to be on the critical line. Although the hypothesis may yet turn out to be false, investigations of this difficult problem have enriched the understanding of complex numbers.

Suppose you want to put a probability distribution on the natural numbers for the purpose of doing number theory. What properties might you want such a distribution to have? Well, if you’re doing number theory then you want to think of the prime numbers as acting “independently”: knowing that a number is divisible by p should give you no information about whether it’s divisible by q

That quickly leads you to the following realization: you should choose the exponent of each prime in the prime factorization independently. So how should you choose these? It turns out that the probability distribution on the non-negative integers with maximum entropy and a given mean is a geometric distribution. So let’s take the probability that the exponent of p is k to be equal to (1−rp)rpk  for some constant rp

This gives the probability that a positive integer n = p1e1…pkek occurs as

C ∏ki=1 rpei

, where

C = ∏p(1-rp)

So we need to choose rp such that this product converges. Now, we’d like the probability that n occurs to be monotonically decreasing as a function of n. It turns out that this is true iff r= p−s for some s > 1 (since C has to converge), which gives the probability that n occurs as

1/ns/ζ(s), 

ζ(s) is the zeta function.

The Riemann-Zeta function is a complex function that tells us many things about the theory of numbers. Its mystery is increased by the fact it has no closed form – i.e. it can’t be expressed a single formula that contains other standard (elementary) functions.

riemannzetareimabs

riemannzetaridges_700

The plot above shows the “ridges” of  inline4 for  0 < x < 1 and 0 < y < 100. The fact that the ridges appear to decrease monotonically for 0 ≤ x ≤ 1/2 is not a coincidence since it turns out that monotonic decrease implies the Riemann hypothesis. 

On the real line with x>1, the Riemann-Zeta function can be defined by the integral

numberedequation1-1

where Gamma(x) is the gamma function. If x is an integer n, then we have the identity,

inline14

=  inline17

=  inline20

so,

numberedequation2-1

The Riemann zeta function can also be defined in the complex plane by the contour integral

numberedequation16

inline78 , where the contour is illustrated below

riemannzetafunctiongamma_1000

Zeros of inline79 come in (at least) two different types. So-called “trivial zeros” occur at all negative even integers , , , …, and “nontrivial zeros” at certain

numberedequation17

for s in the “critical strip0<sigma<1. The Riemann hypothesis asserts that the nontrivial Riemann zeta function zeros of zeta(s) all have real part sigma=R[s]=1/2, a line called the “critical line.” This is now known to be true for the first 250×10^9 roots.

riemannzetacriticalstrip_700

The plot above shows the real and imaginary parts of zeta(1/2+iy) (i.e., values of zeta(z) along the critical line) as y is varied from 0 to 35.

Now consider this John Cook’s take…

S_p(n) = \sum_{k=1}^n k^p

where p is a positive integer. Here looking at what happens when p becomes a negative integer and we let n go to infinity.

If p < -1, then the limit as n goes to infinity of Sp(n) is ζ(-p). That is, for s > 1, the Riemann-Zeta function ζ(s) is defined by

\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}

We don’t have to limit ourselves to real numbers s > 1; the definition holds for complex numbers s with real part greater than 1. That’ll be important below.

When s is a positive even number, there’s a formula for ζ(s) in terms of the Bernoulli numbers:

\zeta(2n) = (-1)^{n-1} 2^{2n-1} \pi^{2n} \frac{B_{2n}}{(2n)!}

The best-known special case of this formula is that

1 + 1/4 + 1/9 + 1/16 + … = π2 / 6.

It’s a famous open problem to find a closed-form expression for ζ(3) or any other odd argument.

The formula relating the zeta function and Bernoulli tells us a couple things about the Bernoulli numbers. First, for n ≥ 1 the Bernoulli numbers with index 2n alternate sign. Second, by looking at the sum defining ζ(2n) we can see that it is approximately 1 for large n. This tells us that for large n, |B2n| is approximately (2n)! / 22n-1 π2n.

We said above that the sum defining the Riemann zeta function is valid for complex numbers s with real part greater than 1. There is a unique analytic extension of the zeta function to the rest of the complex plane, except at s = 1. The zeta function is defined, for example, at negative integers, but the sum defining zeta in the half-plane Re(s) > 1 is not valid.

One must have seen the equation

1 + 2 + 3 + … = -1/12.

This is an abuse of notation. The sum on the left clearly diverges to infinity. But if the sum defining ζ(s) for Re(s) > 1 were valid for s = -1 (which it is not) then the left side would equal ζ(-1). The analytic continuation of ζ is valid at -1, and in fact ζ(-1) = -1/12. So the equation above is true if you interpret the left side, not as an ordinary sum, but as a way of writing ζ(-1). The same approach could be used to make sense of similar equations such as

12 + 22 + 32 + … = 0

and

13 + 23 + 33 + … = 1/120.