Knowledge Limited for Dummies….Didactics.

header_Pipes

Bertrand Russell with Alfred North Whitehead, in the Principia Mathematica aimed to demonstrate that “all pure mathematics follows from purely logical premises and uses only concepts defined in logical terms.” Its goal was to provide a formalized logic for all mathematics, to develop the full structure of mathematics where every premise could be proved from a clear set of initial axioms.

Russell observed of the dense and demanding work, “I used to know of only six people who had read the later parts of the book. Three of those were Poles, subsequently (I believe) liquidated by Hitler. The other three were Texans, subsequently successfully assimilated.” The complex mathematical symbols of the manuscript required it to be written by hand, and its sheer size – when it was finally ready for the publisher, Russell had to hire a panel truck to send it off – made it impossible to copy. Russell recounted that “every time that I went out for a walk I used to be afraid that the house would catch fire and the manuscript get burnt up.”

Momentous though it was, the greatest achievement of Principia Mathematica was realized two decades after its completion when it provided the fodder for the metamathematical enterprises of an Austrian, Kurt Gödel. Although Gödel did face the risk of being liquidated by Hitler (therefore fleeing to the Institute of Advanced Studies at Princeton), he was neither a Pole nor a Texan. In 1931, he wrote a treatise entitled On Formally Undecidable Propositions of Principia Mathematica and Related Systems, which demonstrated that the goal Russell and Whitehead had so single-mindedly pursued was unattainable.

The flavor of Gödel’s basic argument can be captured in the contradictions contained in a schoolboy’s brainteaser. A sheet of paper has the words “The statement on the other side of this paper is true” written on one side and “The statement on the other side of this paper is false” on the reverse. The conflict isn’t resolvable. Or, even more trivially, a statement like; “This statement is unprovable.” You cannot prove the statement is true, because doing so would contradict it. If you prove the statement is false, then that means its converse is true – it is provable – which again is a contradiction.

The key point of contradiction for these two examples is that they are self-referential. This same sort of self-referentiality is the keystone of Gödel’s proof, where he uses statements that imbed other statements within them. This problem did not totally escape Russell and Whitehead. By the end of 1901, Russell had completed the first round of writing Principia Mathematica and thought he was in the homestretch, but was increasingly beset by these sorts of apparently simple-minded contradictions falling in the path of his goal. He wrote that “it seemed unworthy of a grown man to spend his time on such trivialities, but . . . trivial or not, the matter was a challenge.” Attempts to address the challenge extended the development of Principia Mathematica by nearly a decade.

Yet Russell and Whitehead had, after all that effort, missed the central point. Like granite outcroppings piercing through a bed of moss, these apparently trivial contradictions were rooted in the core of mathematics and logic, and were only the most readily manifest examples of a limit to our ability to structure formal mathematical systems. Just four years before Gödel had defined the limits of our ability to conquer the intellectual world of mathematics and logic with the publication of his Undecidability Theorem, the German physicist Werner Heisenberg’s celebrated Uncertainty Principle had delineated the limits of inquiry into the physical world, thereby undoing the efforts of another celebrated intellect, the great mathematician Pierre-Simon Laplace. In the early 1800s Laplace had worked extensively to demonstrate the purely mechanical and predictable nature of planetary motion. He later extended this theory to the interaction of molecules. In the Laplacean view, molecules are just as subject to the laws of physical mechanics as the planets are. In theory, if we knew the position and velocity of each molecule, we could trace its path as it interacted with other molecules, and trace the course of the physical universe at the most fundamental level. Laplace envisioned a world of ever more precise prediction, where the laws of physical mechanics would be able to forecast nature in increasing detail and ever further into the future, a world where “the phenomena of nature can be reduced in the last analysis to actions at a distance between molecule and molecule.”

What Gödel did to the work of Russell and Whitehead, Heisenberg did to Laplace’s concept of causality. The Uncertainty Principle, though broadly applied and draped in metaphysical context, is a well-defined and elegantly simple statement of physical reality – namely, the combined accuracy of a measurement of an electron’s location and its momentum cannot vary far from a fixed value. The reason for this, viewed from the standpoint of classical physics, is that accurately measuring the position of an electron requires illuminating the electron with light of a very short wavelength. But the shorter the wavelength the greater the amount of energy that hits the electron, and the greater the energy hitting the electron the greater the impact on its velocity.

What is true in the subatomic sphere ends up being true – though with rapidly diminishing significance – for the macroscopic. Nothing can be measured with complete precision as to both location and velocity because the act of measuring alters the physical properties. The idea that if we know the present we can calculate the future was proven invalid – not because of a shortcoming in our knowledge of mechanics, but because the premise that we can perfectly know the present was proven wrong. These limits to measurement imply limits to prediction. After all, if we cannot know even the present with complete certainty, we cannot unfailingly predict the future. It was with this in mind that Heisenberg, ecstatic about his yet-to-be-published paper, exclaimed, “I think I have refuted the law of causality.”

The epistemological extrapolation of Heisenberg’s work was that the root of the problem was man – or, more precisely, man’s examination of nature, which inevitably impacts the natural phenomena under examination so that the phenomena cannot be objectively understood. Heisenberg’s principle was not something that was inherent in nature; it came from man’s examination of nature, from man becoming part of the experiment. (So in a way the Uncertainty Principle, like Gödel’s Undecidability Proposition, rested on self-referentiality.) While it did not directly refute Einstein’s assertion against the statistical nature of the predictions of quantum mechanics that “God does not play dice with the universe,” it did show that if there were a law of causality in nature, no one but God would ever be able to apply it. The implications of Heisenberg’s Uncertainty Principle were recognized immediately, and it became a simple metaphor reaching beyond quantum mechanics to the broader world.

This metaphor extends neatly into the world of financial markets. In the purely mechanistic universe of classical physics, we could apply Newtonian laws to project the future course of nature, if only we knew the location and velocity of every particle. In the world of finance, the elementary particles are the financial assets. In a purely mechanistic financial world, if we knew the position each investor has in each asset and the ability and willingness of liquidity providers to take on those assets in the event of a forced liquidation, we would be able to understand the market’s vulnerability. We would have an early-warning system for crises. We would know which firms are subject to a liquidity cycle, and which events might trigger that cycle. We would know which markets are being overrun by speculative traders, and thereby anticipate tactical correlations and shifts in the financial habitat. The randomness of nature and economic cycles might remain beyond our grasp, but the primary cause of market crisis, and the part of market crisis that is of our own making, would be firmly in hand.

The first step toward the Laplacean goal of complete knowledge is the advocacy by certain financial market regulators to increase the transparency of positions. Politically, that would be a difficult sell – as would any kind of increase in regulatory control. Practically, it wouldn’t work. Just as the atomic world turned out to be more complex than Laplace conceived, the financial world may be similarly complex and not reducible to a simple causality. The problems with position disclosure are many. Some financial instruments are complex and difficult to price, so it is impossible to measure precisely the risk exposure. Similarly, in hedge positions a slight error in the transmission of one part, or asynchronous pricing of the various legs of the strategy, will grossly misstate the total exposure. Indeed, the problems and inaccuracies in using position information to assess risk are exemplified by the fact that major investment banking firms choose to use summary statistics rather than position-by-position analysis for their firmwide risk management despite having enormous resources and computational power at their disposal.

Perhaps more importantly, position transparency also has implications for the efficient functioning of the financial markets beyond the practical problems involved in its implementation. The problems in the examination of elementary particles in the financial world are the same as in the physical world: Beyond the inherent randomness and complexity of the systems, there are simply limits to what we can know. To say that we do not know something is as much a challenge as it is a statement of the state of our knowledge. If we do not know something, that presumes that either it is not worth knowing or it is something that will be studied and eventually revealed. It is the hubris of man that all things are discoverable. But for all the progress that has been made, perhaps even more exciting than the rolling back of the boundaries of our knowledge is the identification of realms that can never be explored. A sign in Einstein’s Princeton office read, “Not everything that counts can be counted, and not everything that can be counted counts.”

The behavioral analogue to the Uncertainty Principle is obvious. There are many psychological inhibitions that lead people to behave differently when they are observed than when they are not. For traders it is a simple matter of dollars and cents that will lead them to behave differently when their trades are open to scrutiny. Beneficial though it may be for the liquidity demander and the investor, for the liquidity supplier trans- parency is bad. The liquidity supplier does not intend to hold the position for a long time, like the typical liquidity demander might. Like a market maker, the liquidity supplier will come back to the market to sell off the position – ideally when there is another investor who needs liquidity on the other side of the market. If other traders know the liquidity supplier’s positions, they will logically infer that there is a good likelihood these positions shortly will be put into the market. The other traders will be loath to be the first ones on the other side of these trades, or will demand more of a price concession if they do trade, knowing the overhang that remains in the market.

This means that increased transparency will reduce the amount of liquidity provided for any given change in prices. This is by no means a hypothetical argument. Frequently, even in the most liquid markets, broker-dealer market makers (liquidity providers) use brokers to enter their market bids rather than entering the market directly in order to preserve their anonymity.

The more information we extract to divine the behavior of traders and the resulting implications for the markets, the more the traders will alter their behavior. The paradox is that to understand and anticipate market crises, we must know positions, but knowing and acting on positions will itself generate a feedback into the market. This feedback often will reduce liquidity, making our observations less valuable and possibly contributing to a market crisis. Or, in rare instances, the observer/feedback loop could be manipulated to amass fortunes.

One might argue that the physical limits of knowledge asserted by Heisenberg’s Uncertainty Principle are critical for subatomic physics, but perhaps they are really just a curiosity for those dwelling in the macroscopic realm of the financial markets. We cannot measure an electron precisely, but certainly we still can “kind of know” the present, and if so, then we should be able to “pretty much” predict the future. Causality might be approximate, but if we can get it right to within a few wavelengths of light, that still ought to do the trick. The mathematical system may be demonstrably incomplete, and the world might not be pinned down on the fringes, but for all practical purposes the world can be known.

Unfortunately, while “almost” might work for horseshoes and hand grenades, 30 years after Gödel and Heisenberg yet a third limitation of our knowledge was in the wings, a limitation that would close the door on any attempt to block out the implications of microscopic uncertainty on predictability in our macroscopic world. Based on observations made by Edward Lorenz in the early 1960s and popularized by the so-called butterfly effect – the fanciful notion that the beating wings of a butterfly could change the predictions of an otherwise perfect weather forecasting system – this limitation arises because in some important cases immeasurably small errors can compound over time to limit prediction in the larger scale. Half a century after the limits of measurement and thus of physical knowledge were demonstrated by Heisenberg in the world of quantum mechanics, Lorenz piled on a result that showed how microscopic errors could propagate to have a stultifying impact in nonlinear dynamic systems. This limitation could come into the forefront only with the dawning of the computer age, because it is manifested in the subtle errors of computational accuracy.

The essence of the butterfly effect is that small perturbations can have large repercussions in massive, random forces such as weather. Edward Lorenz was testing and tweaking a model of weather dynamics on a rudimentary vacuum-tube computer. The program was based on a small system of simultaneous equations, but seemed to provide an inkling into the variability of weather patterns. At one point in his work, Lorenz decided to examine in more detail one of the solutions he had generated. To save time, rather than starting the run over from the beginning, he picked some intermediate conditions that had been printed out by the computer and used those as the new starting point. The values he typed in were the same as the values held in the original simulation at that point, so the results the simulation generated from that point forward should have been the same as in the original; after all, the computer was doing exactly the same operations. What he found was that as the simulated weather pattern progressed, the results of the new run diverged, first very slightly and then more and more markedly, from those of the first run. After a point, the new path followed a course that appeared totally unrelated to the original one, even though they had started at the same place.

Lorenz at first thought there was a computer glitch, but as he investigated further, he discovered the basis of a limit to knowledge that rivaled that of Heisenberg and Gödel. The problem was that the numbers he had used to restart the simulation had been reentered based on his printout from the earlier run, and the printout rounded the values to three decimal places while the computer carried the values to six decimal places. This rounding, clearly insignificant at first, promulgated a slight error in the next-round results, and this error grew with each new iteration of the program as it moved the simulation of the weather forward in time. The error doubled every four simulated days, so that after a few months the solutions were going their own separate ways. The slightest of changes in the initial conditions had traced out a wholly different pattern of weather.

Intrigued by his chance observation, Lorenz wrote an article entitled “Deterministic Nonperiodic Flow,” which stated that “nonperiodic solutions are ordinarily unstable with respect to small modifications, so that slightly differing initial states can evolve into considerably different states.” Translation: Long-range weather forecasting is worthless. For his application in the narrow scientific discipline of weather prediction, this meant that no matter how precise the starting measurements of weather conditions, there was a limit after which the residual imprecision would lead to unpredictable results, so that “long-range forecasting of specific weather conditions would be impossible.” And since this occurred in a very simple laboratory model of weather dynamics, it could only be worse in the more complex equations that would be needed to properly reflect the weather. Lorenz discovered the principle that would emerge over time into the field of chaos theory, where a deterministic system generated with simple nonlinear dynamics unravels into an unrepeated and apparently random path.

The simplicity of the dynamic system Lorenz had used suggests a far-reaching result: Because we cannot measure without some error (harking back to Heisenberg), for many dynamic systems our forecast errors will grow to the point that even an approximation will be out of our hands. We can run a purely mechanistic system that is designed with well-defined and apparently well-behaved equations, and it will move over time in ways that cannot be predicted and, indeed, that appear to be random. This gets us to Santa Fe.

The principal conceptual thread running through the Santa Fe research asks how apparently simple systems, like that discovered by Lorenz, can produce rich and complex results. Its method of analysis in some respects runs in the opposite direction of the usual path of scientific inquiry. Rather than taking the complexity of the world and distilling simplifying truths from it, the Santa Fe Institute builds a virtual world governed by simple equations that when unleashed explode into results that generate unexpected levels of complexity.

In economics and finance, institute’s agenda was to create artificial markets with traders and investors who followed simple and reasonable rules of behavior and to see what would happen. Some of the traders built into the model were trend followers, others bought or sold based on the difference between the market price and perceived value, and yet others traded at random times in response to liquidity needs. The simulations then printed out the paths of prices for the various market instruments. Qualitatively, these paths displayed all the richness and variation we observe in actual markets, replete with occasional bubbles and crashes. The exercises did not produce positive results for predicting or explaining market behavior, but they did illustrate that it is not hard to create a market that looks on the surface an awful lot like a real one, and to do so with actors who are following very simple rules. The mantra is that simple systems can give rise to complex, even unpredictable dynamics, an interesting converse to the point that much of the complexity of our world can – with suitable assumptions – be made to appear simple, summarized with concise physical laws and equations.

The systems explored by Lorenz were deterministic. They were governed definitively and exclusively by a set of equations where the value in every period could be unambiguously and precisely determined based on the values of the previous period. And the systems were not very complex. By contrast, whatever the set of equations are that might be divined to govern the financial world, they are not simple and, furthermore, they are not deterministic. There are random shocks from political and economic events and from the shifting preferences and attitudes of the actors. If we cannot hope to know the course of the deterministic systems like fluid mechanics, then no level of detail will allow us to forecast the long-term course of the financial world, buffeted as it is by the vagaries of the economy and the whims of psychology.

Advertisements

Econophysics: Financial White Noise Switch. Thought of the Day 115.0

circle24

What is the cause of large market fluctuation? Some economists blame irrationality behind the fat-tail distribution. Some economists observed that social psychology might create market fad and panic, which can be modeled by collective behavior in statistical mechanics. For example, the bi-modular distribution was discovered from empirical data in option prices. One possible mechanism of polarized behavior is collective action studied in physics and social psychology. Sudden regime switch or phase transition may occur between uni-modular and bi-modular distribution when field parameter changes across some threshold. The Ising model in equilibrium statistical mechanics was borrowed to study social psychology. Its phase transition from uni-modular to bi-modular distribution describes statistical features when a stable society turns into a divided society. The problem of the Ising model is that its key parameter, the social temperature, has no operational definition in social system. A better alternative parameter is the intensity of social interaction in collective action.

A difficult issue in business cycle theory is how to explain the recurrent feature of business cycles that is widely observed from macro and financial indexes. The problem is: business cycles are not strictly periodic and not truly random. Their correlations are not short like random walk and have multiple frequencies that changing over time. Therefore, all kinds of math models are tried in business cycle theory, including deterministic, stochastic, linear and nonlinear models. We outline economic models in terms of their base function, including white noise with short correlations, persistent cycles with long correlations, and color chaos model with erratic amplitude and narrow frequency band like biological clock.

 

Untitled

The steady state of probability distribution function in the Ising Model of Collective Behavior with h = 0 (without central propaganda field). a. Uni-modular distribution with low social stress (k = 0). Moderate stable behavior with weak interaction and high social temperature. b. Marginal distribution at the phase transition with medium social stress (k = 2). Behavioral phase transition occurs between stable and unstable society induced by collective behavior. c. Bi-modular distribution with high social stress (k = 2.5). The society splits into two opposing groups under low social temperature and strong social interactions in unstable society. 

Deterministic models are used by Keynesian economists for endogenous mechanism of business cycles, such as the case of the accelerator-multiplier model. The stochastic models are used by the Frisch model of noise-driven cycles that attributes external shocks as the driving force of business fluctuations. Since 1980s, the discovery of economic chaos and the application of statistical mechanics provide more advanced models for describing business cycles. Graphically,

Untitled

The steady state of probability distribution function in socio-psychological model of collective choice. Here, “a” is the independent parameter; “b” is the interaction parameter. a Centered distribution with b < a (denoted by short dashed curve). It happens when independent decision rooted in individualistic orientation overcomes social pressure through mutual communication. b Horizontal flat distribution with b = a (denoted by long dashed line). Marginal case when individualistic orientation balances the social pressure. c Polarized distribution with b > a (denoted by solid line). It occurs when social pressure through mutual communication is stronger than independent judgment. 

Untitled

Numerical 1 autocorrelations from time series generated by random noise and harmonic wave. The solid line is white noise. The broken line is a sine wave with period P = 1. 

Linear harmonic cycles with unique frequency are introduced in business cycle theory. The auto-correlations from harmonic cycle and white noise are shown in the above figure. Auto-correlation function from harmonic cycles is a cosine wave. The amplitude of cosine wave is slightly decayed because of limited data points in numerical experiment. Auto-correlations from a random series are an erratic series with rapid decade from one to residual fluctuations in numerical calculation. The auto-regressive (AR) model in discrete time is a combination of white noise term for simulating short-term auto-correlations from empirical data.

The deterministic model of chaos can be classified into white chaos and color chaos. White chaos is generated by nonlinear difference equation in discrete-time, such as one-dimensional logistic map and two-dimensional Henon map. Its autocorrelations and power spectra look like white noise. Its correlation dimension can be less than one. White noise model is simple in mathematical analysis but rarely used in empirical analysis, since it needs intrinsic time unit.

Color chaos is generated by nonlinear differential equations in continuous-time, such as three-dimensional Lorenz model and one-dimensional model with delay-differential model in biology and economics. Its autocorrelations looks like a decayed cosine wave, and its power spectra seem a combination of harmonic cycles and white noise. The correlation dimension is between one and two for 3D differential equations, and varying for delay-differential equation.

Untitled

History shows the remarkable resilience of a market that experienced a series of wars and crises. The related issue is why the economy can recover from severe damage and out of equilibrium? Mathematically speaking, we may exam the regime stability under parameter change. One major weakness of the linear oscillator model is that the regime of periodic cycle is fragile or marginally stable under changing parameter. Only nonlinear oscillator model is capable of generating resilient cycles within a finite area under changing parameters. The typical example of linear models is the Samuelson model of multiplier-accelerator. Linear stochastic models have similar problem like linear deterministic models. For example, the so-called unit root solution occurs only at the borderline of the unit root. If a small parameter change leads to cross the unit circle, the stochastic solution will fall into damped (inside the unit circle) or explosive (outside the unit circle) solution.

Infinite Sequences and Halting Problem. Thought of the Day 76.0

580376_f96f_2

In attempting to extend the notion of depth from finite strings to infinite sequences, one encounters a familiar phenomenon: the definitions become sharper (e.g. recursively invariant), but their intuitive meaning is less clear, because of distinctions (e.g. between infintely-often and almost-everywhere properties) that do not exist in the finite case.

An infinite sequence X is called strongly deep if at every significance level s, and for every recursive function f, all but finitely many initial segments Xn have depth exceeding f(n).

It is necessary to require the initial segments to be deep almost everywhere rather than infinitely often, because even the most trivial sequence has infinitely many deep initial segments Xn (viz. the segments whose lengths n are deep numbers).

It is not difficult to show that the property of strong depth is invariant under truth-table equivalence (this is the same as Turing equivalence in recursively bounded time, or via a total recursive operator), and that the same notion would result if the initial segments were required to be deep in the sense of receiving less than 2−s of their algorithmic probability from f(n)-fast programs. The characteristic sequence of the halting set K is an example of a strongly deep sequence.

A weaker definition of depth, also invariant under truth-table equivalence, is perhaps more analogous to that adopted for finite strings:

An infinite sequence X is weakly deep if it is not computable in recursively bounded time from any algorithmically random infinite sequence.

Computability in recursively bounded time is equivalent to two other properties, viz. truth-table reducibility and reducibility via a total recursive operator.

By contrast to the situation with truth-table reducibility, Péter Gacs has shown that every sequence is computable from (i.e. Turing reducible to) an algorithmically random sequence if no bound is imposed on the time. This is the infinite analog of far more obvious fact that every finite string is computable from an algorithmically random string (e.g. its minimal program).

Every strongly deep sequence is weakly deep, but by intermittently padding K with large blocks of zeros, one can construct a weakly deep sequence with infinitely many shallow initial segments.

Truth table reducibility to an algorithmically random sequence is equivalent to the property studied by Levin et. al. of being random with respect to some recursive measure. Levin calls sequences with this property “proper” or “complete” sequences, and views them as more realistic and interesting than other sequences because they are the typical outcomes of probabilistic or deterministic effective processes operating in recursively bounded time.

Weakly deep sequences arise with finite probability when a universal Turing machine (with one-way input and output tapes, so that it can act as a transducer of infinite sequences) is given an infinite coin toss sequence for input. These sequences are necessarily produced very slowly: the time to output the n’th digit being bounded by no recursive function, and the output sequence contains evidence of this slowness. Because they are produced with finite probability, such sequences can contain only finite information about the halting problem.

Appropriation of (Ir)reversibility of Noise Fluctuations to (Un)Facilitate Complexity

 

data

The logical depth is a suitable measure of subjective complexity for physical as well as mathematical objects. this, upon considering the effect of irreversibility, noise, and spatial symmetries of the equations of motion and initial conditions on the asymptotic depth-generating abilities of model systems.

“Self-organization” suggests a spontaneous increase of complexity occurring in a system with simple, generic (e.g. spatially homogeneous) initial conditions. The increase of complexity attending a computation, by contrast, is less remarkable because it occurs in response to special initial conditions. An important question, which would have interested Turing, is whether self-organization is an asymptotically qualitative phenomenon like phase transitions. In other words, are there physically reasonable models in which complexity, appropriately defined, not only increases, but increases without bound in the limit of infinite space and time? A positive answer to this question would not explain the natural history of our particular finite world, but would suggest that its quantitative complexity can legitimately be viewed as an approximation to a well-defined qualitative property of infinite systems. On the other hand, a negative answer would suggest that our world should be compared to chemical reaction-diffusion systems (e.g. Belousov-Zhabotinsky), which self-organize on a macroscopic, but still finite scale, or to hydrodynamic systems which self-organize on a scale determined by their boundary conditions.

The suitability of logical depth as a measure of physical complexity depends on the assumed ability (“physical Church’s thesis”) of Turing machines to simulate physical processes, and to do so with reasonable efficiency. Digital machines cannot of course integrate a continuous system’s equations of motion exactly, and even the notion of computability is not very robust in continuous systems, but for realistic physical systems, subject throughout their time development to finite perturbations (e.g. electromagnetic and gravitational) from an uncontrolled environment, it is plausible that a finite-precision digital calculation can approximate the motion to within the errors induced by these perturbations. Empirically, many systems have been found amenable to “master equation” treatments in which the dynamics is approximated as a sequence of stochastic transitions among coarse-grained microstates.

We concentrate arbitrarily on cellular automata, in the broad sense of discrete lattice models with finitely many states per site, which evolve according to a spatially homogeneous local transition rule that may be deterministic or stochastic, reversible or irreversible, and synchronous (discrete time) or asynchronous (continuous time, master equation). Such models cover the range from evidently computer-like (e.g. deterministic cellular automata) to evidently material-like (e.g. Ising models) with many gradations in between.

More of the favorable properties need to be invoked to obtain “self-organization,” i.e. nontrivial computation from a spatially homogeneous initial condition. A rather artificial system (a cellular automaton which is stochastic but noiseless, in the sense that it has the power to make purely deterministic as well as random decisions) undergoes this sort of self-organization. It does so by allowing the nucleation and growth of domains, within each of which a depth-producing computation begins. When two domains collide, one conquers the other, and uses the conquered territory to continue its own depth-producing computation (a computation constrained to finite space, of course, cannot continue for more than exponential time without repeating itself). To achieve the same sort of self-organization in a truly noisy system appears more difficult, partly because of the conflict between the need to encourage fluctuations that break the system’s translational symmetry, while suppressing fluctuations that introduce errors in the computation.

Irreversibility seems to facilitate complex behavior by giving noisy systems the generic ability to correct errors. Only a limited sort of error-correction is possible in microscopically reversible systems such as the canonical kinetic Ising model. Minority fluctuations in a low-temperature ferromagnetic Ising phase in zero field may be viewed as errors, and they are corrected spontaneously because of their potential energy cost. This error correcting ability would be lost in nonzero field, which breaks the symmetry between the two ferromagnetic phases, and even in zero field it gives the Ising system the ability to remember only one bit of information. This limitation of reversible systems is recognized in the Gibbs phase rule, which implies that under generic conditions of the external fields, a thermodynamic system will have a unique stable phase, all others being metastable. Even in reversible systems, it is not clear why the Gibbs phase rule enforces as much simplicity as it does, since one can design discrete Ising-type systems whose stable phase (ground state) at zero temperature simulates an aperiodic tiling of the plane, and can even get the aperiodic ground state to incorporate (at low density) the space-time history of a Turing machine computation. Even more remarkably, one can get the structure of the ground state to diagonalize away from all recursive sequences.

String’s Depth of Burial

string_conversion_03.2013

A string’s depth might be defined as the execution time of its minimal program.

The difficulty with this definition arises in cases where the minimal program is only a few bits smaller than some much faster program, such as a print program, to compute the same output x. In this case, slight changes in x may induce arbitrarily large changes in the run time of the minimal program, by changing which of the two competing programs is minimal. Analogous instability manifests itself in translating programs from one universal machine to another. This instability emphasizes the essential role of the quantity of buried redundancy, not as a measure of depth, but as a certifier of depth. In terms of the philosophy-of-science metaphor, an object whose minimal program is only a few bits smaller than its print program is like an observation that points to a nontrivial hypothesis, but with only a low level of statistical confidence.

To adequately characterize a finite string’s depth one must therefore consider the amount of buried redundancy as well as the depth of its burial. A string’s depth at significance level s might thus be defined as that amount of time complexity which is attested by s bits worth of buried redundancy. This characterization of depth may be formalized in several ways.

A string’s depth at significance level s be defined as the time required to compute the string by a program no more than s bits larger than the minimal program.

This definition solves the stability problem, but is unsatisfactory in the way it treats multiple programs of the same length. Intuitively, 2k distinct (n + k)-bit programs that compute same output ought to be accorded the same weight as one n-bit program; but, by the present definition, they would be given no more weight than one (n + k)-bit program.

A string’s depth at signicifcance level s depth might be defined as the time t required for the string’s time-bounded algorithmic probability Pt(x) to rise to within a factor 2−s of its asymptotic time-unbounded value P(x).

This formalizes the notion that for the string to have originated by an effective process of t steps or fewer is less plausible than for the first s tosses of a fair coin all to come up heads.

It is not known whether there exist strings that are deep, in other words, strings having no small fast programs, even though they have enough large fast programs to contribute a significant fraction of their algorithmic probability. Such strings might be called deterministically deep but probabilistically shallow, because their chance of being produced quickly in a probabilistic computation (e.g. one where the input bits of U are supplied by coin tossing) is significant compared to their chance of being produced slowly. The question of whether such strings exist is probably hard to answer because it does not relativize uniformly. Deterministic and probabilistic depths are not very different relative to a random coin-toss oracle A of the equality of random-oracle-relativized deterministic and probabilistic polynomial time complexity classes; but they can be very different relative to an oracle B deliberately designed to hide information from deterministic computations (this parallels Hunt’s proof that deterministic and probabilistic polynomial time are unequal relative to such an oracle).

(Depth of Finite Strings): Let x and w be strings and s a significance parameter. A string’s depth at significance level s, denoted Ds(x), will be defined as min{T(p) : (|p|−|p| < s)∧(U(p) = x)}, the least time required to compute it by a s-incompressible program. At any given significance level, a string will be called t-deep if its depth exceeds t, and t-shallow otherwise.

The difference between this definition and the previous one is rather subtle philosophically and not very great quantitatively. Philosophically, when each individual hypothesis for the rapid origin of x is implausible at the 2−s confidence level, then it requires only that a weighted average of all such hypotheses be implausible.

There exist constants c1 and c2 such that for any string x, if programs running in time ≤ t contribute a fraction between 2−s and 2−s+1 of the string’s total algorithmic probability, then x has depth at most t at significance level s + c1 and depth at least t at significance level s − min{H(s), H(t)} − c2.

Proof : The first part follows easily from the fact that any k-compressible self-delimiting program p is associated with a unique, k − O(1) bits shorter, program of the form “execute the result of executing p∗”. Therefore there exists a constant c1 such that if all t-fast programs for x were s + c1– compressible, the associated shorter programs would contribute more than the total algorithmic probability of x. The second part follows because, roughly, if fast programs contribute only a small fraction of the algorithmic probability of x, then the property of being a fast program for x is so unusual that no program having that property can be random. More precisely, the t-fast programs for x constitute a finite prefix set, a superset S of which can be computed by a program of size H(x) + min{H(t), H(s)} + O(1) bits. (Given x∗ and either t∗ or s∗, begin enumerating all self-delimiting programs that compute x, in order of increasing running time, and quit when either the running time exceeds t or the accumulated measure of programs so far enumerated exceeds 2−(H(x)−s)). Therefore there exists a constant c2 such that, every member of S, and thus every t-fast program for x, is compressible by at least s − min{H(s), H(t)} − O(1) bits.

The ability of universal machines to simulate one another efficiently implies a corresponding degree of machine-independence for depth: for any two efficiently universal machines of the sort considered here, there exists a constant c and a linear polynomial L such that for any t, strings whose (s+c)-significant depth is at least L(t) on one machine will have s-significant depth at least t on the other.

Depth of one string relative to another may be defined analogously, and represents the plausible time required to produce one string, x, from another, w.

(Relative Depth of Finite Strings): For any two strings w and x, the depth of x relative to w at significance level s, denoted Ds(x/w), will be defined as min{T(p, w) : (|p|−|(p/w)∗| < s)∧(U(p, w) = x)}, the least time required to compute x from w by a program that is s-incompressible relative to w.

Depth of a string relative to its length is a particularly useful notion, allowing us, as it were, to consider the triviality or nontriviality of the “content” of a string (i.e. its bit sequence), independent of its “form” (length). For example, although the infinite sequence 000… is intuitively trivial, its initial segment 0n is deep whenever n is deep. However, 0n is always shallow relative to n, as is, with high probability, a random string of length n.

In order to adequately represent the intuitive notion of stored mathematical work, it is necessary that depth obey a “slow growth” law, i.e. that fast deterministic processes be unable to transform a shallow object into a deep one, and that fast probabilistic processes be able to do so only with low probability.

(Slow Growth Law): Given any data string x and two significance parameters s2 > s1, a random program generated by coin tossing has probability less than 2−(s2−s1)+O(1) of transforming x into an excessively deep output, i.e. one whose s2-significant depth exceeds the s1-significant depth of x plus the run time of the transforming program plus O(1). More precisely, there exist positive constants c1, c2 such that for all strings x, and all pairs of significance parameters s2 > s1, the prefix set {q : Ds2(U(q, x)) > Ds1(x) + T(q, x) + c1} has measure less than 2−(s2−s1)+c2.

Proof: Let p be a s1-incompressible program which computes x in time Ds1(x), and let r be the restart prefix mentioned in the definition of the U machine. Let Q be the prefix set {q : Ds2(U(q, x)) > T(q, x) + Ds1(x) + c1}, where the constant c1 is sufficient to cover the time overhead of concatenation. For all q ∈ Q, the program rpq by definition computes some deep result U(q, x) in less time than that result’s own s2-significant depth, and so rpq must be compressible by s2 bits. The sum of the algorithmic probabilities of strings of the form rpq, where q ∈ Q, is therefore

Σq∈Q P(rpq)< Σq∈Q 2−|rpq| + s2 = 2−|r|−|p|+s2 μ(Q)

On the other hand, since the self-delimiting program p can be recovered from any string of the form rpq (by deleting r and executing the remainder pq until halting occurs, by which time exactly p will have been read), the algorithmic probability of p is at least as great (within a constant factor) as the sum of the algorithmic probabilities of the strings {rpq : q ∈ Q} considered above:

P(p) > μ(Q) · 2−|r|−|p|+s2−O(1)

Recalling the fact that minimal program size is equal within a constant factor to the −log of algorithmic probability, and the s1-incompressibility of p, we have P(p) < 2−(|p|−s1+O(1)), and therefore finally

μ(Q) < 2−(s2−s1)+O(1), which was to be demonstrated.

Universal Turing Machine: Algorithmic Halting

169d342be4ac9fdca10d1c8c9c04c3df

A natural number x will be identified with the x’th binary string in lexicographic order (Λ,0,1,00,01,10,11,000…), and a set X of natural numbers will be identified with its characteristic sequence, and with the real number between 0 and 1 having that sequence as its dyadic expansion. The length of a string x will be denoted |x|, the n’th bit of an infinite sequence X will be noted X(n), and the initial n bits of X will be denoted Xn. Concatenation of strings p and q will be denoted pq.

We now define the information content (and later the depth) of finite strings using a universal Turing machine U. A universal Turing machine may be viewed as a partial recursive function of two arguments. It is universal in the sense that by varying one argument (“program”) any partial recursive function of the other argument (“data”) can be obtained. In the usual machine formats, program, data and output are all finite strings, or, equivalently, natural numbers. However, it is not possible to take a uniformly weighted average over a countably infinite set. Chaitin’s universal machine has two tapes: a read-only one-way tape containing the infinite program; and an ordinary two-way read/write tape, which is used for data input, intermediate work, and output, all of which are finite strings. Our machine differs from Chaitin’s in having some additional auxiliary storage (e.g. another read/write tape) which is needed only to improve the time efficiency of simulations.

We consider only terminating computations, during which, of course, only a finite portion of the program tape can be read. Therefore, the machine’s behavior can still be described by a partial recursive function of two string arguments U(p, w), if we use the first argument to represent that portion of the program that is actually read in the course of a particular computation. The expression U (p, w) = x will be used to indicate that the U machine, started with any infinite sequence beginning with p on its program tape and the finite string w on its data tape, performs a halting computation which reads exactly the initial portion p of the program, and leaves output data x on the data tape at the end of the computation. In all other cases (reading less than p, more than p, or failing to halt), the function U(p, w) is undefined. Wherever U(p, w) is defined, we say that p is a self-delimiting program to compute x from w, and we use T(p, w) to represent the time (machine cycles) of the computation. Often we will consider computations without input data; in that case we abbreviate U(p, Λ) and T(p, Λ) as U(p) and T(p) respectively.

The self-delimiting convention for the program tape forces the domain of U and T, for each data input w, to be a prefix set, that is, a set of strings no member of which is the extension of any other member. Any prefix set S obeys the Kraft inequality

p∈S 2−|p| ≤ 1

Besides being self-delimiting with regard to its program tape, the U machine must be efficiently universal in the sense of being able to simulate any other machine of its kind (Turing machines with self-delimiting program tape) with at most an additive constant constant increase in program size and a linear increase in execution time.

Without loss of generality we assume that there exists for the U machine a constant prefix r which has the effect of stacking an instruction to restart the computation when it would otherwise end. This gives the machine the ability to concatenate programs to run consecutively: if U(p, w) = x and U(q, x) = y, then U(rpq, w) = y. Moreover, this concatenation should be efficient in the sense that T (rpq, w) should exceed T (p, w) + T (q, x) by at most O(1). This efficiency of running concatenated programs can be realized with the help of the auxiliary storage to stack the restart instructions.

Sometimes we will generalize U to have access to an “oracle” A, i.e. an infinite look-up table which the machine can consult in the course of its computation. The oracle may be thought of as an arbitrary 0/1-valued function A(x) which the machine can cause to be evaluated by writing the argument x on a special tape and entering a special state of the finite control unit. In the next machine cycle the oracle responds by sending back the value A(x). The time required to evaluate the function is thus linear in the length of its argument. In particular we consider the case in which the information in the oracle is random, each location of the look-up table having been filled by an independent coin toss. Such a random oracle is a function whose values are reproducible, but otherwise unpredictable and uncorrelated.

Let {φAi (p, w): i = 0,1,2…} be an acceptable Gödel numbering of A-partial recursive functions of two arguments and {φAi (p, w)} an associated Blum complexity measure, henceforth referred to as time. An index j is called self-delimiting iff, for all oracles A and all values w of the second argument, the set { x : φAj (x, w) is defined} is a prefix set. A self-delimiting index has efficient concatenation if there exists a string r such that for all oracles A and all strings w, x, y, p, and q,if φAj (p, w) = x and φAj (q, x) = y, then φAj(rpq, w) = y and φAj (rpq, w) = φAj (p, w) + φAj (q, x) + O(1). A self-delimiting index u with efficient concatenation is called efficiently universal iff, for every self-delimiting index j with efficient concatenation, there exists a simulation program s and a linear polynomial L such that for all oracles A and all strings p and w, and

φAu(sp, w) = φAj (p, w)

and

ΦAu(sp, w) ≤ L(ΦAj (p, w))

The functions UA(p,w) and TA(p,w) are defined respectively as φAu(p, w) and ΦAu(p, w), where u is an efficiently universal index.

For any string x, the minimal program, denoted x∗, is min{p : U(p) = x}, the least self-delimiting program to compute x. For any two strings x and w, the minimal program of x relative to w, denoted (x/w)∗, is defined similarly as min{p : U(p,w) = x}.

By contrast to its minimal program, any string x also has a print program, of length |x| + O(log|x|), which simply transcribes the string x from a verbatim description of x contained within the program. The print program is logarithmically longer than x because, being self-delimiting, it must indicate the length as well as the contents of x. Because it makes no effort to exploit redundancies to achieve efficient coding, the print program can be made to run quickly (e.g. linear time in |x|, in the present formalism). Extra information w may help, but cannot significantly hinder, the computation of x, since a finite subprogram would suffice to tell U to simply erase w before proceeding. Therefore, a relative minimal program (x/w)∗ may be much shorter than the corresponding absolute minimal program x∗, but can only be longer by O(1), independent of x and w.

A string is compressible by s bits if its minimal program is shorter by at least s bits than the string itself, i.e. if |x∗| ≤ |x| − s. Similarly, a string x is said to be compressible by s bits relative to a string w if |(x/w)∗| ≤ |x| − s. Regardless of how compressible a string x may be, its minimal program x∗ is compressible by at most an additive constant depending on the universal computer but independent of x. [If (x∗)∗ were much smaller than x∗, then the role of x∗ as minimal program for x would be undercut by a program of the form “execute the result of executing (x∗)∗.”] Similarly, a relative minimal program (x/w)∗ is compressible relative to w by at most a constant number of bits independent of x or w.

The algorithmic probability of a string x, denoted P(x), is defined as {∑2−|p| : U(p) = x}. This is the probability that the U machine, with a random program chosen by coin tossing and an initially blank data tape, will halt with output x. The time-bounded algorithmic probability, Pt(x), is defined similarly, except that the sum is taken only over programs which halt within time t. We use P(x/w) and Pt(x/w) to denote the analogous algorithmic probabilities of one string x relative to another w, i.e. for computations that begin with w on the data tape and halt with x on the data tape.

The algorithmic entropy H(x) is defined as the least integer greater than −log2P(x), and the conditional entropy H(x/w) is defined similarly as the least integer greater than − log2P(x/w). Among the most important properties of the algorithmic entropy is its equality, to within O(1), with the size of the minimal program:

∃c∀x∀wH(x/w) ≤ |(x/w)∗| ≤ H(x/w) + c

The first part of the relation, viz. that algorithmic entropy should be no greater than minimal program size, is obvious, because of the minimal program’s own contribution to the algorithmic probability. The second half of the relation is less obvious. The approximate equality of algorithmic entropy and minimal program size means that there are few near-minimal programs for any given input/output pair (x/w), and that every string gets an O(1) fraction of its algorithmic probability from its minimal program.

Finite strings, such as minimal programs, which are incompressible or nearly so are called algorithmically random. The definition of randomness for finite strings is necessarily a little vague because of the ±O(1) machine-dependence of H(x) and, in the case of strings other than self-delimiting programs, because of the question of how to count the information encoded in the string’s length, as opposed to its bit sequence. Roughly speaking, an n-bit self-delimiting program is considered random (and therefore not ad-hoc as a hypothesis) iff its information content is about n bits, i.e. iff it is incompressible; while an externally delimited n-bit string is considered random iff its information content is about n + H(n) bits, enough to specify both its length and its contents.

For infinite binary sequences (which may be viewed also as real numbers in the unit interval, or as characteristic sequences of sets of natural numbers) randomness can be defined sharply: a sequence X is incompressible, or algorithmically random, if there is an O(1) bound to the compressibility of its initial segments Xn. Intuitively, an infinite sequence is random if it is typical in every way of sequences that might be produced by tossing a fair coin; in other words, if it belongs to no informally definable set of measure zero. Algorithmically random sequences constitute a larger class, including sequences such as Ω which can be specified by ineffective definitions.

The busy beaver function B(n) is the greatest number computable by a self-delimiting program of n bits or fewer. The halting set K is {x : φx(x) converges}. This is the standard representation of the halting problem.

The self-delimiting halting set K0 is the (prefix) set of all self-delimiting programs for the U machine that halt: {p : U(p) converges}.

K and K0 are readily computed from one another (e.g. by regarding the self-delimiting programs as a subset of ordinary programs, the first 2n bits of K0 can be recovered from the first 2n+O(1) bits of K; by encoding each n-bit ordinary program as a self-delimiting program of length n + O(log n), the first 2n bits of K can be recovered from the first 2n+O(log n) bits of K0.)

The halting probability Ω is defined as {2−|p| : U(p) converges}, the probability that the U machine would halt on an infinite input supplied by coin tossing. Ω is thus a real number between 0 and 1.

The first 2n bits of K0 can be computed from the first n bits of Ω, by enumerating halting programs until enough have halted to account for all but 2−n of the total halting probability. The time required for this decoding (between B(n − O(1)) and B(n + H(n) + O(1)) grows faster than any computable function of n. Although K0 is only slowly computable from Ω, the first n bits of Ω can be rapidly computed from the first 2n+H(n)+O(1) bits of K0, by asking about the halting of programs of the form “enumerate halting programs until (if ever) their cumulative weight exceeds q, then halt”, where q is an n-bit rational number…

Malignant Acceleration in Tech-Finance. Some Further Rumination on Regulations. Thought of the Day 72.1

these-stunning-charts-show-some-of-the-wild-trading-activity-that-came-from-a-dark-pool-this-morning

Regardless of the positive effects of HFT that offers, such as reduced spreads, higher liquidity, and faster price discovery, its negative side is mostly what has caught people’s attention. Several notorious market failures and accidents in recent years all seem to be related to HFT practices. They showed how much risk HFT can involve and how huge the damage can be.

HFT heavily depends on the reliability of the trading algorithms that generate, route, and execute orders. High-frequency traders thus must ensure that these algorithms have been tested completely and thoroughly before they are deployed into the live systems of the financial markets. Any improperly-tested, or prematurely-released algorithms may cause losses to both investors and the exchanges. Several examples demonstrate the extent of the ever-present vulnerabilities.

In August 2012, the Knight Capital Group implemented a new liquidity testing software routine into its trading system, which was running live on the NYSE. The system started making bizarre trading decisions, quadrupling the price of one company, Wizzard Software, as well as bidding-up the price of much larger entities, such as General Electric. Within 45 minutes, the company lost USD 440 million. After this event and the weakening of Knight Capital’s capital base, it agreed to merge with another algorithmic trading firm, Getco, which is the biggest HFT firm in the U.S. today. This example emphasizes the importance of implementing precautions to ensure their algorithms are not mistakenly used.

Another example is Everbright Securities in China. In 2013, state-owned brokerage firm, Everbright Securities Co., sent more than 26,000 mistaken buy orders to the Shanghai Stock Exchange (SSE of RMB 23.4 billion (USD 3.82 billion), pushing its benchmark index up 6 % in two minutes. This resulted in a trading loss of approximately RMB 194 million (USD 31.7 million). In a follow-up evaluative study, the China Securities Regulatory Commission (CSRC) found that there were significant flaws in Everbright’s information and risk management systems.

The damage caused by HFT errors is not limited to specific trading firms themselves, but also may involve stock exchanges and the stability of the related financial market. On Friday, May 18, 2012, the social network giant, Facebook’s stock was issued on the NASDAQ exchange. This was the most anticipated initial public offering (IPO) in its history. However, technology problems with the opening made a mess of the IPO. It attracted HFT traders, and very large order flows were expected, and before the IPO, NASDAQ was confident in its ability to deal with the high volume of orders.

But when the deluge of orders to buy, sell and cancel trades came, NASDAQ’s trading software began to fail under the strain. This resulted in a 30-minute delay on NASDAQ’s side, and a 17-second blackout for all stock trading at the exchange, causing further panic. Scrutiny of the problems immediately led to fines for the exchange and accusations that HFT traders bore some responsibility too. Problems persisted after opening, with many customer orders from institutional and retail buyers unfilled for hours or never filled at all, while others ended up buying more shares than they had intended. This incredible gaffe, which some estimates say cost traders USD 100 million, eclipsed NASDAQ’s achievement in getting Facebook’s initial IPO, the third largest IPO in U.S. history. This incident has been estimated to have cost investors USD 100 million.

Another instance occurred on May 6, 2010, when U.S. financial markets were surprised by what has been referred to ever since as the “Flash Crash” Within less than 30 minutes, the main U.S. stock markets experienced the single largest price declines within a day, with a decline of more than 5 % for many U.S.-based equity products. In addition, the Dow Jones Industrial Average (DJIA), at its lowest point that day, fell by nearly 1,000 points, although it was followed by a rapid rebound. This brief period of extreme intraday volatility demonstrated the weakness of the structure and stability of U.S. financial markets, as well as the opportunities for volatility-focused HFT traders. Although a subsequent investigation by the SEC cleared high-frequency traders of directly having caused the Flash Crash, they were still blamed for exaggerating market volatility, withdrawing liquidity for many U.S.-based equities (FLASH BOYS).

Since the mid-2000s, the average trade size in the U.S. stock market had plummeted, the markets had fragmented, and the gap in time between the public view of the markets and the view of high-frequency traders had widened. The rise of high-frequency trading had been accompanied also by a rise in stock market volatility – over and above the turmoil caused by the 2008 financial crisis. The price volatility within each trading day in the U.S. stock market between 2010 and 2013 was nearly 40 percent higher than the volatility between 2004 and 2006, for instance. There were days in 2011 in which volatility was higher than in the most volatile days of the dot-com bubble. Although these different incidents have different causes, the effects were similar and some common conclusions can be drawn. The presence of algorithmic trading and HFT in the financial markets exacerbates the adverse impacts of trading-related mistakes. It may lead to extremely higher market volatility and surprises about suddenly-diminished liquidity. This raises concerns about the stability and health of the financial markets for regulators. With the continuous and fast development of HFT, larger and larger shares of equity trades were created in the U.S. financial markets. Also, there was mounting evidence of disturbed market stability and caused significant financial losses due to HFT-related errors. This led the regulators to increase their attention and effort to provide the exchanges and traders with guidance on HFT practices They also expressed concerns about high-frequency traders extracting profit at the costs of traditional investors and even manipulating the market. For instance, high-frequency traders can generate a large amount of orders within microseconds to exacerbate a trend. Other types of misconduct include: ping orders, which is using some orders to detect other hidden orders; and quote stuffing, which is issuing a large number of orders to create uncertainty in the market. HFT creates room for these kinds of market abuses, and its blazing speed and huge trade volumes make their detection difficult for regulators. Regulators have taken steps to increase their regulatory authority over HFT activities. Some of the problems that arose in the mid-2000s led to regulatory hearings in the United States Senate on dark pools, flash orders and HFT practices. Another example occurred after the Facebook IPO problem. This led the SEC to call for a limit up-limit down mechanism at the exchanges to prevent trades in individual securities from occurring outside of a specified price range so that market volatility will be under better control. These regulatory actions put stricter requirements on HFT practices, aiming to minimize the market disturbance when many fast trading orders occur within a day.