Quantum Informational Biochemistry. Thought of the Day 71.0

el_net2

A natural extension of the information-theoretic Darwinian approach for biological systems is obtained taking into account that biological systems are constituted in their fundamental level by physical systems. Therefore it is through the interaction among physical elementary systems that the biological level is reached after increasing several orders of magnitude the size of the system and only for certain associations of molecules – biochemistry.

In particular, this viewpoint lies in the foundation of the “quantum brain” project established by Hameroff and Penrose (Shadows of the Mind). They tried to lift quantum physical processes associated with microsystems composing the brain to the level of consciousness. Microtubulas were considered as the basic quantum information processors. This project as well the general project of reduction of biology to quantum physics has its strong and weak sides. One of the main problems is that decoherence should quickly wash out the quantum features such as superposition and entanglement. (Hameroff and Penrose would disagree with this statement. They try to develop models of hot and macroscopic brain preserving quantum features of its elementary micro-components.)

However, even if we assume that microscopic quantum physical behavior disappears with increasing size and number of atoms due to decoherence, it seems that the basic quantum features of information processing can survive in macroscopic biological systems (operating on temporal and spatial scales which are essentially different from the scales of the quantum micro-world). The associated information processor for the mesoscopic or macroscopic biological system would be a network of increasing complexity formed by the elementary probabilistic classical Turing machines of the constituents. Such composed network of processors can exhibit special behavioral signatures which are similar to quantum ones. We call such biological systems quantum-like. In the series of works Asano and others (Quantum Adaptivity in Biology From Genetics to Cognition), there was developed an advanced formalism for modeling of behavior of quantum-like systems based on theory of open quantum systems and more general theory of adaptive quantum systems. This formalism is known as quantum bioinformatics.

The present quantum-like model of biological behavior is of the operational type (as well as the standard quantum mechanical model endowed with the Copenhagen interpretation). It cannot explain physical and biological processes behind the quantum-like information processing. Clarification of the origin of quantum-like biological behavior is related, in particular, to understanding of the nature of entanglement and its role in the process of interaction and cooperation in physical and biological systems. Qualitatively the information-theoretic Darwinian approach supplies an interesting possibility of explaining the generation of quantum-like information processors in biological systems. Hence, it can serve as the bio-physical background for quantum bioinformatics. There is an intriguing point in the fact that if the information-theoretic Darwinian approach is right, then it would be possible to produce quantum information from optimal flows of past, present and anticipated classical information in any classical information processor endowed with a complex enough program. Thus the unified evolutionary theory would supply a physical basis to Quantum Information Biology.

Suspicion on Consciousness as an Immanent Derivative

Untitled

The category of the subject (like that of the object) has no place in an immanent world. There can be no transcendent, subjective essence. What, then, is the ontological status of a body and its attendant instance of consciousness? In what would it exist? Sanford Kwinter (conjuncted here) here offers:

It would exist precisely in the ever-shifting pattern of mixtures or composites: both internal ones – the body as a site marked and traversed by forces that converge upon it in continuous variation; and external ones – the capacity of any individuated substance to combine and recombine with other bodies or elements (ensembles), both influencing their actions and undergoing influence by them. The ‘subject’ … is but a synthetic unit falling at the midpoint or interface of two more fundamental systems of articulation: the first composed of the fluctuating microscopic relations and mixtures of which the subject is made up, the second of the macro-blocs of relations or ensembles into which it enters. The image produced at the interface of these two systems – that which replaces, yet is too often mistaken for, subjective essence – may in turn have its own individuality characterized with a certain rigor. For each mixture at this level introduces into the bloc a certain number of defining capacities that determine both what the ‘subject’ is capable of bringing to pass outside of itself and what it is capable of receiving (undergoing) in terms of effects.

This description is sufficient to explain the immanent nature of the subjective bloc as something entirely embedded in and conditioned by its surroundings. What it does not offer – and what is not offered in any detail in the entirety of the work – is an in-depth account of what, exactly, these “defining capacities” are. To be sure, it would be unfair to demand a complete description of these capacities. Kwinter himself has elsewhere referred to the states of the nervous system as “magically complex”. Regardless of the specificity with which these capacities can presently be defined, we must nonetheless agree that it is at this interface, as he calls it, at this location where so many systems are densely overlaid, that consciousness is produced. We may be convinced that this consciousness, this apparent internal space of thought, is derived entirely from immanent conditions and can only be granted the ontological status of an effect, but this effect still manages to produce certain difficulties when attempting to define modes of behavior appropriate to an immanent world.

There is a palpable suspicion of the role of consciousness throughout Kwinter’s work, at least insofar as it is equated with some kind of internal, subjective space. (In one text he optimistically awaits the day when this space will “be left utterly in shreds.”) The basis of this suspicion is multiple and obvious. Among the capacities of consciousness is the ability to attribute to itself the (false) image of a stable and transcendent essence. The workings of consciousness are precisely what allow the subjective bloc to orient itself in a sequence of time, separating itself from an absolute experience of the moment. It is within consciousness that limiting and arbitrary moral categories seem to most stubbornly lodge themselves. (To be sure this is the location of all critical thought.) And, above all, consciousness may serve as the repository for conditioned behaviors which believe themselves to be free of external determination. Consciousness, in short, contains within itself an enormous number of limiting factors which would retard the production of novelty. Insofar as it appears to possess the capacity for self-determination, this capacity would seem most productively applied by turning on itself – that is, precisely by making the choice not to make conscious decisions and instead to permit oneself to be seized by extra-subjective forces.

Potential Synapses. Thought of the Day 52.0

For a neuron to recognize a pattern of activity it requires a set of co-located synapses (typically fifteen to twenty) that connect to a subset of the cells that are active in the pattern to be recognized. Learning to recognize a new pattern is accomplished by the formation of a set of new synapses collocated on a dendritic segment.

Untitled

Figure: Learning by growing new synapses. Learning in an HTM neuron is modeled by the growth of new synapses from a set of potential synapses. A “permanence” value is assigned to each potential synapse and represents the growth of the synapse. Learning occurs by incrementing or decrementing permanence values. The synapse weight is a binary value set to 1 if the permanence is above a threshold.

Figure shows how we model the formation of new synapses in a simulated Hierarchical Temporal Memory (HTM) neuron. For each dendritic segment we maintain a set of “potential” synapses between the dendritic segment and other cells in the network that could potentially form a synapse with the segment. The number of potential synapses is larger than the number of actual synapses. We assign each potential synapse a scalar value called “permanence” which represents stages of growth of the synapse. A permanence value close to zero represents an axon and dendrite with the potential to form a synapse but that have not commenced growing one. A 1.0 permanence value represents an axon and dendrite with a large fully formed synapse.

The permanence value is incremented and decremented using a Hebbian-like rule. If the permanence value exceeds a threshold, such as 0.3, then the weight of the synapse is 1, if the permanence value is at or below the threshold then the weight of the synapse is 0. The threshold represents the establishment of a synapse, albeit one that could easily disappear. A synapse with a permanence value of 1.0 has the same effect as a synapse with a permanence value at threshold but is not as easily forgotten. Using a scalar permanence value enables on-line learning in the presence of noise. A previously unseen input pattern could be noise or it could be the start of a new trend that will repeat in the future. By growing new synapses, the network can start to learn a new pattern when it is first encountered, but only act differently after several presentations of the new pattern. Increasing permanence beyond the threshold means that patterns experienced more than others will take longer to forget.

HTM neurons and HTM networks rely on distributed patterns of cell activity, thus the activation strength of any one neuron or synapse is not very important. Therefore, in HTM simulations we model neuron activations and synapse weights with binary states. Additionally, it is well known that biological synapses are stochastic, so a neocortical theory cannot require precision of synaptic efficacy. Although scalar states and weights might improve performance, they are not required from a theoretical point of view.

US Stock Market Interaction Network as Learned by the Boltzmann Machine

Untitled

Price formation on a financial market is a complex problem: It reflects opinion of investors about true value of the asset in question, policies of the producers, external regulation and many other factors. Given the big number of factors influencing price, many of which unknown to us, describing price formation essentially requires probabilistic approaches. In the last decades, synergy of methods from various scientific areas has opened new horizons in understanding the mechanisms that underlie related problems. One of the popular approaches is to consider a financial market as a complex system, where not only a great number of constituents plays crucial role but also non-trivial interaction properties between them. For example, related interdisciplinary studies of complex financial systems have revealed their enhanced sensitivity to fluctuations and external factors near critical events with overall change of internal structure. This can be complemented by the research devoted to equilibrium and non-equilibrium phase transitions.

In general, statistical modeling of the state space of a complex system requires writing down the probability distribution over this space using real data. In a simple version of modeling, the probability of an observable configuration (state of a system) described by a vector of variables s can be given in the exponential form

p(s) = Z−1 exp {−βH(s)} —– (1)

where H is the Hamiltonian of a system, β is inverse temperature (further β ≡ 1 is assumed) and Z is a statistical sum. Physical meaning of the model’s components depends on the context and, for instance, in the case of financial systems, s can represent a vector of stock returns and H can be interpreted as the inverse utility function. Generally, H has parameters defined by its series expansion in s. Basing on the maximum entropy principle, expansion up to the quadratic terms is usually used, leading to the pairwise interaction models. In the equilibrium case, the Hamiltonian has form

H(s) = −hTs − sTJs —– (2)

where h is a vector of size N of external fields and J is a symmetric N × N matrix of couplings (T denotes transpose). The energy-based models represented by (1) play essential role not only in statistical physics but also in neuroscience (models of neural networks) and machine learning (generative models, also known as Boltzmann machines). Given topological similarities between neural and financial networks, these systems can be considered as examples of complex adaptive systems, which are characterized by the adaptation ability to changing environment, trying to stay in equilibrium with it. From this point of view, market structural properties, e.g. clustering and networks, play important role for modeling of the distribution of stock prices. Adaptation (or learning) in these systems implies change of the parameters of H as financial and economic systems evolve. Using statistical inference for the model’s parameters, the main goal is to have a model capable of reproducing the same statistical observables given time series for a particular historical period. In the pairwise case, the objective is to have

⟨sidata = ⟨simodel —– (3a)

⟨sisjdata = ⟨sisjmodel —– (3b)

where angular brackets denote statistical averaging over time. Having specified general mathematical model, one can also discuss similarities between financial and infinite- range magnetic systems in terms of phenomena related, e.g. extensivity, order parameters and phase transitions, etc. These features can be captured even in the simplified case, when si is a binary variable taking only two discrete values. Effect of the mapping to a binarized system, when the values si = +1 and si = −1 correspond to profit and loss respectively. In this case, diagonal elements of the coupling matrix, Jii, are zero because s2i = 1 terms do not contribute to the Hamiltonian….

US stock market interaction network as learned by the Boltzmann Machine

Algorithmic Randomness and Complexity

Figure-13-Constructing-a-network-of-motifs-After-six-recursive-iterations-starting-from

How random is a real? Given two reals, which is more random? How should we even try to quantify these questions, and how do various choices of measurement relate? Once we have reasonable measuring devices, and, using these devices, we divide the reals into equivalence classes of the same “degree of randomness” what do the resulting structures look like? Once we measure the level of randomness how does the level of randomness relate to classical measures of complexity Turing degrees of unsolvability? Should it be the case that high levels of randomness mean high levels of complexity in terms of computational power, or low levels of complexity? Conversely should the structures of computability such as the degrees and the computably enumerable sets have anything to say about randomness for reals?

Algorithmic Randomness and Complexity

Music Composition Using LSTM Recurrent Neural Networks

LSTM

The most straight-forward way to compose music with an Recurrent Neural Network (RNN) is to use the network as single-step predictor. The network learns to predict notes at time t + 1 using notes at time t as inputs. After learning has been stopped the network can be seeded with initial input values – perhaps from training data – and can then generate novel compositions by using its own outputs to generate subsequent inputs. This note-by-note approach was first examined by Todd.

A feed-forward network would have no chance of composing music in this fashion. Lacking the ability to store any information about the past, such a network would be unable to keep track of where it is in a song. In principle an RNN does not suffer from this limitation. With recurrent connections it can use hidden layer activations as memory and thus is capable of exhibiting (seemingly arbitrary) temporal dynamics. In practice, however, RNNs do not perform very well at this task. As Mozer aptly wrote about his attempts to compose music with RNNs,

While the local contours made sense, the pieces were not musically coherent, lacking thematic structure and having minimal phrase structure and rhythmic organization.

The reason for this failure is likely linked to the problem of vanishing gradients (Hochreiter et al.) in RNNs. In gradient methods such as Back-Propagation Through Time (BPTT) (Williams and Zipser) and Real-Time Recurrent Learning (RTRL) error flow either vanishes quickly or explodes exponentially, making it impossible for the networks to deal correctly with long-term dependencies. In the case of music, long-term dependencies are at the heart of what defines a particular style, with events spanning several notes or even many bars contributing to the formation of metrical and phrasal structure. The clearest example of these dependencies are chord changes. In a musical form like early rock-and-roll music for example, the same chord can be held for four bars or more. Even if melodies are constrained to contain notes no shorter than an eighth note, a network must regularly and reliably bridge time spans of 32 events or more.

The most relevant previous research is that of Mozer, who did note-by-note composition of single-voice melodies accompanied by chords. In the “CONCERT” model, Mozer used sophisticated RNN procedures including BPTT, log-likelihood objective functions and probabilistic interpretation of the output values. In addition to these neural network methods, Mozer employed a psychologically-realistic distributed input encoding (Shepard) that gave the network an inductive bias towards chromatically and harmonically related notes. He used a second encoding method to generate distributed representations of chords.

Untitled

 

A BPTT-trained RNN does a poor job of learning long-term dependencies. To offset this, Mozer used a distributed encoding of duration that allowed him to process a note of any duration in a single network timestep. By representing in a single timestep a note rather than a slice of time, the number of time steps to be bridged by the network in learning global structure is greatly reduced. For example, to allow sixteenth notes in a network which encodes slices of time directly requires that a whole note span at minimum 16 time steps. Though networks regularly outperformed third-order transition table approaches, they failed in all cases to find global structure. In analyzing this performance Mozer suggests that, for the note-by-note method to work it is necessary that the network can induce structure at multiple levels. A First Look at Music Composition using LSTM Recurrent Neural Networks