Abstract Expressions of Time’s Modalities. Thought of the Day 21.0


According to Gregory Bateson,

What we mean by information — the elementary unit of information — is a difference which makes a difference, and it is able to make a difference because the neural pathways along which it travels and is continually transformed are themselves provided with energy. The pathways are ready to be triggered. We may even say that the question is already implicit in them.

In other words, we always need to know some second order logic, and presuppose a second order of “order” (cybernetics) usually shared within a distinct community, to realize what a certain claim, hypothesis or theory means. In Koichiro Matsuno’s opinion Bateson’s phrase

must be a prototypical example of second-order logic in that the difference appearing both in the subject and predicate can accept quantification. Most statements framed in second-order logic are not decidable. In order to make them decidable or meaningful, some qualifier needs to be used. A popular example of such a qualifier is a subjective observer. However, the point is that the subjective observer is not limited to Alice or Bob in the QBist parlance.

This is what is necessitated in order understand the different viewpoints in logic of mathematicians, physicists and philosophers in the dispute about the existence of time. An essential aspect of David Bohm‘s “implicate order” can be seen in the grammatical formulation of theses such as the law of motion:

While it is legitimate in its own light, the physical law of motion alone framed in eternal time referable in the present tense, whether in classical or quantum mechanics, is not competent enough to address how the now could be experienced. … Measurement differs from the physical law of motion as much as the now in experience differs from the present tense in description. The watershed separating between measurement and the law of motion is in the distinction between the now and the present tense. Measurement is thus subjective and agential in making a punctuation at the moment of now. (Matsuno)

The distinction between experiencing and capturing experience of time in terms of language is made explicit in Heidegger’s Being and Time

… by passing away constantly, time remains as time. To remain means: not to disappear, thus, to presence. Thus time is determined by a kind of Being. How, then, is Being supposed to be determined by time?

Koichiro Matsuno’s comment on this is:

Time passing away is an abstraction from accepting the distinction of the grammatical tenses, while time remaining as time refers to the temporality of the durable now prior to the abstraction of the tenses.

Therefore, when trying to understand the “local logics/phenomenologies” of the individual disciplines (mathematics physics, philosophy, etc., including their fields), one should be aware of the fact that the capabilities of our scientific language are not limitless:

…the now of the present moment is movable and dynamic in updating the present perfect tense in the present progressive tense. That is to say, the now is prior and all of the grammatical tenses including the ubiquitous present tense are the abstract derivatives from the durable now. (Matsuno)

This presupposes the adequacy of mathematical abstractions specifically invented or adopted and elaborated for the expression of more sophisticated modalities of time’s now than those currently used in such formalisms as temporal logic.

Osteo Myological Quantization. Note Quote.

The site of the parameters in a higher order space can also be quantized into segments, the limits of which can be no more decomposed. Such a limit may be nearly a rigid piece. In the animal body such quanta cannot but be bone pieces forming parts of the skeleton, whether lying internally as [endo]-skeleton or as almost rigid shell covering the body as external skeleton.

Note the partition of the body into three main segments: Head (cephalique), pectral (breast), caudal (tail), materializing the KH order limit M>= 3 or the KHK dimensional limit N>= 3. Notice also the quantization into more macroscopic segments such as of the abdominal part into several smaller segments beyond the KHK lower bound N=3. Lateral symmetry with a symmetry axis is remarkable. This is of course an indispensable consequence of the modified Zermelo conditions, which entails also locomotive appendages differentiating into legs for walking and wings for flying in the case of insects.


Two paragraphs of Kondo addressing the simple issues of what bones are, mammalian bi-lateral symmetry, the numbers of major body parts and their segmentation, the notion of the mathematical origins of wings, legs and arms. The dimensionality of eggs being zero, hence their need of warmth for progression to locomotion and the dimensionality of snakes being one, hence their mode of locomotion. A feature of the biological is their attention to detail, their use of line art to depict the various forms of living being – from birds to starfish to dinosaurs, the use of the full latin terminology and at all times the relationship of the various form of living being to the underlying higher order geometry and the mathematical notion of principle ideals. The human skeleton is treated as a hierarchical Kawaguchi tree with its characteristic three pronged form. The Riemannian arc length of the curve k(t) is given by the integral of the square root of a quadratic form in x’ with coefficients dependent in x’. This integrand is homogenous of the first order in x’. If we drop the quadratic property and retain the homogeneity, then we obtain the Finsler geometry. Kawaguchi geometry supposes that the integrand depends upon the higher derivatives x’’ up to the k-th derivative xk. The notation that Kondo uses is:



L Parameters N Dimensions M Derivatives

The lower part of the skeleton can be divided into three prongs, each starting from the centre as a single parametric Kawaguchi tree.

…the skeletal, muscular, gastrointestinal, circulation systems etc combine into a holo-parametric whole that can be more generally quantized, each quantum involving some osteological, neural, circulatory functions etc.

…thus globally the human body from head through trunk to limbs are quantized into a finite number of quanta.

The Semiotic Theory of Autopoiesis, OR, New Level Emergentism


The dynamics of all the life-cycle meaning processes can be described in terms of basic semiotic components, algebraic constructions of the following forms:

Pnn:fnn] → Ξn+1)

where Ξn is a sign system corresponding to a representation of a (design) problem at time t1, Ξn+1 is a sign system corresponding to a representation of the problem at time t2, t2 > t1, fn is a composition of semiotic morphisms that specifies the interaction of variation and selection under the condition of information closure, which requires no external elements be added to the current sign system; мn is a semiotic morphism, and Pn is the probability associated with мn, ΣPn = 1, n=1,…,M, where M is the number of the meaningful transformations of the resultant sign system after fn. There is a partial ranking – importance ordering – on the constraints of A in every Ξn, such that lower ranked constraints can be violated in order for higher ranked constraints to be satisfied. The morphisms of fn preserve the ranking.

The Semiotic Theory of Self-Organizing Systems postulates that in the scale hierarchy of dynamical organization, a new level emerges if and only if a new level in the hierarchy of semiotic interpretance emerges. As the development of a new product always and naturally causes the emergence of a new meaning, the above-cited Principle of Emergence directly leads us to the formulation of the first law of life-cycle semiosis as follows:

I. The semiosis of a product life cycle is represented by a sequence of basic semiotic components, such that at least one of the components is well defined in the sense that not all of its morphisms of м and f are isomorphisms, and at least one м in the sequence is not level-preserving in the sense that it does not preserve the original partial ordering on levels.

For the present (i.e. for an on-going process), there exists a probability distribution over the possible мn for every component in the sequence. For the past (i.e. retrospectively), each of the distributions collapses to a single mapping with Pn = 1, while the sequence of basic semiotic components is degenerated to a sequence of functions. For the future, the life-cycle meaning-making

Comment on Purely Random Correlations of the Matrix, or Studying Noise in Neural Networks


In the presence of two-body interactions the many-body Hamiltonian matrix elements vJα,α′ of good total angular momentum J in the shell-model basis |α⟩ generated by the mean field, can be expressed as follows:

vJα,α′ = ∑J’ii’ cJαα’J’ii’ gJ’ii’ —– (4)

The summation runs over all combinations of the two-particle states |i⟩ coupled to the angular momentum J′ and connected by the two-body interaction g. The analogy of this structure to the one schematically captured by the eq. (2) is evident. gJ’ii’ denote here the radial parts of the corresponding two-body matrix elements while cJαα’J’ii’ globally represent elements of the angular momentum recoupling geometry. gJ’ii’ are drawn from a Gaussian distribution while the geometry expressed by cJαα’J’ii’ enters explicitly. This originates from the fact that a quasi-random coupling of individual spins results in the so-called geometric chaoticity and thus cJαα’ coefficients are also Gaussian distributed. In this case, these two (gJ’ii’ and c) essentially random ingredients lead however to an order of magnitude larger separation of the ground state from the remaining states as compared to a pure Random Matrix Theory (RMT) limit. Due to more severe selection rules the effect of geometric chaoticity does not apply for J = 0. Consistently, the ground state energy gaps measured relative to the average level spacing characteristic for a given J is larger for J > 0 than for J = 0, and also J > 0 ground states are more orderly than those for J = 0, as it can be quantified in terms of the information entropy.

Interestingly, such reductions of dimensionality of the Hamiltonian matrix can also be seen locally in explicit calculations with realistic (non-random) nuclear interactions. A collective state, the one which turns out coherent with some operator representing physical external field, is always surrounded by a reduced density of states, i.e., it repells the other states. In all those cases, the global fluctuation characteristics remain however largely consistent with the corresponding version of the random matrix ensemble.

Recently, a broad arena of applicability of the random matrix theory opens in connection with the most complex systems known to exist in the universe. With no doubt, the most complex is the human’s brain and those phenomena that result from its activity. From the physics point of view the financial world, reflecting such an activity, is of particular interest because its characteristics are quantified directly in terms of numbers and a huge amount of electronically stored financial data is readily available. An access to a single brain activity is also possible by detecting the electric or magnetic fields generated by the neuronal currents. With the present day techniques of electro- or magnetoencephalography, in this way it is possible to generate the time series which resolve neuronal activity down to the scale of 1 ms.

One may debate over what is more complex, the human brain or the financial world, and there is no unique answer. It seems however to us that it is the financial world that is even more complex. After all, it involves the activity of many human brains and it seems even less predictable due to more frequent changes between different modes of action. Noise is of course owerwhelming in either of these systems, as it can be inferred from the structure of eigen-spectra of the correlation matrices taken across different space areas at the same time, or across different time intervals. There however always exist several well identifiable deviations, which, with help of reference to the universal characteristics of the random matrix theory, and with the methodology briefly reviewed above, can be classified as real correlations or collectivity. An easily identifiable gap between the corresponding eigenvalues of the correlation matrix and the bulk of its eigenspectrum plays the central role in this connection. The brain when responding to the sensory stimulations develops larger gaps than the brain at rest. The correlation matrix formalism in its most general asymmetric form allows to study also the time-delayed correlations, like the ones between the oposite hemispheres. The time-delay reflecting the maximum of correlation (time needed for an information to be transmitted between the different sensory areas in the brain is also associated with appearance of one significantly larger eigenvalue. Similar effects appear to govern formation of the heteropolymeric biomolecules. The ones that nature makes use of are separated by an energy gap from the purely random sequences.


Purely Random Correlations of the Matrix, or Studying Noise in Neural Networks


Expressed in the most general form, in essentially all the cases of practical interest, the n × n matrices W used to describe the complex system are by construction designed as

W = XYT —– (1)

where X and Y denote the rectangular n × m matrices. Such, for instance, are the correlation matrices whose standard form corresponds to Y = X. In this case one thinks of n observations or cases, each represented by a m dimensional row vector xi (yi), (i = 1, …, n), and typically m is larger than n. In the limit of purely random correlations the matrix W is then said to be a Wishart matrix. The resulting density ρW(λ) of eigenvalues is here known analytically, with the limits (λmin ≤ λ ≤ λmax) prescribed by

λmaxmin = 1+1/Q±2 1/Q and Q = m/n ≥ 1.

The variance of the elements of xi is here assumed unity.

The more general case, of X and Y different, results in asymmetric correlation matrices with complex eigenvalues λ. In this more general case a limiting distribution corresponding to purely random correlations seems not to be yet known analytically as a function of m/n. It indicates however that in the case of no correlations, quite generically, one may expect a largely uniform distribution of λ bound in an ellipse on the complex plane.

Further examples of matrices of similar structure, of great interest from the point of view of complexity, include the Hamiltonian matrices of strongly interacting quantum many body systems such as atomic nuclei. This holds true on the level of bound states where the problem is described by the Hermitian matrices, as well as for excitations embedded in the continuum. This later case can be formulated in terms of an open quantum system, which is represented by a complex non-Hermitian Hamiltonian matrix. Several neural network models also belong to this category of matrix structure. In this domain the reference is provided by the Gaussian (orthogonal, unitary, symplectic) ensembles of random matrices with the semi-circle law for the eigenvalue distribution. For the irreversible processes there exists their complex version with a special case, the so-called scattering ensemble, which accounts for S-matrix unitarity.

As it has already been expressed above, several variants of ensembles of the random matrices provide an appropriate and natural reference for quantifying various characteristics of complexity. The bulk of such characteristics is expected to be consistent with Random Matrix Theory (RMT), and in fact there exists strong evidence that it is. Once this is established, even more interesting are however deviations, especially those signaling emergence of synchronous or coherent patterns, i.e., the effects connected with the reduction of dimensionality. In the matrix terminology such patterns can thus be associated with a significantly reduced rank k (thus k ≪ n) of a leading component of W. A satisfactory structure of the matrix that would allow some coexistence of chaos or noise and of collectivity thus reads:

W = Wr + Wc —– (2)

Of course, in the absence of Wr, the second term (Wc) of W generates k nonzero eigenvalues, and all the remaining ones (n − k) constitute the zero modes. When Wr enters as a noise (random like matrix) correction, a trace of the above effect is expected to remain, i.e., k large eigenvalues and the bulk composed of n − k small eigenvalues whose distribution and fluctuations are consistent with an appropriate version of random matrix ensemble. One likely mechanism that may lead to such a segregation of eigenspectra is that m in eq. (1) is significantly smaller than n, or that the number of large components makes it effectively small on the level of large entries w of W. Such an effective reduction of m (M = meff) is then expressed by the following distribution P(w) of the large off-diagonal matrix elements in the case they are still generated by the random like processes

P(w) = (|w|(M-1)/2K(M-1)/2(|w|))/(2(M-1)/2Γ(M/2)√π) —– (3)

where K stands for the modified Bessel function. Asymptotically, for large w, this leads to P(w) ∼ e(−|w|) |w|M/2−1, and thus reflects an enhanced probability of appearence of a few large off-diagonal matrix elements as compared to a Gaussian distribution. As consistent with the central limit theorem the distribution quickly converges to a Gaussian with increasing M.

Based on several examples of natural complex dynamical systems, like the strongly interacting Fermi systems, the human brain and the financial markets, one could systematize evidence that such effects are indeed common to all the phenomena that intuitively can be qualified as complex.

Binary, Ternary Connect, Neural N/W Deep Learning & Eliminating Multiplications in Forward and Backward Pass

Consider a neural network layer with N input and M output units. The forward computation is y = h(W x + b) where W and b are weights and biases, respectively, h is the activation function, and x and y are the layer’s inputs and outputs. If we choose ReLU, or Rectified Linear Unit/Ramp Function as h, there will be no multiplications in computing the activation function, thus all multiplications reside in the matrix product W x. For each input vector x, N M floating point multiplications are needed.


Binary connect eliminates these multiplications by stochastically sampling weights to be −1 or 1. Full resolution weights w ̄ are kept in memory as reference, and each time when y is needed, we sample a stochastic weight matrix W according to w ̄. For each element of the sampled matrix W, the probability of getting a 1 is proportional to how “close” its corresponding entry in w ̄ is to 1. i.e.,

P(Wij = 1) = (w ̄ij+ 1)/2;

P(Wij = −1) = 1 − P(Wij = 1)

It is necessary to add some edge constraints to w ̄. To ensure that P(Wij = 1) lies in a reasonable range, values in w ̄ are forced to be a real value in the interval [-1, 1]. If during the updates any of its value grows beyond that interval, we set it to be its corresponding edge values −1 or 1. That way floating point multiplications become sign changes.

A remaining question concerns the use of multiplications in the random number generator involved in the sampling process. Sampling an integer has to be faster than multiplication for the algorithm to be worth it.

Moving on from binary to ternary connect, whereas in the former weights are allowed to be −1 or 1, in a trained neural network, it is common to observe that many learned weights are zero or close to zero. Although the stochastic sampling process would allow the mean value of sampled weights to be zero, this suggests that it may be beneficial to explicitly allow weights to be zero.

To allow weights to be zero, split the interval of [-1, 1], within which the full resolution weight value w ̄ lies, into two sub-intervals: [−1, 0] and (0, 1]. If a weight value w ̄ij drops into one of them, we sample w ̄ij to be the two edge values of that interval,

according to their distance from w ̄ij , i.e., if w ̄ij > 0:

P(Wij =1)= w ̄ij; P(Wij = 0) = 1−w ̄ij

and if

w ̄ij <=0:

P(Wij = −1) = −w ̄ij; P(Wij = 0) = 1 + w ̄ij

Like binary connect, ternary connect also eliminates all multiplications in the forward pass.

We move from the forward to the backward pass. Suppose the i-th layer of the network has N input and M output units, and consider an error signal δ propagating downward from its output. The updates for weights and biases would be the outer product of the layer’s input and the error signal:

∆W = ηδ◦h′ (W x + b) x

∆b = ηδ◦h (W x + b)

where η is the learning rate, and x the input to the layer. While propagating through the layers, the error signal δ needs to be updated, too. Its update taking into account the next layer below takes the form:

δ = WTδ◦h′ (W x + b)

Three terms appear repeatedly in the above three equations, viz. δ, h (W x + b) and x. The latter two terms introduce matrix outer products. To eliminate multiplications, one can quantize one of them to be an integer power of 2, so that multiplications involving that term become binary shifts. The expression h′ (W x + b) contains down flowing gradients, which are largely determined by the cost function and network parameters, thus it is hard to bound its values. However, bounding the values is essential for quantization because we need to supply a fixed number of bits for each sampled value, and if that value varies too much, we will need too many bits for the exponent. This, in turn, will result in the need for more bits to store the sampled value and unnecessarily increase the required amount of computation.

While h′ (W x + b) is not a good choice for quantization, x is a better choice, because it is the hidden representation at each layer, and we know roughly the distribution of each layer’s activation.

The approach is therefore to eliminate multiplications in

∆W = ηδ◦h′ (W x + b) x

by quantizing each entry in x to an integer power of 2. That way the outer product in

∆W = ηδ◦h′ (W x + b) x becomes a series of bit shifts. Experimentally, it is discovered that allowing a maximum of 3 to 4 bits of shift is sufficient to make the network work well. This means that 3 bits are already enough to quantize x. As the float 32 format has 24 bits of mantissa, shifting (to the left or right) by 3 to 4 bits is completely tolerable. This approach is referred to as “quantized back propagation”.

If we choose ReLU as the activation function and use binary (ternary) connect to sample W, computing the term h’ (W x + b) involves no multiplications at all. In addition, quantized back propagation eliminates the multiplications in the outer product in

∆W = ηδ◦h′ (W x + b) xT.

The only place where multiplications remain is the element-wise product. From

∆W = ηδ◦h′ (W x + b) xT, ∆b = ηδ◦h (W x + b), and  δ = WTδ◦h′ (W x + b), one can see that 6 × M multiplications are needed for all computations. Like in the forward pass, most of the multiplications are used in the weight updates. Compared with standard back propagation, which would need 2MN + 6M multiplications, the amount of multiplications left is negligible in quantized back propagation.

The Differentiated Hyperreality of Baudrillard


A sense of meaning for Baudrillard connotes a totality that is called knowledge and it is here that he differs significantly from someone like Foucault. For the latter, knowledge is a product of relations employing power, whereas for the former, any attempt to reach a finality or totality as he calls fit is always a flirtation with delusion. A delusion, since the human subject would always aim at understanding the human or non-human object, and, in the process the object would always be elusive since, it being based on signifiers would be vulnerable to a shift in significations. The two key ideas of Baudrillard are simulation and hyperreality. Simulation accords to representation of things such that they become the things represented, or in other words, representations gain priority over the “real” things. There are certain orders that define simulations viz. signs get to represent objective reality, signs veil reality, signs masking the absence of reality and signs turning into simulacra, since they have relation to reality thus ending up simulating a simulation. In Hegarty‘s reading of Baudrillard, there happen to be three types of simulacra each with a distinct historical epoch. The first is the pre-modern period, where the image marks the place for an item and hence the uniqueness of objects and situations marks them as irreproducibly real. The second is the modern period characterized by industrial revolution signifying the breaking down of distinctions between images and reality because of mass reproduction of copies or proliferation of commodities thus risking the essential existence of the original. The third is the post-modern period, where simulacra precedes the original and the distinction between reality and representation vanishes implying only the existence of simulacra and relegating reality as a vacuous concept. Hyperreality defines a condition wherein “reality” as known gets substituted by simulacra. This notion of Baudrillard is influenced by Canadian communication theorist and rhetorician Marshall McLuhan. Hyperreality with its insistence of signs and simulations fit perfectly in the post-modern era and therefore highlights the inability or shortcomings of consciousness to demarcate between reality and the phantasmatic space. In a quite remarkable analysis of Disneyland, Baudrillard (166-184) clarifies the notion of hyperreality, when he says,

The Disneyland imaginary is neither true nor false: it is a deterrence machine set in order to rejuvenate in reverse the fiction of the real. Whence the debility, the infantile degeneration of this imaginary. It’s meant to be an infantile world, in order to make us believe that adults are everywhere, in the “real” world and to conceal the fact that real childishness is everywhere, particularly among those adults who go there to act the child in order to foster illusion of their real childishness.

Although his initial ideas were affiliated with those of Marxism, he differed from Marx in his epitomizing consumption as the driving force of capitalism as compared to latter’s production. Another issue that was worked out remarkably in Baudrillard was historicity. Agreeing largely with Fukuyama’s notion of the end of history after the collapse of the communist block, Baudrillard only differed by placing importance on the idea of historical progress to have ended and not history necessarily. He forcefully makes the point of ending of history as also the ending of dustbins of history. His post-modern stand differed significantly with that of Lyotard’s in one major respect, despite finding common grounds elsewhere. Despite showing growing aversion to the theory of meta-narratives, Baudrillard, unlike Lyotard, reached a point of pragmatic reality within the confines of an excuse laden notion of universality that happened to be in vogue.

Baudrillard has been at the receiving end with some very extreme, acerbic criticisms directed at him. His writings are not just obscure, but also fail in many respects like defining certain concepts he employs, totalizing insights that have no substantial claim to conjectures, and often hinting strongly at apodicticity without paying due attention to the rival positions. This extremity reaches a culmination point when he is cited as a purveyor of reality-denying irrationalism. But not everything is to be looked at critically in his case and he does enjoy an established status as a transdisciplinary theorist, who, with his provocations have put traditional issues regarding modernity and philosophy in general at stake by providing insights into a better comprehensibility of cultural studies, sociology and philosophy. Most importantly, Baudrillard provides for autonomous and differentiated spaces in cultural, socio-economic and political domains by an implosive theory that cuts across boundaries of various disciplines paving the way for a new era in philosophical and social theory at large.