Music Composition Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

LSTM

The most straight-forward way to compose music with an Recurrent Neural Network (RNN) is to use the network as single-step predictor. The network learns to predict notes at time t + 1 using notes at time t as inputs. After learning has been stopped the network can be seeded with initial input values – perhaps from training data – and can then generate novel compositions by using its own outputs to generate subsequent inputs. This note-by-note approach was first examined by Todd.

A feed-forward network would have no chance of composing music in this fashion. Lacking the ability to store any information about the past, such a network would be unable to keep track of where it is in a song. In principle an RNN does not suffer from this limitation. With recurrent connections it can use hidden layer activations as memory and thus is capable of exhibiting (seemingly arbitrary) temporal dynamics. In practice, however, RNNs do not perform very well at this task. As Mozer aptly wrote about his attempts to compose music with RNNs,

While the local contours made sense, the pieces were not musically coherent, lacking thematic structure and having minimal phrase structure and rhythmic organization.

The reason for this failure is likely linked to the problem of vanishing gradients (Hochreiter et al.) in RNNs. In gradient methods such as Back-Propagation Through Time (BPTT) (Williams and Zipser) and Real-Time Recurrent Learning (RTRL) error flow either vanishes quickly or explodes exponentially, making it impossible for the networks to deal correctly with long-term dependencies. In the case of music, long-term dependencies are at the heart of what defines a particular style, with events spanning several notes or even many bars contributing to the formation of metrical and phrasal structure. The clearest example of these dependencies are chord changes. In a musical form like early rock-and-roll music for example, the same chord can be held for four bars or more. Even if melodies are constrained to contain notes no shorter than an eighth note, a network must regularly and reliably bridge time spans of 32 events or more.

The most relevant previous research is that of Mozer, who did note-by-note composition of single-voice melodies accompanied by chords. In the “CONCERT” model, Mozer used sophisticated RNN procedures including BPTT, log-likelihood objective functions and probabilistic interpretation of the output values. In addition to these neural network methods, Mozer employed a psychologically-realistic distributed input encoding (Shepard) that gave the network an inductive bias towards chromatically and harmonically related notes. He used a second encoding method to generate distributed representations of chords.

Untitled

 

A BPTT-trained RNN does a poor job of learning long-term dependencies. To offset this, Mozer used a distributed encoding of duration that allowed him to process a note of any duration in a single network timestep. By representing in a single timestep a note rather than a slice of time, the number of time steps to be bridged by the network in learning global structure is greatly reduced. For example, to allow sixteenth notes in a network which encodes slices of time directly requires that a whole note span at minimum 16 time steps. Though networks regularly outperformed third-order transition table approaches, they failed in all cases to find global structure. In analyzing this performance Mozer suggests that, for the note-by-note method to work it is necessary that the network can induce structure at multiple levels. A First Look at Music Composition using LSTM Recurrent Neural Networks

Dissipations – Bifurcations – Synchronicities. Thought of the Day 29.0

Deleuze’s thinking expounds on Bergson’s adaptation of multiplicities in step with the catastrophe theory, chaos theory, dissipative systems theory, and quantum theory of his era. For Bergson, hybrid scientific/philosophical methodologies were not viable. He advocated tandem explorations, the two “halves” of the Absolute “to which science and metaphysics correspond” as a way to conceive the relations of parallel domains. The distinctive creative processes of these disciplines remain irreconcilable differences-in-kind, commonly manifesting in lived experience. Bergson: Science is abstract, philosophy is concrete. Deleuze and Guattari: Science thinks the function, philosophy the concept. Bergson’s Intuition is a method of division. It differentiates tendencies, forces. Division bifurcates. Bifurcations are integral to contingency and difference in systems logic.

The branching of a solution into multiple solutions as a system is varied. This bifurcating principle is also known as contingency. Bifurcations mark a point or an event at which a system divides into two alternative behaviours. Each trajectory is possible. The line of flight actually followed is often indeterminate. This is the site of a contingency, were it a positionable “thing.” It is at once a unity, a dualism and a multiplicity:

Bifurcations are the manifestation of an intrinsic differentiation between parts of the system itself and the system and its environment. […] The temporal description of such systems involves both deterministic processes (between bifurcations) and probabilistic processes (in the choice of branches). There is also a historical dimension involved […] Once we have dissipative structures we can speak of self-organisation.

Untitled

Figure: In a dynamical system, a bifurcation is a period doubling, quadrupling, etc., that accompanies the onset of chaos. It represents the sudden appearance of a qualitatively different solution for a nonlinear system as some parameter is varied. The illustration above shows bifurcations (occurring at the location of the blue lines) of the logistic map as the parameter r is varied. Bifurcations come in four basic varieties: flip bifurcation, fold bifurcation, pitchfork bifurcation, and transcritical bifurcation. 

A bifurcation, according to Prigogine and Stengers, exhibits determinacy and choice. It pertains to critical points, to singular intensities and their division into multiplicities. The scientific term, bifurcation, can be substituted for differentiation when exploring processes of thought or as Massumi explains affect:

Affect and intensity […] is akin to what is called a critical point, or bifurcation point, or singular point, in chaos theory and the theory of dissipative structures. This is the turning point at which a physical system paradoxically embodies multiple and normally mutually exclusive potentials… 

The endless bifurcating division of progressive iterations, the making of multiplicities by continually differentiating binaries, by multiplying divisions of dualities – this is the ontological method of Bergson and Deleuze after him. Bifurcations diagram multiplicities, from monisms to dualisms, from differentiation to differenciation, creatively progressing. Manuel Delanda offers this account, which describes the additional technicality of control parameters, analogous to higher-level computer technologies that enable dynamic interaction. These protocols and variable control parameters are later discussed in detail in terms of media objects in the metaphorical state space of an in situ technology:

[…] for the purpose of defining an entity to replace essences, the aspect of state space that mattered was its singularities. One singularity (or set of singularities) may undergo a symmetry-breaking transition and be converted into another one. These transitions are called bifurcations and may be studied by adding to a particular state space one or more ‘control knobs’ (technically control parameters) which determine the strength of external shocks or perturbations to which the system being modeled may be subject.

Another useful example of bifurcation with respect to research in the neurological and cognitive sciences is Francesco Varela’s theory of the emergence of microidentities and microworlds. The ready-for-action neuronal clusters that produce microindentities, from moment to moment, are what he calls bifurcating “break- downs”. These critical events in which a path or microidentity is chosen are, by implication, creative:

Abstract Expressions of Time’s Modalities. Thought of the Day 21.0

00_Pask_Archtecture_of_Knowledge_24

According to Gregory Bateson,

What we mean by information — the elementary unit of information — is a difference which makes a difference, and it is able to make a difference because the neural pathways along which it travels and is continually transformed are themselves provided with energy. The pathways are ready to be triggered. We may even say that the question is already implicit in them.

In other words, we always need to know some second order logic, and presuppose a second order of “order” (cybernetics) usually shared within a distinct community, to realize what a certain claim, hypothesis or theory means. In Koichiro Matsuno’s opinion Bateson’s phrase

must be a prototypical example of second-order logic in that the difference appearing both in the subject and predicate can accept quantification. Most statements framed in second-order logic are not decidable. In order to make them decidable or meaningful, some qualifier needs to be used. A popular example of such a qualifier is a subjective observer. However, the point is that the subjective observer is not limited to Alice or Bob in the QBist parlance.

This is what is necessitated in order understand the different viewpoints in logic of mathematicians, physicists and philosophers in the dispute about the existence of time. An essential aspect of David Bohm‘s “implicate order” can be seen in the grammatical formulation of theses such as the law of motion:

While it is legitimate in its own light, the physical law of motion alone framed in eternal time referable in the present tense, whether in classical or quantum mechanics, is not competent enough to address how the now could be experienced. … Measurement differs from the physical law of motion as much as the now in experience differs from the present tense in description. The watershed separating between measurement and the law of motion is in the distinction between the now and the present tense. Measurement is thus subjective and agential in making a punctuation at the moment of now. (Matsuno)

The distinction between experiencing and capturing experience of time in terms of language is made explicit in Heidegger’s Being and Time

… by passing away constantly, time remains as time. To remain means: not to disappear, thus, to presence. Thus time is determined by a kind of Being. How, then, is Being supposed to be determined by time?

Koichiro Matsuno’s comment on this is:

Time passing away is an abstraction from accepting the distinction of the grammatical tenses, while time remaining as time refers to the temporality of the durable now prior to the abstraction of the tenses.

Therefore, when trying to understand the “local logics/phenomenologies” of the individual disciplines (mathematics physics, philosophy, etc., including their fields), one should be aware of the fact that the capabilities of our scientific language are not limitless:

…the now of the present moment is movable and dynamic in updating the present perfect tense in the present progressive tense. That is to say, the now is prior and all of the grammatical tenses including the ubiquitous present tense are the abstract derivatives from the durable now. (Matsuno)

This presupposes the adequacy of mathematical abstractions specifically invented or adopted and elaborated for the expression of more sophisticated modalities of time’s now than those currently used in such formalisms as temporal logic.

Osteo Myological Quantization. Note Quote.

The site of the parameters in a higher order space can also be quantized into segments, the limits of which can be no more decomposed. Such a limit may be nearly a rigid piece. In the animal body such quanta cannot but be bone pieces forming parts of the skeleton, whether lying internally as [endo]-skeleton or as almost rigid shell covering the body as external skeleton.

Note the partition of the body into three main segments: Head (cephalique), pectral (breast), caudal (tail), materializing the KH order limit M>= 3 or the KHK dimensional limit N>= 3. Notice also the quantization into more macroscopic segments such as of the abdominal part into several smaller segments beyond the KHK lower bound N=3. Lateral symmetry with a symmetry axis is remarkable. This is of course an indispensable consequence of the modified Zermelo conditions, which entails also locomotive appendages differentiating into legs for walking and wings for flying in the case of insects.

alchemical_transmutation_mandala

Two paragraphs of Kondo addressing the simple issues of what bones are, mammalian bi-lateral symmetry, the numbers of major body parts and their segmentation, the notion of the mathematical origins of wings, legs and arms. The dimensionality of eggs being zero, hence their need of warmth for progression to locomotion and the dimensionality of snakes being one, hence their mode of locomotion. A feature of the biological is their attention to detail, their use of line art to depict the various forms of living being – from birds to starfish to dinosaurs, the use of the full latin terminology and at all times the relationship of the various form of living being to the underlying higher order geometry and the mathematical notion of principle ideals. The human skeleton is treated as a hierarchical Kawaguchi tree with its characteristic three pronged form. The Riemannian arc length of the curve k(t) is given by the integral of the square root of a quadratic form in x’ with coefficients dependent in x’. This integrand is homogenous of the first order in x’. If we drop the quadratic property and retain the homogeneity, then we obtain the Finsler geometry. Kawaguchi geometry supposes that the integrand depends upon the higher derivatives x’’ up to the k-th derivative xk. The notation that Kondo uses is:

K(M)L,N

For:

L Parameters N Dimensions M Derivatives

The lower part of the skeleton can be divided into three prongs, each starting from the centre as a single parametric Kawaguchi tree.

…the skeletal, muscular, gastrointestinal, circulation systems etc combine into a holo-parametric whole that can be more generally quantized, each quantum involving some osteological, neural, circulatory functions etc.

…thus globally the human body from head through trunk to limbs are quantized into a finite number of quanta.

The Semiotic Theory of Autopoiesis, OR, New Level Emergentism

higher-consciousness

The dynamics of all the life-cycle meaning processes can be described in terms of basic semiotic components, algebraic constructions of the following forms:

Pnn:fnn] → Ξn+1)

where Ξn is a sign system corresponding to a representation of a (design) problem at time t1, Ξn+1 is a sign system corresponding to a representation of the problem at time t2, t2 > t1, fn is a composition of semiotic morphisms that specifies the interaction of variation and selection under the condition of information closure, which requires no external elements be added to the current sign system; мn is a semiotic morphism, and Pn is the probability associated with мn, ΣPn = 1, n=1,…,M, where M is the number of the meaningful transformations of the resultant sign system after fn. There is a partial ranking – importance ordering – on the constraints of A in every Ξn, such that lower ranked constraints can be violated in order for higher ranked constraints to be satisfied. The morphisms of fn preserve the ranking.

The Semiotic Theory of Self-Organizing Systems postulates that in the scale hierarchy of dynamical organization, a new level emerges if and only if a new level in the hierarchy of semiotic interpretance emerges. As the development of a new product always and naturally causes the emergence of a new meaning, the above-cited Principle of Emergence directly leads us to the formulation of the first law of life-cycle semiosis as follows:

I. The semiosis of a product life cycle is represented by a sequence of basic semiotic components, such that at least one of the components is well defined in the sense that not all of its morphisms of м and f are isomorphisms, and at least one м in the sequence is not level-preserving in the sense that it does not preserve the original partial ordering on levels.

For the present (i.e. for an on-going process), there exists a probability distribution over the possible мn for every component in the sequence. For the past (i.e. retrospectively), each of the distributions collapses to a single mapping with Pn = 1, while the sequence of basic semiotic components is degenerated to a sequence of functions. For the future, the life-cycle meaning-making

Comment on Purely Random Correlations of the Matrix, or Studying Noise in Neural Networks

ED_Matrix

In the presence of two-body interactions the many-body Hamiltonian matrix elements vJα,α′ of good total angular momentum J in the shell-model basis |α⟩ generated by the mean field, can be expressed as follows:

vJα,α′ = ∑J’ii’ cJαα’J’ii’ gJ’ii’ —– (4)

The summation runs over all combinations of the two-particle states |i⟩ coupled to the angular momentum J′ and connected by the two-body interaction g. The analogy of this structure to the one schematically captured by the eq. (2) is evident. gJ’ii’ denote here the radial parts of the corresponding two-body matrix elements while cJαα’J’ii’ globally represent elements of the angular momentum recoupling geometry. gJ’ii’ are drawn from a Gaussian distribution while the geometry expressed by cJαα’J’ii’ enters explicitly. This originates from the fact that a quasi-random coupling of individual spins results in the so-called geometric chaoticity and thus cJαα’ coefficients are also Gaussian distributed. In this case, these two (gJ’ii’ and c) essentially random ingredients lead however to an order of magnitude larger separation of the ground state from the remaining states as compared to a pure Random Matrix Theory (RMT) limit. Due to more severe selection rules the effect of geometric chaoticity does not apply for J = 0. Consistently, the ground state energy gaps measured relative to the average level spacing characteristic for a given J is larger for J > 0 than for J = 0, and also J > 0 ground states are more orderly than those for J = 0, as it can be quantified in terms of the information entropy.

Interestingly, such reductions of dimensionality of the Hamiltonian matrix can also be seen locally in explicit calculations with realistic (non-random) nuclear interactions. A collective state, the one which turns out coherent with some operator representing physical external field, is always surrounded by a reduced density of states, i.e., it repells the other states. In all those cases, the global fluctuation characteristics remain however largely consistent with the corresponding version of the random matrix ensemble.

Recently, a broad arena of applicability of the random matrix theory opens in connection with the most complex systems known to exist in the universe. With no doubt, the most complex is the human’s brain and those phenomena that result from its activity. From the physics point of view the financial world, reflecting such an activity, is of particular interest because its characteristics are quantified directly in terms of numbers and a huge amount of electronically stored financial data is readily available. An access to a single brain activity is also possible by detecting the electric or magnetic fields generated by the neuronal currents. With the present day techniques of electro- or magnetoencephalography, in this way it is possible to generate the time series which resolve neuronal activity down to the scale of 1 ms.

One may debate over what is more complex, the human brain or the financial world, and there is no unique answer. It seems however to us that it is the financial world that is even more complex. After all, it involves the activity of many human brains and it seems even less predictable due to more frequent changes between different modes of action. Noise is of course owerwhelming in either of these systems, as it can be inferred from the structure of eigen-spectra of the correlation matrices taken across different space areas at the same time, or across different time intervals. There however always exist several well identifiable deviations, which, with help of reference to the universal characteristics of the random matrix theory, and with the methodology briefly reviewed above, can be classified as real correlations or collectivity. An easily identifiable gap between the corresponding eigenvalues of the correlation matrix and the bulk of its eigenspectrum plays the central role in this connection. The brain when responding to the sensory stimulations develops larger gaps than the brain at rest. The correlation matrix formalism in its most general asymmetric form allows to study also the time-delayed correlations, like the ones between the oposite hemispheres. The time-delay reflecting the maximum of correlation (time needed for an information to be transmitted between the different sensory areas in the brain is also associated with appearance of one significantly larger eigenvalue. Similar effects appear to govern formation of the heteropolymeric biomolecules. The ones that nature makes use of are separated by an energy gap from the purely random sequences.

 

Purely Random Correlations of the Matrix, or Studying Noise in Neural Networks

2-Figure1-1

Expressed in the most general form, in essentially all the cases of practical interest, the n × n matrices W used to describe the complex system are by construction designed as

W = XYT —– (1)

where X and Y denote the rectangular n × m matrices. Such, for instance, are the correlation matrices whose standard form corresponds to Y = X. In this case one thinks of n observations or cases, each represented by a m dimensional row vector xi (yi), (i = 1, …, n), and typically m is larger than n. In the limit of purely random correlations the matrix W is then said to be a Wishart matrix. The resulting density ρW(λ) of eigenvalues is here known analytically, with the limits (λmin ≤ λ ≤ λmax) prescribed by

λmaxmin = 1+1/Q±2 1/Q and Q = m/n ≥ 1.

The variance of the elements of xi is here assumed unity.

The more general case, of X and Y different, results in asymmetric correlation matrices with complex eigenvalues λ. In this more general case a limiting distribution corresponding to purely random correlations seems not to be yet known analytically as a function of m/n. It indicates however that in the case of no correlations, quite generically, one may expect a largely uniform distribution of λ bound in an ellipse on the complex plane.

Further examples of matrices of similar structure, of great interest from the point of view of complexity, include the Hamiltonian matrices of strongly interacting quantum many body systems such as atomic nuclei. This holds true on the level of bound states where the problem is described by the Hermitian matrices, as well as for excitations embedded in the continuum. This later case can be formulated in terms of an open quantum system, which is represented by a complex non-Hermitian Hamiltonian matrix. Several neural network models also belong to this category of matrix structure. In this domain the reference is provided by the Gaussian (orthogonal, unitary, symplectic) ensembles of random matrices with the semi-circle law for the eigenvalue distribution. For the irreversible processes there exists their complex version with a special case, the so-called scattering ensemble, which accounts for S-matrix unitarity.

As it has already been expressed above, several variants of ensembles of the random matrices provide an appropriate and natural reference for quantifying various characteristics of complexity. The bulk of such characteristics is expected to be consistent with Random Matrix Theory (RMT), and in fact there exists strong evidence that it is. Once this is established, even more interesting are however deviations, especially those signaling emergence of synchronous or coherent patterns, i.e., the effects connected with the reduction of dimensionality. In the matrix terminology such patterns can thus be associated with a significantly reduced rank k (thus k ≪ n) of a leading component of W. A satisfactory structure of the matrix that would allow some coexistence of chaos or noise and of collectivity thus reads:

W = Wr + Wc —– (2)

Of course, in the absence of Wr, the second term (Wc) of W generates k nonzero eigenvalues, and all the remaining ones (n − k) constitute the zero modes. When Wr enters as a noise (random like matrix) correction, a trace of the above effect is expected to remain, i.e., k large eigenvalues and the bulk composed of n − k small eigenvalues whose distribution and fluctuations are consistent with an appropriate version of random matrix ensemble. One likely mechanism that may lead to such a segregation of eigenspectra is that m in eq. (1) is significantly smaller than n, or that the number of large components makes it effectively small on the level of large entries w of W. Such an effective reduction of m (M = meff) is then expressed by the following distribution P(w) of the large off-diagonal matrix elements in the case they are still generated by the random like processes

P(w) = (|w|(M-1)/2K(M-1)/2(|w|))/(2(M-1)/2Γ(M/2)√π) —– (3)

where K stands for the modified Bessel function. Asymptotically, for large w, this leads to P(w) ∼ e(−|w|) |w|M/2−1, and thus reflects an enhanced probability of appearence of a few large off-diagonal matrix elements as compared to a Gaussian distribution. As consistent with the central limit theorem the distribution quickly converges to a Gaussian with increasing M.

Based on several examples of natural complex dynamical systems, like the strongly interacting Fermi systems, the human brain and the financial markets, one could systematize evidence that such effects are indeed common to all the phenomena that intuitively can be qualified as complex.

Binary, Ternary Connect, Neural N/W Deep Learning & Eliminating Multiplications in Forward and Backward Pass

Consider a neural network layer with N input and M output units. The forward computation is y = h(W x + b) where W and b are weights and biases, respectively, h is the activation function, and x and y are the layer’s inputs and outputs. If we choose ReLU, or Rectified Linear Unit/Ramp Function as h, there will be no multiplications in computing the activation function, thus all multiplications reside in the matrix product W x. For each input vector x, N M floating point multiplications are needed.

relufamily

Binary connect eliminates these multiplications by stochastically sampling weights to be −1 or 1. Full resolution weights w ̄ are kept in memory as reference, and each time when y is needed, we sample a stochastic weight matrix W according to w ̄. For each element of the sampled matrix W, the probability of getting a 1 is proportional to how “close” its corresponding entry in w ̄ is to 1. i.e.,

P(Wij = 1) = (w ̄ij+ 1)/2;

P(Wij = −1) = 1 − P(Wij = 1)

It is necessary to add some edge constraints to w ̄. To ensure that P(Wij = 1) lies in a reasonable range, values in w ̄ are forced to be a real value in the interval [-1, 1]. If during the updates any of its value grows beyond that interval, we set it to be its corresponding edge values −1 or 1. That way floating point multiplications become sign changes.

A remaining question concerns the use of multiplications in the random number generator involved in the sampling process. Sampling an integer has to be faster than multiplication for the algorithm to be worth it.

Moving on from binary to ternary connect, whereas in the former weights are allowed to be −1 or 1, in a trained neural network, it is common to observe that many learned weights are zero or close to zero. Although the stochastic sampling process would allow the mean value of sampled weights to be zero, this suggests that it may be beneficial to explicitly allow weights to be zero.

To allow weights to be zero, split the interval of [-1, 1], within which the full resolution weight value w ̄ lies, into two sub-intervals: [−1, 0] and (0, 1]. If a weight value w ̄ij drops into one of them, we sample w ̄ij to be the two edge values of that interval,

according to their distance from w ̄ij , i.e., if w ̄ij > 0:

P(Wij =1)= w ̄ij; P(Wij = 0) = 1−w ̄ij

and if

w ̄ij <=0:

P(Wij = −1) = −w ̄ij; P(Wij = 0) = 1 + w ̄ij

Like binary connect, ternary connect also eliminates all multiplications in the forward pass.

We move from the forward to the backward pass. Suppose the i-th layer of the network has N input and M output units, and consider an error signal δ propagating downward from its output. The updates for weights and biases would be the outer product of the layer’s input and the error signal:

∆W = ηδ◦h′ (W x + b) x

∆b = ηδ◦h (W x + b)

where η is the learning rate, and x the input to the layer. While propagating through the layers, the error signal δ needs to be updated, too. Its update taking into account the next layer below takes the form:

δ = WTδ◦h′ (W x + b)

Three terms appear repeatedly in the above three equations, viz. δ, h (W x + b) and x. The latter two terms introduce matrix outer products. To eliminate multiplications, one can quantize one of them to be an integer power of 2, so that multiplications involving that term become binary shifts. The expression h′ (W x + b) contains down flowing gradients, which are largely determined by the cost function and network parameters, thus it is hard to bound its values. However, bounding the values is essential for quantization because we need to supply a fixed number of bits for each sampled value, and if that value varies too much, we will need too many bits for the exponent. This, in turn, will result in the need for more bits to store the sampled value and unnecessarily increase the required amount of computation.

While h′ (W x + b) is not a good choice for quantization, x is a better choice, because it is the hidden representation at each layer, and we know roughly the distribution of each layer’s activation.

The approach is therefore to eliminate multiplications in

∆W = ηδ◦h′ (W x + b) x

by quantizing each entry in x to an integer power of 2. That way the outer product in

∆W = ηδ◦h′ (W x + b) x becomes a series of bit shifts. Experimentally, it is discovered that allowing a maximum of 3 to 4 bits of shift is sufficient to make the network work well. This means that 3 bits are already enough to quantize x. As the float 32 format has 24 bits of mantissa, shifting (to the left or right) by 3 to 4 bits is completely tolerable. This approach is referred to as “quantized back propagation”.

If we choose ReLU as the activation function and use binary (ternary) connect to sample W, computing the term h’ (W x + b) involves no multiplications at all. In addition, quantized back propagation eliminates the multiplications in the outer product in

∆W = ηδ◦h′ (W x + b) xT.

The only place where multiplications remain is the element-wise product. From

∆W = ηδ◦h′ (W x + b) xT, ∆b = ηδ◦h (W x + b), and  δ = WTδ◦h′ (W x + b), one can see that 6 × M multiplications are needed for all computations. Like in the forward pass, most of the multiplications are used in the weight updates. Compared with standard back propagation, which would need 2MN + 6M multiplications, the amount of multiplications left is negligible in quantized back propagation.

The Differentiated Hyperreality of Baudrillard

maxence-parache-s-journey-into-hyper-reality-yatzer-12

A sense of meaning for Baudrillard connotes a totality that is called knowledge and it is here that he differs significantly from someone like Foucault. For the latter, knowledge is a product of relations employing power, whereas for the former, any attempt to reach a finality or totality as he calls fit is always a flirtation with delusion. A delusion, since the human subject would always aim at understanding the human or non-human object, and, in the process the object would always be elusive since, it being based on signifiers would be vulnerable to a shift in significations. The two key ideas of Baudrillard are simulation and hyperreality. Simulation accords to representation of things such that they become the things represented, or in other words, representations gain priority over the “real” things. There are certain orders that define simulations viz. signs get to represent objective reality, signs veil reality, signs masking the absence of reality and signs turning into simulacra, since they have relation to reality thus ending up simulating a simulation. In Hegarty‘s reading of Baudrillard, there happen to be three types of simulacra each with a distinct historical epoch. The first is the pre-modern period, where the image marks the place for an item and hence the uniqueness of objects and situations marks them as irreproducibly real. The second is the modern period characterized by industrial revolution signifying the breaking down of distinctions between images and reality because of mass reproduction of copies or proliferation of commodities thus risking the essential existence of the original. The third is the post-modern period, where simulacra precedes the original and the distinction between reality and representation vanishes implying only the existence of simulacra and relegating reality as a vacuous concept. Hyperreality defines a condition wherein “reality” as known gets substituted by simulacra. This notion of Baudrillard is influenced by Canadian communication theorist and rhetorician Marshall McLuhan. Hyperreality with its insistence of signs and simulations fit perfectly in the post-modern era and therefore highlights the inability or shortcomings of consciousness to demarcate between reality and the phantasmatic space. In a quite remarkable analysis of Disneyland, Baudrillard (166-184) clarifies the notion of hyperreality, when he says,

The Disneyland imaginary is neither true nor false: it is a deterrence machine set in order to rejuvenate in reverse the fiction of the real. Whence the debility, the infantile degeneration of this imaginary. It’s meant to be an infantile world, in order to make us believe that adults are everywhere, in the “real” world and to conceal the fact that real childishness is everywhere, particularly among those adults who go there to act the child in order to foster illusion of their real childishness.

Although his initial ideas were affiliated with those of Marxism, he differed from Marx in his epitomizing consumption as the driving force of capitalism as compared to latter’s production. Another issue that was worked out remarkably in Baudrillard was historicity. Agreeing largely with Fukuyama’s notion of the end of history after the collapse of the communist block, Baudrillard only differed by placing importance on the idea of historical progress to have ended and not history necessarily. He forcefully makes the point of ending of history as also the ending of dustbins of history. His post-modern stand differed significantly with that of Lyotard’s in one major respect, despite finding common grounds elsewhere. Despite showing growing aversion to the theory of meta-narratives, Baudrillard, unlike Lyotard, reached a point of pragmatic reality within the confines of an excuse laden notion of universality that happened to be in vogue.

Baudrillard has been at the receiving end with some very extreme, acerbic criticisms directed at him. His writings are not just obscure, but also fail in many respects like defining certain concepts he employs, totalizing insights that have no substantial claim to conjectures, and often hinting strongly at apodicticity without paying due attention to the rival positions. This extremity reaches a culmination point when he is cited as a purveyor of reality-denying irrationalism. But not everything is to be looked at critically in his case and he does enjoy an established status as a transdisciplinary theorist, who, with his provocations have put traditional issues regarding modernity and philosophy in general at stake by providing insights into a better comprehensibility of cultural studies, sociology and philosophy. Most importantly, Baudrillard provides for autonomous and differentiated spaces in cultural, socio-economic and political domains by an implosive theory that cuts across boundaries of various disciplines paving the way for a new era in philosophical and social theory at large.