Nomological Progress on Theoretical Grounds – Underdetermined by Data. Thought of the Day 37.0


The most common version of empirical equivalence discussed by philosophers is the case of exact empirical equivalence for all models of two theories. The potential interest of such a scenario is evident: obviously there is no chance in any nomologically possible world that experimental progress will resolve the debate, while settling it on theoretical grounds might also be difficult. However, this scenario runs the risk that the two supposedly rival theories are in fact one and the same theory in different guises. Such identification was often made by those influenced by logical empiricism. A related weaker claim is made today by John Norton, namely, that for theories for which the

observational equivalence can be demonstrated by arguments brief enough to be included in a journal article . . . we cannot preclude the possibility that the theories are merely variant formulations of the same theory.

Norton evidently has in mind journal articles in the philosophy of science, not physics or some other science. While Norton aims to deny that the underdetermination of theories by data is generic and that philosophers’ algorithmic rivals carry much force, there are some serious candidates for underdetermination that arise from within real physics and that have not been discussed much, if at all, by philosophers. However, the examples available from real physics do seem sufficiently widespread and interesting that it might well frequently be the case that scientific, or rather physical, theories are permanently underdetermined by data…..

Apart from some exceptions of perhaps little physical importance (such as solutions of an Ashtekar formulation with a degenerate metric, for example), the various sets of variables for GTR (broadly construed in the fashion of physicists) are empirically equivalent in the sense that all or most solutions of one set of equations are suitably related (not always one-to-one) with solutions in other sets of variables. Physicists are generally not tempted to regard the resulting theory formulations as distinct theories, partly because their criteria for physical reality are attuned to this mathematical interrelation. Each description comes with an adequate recipe for distinguishing the physically meaningful from the descriptive fluff, and no further ontological questions are typically asked or answered. Physicists are also quite comfortable with a certain amount of vagueness or merely implicit specificity. For example, is a given energy condition, such as the weak energy condition, part of General Relativity or not? The answer to that question depends, at least, on whether ‘realistic’ matter fields satisfy the condition; but whether a certain kind of matter is realistic is malleable in light of both empirical factors (such as the apparent observation of dark energy in the late 1990s) and theoretical factors (such as recognition that seemingly tame matter fields or quantum fields violate an energy condition hitherto regarded as important). GTR for physicists is in effect a cluster of theories sharing a hard core including Einstein’s equations, while partially overlapping in including or failing to include various additional claims with various degrees of importance, not unlike a Lakatosian research program. Perhaps Arthur Fine (Arthur Fine-The shaky game_ Einstein, realism, and the quantum theory) would commend to philosophers the physicists’ approach, which sounds something like his Natural Ontological Attitude that there is no distinctively philosophical question about the real existence of entities employed in scientific theories, so neither realism nor anti-realism is an appropriate doctrine. Physicists typically assume some sort of mathematical equivalence as necessary and sufficient for two formulations to be the same theory (though strict equivalence is not always required).

Degeneracy: Mayhem While We’re Freezing and Starving (Jordan Peterson)


One of the reasons Marxism fails is because its cognitive model of the world – which even low IQ knuckleheads can grasp – does not map adequately on the reality of human nature and the means of production. But it would be a huge mistake to think that cognitive simplification is of exclusive domain of the liberal temperament, Conservatives are quite capable of “intuitively simplifying” as well. Which brings us to the psuedo-Right. If I had to define the Pseudo-Right, I would define it as those of a conservative disposition who refuse to acknowledge reality. Reality, in this instance, is not rhetorical “reality” but objective Truth, since it makes them feel bad – The Social Pathologist

Many of the Alt-Right, for instance, are quite happy with moral degeneracy provided its ethnically pure. The problem is that even an elementary understanding of history will show that no stable or prosperous society has ever been built on moral degeneracy. It’s a belief that is miscalibrated to reality. But lounging poolside in a white brothel sure feelz good.  Likewise those of a traditionalist disposition wondering why it all went to Hell in a handbasket fail to understand that many of their “traditional beliefs” were miscalibrated to reality, and had the rug pulled out from them when reality intervened. Change induces bad feelings and these feelings must be avoided. Hence no change.

Music Composition Using LSTM Recurrent Neural Networks


The most straight-forward way to compose music with an Recurrent Neural Network (RNN) is to use the network as single-step predictor. The network learns to predict notes at time t + 1 using notes at time t as inputs. After learning has been stopped the network can be seeded with initial input values – perhaps from training data – and can then generate novel compositions by using its own outputs to generate subsequent inputs. This note-by-note approach was first examined by Todd.

A feed-forward network would have no chance of composing music in this fashion. Lacking the ability to store any information about the past, such a network would be unable to keep track of where it is in a song. In principle an RNN does not suffer from this limitation. With recurrent connections it can use hidden layer activations as memory and thus is capable of exhibiting (seemingly arbitrary) temporal dynamics. In practice, however, RNNs do not perform very well at this task. As Mozer aptly wrote about his attempts to compose music with RNNs,

While the local contours made sense, the pieces were not musically coherent, lacking thematic structure and having minimal phrase structure and rhythmic organization.

The reason for this failure is likely linked to the problem of vanishing gradients (Hochreiter et al.) in RNNs. In gradient methods such as Back-Propagation Through Time (BPTT) (Williams and Zipser) and Real-Time Recurrent Learning (RTRL) error flow either vanishes quickly or explodes exponentially, making it impossible for the networks to deal correctly with long-term dependencies. In the case of music, long-term dependencies are at the heart of what defines a particular style, with events spanning several notes or even many bars contributing to the formation of metrical and phrasal structure. The clearest example of these dependencies are chord changes. In a musical form like early rock-and-roll music for example, the same chord can be held for four bars or more. Even if melodies are constrained to contain notes no shorter than an eighth note, a network must regularly and reliably bridge time spans of 32 events or more.

The most relevant previous research is that of Mozer, who did note-by-note composition of single-voice melodies accompanied by chords. In the “CONCERT” model, Mozer used sophisticated RNN procedures including BPTT, log-likelihood objective functions and probabilistic interpretation of the output values. In addition to these neural network methods, Mozer employed a psychologically-realistic distributed input encoding (Shepard) that gave the network an inductive bias towards chromatically and harmonically related notes. He used a second encoding method to generate distributed representations of chords.



A BPTT-trained RNN does a poor job of learning long-term dependencies. To offset this, Mozer used a distributed encoding of duration that allowed him to process a note of any duration in a single network timestep. By representing in a single timestep a note rather than a slice of time, the number of time steps to be bridged by the network in learning global structure is greatly reduced. For example, to allow sixteenth notes in a network which encodes slices of time directly requires that a whole note span at minimum 16 time steps. Though networks regularly outperformed third-order transition table approaches, they failed in all cases to find global structure. In analyzing this performance Mozer suggests that, for the note-by-note method to work it is necessary that the network can induce structure at multiple levels. A First Look at Music Composition using LSTM Recurrent Neural Networks