Reductionism of Numerical Complexity: A Wittgensteinian Excursion

boyle10

Wittgenstein’s criticism of Russell’s logicist foundation of mathematics contained in (Remarks on the Foundation of Mathematics) consists in saying that it is not the formalized version of mathematical deduction which vouches for the validity of the intuitive version but conversely.

If someone tries to shew that mathematics is not logic, what is he trying to shew? He is surely trying to say something like: If tables, chairs, cupboards, etc. are swathed in enough paper, certainly they will look spherical in the end.

He is not trying to shew that it is impossible that, for every mathematical proof, a Russellian proof can be constructed which (somehow) ‘corresponds’ to it, but rather that the acceptance of such a correspondence does not lean on logic.

Taking up Wittgenstein’s criticism, Hao Wang (Computation, Logic, Philosophy) discusses the view that mathematics “is” axiomatic set theory as one of several possible answers to the question “What is mathematics?”. Wang points out that this view is epistemologically worthless, at least as far as the task of understanding the feature of cognition guiding is concerned:

Mathematics is axiomatic set theory. In a definite sense, all mathematics can be derived from axiomatic set theory. [ . . . ] There are several objections to this identification. [ . . . ] This view leaves unexplained why, of all the possible consequences of set theory, we select only those which happen to be our mathematics today, and why certain mathematical concepts are more interesting than others. It does not help to give us an intuitive grasp of mathematics such as that possessed by a powerful mathematician. By burying, e.g., the individuality of natural numbers, it seeks to explain the more basic and the clearer by the more obscure. It is a little analogous to asserting that all physical objects, such as tables, chairs, etc., are spherical if we swathe them with enough stuff.

Reductionism is an age-old project; a close forerunner of its incarnation in set theory was the arithmetization program of the 19th century. It is interesting that one of its prominent representatives, Richard Dedekind (Essays on the Theory of Numbers), exhibited a quite distanced attitude towards a consequent carrying out of the program:

It appears as something self-evident and not new that every theorem of algebra and higher analysis, no matter how remote, can be expressed as a theorem about natural numbers [ . . . ] But I see nothing meritorious [ . . . ] in actually performing this wearisome circumlocution and insisting on the use and recognition of no other than rational numbers.

Perec wrote a detective novel without using the letter ‘e’ (La disparition, English A void), thus proving not only that such an enormous enterprise is indeed possible but also that formal constraints sometimes have great aesthetic appeal. The translation of mathematical propositions into a poorer linguistic framework can easily be compared with such painful lipogrammatical exercises. In principle all logical connectives can be simulated in a framework exclusively using Sheffer’s stroke, and all cuts (in Gentzen’s sense) can be eliminated; one can do without common language at all in mathematics and formalize everything and so on: in principle, one could leave out a whole lot of things. However, in doing so one would depart from the true way of thinking employed by the mathematician (who really uses “and” and “not” and cuts and who does not reduce many things to formal systems). Obviously, it is the proof theorist as a working mathematician who is interested in things like the reduction to Sheffer’s stroke since they allow for more concise proofs by induction in the analysis of a logical calculus. Hence this proof theorist has much the same motives as a mathematician working on other problems who avoids a completely formalized treatment of these problems since he is not interested in the proof-theoretical aspect.

There might be quite similar reasons for the interest of some set theorists in expressing usual mathematical constructions exclusively with the expressive means of ZF (i.e., in terms of ∈). But beyond this, is there any philosophical interpretation of such a reduction? In the last analysis, mathematicians always transform (and that means: change) their objects of study in order to make them accessible to certain mathematical treatments. If one considers a mathematical concept as a tool, one does not only use it in a way different from the one in which it would be used if it were considered as an object; moreover, in semiotical representation of it, it is given a form which is different in both cases. In this sense, the proof theorist has to “change” the mathematical proof (which is his or her object of study to be treated with mathematical tools). When stating that something is used as object or as tool, we have always to ask: in which situation, or: by whom.

A second observation is that the translation of propositional formulæ in terms of Sheffer’s stroke in general yields quite complicated new formulæ. What is “simple” here is the particularly small number of symbols needed; but neither the semantics becomes clearer (p|q means “not both p and q”; cognitively, this looks more complex than “p and q” and so on), nor are the formulæ you get “short”. What is looked for in this case, hence, is a reduction of numerical complexity, while the primitive basis attained by the reduction cognitively looks less “natural” than the original situation (or, as Peirce expressed it, “the consciousness in the determined cognition is more lively than in the cognition which determines it”); similarly in the case of cut elimination. In contrast to this, many philosophers are convinced that the primitive basis of operating with sets constitutes really a “natural” basis of mathematical thinking, i.e., such operations are seen as the “standard bricks” of which this thinking is actually made – while no one will reasonably claim that expressions of the type p|q play a similar role for propositional logic. And yet: reduction to set theory does not really have the task of “explanation”. It is true, one thus reduces propositions about “complex” objects to propositions about “simple” objects; the propositions themselves, however, thus become in general more complex. Couched in Fregean terms, one can perhaps more easily grasp their denotation (since the denotation of a proposition is its truth value) but not their meaning. A more involved conceptual framework, however, might lead to simpler propositions (and in most cases has actually just been introduced in order to do so). A parallel argument concerns deductions: in its totality, a deduction becomes more complex (and less intelligible) by a decomposition into elementary steps.

Now, it will be subject to discussion whether in the case of some set operations it is admissible at all to claim that they are basic for thinking (which is certainly true in the case of the connectives of propositional logic). It is perfectly possible that the common sense which organizes the acceptance of certain operations as a natural basis relies on something different, not having the character of some eternal laws of thought: it relies on training.

Is it possible to observe that a surface is coloured red and blue; and not to observe that it is red? Imagine a kind of colour adjective were used for things that are half red and half blue: they are said to be ‘bu’. Now might not someone to be trained to observe whether something is bu; and not to observe whether it is also red? Such a man would then only know how to report: “bu” or “not bu”. And from the first report we could draw the conclusion that the thing was partly red.

Music Composition Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

LSTM

The most straight-forward way to compose music with an Recurrent Neural Network (RNN) is to use the network as single-step predictor. The network learns to predict notes at time t + 1 using notes at time t as inputs. After learning has been stopped the network can be seeded with initial input values – perhaps from training data – and can then generate novel compositions by using its own outputs to generate subsequent inputs. This note-by-note approach was first examined by Todd.

A feed-forward network would have no chance of composing music in this fashion. Lacking the ability to store any information about the past, such a network would be unable to keep track of where it is in a song. In principle an RNN does not suffer from this limitation. With recurrent connections it can use hidden layer activations as memory and thus is capable of exhibiting (seemingly arbitrary) temporal dynamics. In practice, however, RNNs do not perform very well at this task. As Mozer aptly wrote about his attempts to compose music with RNNs,

While the local contours made sense, the pieces were not musically coherent, lacking thematic structure and having minimal phrase structure and rhythmic organization.

The reason for this failure is likely linked to the problem of vanishing gradients (Hochreiter et al.) in RNNs. In gradient methods such as Back-Propagation Through Time (BPTT) (Williams and Zipser) and Real-Time Recurrent Learning (RTRL) error flow either vanishes quickly or explodes exponentially, making it impossible for the networks to deal correctly with long-term dependencies. In the case of music, long-term dependencies are at the heart of what defines a particular style, with events spanning several notes or even many bars contributing to the formation of metrical and phrasal structure. The clearest example of these dependencies are chord changes. In a musical form like early rock-and-roll music for example, the same chord can be held for four bars or more. Even if melodies are constrained to contain notes no shorter than an eighth note, a network must regularly and reliably bridge time spans of 32 events or more.

The most relevant previous research is that of Mozer, who did note-by-note composition of single-voice melodies accompanied by chords. In the “CONCERT” model, Mozer used sophisticated RNN procedures including BPTT, log-likelihood objective functions and probabilistic interpretation of the output values. In addition to these neural network methods, Mozer employed a psychologically-realistic distributed input encoding (Shepard) that gave the network an inductive bias towards chromatically and harmonically related notes. He used a second encoding method to generate distributed representations of chords.

Untitled

 

A BPTT-trained RNN does a poor job of learning long-term dependencies. To offset this, Mozer used a distributed encoding of duration that allowed him to process a note of any duration in a single network timestep. By representing in a single timestep a note rather than a slice of time, the number of time steps to be bridged by the network in learning global structure is greatly reduced. For example, to allow sixteenth notes in a network which encodes slices of time directly requires that a whole note span at minimum 16 time steps. Though networks regularly outperformed third-order transition table approaches, they failed in all cases to find global structure. In analyzing this performance Mozer suggests that, for the note-by-note method to work it is necessary that the network can induce structure at multiple levels. A First Look at Music Composition using LSTM Recurrent Neural Networks