In a network that does not recognize a node’s relation with any specific concept, the hidden layer is populated by neurons, in a way that architecturally blue prints each neuron as connected with every input layer node. What happens to information passed into the network is interesting from the point of view of distribution over all of the neurons that populate the hidden layer. This distribution over the domain strips any particular neuron within the hidden layer of any privileged status, in turn meaning no privileged (ontological) status for any weights and nodes as well. With the absence of any privileged status accorded to nodes, weights and even neurons in the hidden layer, representation comes to mean something entirely different as compared with what it normally meant in semantic networks, and that being representations are not representative of any coherent concept. Such a scenario is representational with sub-symbolic features, and since all the weights have a share in participation, each time a network is faced with something like pattern recognition, the representation is what is called distributed representation. The best example of such a distributed representation is a multilayer perceptron, which is a feedforward artificial neural network that maps sets of input data onto a set of appropriate output, and finds its use in image recognition, pattern recognition and even speech recognition.

A multilayer perceptron is characterized by each neuron as using a nonlinear activation function to model firing of biological neurons in the brain. The activation functions for the current application are sigmoids, and the equations are:

**Ф (y _{i}) = tanh (v_{i}) and Ф (y_{i}) = (1 + e^{-v}_{i})_{-1}**

where, y_{i} is the output of the ith neuron, and v_{i} is the weighted sum of the input synapses, and the former function is a hyperbolic tangent in the range of -1 to +1, and the latter is equivalent in shape but ranging from 0 to +1. Learning takes place through backpropagation. The connection weights are changed, after adjustments are made in the output compared with the expected result. To be on the technical side, let us see how backpropagation is responsible for learning to take place in the multilayer perceptron.

Error in the output node j in the nth data point is represented by,

**e _{j} (n) = d_{j} (n) – y_{j} (n)**,

where, d is the target value and y is the value produced by the perceptron. Corrections to the weights of the nodes that minimize the error in the entire output is made by,

**ξ(n)=0.5 * ∑_{j }e^{2}_{j} (n)**

With the help of gradient descent, the change in each weight happens to be given by,

**∆ w _{ji} (n)=−η * (δξ(n)/δv_{j} (n)) * y_{i} (n)**

where, y_{i} is the output of the previous neuron, and η is the learning rate that is carefully selected to make sure that weights converge to a response quickly enough without undergoing any sort of oscillations. Gradient descent is based on the observation that if the real-valued function F (x) is defined and differentiable in a neighborhood of a point ‘a’, then F (x) decreases fastest if one goes from ‘a’ in the direction of the negative gradient of F at ‘a.

The derivative to be calculated depends on the local induced field v_{j}, that is susceptible to variations. The derivative is simplified for the output node,

**− (δξ(n)/δv _{j} (n)) = e_{j} (n) Ф'(v_{j} (n))**

where, Ф’ is the first-order derivative of the activation function Ф, and does not vary. The analysis is more difficult to a change in weights to a hidden node, but can be shown with the relevant derivative as,

**− (δξ(n)/δv _{j} (n)) = Ф'(v_{j} (n))∑_{k }− (δξ(n)/δv_{k} (n)) * w_{kj} (n)**

which depends on the change of weights of the kth node, representing the output layer. So to change the hidden layer weights, we must first change the output layer weights according to the derivative of the activation function, and so this algorithm represents a backpropagation of the activation function. Perceptron as a distributed representation is gaining wider applications in AI project, but since biological knowledge is prone to change over time, its biological plausibility is doubtful. A major drawback despite scoring over semantic networks, or symbolic models is the loose modeling capabilities with neurons and synapses. At the same time, backpropagation multilayer perceptrons are not too closely resembling brain-like structures, and for near complete efficiency, require synapses to be varying. A typical multilayer perceptron would look something like,

where, (x_{1},….,x_{p}) are the predictor variable values as presented to the input layer. Note that the standardized values for these variables are in the range -1 to +1. W_{ji} is the weight that multiplies with each of the values coming from the input neuron, and u_{j} is the compounded combined value of the addition of the resulting weighted values in the hidden layer. The weighted sum is fed into a transfer function of a sigmoidal/non-linear kind, σ, that outputs a value h_{j}, before getting distributed to an output layer. Arriving at a neuron in the output layer, the value from each hidden layer neuron is multiplied by a weight w_{kj}, and the resulting weighted values are added together producing a compounded combined value v_{j}. This weighted sum v_{j} is fed into a transfer function of a sigmoid/non-linear kind, σ, that outputs a value y_{k}, which are the outputs of the network.

[…] If Paul Churchland delves in vectorialism and vector coding, Patricia Churchland and Terrence Sejnow… […]