metadatta.

Neural Networks II

January 19, 2007 · 2 Comments

Last week’s subject was McCulloch-Pitts neurons and perceptrons, particularly the Perceptron Convergence Theorem and linearly separable problems. This week builds on that, and this post’s subject is Pigeons, the Credit Assignment Problem, and Backpropagation.


Mikhail Bongard and Herrnstein’s Pigeons
The basic problem with the Perceptron is that unless you put some restrictions on the problem at hand, it just won’t be able to solve it. So the question is – how do these limitations affect what you want to do? In his book on pattern recognition in the 60’s, Soviet computer scientist Mikhail Bongard outlined a set of 100 problems that set the standard in terms of automated pattern recognition – that is to say, any machine that could solve these problems (to him) was pretty far along in its pattern recognition capabilities. An example of one is to the left (click for larger version).

p091.gifOn the left are examples of things that are in the category, whereas on the right are things that are not in the category. The question is – what’s the category? Essentially there are infinitely many such problems – how is a machine supposed to know how to solve every single one? Humans are intrinsically good at solving these kinds of pattern recognition problems but it turns out that we’re not the only creatures that are good at this.

The psychologist Richard Herrnstein performed a seminal experiment in the 60’s using pigeons where he showed them slides of different leaves, rewarding them when they pecked at oak leaves (but not anything else). There’s a lot of variability involved, but the pigeons essentially learn instantaneously (~3-4 slides) – a pigeon, with a brain the size of a pea! One might say alright, well, pigeons are birds, they live in trees, clearly they’re biased towards this sort of thing, et cetera. But this worked for many things – buildings, marine life, Charlie Brown versus other characters… the bottom line is that pigeons are able to abstract out the characteristic rule in such Bongard problems. The ability to do so seems innate in the biological system. Attempts to do this in artificial systems are frustrated, tortuous and ultimately not too successful. There seems to be something very very basic involved in that something so simple as a pigeon can do this almost instinctively – we just can’t figure out what it is, and how it can be implemented in machines.

The Bongard problems range in difficulty – for example, in one of them, things are categorized if they satisfy 2 out of 3 properties. (Some researchers came up with very complicated logical solutions of the form if __ then ___ if ___ or ___ … that were strictly true in the local sense, but were not as elegant as the ‘real’ rule and could not be generalized.) Clearly, this kind of thing represents a whole new class of problem (things that satisfy m out of k properties), and is more like what real categories are like. The philosopher Wittgenstein wrote about this in his Philosophical Investigations in the context of games in which there is no singly necessary rule (e.g. a Swedish girl playing jacks versus naval wargames). This is hard for us to get, and yet, pigeons get this right away! The implication, ultimately, is that it probably doesn’t take too many neurons to figure out patterns à la Bongard if you know what the underlying principle is.

Perceptrons and the Credit Assignment Problem
What I alluded to in my previous post was that it’s hard to tell which problems will be simple (i.e. linearly separable) for a perceptron, and which will be difficult. This was what Minsky and Papert showed, supposedly putting a definitive end to perceptrons and perceptron-based systems. An example of what I mean is convexity, versus connectedness: one can show that a 3rd order perceptron (i.e. one whose ‘dendrites’ sample three pixels of the image being shown) can easily tell whether a shape is convex or not. On the other hand, figuring out whether something is ‘connected‘ or not – a seemingly very similar problem – is impossible for diameter/order-limited perceptrons. The proof is quite elegant and is found as Theorem 0.8 in Minsky and Papert’s book. It only uses the following four images (click to enlarge):

minsky.jpg

The fact that this problem isn’t linearly separable (unlike the convexity problem) may be obvious to some – if one considers the four images shown, it is clear the problem is simply the XOR (exclusive-or) problem in disguise – if you have the line on the left, you’re ok; or, if you’re on the right, you’re ok; but not both! – and the XOR problem is, after all, one of the simplest examples of a non-linearly separable problem.

It’s clear that one of the main reasons perceptrons can’t deal with these kinds of problems is the structure of the network – it’s just summing things up, so everything is linear – of course it can’t solve nonlinear problems! If one made the architecture more complicated, the perceptron could probably handle at least some nonlinearities, but the problem is that once you start making the structure of the perceptron more complex, it’s hard to come up with a training rule that converges to a solution like the Perceptron Convergence Theorem so nicely does. This problem is a very classical problem that people still worry about today, and is known as the Credit Assignment Problem.

Backpropagation
Recall the perceptron – a system with ‘dendrites’ that sample pixels on a screen, weights each one, and sums up the contributions of the dendrites. By comparing this sum to a given threshold, the system makes a decision as to whether or not the image is something – say a cat, or a letter ‘A’ – or not, and rewarding the ‘good’ dendrites by raising their weight. But what happens if the perceptron gets something wrong – who deserves the credit and who deserves the blame? It took about twenty years for people to figure this out, mainly because after Minsky and Papert wrote their books detailing all the ways in which perceptrons failed, everyone in the field pretty much gave up and moved on to something else.

There’s no general solution to this (e.g. in ‘real’ biological neurons), but there is one for artificial multilayer networks. It’s not difficult, but it does involve taking the chain rule (from calculus) many times. I strongly encourage reading the original paper that solved this problem (D. E. Rumelhart, G. E. Hinton and R. J. Williams, “Learning representations by back-propagating errors”, Nature 323, 533 (1986)) because it’s very readable. The point is that the authors proposed a new, modified perceptron – one that sums up the dendrites with coefficients (just like the old one), but this time takes the error between the output and the desired output, and uses this to go back and change the weights of the various dendrites using a rule based on the chain-rule. When generalized to multilayered networks, it turns out that this technique ‘backpropagates’ the error in the output in such a way that greatly improves on the Perceptron Convergence Theorem. Show the machine a pattern, let the activity flow to the top, study the error of the top levels and the influence of the various nodes, then use this information to change the weights of the various layers. Repeat.

Backpropagation – that’s what the authors called it, and it caused a big stir when the paper was published. After all, it meant (among other things) that you could actually train these multilevel networks and solve a variety of problems in a useful way! And that’s where I’ll leave things, for now.

Next time, I will write a bit more about backpropagation, why it’s cool, and what kinds of new problems it can solve…

Categories: Artificial Intelligence · Biophysics · Classes · Computational Neuroscience · Mathematical Biology · Neural Networks · Science

2 responses so far ↓

Leave a Comment