LSE hosted quite an exciting public event (15/01/2015): Is the Brain a Predictive Machine? I had the pleasure to attend it and now find myself toying with a handful of ideas that are perhaps worth sharing. But first, the basics:
The event was a public debate between Prof. Paul Fletcher (Bernard Wolfe Professor of Health Neuroscience at the University of Cambridge), Prof. Karl Friston (Professor of Neuroscience at UCL), Dr. Demis Hassabis (Vice President of Engineering at Google DeepMind) and Prof. Richard Holton (Professor of Philosophy at the University of Cambridge); the chair was Dr. Benedetto De Martino (Sir Henry Dale Senior Research Fellow at the Department of Psychology at the University of Cambridge).
The topic of the debate was the radical idea that perception, understanding and action, and thus all that happens in the brain (conceived here as the information processing organ that mediates between input – sensory stimuli – and output – action), can be modelled (explained) by a single framework based on prediction. This idea comes with many related labels, including “Predictive Brain”, “Bayesian Brain”, “Predictive Coding” and “Free Energy Principle”. To the uninitiated the whole business may look quite confusing, esoteric and maybe even extravagant; in all cases, I find that it is relatively difficult to locate a good starting point to get a basic grasp of the concept; therefore this post will contain my own attempt to provide a short, non technical explanation (with added links at the bottom). The second part will concern a couple of ideas that I got from the interesting discussion at the LSE event.
I will start from the same example made by Dr. De Martino at the beginning of the event: let us consider a room, fitted with a heater that is controlled by a thermostat. The typical understanding of the thermostat sees it as rather passive: it has a sensor that measures the temperature, encodes it in some internal signal, compares it with the desired temperature and switches on the heater if the measured temperature is lower than a given threshold. The predictive mind would posit that biological perception works in a completely different way: there would be some internally generated “expected value” of the predicted temperature, which is then compared with the measured signal, the difference is computed, and used to refine the system that generated the expected/predicted value. On a more detailed account, all the proposals about the predictive mind posit that this general circuitry is used in a series of layers, so that what is actively modelled are different features of the original input, starting from very basic characteristics, and getting progressively more “high level”, or conceptual, as the signal progresses through the layered system.
This may still look rather extravagant: why would natural selection promote the emergence of such a complicated system? The standard answer is that such a layered, gradual evaluation solves a wide range of remarkable and interconnected problems. The first is sometimes referred as the “view from inside the black box”, but I much prefer the visualisation provided by Dennett in “Intuition pumps and other tools for thinking” (chapter 23). Imagine that you wake up in a strange and closed room; the room is full of dials, indicators and buttons, and the only explanation is provided in a note: “you are trapped in the control room of a robot, dials and indicators are controlled by a number of sensors. Buttons and levers control the robot. In order to survive, you’ll need to understand the world outside the robot, and make sure the robot integrity is preserved. Good luck.” In this situation, how would you ever hope to understand what the various dials measure, what the buttons command and thus how to control the robot effectively? Without a starting point, an initial seed of reliable information, you would probably have no chance. However, if an initial seed of information is provided, you may be able to formulate a hypothesis (for example “one of the dials measures external temperature”) and consequently try to test it. In other words, you’d use pre-existing information (a set of priors, for example “this lever tells the robot to crouch” and the knowledge that hot air tends to rise) to make a prediction: if the robot crouches, one dial, the one that measures temperature, may decrease the measured value by a tiny bit. This would allow you to identify the “external temperature” indicator, establish a useful fact and proceed to more hypothesis testing cycles.
Now, consider a brain: in many ways, it is in a very similar situation. A newly born brain has the need to understand what the different inputs mean and what happens when a given output is sent to the muscles. A predictive brain will be in the business of generating a prediction (the hypothesis above) based on some pre-existing information (in this case, provided by the optimisation system that is natural selection) and evaluating how and if the prediction is accurate. The predictive brain hypothesis thus provides an explanation of how it may be possible for a brain to understand the world around it (genetically encoded priors, plus hypothesis-generation and testing).
In practice, this general mechanism is thought to happen in a multitude of successive and conceptually similar steps: for example, at the first level, the activation of a single photo-receptor is used to make the prediction that adjacent ones will also be activated, at the second level the fact that this prediction is true along the X but not the Y axis will be used to make the prediction of what more distal receptors will measure. Similarly, at a third level, the system will recognise and predict that the perceived horizontal line of light is moving in a given direction. In more detail, the idea is that the higher level (more conceptual) integration generates a hypothesis on what it expects to be received by the lower (closer to the signal receptors) layer, the prediction is sent down, and what is sent back is not the original perception, but only the difference: perceived signal minus prediction. If the prediction is perfect (perfect understanding), no signal is sent back, if the prediction is completely wrong, a lot of data will bounce back.
Interestingly, this kind of processing is useful as a compression algorithm (it is in this context that the original idea was conceived). In the example above, at the end of the three steps we can describe the perceived scene in terms of extracted features (a horizontal line of light that moves in a given direction) instead of having to report the detail of what every single photoreceptor measured. So here is a second reason why such a seemingly overcomplicated system may turn out to be handy: it allows to gradually extract useful information from the original stimuli, and reduce the amount of data that is needed to transmit and analyse as we progress from raw data to more conceptual representations. This should start resonating with your intuitions: if you function more or less like me, introspection should tell you that when we perceive reality, we grasp the significant information and are not overwhelmed by the sheer amount of data that our sensory organs produce. In the predictive mind model, this happens because the conscious level is at the top of many “predictive” layers and thus receives already “classified” information (in the example above, “a horizontal line, moving that way”).
Third interesting observation: a layered predictive system would therefore be able to recognise more and more abstract features of reality. By simply adding more layers it should be possible to generate more and more understanding. Evolutionarily, this makes a lot of sense: once the genetic instructions needed to put the first predictive circuit have evolved (and we presume these would be rather complicated), the genetic changes necessary to make this circuit even more useful would be rather small: one would only need the instructions to “pile these circuits one on top of the other”, so the mutations necessary would be relatively small. For obvious probabilistic reasons, evolution usually invents little and recycles a lot.
But we are far from the complete picture… What happens when a given signal wasn’t predicted? After all, unexpected things happen all the time, at any level of “abstraction”: here the general idea is that the neural circuitry registers and focuses on what was not predicted, and uses the mismatched information to guide what to change in the predictive machinery. The result is a continuously refined model of the external world: at any given layer, the information received is used to improve the prediction, and since this happens at more and more abstract levels, the top layers will be in the business of keeping up to date a general, conceptual model of the world out there. Thus, the predictive brain hypothesis unifies in a single, evolutionarily plausible model our understanding of perception and understanding. Furthermore, one can extend the general idea and include also attention and learning: making a big prediction error will automatically increase the amount of data that is sent to the next layer, and this data will automatically trigger the model-refining (learning) machinery. Surprising data are the only kind of input that can teach us something (if it isn’t surprising you knew it already), and in this framework, surprising data are also what is not removed while travelling across the layers. Thus, the conscious level will be receiving primarily what is worth attending to, providing a hint of what may be the basic mechanism of attention.
A short recap. We started with what looked like an extravagant idea: our brains don’t passively receive and classify sensory information, they actually continuously try to guess what will be perceived. Exploring this idea we found that it provides a plausible way to generate viable understanding of the world, provided that minimal “seeding” information is available. We also found that the predictive brain idea explains how information is analysed and made more abstract along the way, how unnecessary data is discarded, how a model of the world is generated and refined (i.e. how we learn), and how we figure out what is worth our attention. As far as explanatory scope goes, the predictive brain hypothesis seems to be pretty powerful! But wait, there is more.
In the past years, an even more general proposal has emerged. The idea is that the same basic architecture may be used not only in perceiving, but also in acting. Really? Surely this is over-stretching an already bold hypothesis… Well, maybe, but it’s worth mentioning it: action wise, the same basic kind of system may be used to generate motor signals. Dissecting what I’ve summarily described above, we have an internally generated prediction and an incoming (passively collected) signal. The comparison of the two inputs generates a third signal which is normally referred to as the prediction error. This is what a single layer produces, it is its output. Hang on, did I mention an output? When the last layer is reached, where does this output go? If we conceive a brain as what stands between input and output, and something that is nothing but a long series of layers that process information in the way I’ve described, the last layer will be generating genuine output, or, in other words, a signal that is sent outside the brain. Of course, brains have outputs, they are known as motor signals. So maybe there is a connection here… The hypothesis therefore is that the same sort of organisation may be used to produce motor signals based on expectations. This time, instead of predictions, we have expectations, and it should be quite intuitive that the two concepts are closely related. Where the perceiving layer received the sensory stimulus, a prediction, and produced the difference, an “acting” layer may receive a sensory stimulus (of what the body/muscles are doing) and an “expectation” of what should be perceived instead. The layer would compute the difference and this signal would be the one that proceeds towards the effector organs. Makes sense? No? To me neither, not yet!
I can try to explain this other idea by going back to the beginning, and recall our thermostat. If we consider the thermostat+heater as a discrete system that does one thing (heats the room), we can say the following: the temperature at which the thermostat is set is our “expectation”, the measured temperature is the stimulus, the “difference” is a binary variable, 1 or 0, on or off. From this angle, the thermostat is a system that acts as our predictive layer, it computes the difference between the measured state and the expectation: if the measured temperature is higher than the expected, it outputs zero/off. If the temperature is lower, it outputs 1, or “on”, and switches the heater on. Here we have a system that computes some information and is effectively analogous to what I have described for perception, but instead, it integrates perception and action, by using an expectation instead of a prediction. The other difference is that here there is no “expectation/prediction-generating” system, the expectation is set a priori.
In other words, there is something missing: we haven’t quite closed the circle yet. How to fill the last gap is the subject of the intuition I had at the LSE event and will be covered in the next post.
To conclude, I would like to clarify that I owe the idea of considering the thermostat+heater as a predictive machinery to Prof. Friston himself: Dr. De Martino introduced the discussion by describing the thermostat as a typical example of how we understand a passive receptor; Prof. Friston immediately turned this description head over feet and suggested (but, as far as I can recall, didn’t quite explain) that one could see the same system in terms of an active predictor instead.
Micah Allen produced a super-fast and accurate summary of what was discussed with the audience at the LSE event. It was really useful to help me reconstruct my own train of thoughts and thus write this post.
Links, bibliography and further reading
If you are a proper geek, and thrive amongst maths and formulas, you may want to read more at the very source of all this. The academic works of Prof. Friston himself are summarised here.
If you are geeky, but would rather avoid maths, the best entry point that I know of is:
Clark A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science, Behavioral and Brain Sciences, 36 (03) 181-204. DOI: http://dx.doi.org/10.1017/s0140525x12000477
(Full text is here)
If you are interested, but not a self-punishment glutton, you may want to read Andy Clark’s essay “Do Thrifty Brains Make Better Minds?“.
Also cited: Dennett, D. C. (2014). Intuition pumps and other tools for thinking. Penguin.