Predictive Processing: one theory to rule them all…

After discussing some of the basic concepts behind the Predictive Processing (PP) framework, it’s time to explore why I think it was worth the effort. In short, the explanatory power that PP seems to have is, as far as I can tell, unprecedented in neuroscience. No theory that I’ve been exposed to has ever managed to get close to the width and depth encompassed by the PP proposal. One way to see why is to concentrate on one key element and briefly mention some of the phenomena it might explain. My choice is precision weighting (PW), a mechanism that suggests many possible implications. In this post, I will explore the ones that I find more striking.

[Note: this post is part of a series inspired by Andy Clark’s “Surfing Uncertainty” book. For a general introduction to the Predictive Brain idea, see this post and the literature cited therein. The concept of precision weighting and how it fits in the PP framework is discussed in a previous post.]

Many illusions can be explained in terms of PP. Image adapted from Flikr by Robson# CC BY 2.0

A short recap: when a sensory stimulus is transduced (collected and transformed in a nervous signal), PP hypothesises that it will reach a sequence of neural layers, each busy producing predictions that try to match the signal arriving from the layer below. [In this convention, lower levels are those situated closer to sensory organs.] Each layer will issue a prediction to the layer below, and will concurrently match the prediction it receives from above with the incoming signal from below. The matching will result in a “difference” signal (or, better, an Error Signal – ES) which is presumed to be the main/only signal that a given layer will send upwards. The ES thus carries upwards only the information that could not be predicted, or, if you prefer, only the surprising and newsworthy elements of the original raw sensory stimuli. We have explored before two additional ingredients:

  1. For such a system to work, it is necessary that whenever a signal passes from one layer to the other, it must carry some information about its expected precision/confidence. [We have also seen why it is reasonable to conflate precision and confidence into one single “measure”.] PW allows a given layer to dynamically give more importance to the error/sensory signal arriving from below or to the prediction issued from above. It is generally assumed that precision/confidence information is encoded as the gain (strength) of a given signal.
  2. Such an architecture is proposed to continue uninterrupted from layers that deal with sensory information, all the way to layers that are concerned with action selection and control. In this latter case, the ES will (or might) also be used to control muscles/effectors. Reducing the confidence (gain) of motor-related prediction signals will thus allow to “plan” actions, without triggering actual movements.

We have also seen before that, at levels concerned with integrating information coming from different senses, PW becomes important to deal with possible conflicts. For example, when watching TV, sounds will not seem to come from the TV speakers, but from the images themselves, as visual stimuli come with much higher spatial precision than acoustic ones. Thus, PW proposes to explain how sensory stimuli can be integrated, as well as why and how a perfect matching isn’t required.

When trying to understand how a complex system/mechanism works, it is often very useful to explore anomalies, especially when one is proposing a strictly mechanistic explanation of the inner workings of such systems. This makes perfect sense: any given mechanism must be constrained, and therefore it is reasonable to expect that it will not work particularly well under unusual circumstances. Moreover, particular idiosyncrasies will be specific to given mechanisms (different implementations will be characterised by different anomalies). This means that studying where things “go wrong” allows to match failures with hypothetical mechanisms: some mechanisms will be expected to fail in one way, some in an other. Thus, a theory of perception that happens to easily accommodate known (and hard to explain) perceptual anomalies (such as what happens when watching TV) and/or neurological conditions, will look more promising than one that doesn’t. For us, this consideration means that it makes sense to look at how PP proposes to explain some of such failings with the aid of PW.

One such anomaly is the rather spectacular rubber hand illusion:

To say it with Seth (2013):

[S]tatistical correlations among highly precision-weighted sensory signals (vision, touch) could overcome prediction errors in a different modality (proprioception)

In other words, proprioception isn’t very precise, or, more specifically, produces reliable signals about movement and changes in forces; thus, in the unusual experimental conditions (people are expected not to move their hidden hand), and given enough time, the relatively high precision signals coming from sight and touch can take precedence, forcing the overall system to explain them (that is: successfully predict them) by assuming the rubber hand is the real one.

Perhaps more interestingly, it’s also possible to relate PW to more natural anomalous conditions. One way to describe this line of thought it to ask: what would happen if the delicate balancing of precision versus confidence is systematically biased in one or the other direction?

On one extreme, we could imagine the situation where predictions tend to have too much weight. The result would be an overall system that relies too little on the supervision of sensory input and is therefore more likely to make systematic mistakes. If the imbalance is strong enough, the whole system will occasionally get flooded with abnormal errors (whenever the predictions happen to be very wrong, but issued with high confidence/gain), triggering an equally abnormal need to revise the predictions themselves, which could then realise a self-sustaining vicious cycle: more top-heavy, misinformed predictions will be issued, producing more floods of error signals, requiring even more revisions in the predictions themselves. The result would be the establishment of ungrounded expectations, which would then have visible impact on both perception (how the subject experiences the outside world) and on the overall understanding of the outside world itself (beliefs). Recall that according to PP, prior expectations are intrinsically able to shape perceptions themselves. Wrong perceptions, when they are indeed very wrong, are normally called hallucinations, while wrong beliefs can be seen as delusions. Sounds familiar? Indeed, the combination of both represents the “positive symptoms” of schizophrenia. In short, a systematic bias towards prediction confidence, if PP is broadly correct, would produce a system which is unable to self-correct.

On the opposite extreme, what would happen if issued predictions are not trusted enough? In such cases, prior knowledge would fail to help interpreting incoming signals, making it harder and harder to ‘explain away’ a given stimulus, as even the right predictions might struggle to quash out the incoming signals (which will then be interpreted incorrectly as a genuine ES). A subject afflicted with this condition will be able to react correctly to very familiar situations, where confidence in the prediction is highest and is therefore strong enough to reduce the ES. On the other hand, in new and ambiguous situations, predictions will systematically struggle to perform their function even when correct, and will therefore force the subject to attempt re-evaluating the current situation over and over. This would allow to gradually increase confidence in the issued predictions, and thus regain the ability of appropriately react to the outside world, at the cost of an abnormally high investment of time and attention. It’s easy to predict that such subjects will naturally tend to avoid unfamiliar circumstances and that they will also find it hard to correctly navigate the maze of ambiguities that we call natural language. In this case, an excess of error signal doesn’t lead to hallucinations and delusions because the “supervision” of sensory information happens to be too high (not too weak!), and thus only very precise predictions, i.e. those able to exactly match the stimuli, will have the best chance of reducing error signals to manageable levels. Once again, this kind of condition should also sound familiar: it is tantalisingly similar to autism. It’s worth noting that this approach is entirely compatible (indeed, I see it as a proposal of how the general principle might be implemented) with the well established view that autism is connected to an impaired ability to use Bayesian inference; for the details, see Pellicano and Burr (2012).

This leads me to the matter of attention. According to Friston and many of the PP proponents (see Feldman and Friston, 2010), attention is the perceivable result of highly weighted error signals. On the face of it, it makes perfect sense: what should we pay attention to? To whatever is news to us, and therefore, to what we struggle to predict. Moreover, we should be able to direct attention according to our current task: this can be readily done by reducing the confidence on the predictions we are making. By doing so, we’d amplify the residual error signals concerned with what we are paying attention to, making only very precise predictions (precise in the sense of being a perfect match of the incoming signal) able to reduce prediction error. This reinforces the view of autism sketched above: autistic individuals would thus be unable to command their attention and would instead be forced to attend any stimulus that isn’t readily explained away.


Predictive Processing, once enriched with the concept of Precision Weighting, is able to propose a preliminary sketch that includes reasonable explanations of how we manage to make sense of the world, learn from sensory information, plan, execute and control actions, pay attention to something and/or get our attention diverted by sudden and unexpected stimuli. Moreover, our abilities of dreaming and daydreaming are easily accommodated (in ways I might explore in the future). If this wasn’t enough, it also aspires to explain why and how certain well-known pathologies work, and is generally able to accommodate many perceptual illusions and anomalies. In other words, one single theory is proposing to explain much of what the brain does. This in a nutshell is why I’ve dedicated so much of my spare time to this subject: for the first time I get the impression that we might have some hope to understand how brains work – we now have a candidate theory which is potentially able to offer a unifying interpretative lens. Otherwise, without a set of general and encompassing principles, all our (increasing) understanding would be (has been) condemned to remain local, applicable only within a given restricted frame of reference (how neurons communicate, how edges are detected in vision, and so forth).
Given my background in neuroscience, I expect that my excitement comes with no surprise. Fair enough: but is my enthusiasm justified? Perhaps. To answer this question in the following posts I will look at what I find unconvincing or underdeveloped in the PP world. I might also use the occasion to err on the overconfidence side(!) and propose some of my ideas on how to tackle such difficulties.



Clark, A (2016). Surfing Uncertainty: Prediction, Action, and the Embodied Mind Oxford Scholarship DOI: 10.1093/acprof:oso/9780190217013.003.0011

Feldman, H., & Friston, K. J. (2010). Attention, uncertainty, and free-energy. Frontiers in human neuroscience, 4.

Pellicano, E., & Burr, D. (2012). When the world becomes ‘too real’: a Bayesian explanation of autistic perception. Trends in cognitive sciences, 16(10), 504-510.

Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in cognitive sciences, 17(11), 565-573. DOI:


Tagged with: , , , , , , , , , ,
Posted in Neuroscience, Psychology
One comment on “Predictive Processing: one theory to rule them all…
  1. […] idea, as proposed by Andy Clark in “Surfing Uncertainty” – I concluded with a post that summarised why I think PP is the most promising idea currently on offer in the entire […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Follow me on Twitter

All original content published on this blog is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Creative Commons Licence
Please feel free to re-use and adapt. I would appreciate if you'll let me know about any reuse, you may do so via twitter or the comments section. Thanks!

%d bloggers like this: