Predictive Processing: the long road ahead.

Posted on October 14, 2017 by Sergio Graziosi — Leave a comment

In the previous posts in this series I’ve proposed an extreme synthesis of the Predictive Processing (PP) idea, as proposed by Andy Clark in “Surfing Uncertainty” – I concluded with a post that summarised why I think PP is the most promising idea currently on offer in the entire neuroscience field. In this post I will do the opposite: exciting and extremely powerful ideas should never go unchallenged. Thus, I will produce a short list of what I see as the main problems that PP either fails to solve or even generates by its own accord.

Audience: who is this post for?

If PP is true, why so many different neural structures? Image by Thomas Schultz. CC BY-SA 3.0

This post is significantly different from the previous ones in the series. Previously, I tried to summarise my understanding of the PP framework. First of all, I wanted to check if my understanding was good enough, at least according to my own standards(!): by trying to put together a decent summary I forced myself to see if the picture did fit and if it appeared to cover enough ground. Secondarily, I thought this exercise could be useful to newcomers. PP isn’t exactly the most approachable framework. Thus, I was (/am) hoping that my effort could double up as useful introduction to PP, at least, it could help deciding if and how PP is worth deeper scrutiny. Having done the above, however imperfectly, it’s time to change gear and move onto criticism. Once again, this helps me to understand what I should look out for: a neat list might direct my future readings, based on their potential to address what I think are the most important shortcomings and/or gaps in the PP story.

In terms of audience, this means that my ideal reader has changed. I would like to receive scrutiny and counter-criticism from people who are already invested in the PP framework. In return, my list might help PP specialists to see their topic from a fresh perspective, which may be useful to spot weak points (if I’m doing it right) and/or areas that require more accessible explanations (if I’m not!).

Method: what do I think I’m doing?

Given my high ambitions, it’s worth adding also some warnings, in the form of an explicit admission of why what follows is necessarily far from ideal. I write here because I enjoy it, but I have a quite demanding day job, which has nothing to do with neuroscience and/or PP itself. Thus, I cannot, nor wish-to systematically read most or all of the literature on the subject. What I do, is approach the topic with a flâneurish attitude: I do actively look for things to read, but only invest my limited spare time in reading what happens to attract my attention, for whatever reason.

As a consequence, I expect that many of the points I’ll address below have been raised before, and that many possible solutions have been proposed already. What I will mention is selected (out of a longer list) either because I think that a given issue really needs to be addressed as soon as possible (and in great detail) or because I think that there is no available consensus on the possible “solutions”. In both cases, I might be wrong, in which case I would greatly appreciate some feedback (specifically: pointers about what I should read next!).

Predictive Processing: a list of issues and upcoming challenges.

I will list most of my criticism in the shortest manner that I’m capable of. I will try to explain at least why I think a given question should count as a genuine problem. I do plan to expand on several or all points in follow-up posts. The following list is roughly organised from the more technical/specific to the more general/conceptual.

1. Does PP require filtering and partitioning?

If PP layers specialise in predicting certain kinds of features, does this require to filter incoming sensory streams and to segregate the results of different filters along separate PP pathways? Starting from the periphery, PP states that there must exist a “Level-0” which receives “a prediction” from Level-1 and matches it with the raw sensory input. Let’s imagine that Level-1 specialises in predicting direction of edges/lines/segments in the visual field (to make my point understandable – in fairness, any kind of feature might be the specific concern). Even if the prediction is 100% accurate, a lot of the original signal will not be predicted, for Level-1 only deals with a limited set of features; hence, most of the original input to Level-0 will always travel up to Level-1. In PP terms, this would/should count as an Error Signal (ES). However, if the job of Level-1 is do deal with edges/lines/segments alone, the signal it receives from Level-0 will never account for errors alone. Level-1 will therefore need to be able to discern between residual sensory input that could not have been predicted (at this level) and error signal that results from wrong predictions.
This simple observation calls for an additional element: either on levels 0 or 1 some filtering mechanism, on top of vanilla PP, is required. This filtering could be used to distinguish ES from the residual sensory signal. Alternatively the filtering may happen upstream, ensuring each level receives only the kind of signal that fits its particular role. Moreover, what is filtered out at one level needs to be directed to some different PP unit.

Thus, we end with:

At least one additional mechanism, dedicated to filtering. More importantly, different filters will apply at different levels and PP units. Thus, at each/most/many levels, different filters are likely to entail very different neural architectures.
Re-routing of signals so to apply different filters to the same raw signal, leading to parallel PP as well as instances of level skipping, where a particular filtered signal jumps one or multiple levels to serve as input at the appropriate layer.
If I’m getting this right, it is then possible that an additional system is required to reconstruct the bigger picture, once different features have been segmented and predicted (or not).

In other words, I don’t think that the purely perceptual part of PP, as proposed/synthesised by Clark, is complete – some additional element is missing.

2. Attention.

The account of attention proposed by PP is remarkably elegant, extremely powerful, and strikingly comprehensive. Attention is very hard to pinpoint, due to its duplicitous (or multifaceted) nature. To start with, attentional mechanisms can be triggered in both bottom-up (a sudden noise) and top-down (trying to decipher the handwriting of a medical practitioner) fashions. At first sight, precision weighting (PW) allows to account for this dichotomous triggering, which is one reason why PP looks so promising. However, I do not think that PW alone can account for all the observable phenomena – from my own (perhaps idiosyncratic) point of view, both the bottom-up and top-down stories seem incomplete, at best.

Bottom-up: a sudden and unpredicted loud bang is certainly able to generate bottom-up attention. Problem is: attention to what? Not the noise itself: being sudden and brief, by the time the attention mechanism has managed to be fully activated, it is likely that the noise has disappeared. In fact, what happens is that our attention is diverted towards the rough spatial location where we have estimated that the noise originated. This makes adaptive sense, but, as far as I can tell, nothing in the proposed PP mechanisms is able to explain how. Yes, a big ES was received (we failed to predict the sudden noise), but somehow, attention then becomes focused on multiple channels, directed to specific features of such channels, and perhaps won’t even involve the channel where the original strong ES was generated.
Top-down: similarly, if I’m trying to decipher my doctor’s hasty scribbles, PP suggests that I would do so by enhancing some error signal (requiring to put more time and effort in minimising it). Once again, the obvious question is: how does my brain decide what error signal should be amplified? In this particular case, it will involve a relatively high (conceptual) layer in the visual pathway, but most definitely, not the whole visual processing hierarchy. For example, detecting the exact hue of the scribbles isn’t very relevant to the task (doesn’t change much if they were made with a blue or black pen) and motion detectors (predictors) won’t be very useful in this particular case. It follows that attention needs to be able to focus not only on specific areas of the receptive field (in this case, specific parts of the visual “scene”) but also at particular (context dependent) layers in the PP pathway. Simply enhancing the strength of the error signal along the visual pathway (which is what is possible according to the vanilla interpretation of PP and PW) is very obviously not enough. We need to be able to enhance a specific error signal, identified in a coordinates space which is at least three-dimensional. The architecture of the visual pathway may allow to enhance only a particular area of the visual field, this would account for 2 dimensions, leaving the third (what kind of feature is attention supposed to focus on) unaccounted.
Once again, it seems to me that some fundamental ingredient is missing. It is possible that this ingredient is identical or closely related to the one I’ve identified in the previous section.

3. Heterogeneity.

PP describes a general processing style which is supposedly able to account for most of our mental abilities. From perception (of internal and external states), to attention, sense making, planning and action control. Very little does not fit in. This is one reason why PP is so attractive, but comes with its own cost. If the same processing style is deployed across more or less all of the brain functions, the variability of neural structures within the brain becomes an anomaly that requires a PP-specific explanation. As far as I can tell, this explanation is currently only sketched. If my first worry (above) is justified, I do suspect that what will count as a solution about filtering and partitioning might end up to account for various specialised structures that are particularly optimised for filtering and/or predicting specific features of incoming signals.

4. Development.

Once again, the versatility of the PP story generates its own new questions. If PP circuits (whatever they will turn out to be) are able handle most of brain functions, it follows that to enhance the abilities of a given brain, what is needed is simply more of the same. This is not what is observed along the development of brains (synaptic pruning, anyone?). There is a lot that needs explained in terms of how different structures develop in a PP-specific way. Once this is done, one also needs to explain related phenomena such repurposing of specialised areas (reading, for example) and proceed to figuring out how specialised areas change in size following training, exercise or disuse. Perhaps this is also where synaesthesia enters the picture.

5. Learning.

What specific systems allow the PP circuitry to adjust predictions in order to learn how to quash error signals? It seems to me that PP should be enriched with one or more hypothesis tackling how, given ErrorA (corresponding to PredictionA), the layer that received it will produce a new PredictionA1 which should better reduce the ES. This is an extremely complicated business. To start with, PredictionA and ErrorA both might contain clues on how PredictionA should be modified, but other clues could be present in virtually any other processing layer. Presumably, the brain has some system of fishing out relevant information, but nothing in PP helped me getting a glimpse of what such a mechanism might be. Timing issues also get in the way: by the time PredictionA1 is issued, new sensory input would have been generated, making it even harder to produce the right prediction for the new (still to-be-assessed situation). If a prediction is badly wrong, how does the brain get back in track, instead of getting it progressively more wrong?

6. Evolution.

Some elements of the PP story are well placed within a bigger evolutionary outlook. Perhaps too well! The perspective offered by Friston via the deployment of Markov Blankets is in fact able to extend the PP lens all the way back to unicellular organisms… Thus, more new questions emerge! If PP-like mechanisms are possible (or even necessary!) within single cells, what different function were neurons selected for? When/where exactly in the phylogenetic tree do neurons start to organise around error minimisation? Does that coincide with the point where error signals get relegated to signals between neurons? Speaking of which: why should error signals be transmitted exclusively between neurons? Are we sure they don’t involve other types of cells?
If PP circuitry is so versatile, what accounts for the vast difference in abilities across different species (and even different individuals – see also points 1, 3, 4 and 5 above)? Looking at humans: what explains our special abilities (for example, the unprecedented specialisations which allow language and mental time travel)? If PP accounts for it, does it mean that PP circuitry is not present in organisms that show no trace of such abilities? If it does not, what additional ingredient enables some faculties especially in humans?

7. Plants, sleep and dreaming.

Sleep and perhaps dreaming seem to be conserved features, present (in variable forms) across most animals, probably even insects. PP proposes to be an architecture that solves the basic “persistence” problem of all living forms (via the Free Energy Principle – FEP, see conclusion): is PP present in some form also in plants? If not, why not? Assuming we can leave vegetables aside, can we expect PP to be present across most animals? If we can, should we conclude that sleep and dreaming correlate with PP? In such a case, what is the relation supposed to be? Does PP itself produce the requirement of sleeping and dreaming? How? If not, why not?
[In fairness, Clark does address some of the questions above in his book. As far as I’m concerned, I would guess that learning, sleep and dreaming will eventually be accounted for by one single “solution”, see my ancient thoughts, to learn why.]

8. Pain and pleasure.

If we are describing the signal exchanges to/from and within brains, I would expect that any such account will somehow account for pain and pleasure signals. In PP, perhaps pain can be conceptualised as an error signal that refuses to be predicted away (thus being well placed to attract our conscious attention). This idea seems promising to me (would account for some structural hyper-prior, forcing pain “errors” to never be fully predicted). If so, how does PP allow for such “systematic” inability to predict something? Especially with chronic pain, such predictions should be quite easily produced! Even if this idea is on track, how do we explain pleasure? It can’t simply be the opposite: a signal that always gets predicted away. That’s because we are all quite good at giving attention to pleasurable sensations… In other words, I can’t see how PP can directly account for the special qualities of painful and pleasant stimuli, or even start explaining what distinguishes one from the other.

9. Consciousness.

This issue follows from the point above. It doesn’t seem that PP itself is able to account for the phenomenal element of consciousness (the “what is it like” aspect, or phenomenal consciousness – PC). Once a brain has produced a good enough global prediction of the last train of inputs, what exactly controls why we can consciously perceive some features and not others? How does attention influence what we are conscious of? What constitutes the undesirability of painful sensations? What makes pleasure desirable? Are all PP-based systems conscious? If not, what additional system produces PC? What accounts for the loss of PC during dreamless sleep?
In short, PP appears to remain solidly outside the scope of Chalmers’ Hard Problem of consciousness. This is perhaps the biggest problem that I see. If PP explains perception (but does it? If PP does not include an explanation of why we perceive some things and not others, does it account for perception at all?), attention, mental-time travel, planning and action, but in no way accounts for PC, what function does PC fulfil? If accepting PP entails epiphenomenalism, as far as I’m concerned, it follows that PP must be nonsensical, pretty much as epiphenomenalism itself.

Conclusion.

The list above is incomplete. It took me a very long time to write this post also because I had to find a way to organise my thoughts and establish some reasonable criteria to decide what could be left out. The biggest omission is about the Free Energy Principle. This is because criticising FEP requires a full book, cannot be done in a few lines. Secondarily, such criticism might be aimed at a too broad target, and thus fail to be constructive. [For the gluttons: I’ve covered the brightest side of FEP here, while some hints of criticism are in this discussion.]

Overall, it seems to be pretty obvious that PP, as a theoretical framework (and/or, depending on your preferences: a scientific paradigm, a scientific programme) is far from complete. This is expected and entirely justified. As anyone with some familiarity with the history of science should know, new ideas require time to reach maturity, they necessarily start off by being incomplete, sometimes directly contradicted by some pre-existing evidence, and not necessarily self-consistent either. That’s normal. Thus, this post is not intended to curb our enthusiasm, it is intended to focus it in (hopefully) useful ways. My quasi-arbitrary critique above might help focussing our attention in interesting directions. Or at least, it might help me: I will appreciate all feedback, and in particular reading suggestions in response to any of the points raised here. Thank you!

Clark, A (2016). Surfing Uncertainty: Prediction, Action, and the Embodied Mind Oxford Scholarship DOI: 10.1093/acprof:oso/9780190217013.003.0011

Tagged with: Andy Clark, Behaviour, Cognition, Free Energy Principle, Karl Friston, Predictive Brain, Predictive Coding, Surfing Uncertainty
Posted in Neuroscience, Philosophy