Predictive Processing: the long road ahead.

In the previous posts in this series I’ve proposed an extreme synthesis of the Predictive Processing (PP) idea, as proposed by Andy Clark in “Surfing Uncertainty” – I concluded with a post that summarised why I think PP is the most promising idea currently on offer in the entire neuroscience field. In this post I will do the opposite: exciting and extremely powerful ideas should never go unchallenged. Thus, I will produce a short list of what I see as the main problems that PP either fails to solve or even generates by its own accord.

Audience: who is this post for?

If PP is true, why so many different neural structures? Image by Thomas Schultz. CC BY-SA 3.0

This post is significantly different from the previous ones in the series. Previously, I tried to summarise my understanding of the PP framework. First of all, I wanted to check if my understanding was good enough, at least according to my own standards(!): by trying to put together a decent summary I forced myself to see if the picture did fit and if it appeared to cover enough ground. Secondarily, I thought this exercise could be useful to newcomers. PP isn’t exactly the most approachable framework. Thus, I was (/am) hoping that my effort could double up as useful introduction to PP, at least, it could help deciding if and how PP is worth deeper scrutiny. Having done the above, however imperfectly, it’s time to change gear and move onto criticism. Once again, this helps me to understand what I should look out for: a neat list might direct my future readings, based on their potential to address what I think are the most important shortcomings and/or gaps in the PP story.

In terms of audience, this means that my ideal reader has changed. I would like to receive scrutiny and counter-criticism from people who are already invested in the PP framework. In return, my list might help PP specialists to see their topic from a fresh perspective, which may be useful to spot weak points (if I’m doing it right) and/or areas that require more accessible explanations (if I’m not!).

Method: what do I think I’m doing?

Given my high ambitions, it’s worth adding also some warnings, in the form of an explicit admission of why what follows is necessarily far from ideal. I write here because I enjoy it, but I have a quite demanding day job, which has nothing to do with neuroscience and/or PP itself. Thus, I cannot, nor wish-to systematically read most or all of the literature on the subject. What I do, is approach the topic with a flâneurish attitude: I do actively look for things to read, but only invest my limited spare time in reading what happens to attract my attention, for whatever reason.

As a consequence, I expect that many of the points I’ll address below have been raised before, and that many possible solutions have been proposed already. What I will mention is selected (out of a longer list) either because I think that a given issue really needs to be addressed as soon as possible (and in great detail) or because I think that there is no available consensus on the possible “solutions”. In both cases, I might be wrong, in which case I would greatly appreciate some feedback (specifically: pointers about what I should read next!).

Predictive Processing: a list of issues and upcoming challenges.

I will list most of my criticism in the shortest manner that I’m capable of. I will try to explain at least why I think a given question should count as a genuine problem. I do plan to expand on several or all points in follow-up posts. The following list is roughly organised from the more technical/specific to the more general/conceptual.

1. Does PP require filtering and partitioning?

If PP layers specialise in predicting certain kinds of features, does this require to filter incoming sensory streams and to segregate the results of different filters along separate PP pathways? Starting from the periphery, PP states that there must exist a “Level-0” which receives “a prediction” from Level-1 and matches it with the raw sensory input. Let’s imagine that Level-1 specialises in predicting direction of edges/lines/segments in the visual field (to make my point understandable – in fairness, any kind of feature might be the specific concern). Even if the prediction is 100% accurate, a lot of the original signal will not be predicted, for Level-1 only deals with a limited set of features; hence, most of the original input to Level-0 will always travel up to Level-1. In PP terms, this would/should count as an Error Signal (ES). However, if the job of Level-1 is do deal with edges/lines/segments alone, the signal it receives from Level-0 will never account for errors alone. Level-1 will therefore need to be able to discern between residual sensory input that could not have been predicted (at this level) and error signal that results from wrong predictions.
This simple observation calls for an additional element: either on levels 0 or 1 some filtering mechanism, on top of vanilla PP, is required. This filtering could be used to distinguish ES from the residual sensory signal. Alternatively the filtering may happen upstream, ensuring each level receives only the kind of signal that fits its particular role. Moreover, what is filtered out at one level needs to be directed to some different PP unit.

Thus, we end with:

  1. At least one additional mechanism, dedicated to filtering. More importantly, different filters will apply at different levels and PP units. Thus, at each/most/many levels, different filters are likely to entail very different neural architectures.
  2. Re-routing of signals so to apply different filters to the same raw signal, leading to parallel PP as well as instances of level skipping, where a particular filtered signal jumps one or multiple levels to serve as input at the appropriate layer.
  3. If I’m getting this right, it is then possible that an additional system is required to reconstruct the bigger picture, once different features have been segmented and predicted (or not).

In other words, I don’t think that the purely perceptual part of PP, as proposed/synthesised by Clark, is complete – some additional element is missing.

2. Attention.

The account of attention proposed by PP is remarkably elegant, extremely powerful, and strikingly comprehensive. Attention is very hard to pinpoint, due to its duplicitous (or multifaceted) nature. To start with, attentional mechanisms can be triggered in both bottom-up (a sudden noise) and top-down (trying to decipher the handwriting of a medical practitioner) fashions. At first sight, precision weighting (PW) allows to account for this dichotomous triggering, which is one reason why PP looks so promising. However, I do not think that PW alone can account for all the observable phenomena – from my own (perhaps idiosyncratic) point of view, both the bottom-up and top-down stories seem incomplete, at best.

Bottom-up: a sudden and unpredicted loud bang is certainly able to generate bottom-up attention. Problem is: attention to what? Not the noise itself: being sudden and brief, by the time the attention mechanism has managed to be fully activated, it is likely that the noise has disappeared. In fact, what happens is that our attention is diverted towards the rough spatial location where we have estimated that the noise originated. This makes adaptive sense, but, as far as I can tell, nothing in the proposed PP mechanisms is able to explain how. Yes, a big ES was received (we failed to predict the sudden noise), but somehow, attention then becomes focused on multiple channels, directed to specific features of such channels, and perhaps won’t even involve the channel where the original strong ES was generated.
Top-down: similarly, if I’m trying to decipher my doctor’s hasty scribbles, PP suggests that I would do so by enhancing some error signal (requiring to put more time and effort in minimising it). Once again, the obvious question is: how does my brain decide what error signal should be amplified? In this particular case, it will involve a relatively high (conceptual) layer in the visual pathway, but most definitely, not the whole visual processing hierarchy. For example, detecting the exact hue of the scribbles isn’t very relevant to the task (doesn’t change much if they were made with a blue or black pen) and motion detectors (predictors) won’t be very useful in this particular case. It follows that attention needs to be able to focus not only on specific areas of the receptive field (in this case, specific parts of the visual “scene”) but also at particular (context dependent) layers in the PP pathway. Simply enhancing the strength of the error signal along the visual pathway (which is what is possible according to the vanilla interpretation of PP and PW) is very obviously not enough. We need to be able to enhance a specific error signal, identified in a coordinates space which is at least three-dimensional. The architecture of the visual pathway may allow to enhance only a particular area of the visual field, this would account for 2 dimensions, leaving the third (what kind of feature is attention supposed to focus on) unaccounted.
Once again, it seems to me that some fundamental ingredient is missing. It is possible that this ingredient is identical or closely related to the one I’ve identified in the previous section.

3. Heterogeneity.

PP describes a general processing style which is supposedly able to account for most of our mental abilities. From perception (of internal and external states), to attention, sense making, planning and action control. Very little does not fit in. This is one reason why PP is so attractive, but comes with its own cost. If the same processing style is deployed across more or less all of the brain functions, the variability of neural structures within the brain becomes an anomaly that requires a PP-specific explanation. As far as I can tell, this explanation is currently only sketched. If my first worry (above) is justified, I do suspect that what will count as a solution about filtering and partitioning might end up to account for various specialised structures that are particularly optimised for filtering and/or predicting specific features of incoming signals.

4. Development.

Once again, the versatility of the PP story generates its own new questions. If PP circuits (whatever they will turn out to be) are able handle most of brain functions, it follows that to enhance the abilities of a given brain, what is needed is simply more of the same. This is not what is observed along the development of brains (synaptic pruning, anyone?). There is a lot that needs explained in terms of how different structures develop in a PP-specific way. Once this is done, one also needs to explain related phenomena such repurposing of specialised areas (reading, for example) and proceed to figuring out how specialised areas change in size following training, exercise or disuse. Perhaps this is also where synaesthesia enters the picture.

5. Learning.

What specific systems allow the PP circuitry to adjust predictions in order to learn how to quash error signals? It seems to me that PP should be enriched with one or more hypothesis tackling how, given ErrorA (corresponding to PredictionA), the layer that received it will produce a new PredictionA1 which should better reduce the ES. This is an extremely complicated business. To start with, PredictionA and ErrorA both might contain clues on how PredictionA should be modified, but other clues could be present in virtually any other processing layer. Presumably, the brain has some system of fishing out relevant information, but nothing in PP helped me getting a glimpse of what such a mechanism might be. Timing issues also get in the way: by the time PredictionA1 is issued, new sensory input would have been generated, making it even harder to produce the right prediction for the new (still to-be-assessed situation). If a prediction is badly wrong, how does the brain get back in track, instead of getting it progressively more wrong?

6. Evolution.

Some elements of the PP story are well placed within a bigger evolutionary outlook. Perhaps too well! The perspective offered by Friston via the deployment of Markov Blankets is in fact able to extend the PP lens all the way back to unicellular organisms… Thus, more new questions emerge! If PP-like mechanisms are possible (or even necessary!) within single cells, what different function were neurons selected for? When/where exactly in the phylogenetic tree do neurons start to organise around error minimisation? Does that coincide with the point where error signals get relegated to signals between neurons? Speaking of which: why should error signals be transmitted exclusively between neurons? Are we sure they don’t involve other types of cells?
If PP circuitry is so versatile, what accounts for the vast difference in abilities across different species (and even different individuals  – see also points 1, 3, 4 and 5 above)? Looking at humans: what explains our special abilities (for example, the unprecedented specialisations which allow language and mental time travel)? If PP accounts for it, does it mean that PP circuitry is not present in organisms that show no trace of such abilities? If it does not, what additional ingredient enables some faculties especially in humans?

7. Plants, sleep and dreaming.

Sleep and perhaps dreaming seem to be conserved features, present (in variable forms) across most animals, probably even insects. PP proposes to be an architecture that solves the basic “persistence” problem of all living forms (via the Free Energy Principle – FEP, see conclusion): is PP present in some form also in plants? If not, why not? Assuming we can leave vegetables aside, can we expect PP to be present across most animals? If we can, should we conclude that sleep and dreaming correlate with PP? In such a case, what is the relation supposed to be? Does PP itself produce the requirement of sleeping and dreaming? How? If not, why not?
[In fairness, Clark does address some of the questions above in his book. As far as I’m concerned, I would guess that learning, sleep and dreaming will eventually be accounted for by one single “solution”, see my ancient thoughts, to learn why.]

8. Pain and pleasure.

If we are describing the signal exchanges to/from and within brains, I would expect that any such account will somehow account for pain and pleasure signals. In PP, perhaps pain can be conceptualised as an error signal that refuses to be predicted away (thus being well placed to attract our conscious attention). This idea seems promising to me (would account for some structural hyper-prior, forcing pain “errors” to never be fully predicted). If so, how does PP allow for such “systematic” inability to predict something? Especially with chronic pain, such predictions should be quite easily produced! Even if this idea is on track, how do we explain pleasure? It can’t simply be the opposite: a signal that always gets predicted away. That’s because we are all quite good at giving attention to pleasurable sensations… In other words, I can’t see how PP can directly account for the special qualities of painful and pleasant stimuli, or even start explaining what distinguishes one from the other.

9. Consciousness.

This issue follows from the point above. It doesn’t seem that PP itself is able to account for the phenomenal element of consciousness (the “what is it like” aspect, or phenomenal consciousness – PC). Once a brain has produced a good enough global prediction of the last train of inputs, what exactly controls why we can consciously perceive some features and not others? How does attention influence what we are conscious of? What constitutes the undesirability of painful sensations? What makes pleasure desirable? Are all PP-based systems conscious? If not, what additional system produces PC? What accounts for the loss of PC during dreamless sleep?
In short, PP appears to remain solidly outside the scope of Chalmers’ Hard Problem of consciousness. This is perhaps the biggest problem that I see. If PP explains perception (but does it? If PP does not include an explanation of why we perceive some things and not others, does it account for perception at all?), attention, mental-time travel, planning and action, but in no way accounts for PC, what function does PC fulfil? If accepting PP entails epiphenomenalism, as far as I’m concerned, it follows that PP must be nonsensical, pretty much as epiphenomenalism itself.

Conclusion.

The list above is incomplete. It took me a very long time to write this post also because I had to find a way to organise my thoughts and establish some reasonable criteria to decide what could be left out. The biggest omission is about the Free Energy Principle. This is because criticising FEP requires a full book, cannot be done in a few lines. Secondarily, such criticism might be aimed at a too broad target, and thus fail to be constructive. [For the gluttons: I’ve covered the brightest side of FEP here, while some hints of criticism are in this discussion.]

Overall, it seems to be pretty obvious that PP, as a theoretical framework (and/or, depending on your preferences: a scientific paradigm, a scientific programme) is far from complete. This is expected and entirely justified. As anyone with some familiarity with the history of science should know, new ideas require time to reach maturity, they necessarily start off by being incomplete, sometimes directly contradicted by some pre-existing evidence, and not necessarily self-consistent either. That’s normal. Thus, this post is not intended to curb our enthusiasm, it is intended to focus it in (hopefully) useful ways. My quasi-arbitrary critique above might help focussing our attention in interesting directions. Or at least, it might help me: I will appreciate all feedback, and in particular reading suggestions in response to any of the points raised here. Thank you!

ResearchBlogging.org

Clark, A (2016). Surfing Uncertainty: Prediction, Action, and the Embodied Mind Oxford Scholarship DOI: 10.1093/acprof:oso/9780190217013.003.0011

Tagged with: , , , , , , ,
Posted in Neuroscience, Philosophy

Machine Learning, the usual Bat and deflationary epistemology

What does it feel like to be a mechanical Batman?
Original image by Andrew Martin [CC0 1.0].

This is a quick, semi-serious follow-up to my first Twitter poll. In a rare moment of impulsivity, I’ve recently posted a deliberately awkward question on twitter. A few respondents did notice that something was amiss, an indeed, an explanation is due, hence this post. The subject does demand a lengthier treatment, which is in my plans; for today, I’m hoping that what follows will not sound entirely ungrounded.

I rarely act impulsively, but maybe I should do it more often? Predictably, my poll did not collect many votes, however, I could not hope for better results: adding my own vote, we get a perfect 50-50 split. There appears to be no agreement on the matter, so perhaps the question was worth asking…

The Question itself

Here is the original tweet:

Why did I pose the question?

In extreme synthesis: I guessed the reactions will be thought-provoking, for me, at least.

I wasn’t wrong. I was also hoping not to find too much agreement, as a split opinion in this case would give me a chance to propose some additional lucubrations.

My interest can be summarised as follows:

  1. To my eyes, the question can only make proper sense if one is aware of two distinct debates. In philosophy of mind, most of the discussions revolve around foundational questions such as: how does phenomenal experience get generated? Is it reducible to physical mechanisms?
    On the other hand, as real life applications of Artificial Intelligence are becoming quasi ubiquitous, other questions are becoming important and even urgent: there is an important demand to make machine-learning algorithms auditable, accountable and/or generally “explainable”. Thus, I was curious to see what my Twitter bubble would make of my mix’n’match provocation. I think I didn’t include the “huh?” option in order to force people to try harder and see if they could figure out what the connection might be. In hindsight, it wasn’t a bad choice, perhaps.
  2. I was also being a bit mischievous, because by forcing people to double-check their reaction (by not allowing to answer “huh?”) I sort-of forced some to make an incorrect choice. The only way I can see to make sense of the question is by recognising (at least at level of intuition) that there is a connection. If someone saw no connection at all, then the “correct” answer would indeed have been “huh? question is malformed, can’t figure why it’s worth asking”. Thus, knowing that within my twitter reach there are plenty of very clever people, I was semi-consciously curious to see if anyone will call me out. At least two did, to my great satisfaction! (With my apologies.)
  3.  Both debates (point 1. above) are, IMVHO, informed by mistakes. I wanted to explore the intuition that these mistakes share a common root. Which then immediately becomes the reason why my answer is “No, it isn’t a coincidence“.

This leads me to the second part of this brief response, it’s time to spill my beans and write down what I think.

My answer: no, it isn’t a coincidence.

My position has to do with what it means to know/understand something and how my own deflationary epistemology allows to make sense of a good number of problems. I’m pointing at some sort of illusionism in terms of knowledge (as in: “knowledge isn’t what we think it is“). I’m not planning to fully unpack the above in here, but will use my question to explain a little.
[Note. I will do so from one angle only: a full exploration requires to show how the same manoeuvre works along many different paths and brings to more or less the same conclusions.]

The route I’ll pick today is about the mistakes I mentioned above. In AI (or better: Machine Learning – ML), (informed) people are both rightly and mistakenly(!!!) asking to work towards producing ML systems that can be “explained”. Specifically, because of the enormous importance that ML-based decision-making is acquiring in our society, (informed) people want  the ML algorithms to be auditable. When a given machine makes a non-trivial choice, we want to be able to know “why did this system pick A and not B?”. The reason to demand such “transparent” ML systems is obvious, important and entirely correct: after all, we *need* to be able to detect and correct mistakes.

However, I fear that it is impossible to fully satisfy this demand. This has to do with reduction and our epistemological limits. Starting with the latter, if the question is “why did this system pick A and not B?”, the set of what could count as acceptable answers does not, by definition, contain the correct answers. ML systems are built to deal with otherwise unmanageable high number of variables, each having the potential of contributing to the output, and usually the final result is indeed determined by small contributions of a very high number of input variables. Thus, saying “Machine picked A because…” requires to list the contribution of many factors, and explain how they influenced the training phase as well as their relative contribution in the current choice. Unfortunately, no human can make sense of such an answer! What we’d like instead are answers like “…because the training set was biased towards A” or “…because most training data points to A”. Trouble is, both kinds of answers are oversimplifications, to the point of being wrong and pointless.

To put it in another way: when we are applying ML to a domain that justifies the use of ML, the complexity of the domain in question guarantees that the easiest way for us to learn what the ML system will output is to let the system compute the response. If we had an alternative, “better” (simpler) way of doing it, we would use this simpler system directly and let intractable ML systems alone, right?

Looking at the same scenario in terms of reduction, what we find is that ML is used precisely when reducing a problem to a handful of tractable variables simply doesn’t work (or we don’t know how to make it work). Thus, the interesting/useful results provided by ML are precisely those we are currently unable to reduce to simpler, more explainable, algorithms. QED: we can’t know why the machine picked “A” precisely because we asked the machine in the first place!

In terms of deflationary epistemology: we can only fully “understand” simple stuff, most of us (including me) can hold in working memory only less than ten variables, working out how they interact without the aid of external mind-extensions (pen and paper, calculator, spreadsheet, ML systems, etc.) is simply not possible. In other words, we can’t understand ML-driven choices because we ask ML to operate on domains that we can’t reduce to stuff we can consciously process.

This leads me to our bat – or better, a bit closer to our (mis)understanding of phenomenal consciousness. Image recognition is the typical domain where only ML systems can match our own abilities (we could not design any “simpler” way of doing it). [Coincidence? No!] Of course humans are, according to their own standards, quite good at image recognition. However, not a single one of us has a clear (and demonstrable) idea of how we do it. We do it all the time, but we do it unconsciously. Yes, we “recognise” red among other colours, which leads us to say that there is a specific “how it is like” to perceive redness. But how we do recognise redness (or anything at all) is entirely obscure to us. Introspection can tell us exactly nothing about the mechanism that allows us to discern colours. Neuroscience is starting to produce some incomplete answers, but it is merely scratching the surface.
[Reminder: colour perception is spectacularly complex, do I need to mention “the dress“?]

Thus, we must conclude that humans (and probably mammals, if not most animal forms), just as ML systems, are able to make discriminations that rely on contributions made by a high numbers of variables. I hope that we can now agree that humans are unable to consciously explain exactly how equivalent tasks are performed by machines and biological organisms alike. This inability is a function of the complexity of the task, not a function of what system performs it.

[Note: I am not talking of what counts as “scientific explanations”, I am referring here to what we can grasp and feel without external aids.]

In the case of biological image-recognition, we don’t know how the mechanisms in question work, but we do know that even if we did (in scientific terms), we would not be able to produce explanations that are simple-enough to be understood by most humans (not without having to laboriously study for many years): in the case of ML, we know everything about the mechanisms, but we still can’t find the answers we’re seeking. This is because we want “simple” answers, simple enough to be understood, at least. The simplicity of the desired answers is the common factor between the two “unknowns” mentioned in my poll.

Thus, we reach my conclusion. We can’t (consciously) know how it feels to be a bat: even if we knew the mechanism (as per ML), we would not have the capacity of reasoning all the way up to forming the correct idea (such idea, in order to be correct, includes too many variables, so we wouldn’t be able to hold it in our limited conscious minds).
Thus, the answer to my question is (from my own perspective!) a definitive: “No, not a coincidence”. The common factor is how limited our conscious understanding can be.

Conclusion

My own hunch may well be wrong, however, the fact that the poll results are split (based on a tiny sample size!) is hopefully an indication that the question is not as absurd as it may appear at first sight. Please do feel free to add your own thoughts in the comments (or via Twitter, if you prefer). Thanks for reading and to all the poll responders!

Tagged with: , , , ,
Posted in Consciousness, Philosophy, Stupidity

Predictive Processing: one theory to rule them all…

After discussing some of the basic concepts behind the Predictive Processing (PP) framework, it’s time to explore why I think it was worth the effort. In short, the explanatory power that PP seems to have is, as far as I can tell, unprecedented in neuroscience. No theory that I’ve been exposed to has ever managed to get close to the width and depth encompassed by the PP proposal. One way to see why is to concentrate on one key element and briefly mention some of the phenomena it might explain. My choice is precision weighting (PW), a mechanism that suggests many possible implications. In this post, I will explore the ones that I find more striking.

[Note: this post is part of a series inspired by Andy Clark’s “Surfing Uncertainty” book. For a general introduction to the Predictive Brain idea, see this post and the literature cited therein. The concept of precision weighting and how it fits in the PP framework is discussed in a previous post.]

Many illusions can be explained in terms of PP. Image adapted from Flikr by Robson# CC BY 2.0

A short recap: when a sensory stimulus is transduced (collected and transformed in a nervous signal), PP hypothesises that it will reach a sequence of neural layers, each busy producing predictions that try to match the signal arriving from the layer below. [In this convention, lower levels are those situated closer to sensory organs.] Each layer will issue a prediction to the layer below, and will concurrently match the prediction it receives from above with the incoming signal from below. The matching will result in a “difference” signal (or, better, an Error Signal – ES) which is presumed to be the main/only signal that a given layer will send upwards. The ES thus carries upwards only the information that could not be predicted, or, if you prefer, only the surprising and newsworthy elements of the original raw sensory stimuli. We have explored before two additional ingredients:

  1. For such a system to work, it is necessary that whenever a signal passes from one layer to the other, it must carry some information about its expected precision/confidence. [We have also seen why it is reasonable to conflate precision and confidence into one single “measure”.] PW allows a given layer to dynamically give more importance to the error/sensory signal arriving from below or to the prediction issued from above. It is generally assumed that precision/confidence information is encoded as the gain (strength) of a given signal.
  2. Such an architecture is proposed to continue uninterrupted from layers that deal with sensory information, all the way to layers that are concerned with action selection and control. In this latter case, the ES will (or might) also be used to control muscles/effectors. Reducing the confidence (gain) of motor-related prediction signals will thus allow to “plan” actions, without triggering actual movements.

We have also seen before that, at levels concerned with integrating information coming from different senses, PW becomes important to deal with possible conflicts. For example, when watching TV, sounds will not seem to come from the TV speakers, but from the images themselves, as visual stimuli come with much higher spatial precision than acoustic ones. Thus, PW proposes to explain how sensory stimuli can be integrated, as well as why and how a perfect matching isn’t required.

When trying to understand how a complex system/mechanism works, it is often very useful to explore anomalies, especially when one is proposing a strictly mechanistic explanation of the inner workings of such systems. This makes perfect sense: any given mechanism must be constrained, and therefore it is reasonable to expect that it will not work particularly well under unusual circumstances. Moreover, particular idiosyncrasies will be specific to given mechanisms (different implementations will be characterised by different anomalies). This means that studying where things “go wrong” allows to match failures with hypothetical mechanisms: some mechanisms will be expected to fail in one way, some in an other. Thus, a theory of perception that happens to easily accommodate known (and hard to explain) perceptual anomalies (such as what happens when watching TV) and/or neurological conditions, will look more promising than one that doesn’t. For us, this consideration means that it makes sense to look at how PP proposes to explain some of such failings with the aid of PW.

One such anomaly is the rather spectacular rubber hand illusion:

To say it with Seth (2013):

[S]tatistical correlations among highly precision-weighted sensory signals (vision, touch) could overcome prediction errors in a different modality (proprioception)

In other words, proprioception isn’t very precise, or, more specifically, produces reliable signals about movement and changes in forces; thus, in the unusual experimental conditions (people are expected not to move their hidden hand), and given enough time, the relatively high precision signals coming from sight and touch can take precedence, forcing the overall system to explain them (that is: successfully predict them) by assuming the rubber hand is the real one.

Perhaps more interestingly, it’s also possible to relate PW to more natural anomalous conditions. One way to describe this line of thought it to ask: what would happen if the delicate balancing of precision versus confidence is systematically biased in one or the other direction?

On one extreme, we could imagine the situation where predictions tend to have too much weight. The result would be an overall system that relies too little on the supervision of sensory input and is therefore more likely to make systematic mistakes. If the imbalance is strong enough, the whole system will occasionally get flooded with abnormal errors (whenever the predictions happen to be very wrong, but issued with high confidence/gain), triggering an equally abnormal need to revise the predictions themselves, which could then realise a self-sustaining vicious cycle: more top-heavy, misinformed predictions will be issued, producing more floods of error signals, requiring even more revisions in the predictions themselves. The result would be the establishment of ungrounded expectations, which would then have visible impact on both perception (how the subject experiences the outside world) and on the overall understanding of the outside world itself (beliefs). Recall that according to PP, prior expectations are intrinsically able to shape perceptions themselves. Wrong perceptions, when they are indeed very wrong, are normally called hallucinations, while wrong beliefs can be seen as delusions. Sounds familiar? Indeed, the combination of both represents the “positive symptoms” of schizophrenia. In short, a systematic bias towards prediction confidence, if PP is broadly correct, would produce a system which is unable to self-correct.

On the opposite extreme, what would happen if issued predictions are not trusted enough? In such cases, prior knowledge would fail to help interpreting incoming signals, making it harder and harder to ‘explain away’ a given stimulus, as even the right predictions might struggle to quash out the incoming signals (which will then be interpreted incorrectly as a genuine ES). A subject afflicted with this condition will be able to react correctly to very familiar situations, where confidence in the prediction is highest and is therefore strong enough to reduce the ES. On the other hand, in new and ambiguous situations, predictions will systematically struggle to perform their function even when correct, and will therefore force the subject to attempt re-evaluating the current situation over and over. This would allow to gradually increase confidence in the issued predictions, and thus regain the ability of appropriately react to the outside world, at the cost of an abnormally high investment of time and attention. It’s easy to predict that such subjects will naturally tend to avoid unfamiliar circumstances and that they will also find it hard to correctly navigate the maze of ambiguities that we call natural language. In this case, an excess of error signal doesn’t lead to hallucinations and delusions because the “supervision” of sensory information happens to be too high (not too weak!), and thus only very precise predictions, i.e. those able to exactly match the stimuli, will have the best chance of reducing error signals to manageable levels. Once again, this kind of condition should also sound familiar: it is tantalisingly similar to autism. It’s worth noting that this approach is entirely compatible (indeed, I see it as a proposal of how the general principle might be implemented) with the well established view that autism is connected to an impaired ability to use Bayesian inference; for the details, see Pellicano and Burr (2012).

This leads me to the matter of attention. According to Friston and many of the PP proponents (see Feldman and Friston, 2010), attention is the perceivable result of highly weighted error signals. On the face of it, it makes perfect sense: what should we pay attention to? To whatever is news to us, and therefore, to what we struggle to predict. Moreover, we should be able to direct attention according to our current task: this can be readily done by reducing the confidence on the predictions we are making. By doing so, we’d amplify the residual error signals concerned with what we are paying attention to, making only very precise predictions (precise in the sense of being a perfect match of the incoming signal) able to reduce prediction error. This reinforces the view of autism sketched above: autistic individuals would thus be unable to command their attention and would instead be forced to attend any stimulus that isn’t readily explained away.

Conclusion

Predictive Processing, once enriched with the concept of Precision Weighting, is able to propose a preliminary sketch that includes reasonable explanations of how we manage to make sense of the world, learn from sensory information, plan, execute and control actions, pay attention to something and/or get our attention diverted by sudden and unexpected stimuli. Moreover, our abilities of dreaming and daydreaming are easily accommodated (in ways I might explore in the future). If this wasn’t enough, it also aspires to explain why and how certain well-known pathologies work, and is generally able to accommodate many perceptual illusions and anomalies. In other words, one single theory is proposing to explain much of what the brain does. This in a nutshell is why I’ve dedicated so much of my spare time to this subject: for the first time I get the impression that we might have some hope to understand how brains work – we now have a candidate theory which is potentially able to offer a unifying interpretative lens. Otherwise, without a set of general and encompassing principles, all our (increasing) understanding would be (has been) condemned to remain local, applicable only within a given restricted frame of reference (how neurons communicate, how edges are detected in vision, and so forth).
Given my background in neuroscience, I expect that my excitement comes with no surprise. Fair enough: but is my enthusiasm justified? Perhaps. To answer this question in the following posts I will look at what I find unconvincing or underdeveloped in the PP world. I might also use the occasion to err on the overconfidence side(!) and propose some of my ideas on how to tackle such difficulties.

Bibliography

ResearchBlogging.org

 

Clark, A (2016). Surfing Uncertainty: Prediction, Action, and the Embodied Mind Oxford Scholarship DOI: 10.1093/acprof:oso/9780190217013.003.0011

Feldman, H., & Friston, K. J. (2010). Attention, uncertainty, and free-energy. Frontiers in human neuroscience, 4.

Pellicano, E., & Burr, D. (2012). When the world becomes ‘too real’: a Bayesian explanation of autistic perception. Trends in cognitive sciences, 16(10), 504-510.

Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in cognitive sciences, 17(11), 565-573. DOI: http://dx.doi.org/10.1016/j.tics.2013.09.007.

 

Tagged with: , , , , , , , , , ,
Posted in Neuroscience, Psychology

Not quite wrong enough

In my last posts on politics I’ve made a few predictions. Wrong predictions! In this post, I want to acknowledge my errors, reflect on what they mean, and perhaps make a few more in the process. In a nutshell, the root of my mistakes is clear: the initial directions taken by both May’s government and Trump’s administration have been openly fascistic and seemed to encounter little resistance, especially in May’s case. This sent sending me down the path of the gloomiest predictions. Luckily, I was wrong (with my immense relief), but unfortunately, not quite wrong enough.

What I got wrong.

On the US side, keeping in mind that I only have second-hand knowledge of the situation, I had underestimated the strength of constitutional checks and balances (along with the volume of bottom-up dissent). I don’t think I had also overestimated Trump’s capacity to exploit the situation, if anything, I was expecting him to make more mistakes, driven by his massive ego. The specific prediction I got wrong was that Trump would exploit, if not facilitate, internal unrest, and use the consequent emergencies to suffocate the system of checks and balances that limit the executive powers of the presidency. I also predicted that this kind of scenario will unfold really quickly, and I can’t emphasise enough how happy I am to realise that I was wrong. Happier every day. I honestly have no good explanation on why I was wrong, but I do fear that the reasons why unrest might explode at any time are still valid. I am also still convinced that riots, or any form of civil unrest that is widespread enough to disrupt productivity, can still be exploited by Trump’s administration to undermine the democratic institutions of the country. Thus, I’m left in a state of fearful hope: what if I got only the timing wrong, while my worst fears are still valid? I can only hope I was entirely wrong!

On the UK side, my fear was that the authoritarian inclinations of Theresa May, and a good proportion of her Tory supporters, was backed by a decent amount of competence and that her fascistic aspirations could go unrecognised by both the main stream media and by a sizeable proportion of the electorate. Luckily, a crucial assumption was entirely wrong: despite the fact that May did have a reputation for high competence, Theresa May called a snap election without having a convincing reason to do so. She then ran the worst campaign I’ve ever witnessed, and in doing so, demonstrated to the country and the whole world how utterly incompetent she is (along with her whole team, one would think). I feel that my mistake was entirely justified: yes, you can never overestimate human stupidity, but assuming that your adversaries are a bunch of witless morons is a very obvious act of self-harm.

From the seminal mistake above (it appears that May herself, as well as her strategist, genuinely don’t have a clue), a second mistake followed. I have also predicted that “Corbyn and McDonnell are sleepwalking into their own obliteration“. Under the assumption that the Tories wouldn’t shoot themselves on the foot by their own initiative (an assumption that one is forced to make, when thinking about strategy), this could have been the case. However, I did underestimate two things: (1) Corbyn’s ability to appear genuine, along with the renewed appeal of his sensible domestic policies. (2) How well the deliberate ambiguity of Brexit would work.

There isn’t much to say about (1). Corbyn appears sincere, and he probably is, broadly speaking; I am 92% sure that he does mean well, although I can’t be persuaded that he genuinely believes in the open approach to decision-making he advocates (I can’t, because he never follows his own advice!). On point (2), there is much to be said, giving me the chance of making even more (hopefully wrong!) predictions.

Mistakes you need to make.

Along with problems that are good to have and problems that should not be solved, another mantra of mine is that some mistakes need to be made. The typical example is when there is a lesson to be learnt: sometimes making a mistake (preferably under controlled circumstances, where the consequences can be minimised) is the only effective way to permanently learn the lesson and reduce the likelihood of making the same mistake again, when stakes might be too high. [There is an interesting argument about the roles of parenting and education to be made here, perhaps something worth a separate discussion.] In the case of one of the wrong predictions I’ve made in the past 6-8 months, however, the mistakes I’ve made were mistakes that should not be avoided, which is different, and interesting in itself (to me, at least). It’s useful to learn to detect and react appropriately to these kind of counter-intuitive situations, so I’ll write down my reasoning here, doing so solidifies it (useful for me) and might be thought-provoking to my occasional readers. It is also very relevant to the current political situation, so please bear with me.

Mistakes that should not be avoided are a specific case of mistaken predictions, which may happen when the act of issuing a prediction can influence the outcome. In my case, I’m living in a society that is manifesting numerous warning signs: there is a very visible drive towards authoritarianism/fascism. Making the prediction that other parts of society will counterbalance this drive automatically and inevitably weakens the defences is question: if you are confident there is no danger, you will not spend your energies resisting it. If everyone involved feels the same, they will not push in the other direction, leaving the original drive free to steer society in the wrong direction. Thus, anyone who recognises such an unusual feedback is faced with a choice. One option is to issue the prediction one would hope is right (or, more weakly, choose to remain silent, because of it): people will recognise and reject fascism. This prediction automatically undermines itself, so in terms of predictable effects, it helps brining about the undesired outcome. The other option is to sound the alarm, hoping to be wrong. Doing so makes it more likely that things will turn out well.
It is paradoxical: the act of expressing a prediction is bound to reduce the likelihood that the prediction is correct. I know very well that in the case of my own prediction, its own effect is tiny enough to be well below being detectable. I don’t care. If everyone chose to play it safe, fascism would encounter zero resistance; I am not going to be complicit.

Overall, the choice above is not really a choice, not if you care about the outcome more than about your own track record. The only reasonable thing to do is pick the second option and shout the alarm as loud as possible.

In short: I could not be happier to acknowledge that my specific prediction (there is an authoritarian drive in the UK and it is not being met by an appropriate backlash) was wrong. For now. The situation might change: for as long as the worrying signs are present I will continue to call for countermeasures.

There are self-fulfilling prophecies, but also self-undermining ones, one ought to recognise them and act accordingly.

Consequences

I’ve learned one lesson: I do not know enough about what is happening in the US. Situation still looks very alarming, I still think a shitstorm might explode anytime, but I know there are many forces at play, most of them unknown to me. This makes all of my predictions moot, so I may as well avoid making them.

In the case of the UK, I’m happy to keep getting it wrong: here is my assessment of the current situation.

  1. The macroscopic and unprecedented mistakes made by the Tories are certainly due, at least in part, to their own hubris. They thought Corbyn was a lame duck and underestimated their own weaknesses (see above: they relied on a self-undermining prediction, ha!). Assuming they will repeat the same mistake again would be utterly foolish.
  2. The strongest rhetorical weapon of the Tories has been somewhat weakened, but it is not neutralised. It is self-evident that some Tories have been betting on the failure of the Brexit negotiations. In such a case, there is little doubt that the plan was to put all the blame onto the evil (undemocratic, unaccountable, etc.) European bureaucracy. To make this move effective, the Tories need to re-establish their own credibility, which isn’t easy, but I am not ready to bet that it’s impossible.
  3. Corbyn and McDonnell might still be sleepwalking into their own obliteration. If the Tories will find a way to neutralise their own hubris, they will automatically expose the blind self-righteousness of Corbyn and the Labour left (see below). In other words, the outcome of the 2017 General Election makes it more likely that Labour will fall on the same hubristic trap that has almost destroyed the current Tory leadership. We must try to compensate for this, which requires to actively push in the opposite direction.
  4. As far as Brexit goes, it would be a mistake to assume that it is now likely that Brexit will not happen. Once again, making this prediction inherently undermines it. Thus, the only reasonable strategy is to keep fighting against Brexit. The best way to do so hasn’t changed one inch (for some of my ideas, see this post and the preceding ones).

One entirely positive effect of the last election is that it is now visibly wrong to assume that the neoliberal overreach (links to an excellent article by Simon Wren-Lewis, see also this equally good one by Simon Tilford) is the only kind of rhetoric that chimes with the public. The importance of this change cannot be overestimated (by Dougald Hine) and is due to the relentless efforts of Corbyn and co. (as well as many con-causes, obviously). Yes, while acknowledging my own mistakes I also want to highlight what they did do well! Specifically, this historic change of mood is happening also because Corbyn and his team have forcefully ignored all advice intended to move them towards the so-called centre ground. I applaud their resilience, with all my heart. I also worry that the same resilience will mean they will keep favouring Brexit, and do so in a covert and oblique way (as they are doing now).

Taking an ambiguous stance while working towards a covert objective will inevitably backfire (only question is when and how). Most of Corbyn’s capital is in the form of personal credibility. He appears genuine and trustworthy, probably for good reasons. However, this capital can be destroyed in the blink of an eye: it will disappear instantly, if the electorate will conclude that Brexit was a bad idea and that Corbyn backed it all along. Moreover, sooner or later, Corbyn will have to abandon the current ambiguity, he will need to choose between an act of national self-harm (implicitly affirming that he doesn’t care for the well-being of his electors, not if that means compromising on his ideals), or to revise his world-view and accept that the EU is a problem that is worth having (see here and here). Depending on his previous actions, Corbyn might find himself already forced to pick the first option, which would be catastrophic.

Brexit is bad for the country and worse for the international scene. Backing it means backing the wrong forces of history. Anyone who cares for peace, international stability and development should be busy managing or fixing the many problems that afflict the EU. Choosing to help destroying the most effective peace-making project in the history of humanity is inexcusable and foolish.

For us single individuals, the course of action is therefore obvious.
We need to keep saying that Brexit is the worst decision the UK could take. We need to point out that it was taken on the basis of false information, the public was systematically misled, we need to remind everyone that the choice of 37.47% of the electorate cannot be misrepresented as “the will of the people”. We also need to keep asking Labour to stop backing Brexit. Brexit is self-destructive, contrary to all the values shared across the party (admittedly, it is not entirely incompatible with the values that distinguish the Labour’s left); but above all, it is morally indefensible.

Tagged with: , , , , , , ,
Posted in Ethics, Politics, Stupidity

Predictive processing: action and action control.

In our exploration of the Predictive Processing (PP) framework, it is time to complete the overall theoretical sketch by discussing how action and action control fit in the overall picture. This will allow to finally appreciate the astonishing explanatory power that precision weighting is proposed to carry.

Tasks that appear to be simple, such as walking, are not simple at all.
Image by Vanillase  [CC BY-SA 3.0].

[Note: this series of posts concentrates on Clark’s “Surfing Uncertainty” book. For a general introduction to the idea, see this post and the literature cited therein. The concept of precision weighting and how it fits in the PP framework is discussed in the previous post of this series.]

So far, we’ve seen that sensory input is processed by trying to anticipate it across a series of hierarchical layers which compare mini top-down predictions with the bottom up signal coming from sensory pathways. One concept that I find important to fully grasp is that, when a sensory organ transduces a stimulus into a nervous signal, only the first PP layer will actually receive what we can easily consider as the nervous representation of the original stimulus (probably including the expected precision of the signal itself), the next level up will receive only the prediction error, meaning that if the prediction was spot-on, no further signal will be sent to the higher levels at all. The absence of an error signal then must be considered as a signal in itself, meaning: “prediction was correct”. In terms of action and action control, this special quality of PP signalling pattern will play a crucial role, which we are about to explore.

Clark discusses the problem of action control and the solution proposed by PP in a biological-centric way, he does not ignore the engineering perspective (i.e.  action-control of manufactured robots and effectors), but doesn’t quite put it into the centre stage. Clark’s approach makes a lot of sense, of course. However, I found that in order to appreciate it in full one needs to be armed with a large amount of multidisciplinary knowledge, which I wouldn’t be able to summarise here. For this post, I will try to explore the same topic starting from an engineering point of view, which I hope will make the subject easier to follow, even for non-specialists.

As we saw for the case of measuring instruments, also action control is a problem that has been extensively studied by engineers. It turns out that allowing mechanical artefacts to autonomously act on the world is a hard problem to solve, especially if high precision is a requirement. Since the world is noisy, even in a highly controlled environment (such as an automated factory), noise, in the form of random deviations from the perfect “action” (as idealised in the engineers’ “plan”) will interfere with the movements enacted by a given robot/effector. This poses the problem of detecting such deviations and correcting them in real-time. The intuitively sensible way to allow machines to interact with the world with high precision is to allow feedback loops, where the robot finely readjusts its movements according to the aforementioned “plan”. This strategy is potentially very powerful, but it is extremely difficult to implement in practice, as it requires to design complex control systems: these define how the robot will  detect each possible deviation and how it should dynamically readjust its actions while they are already occurring. The standard way to tackle this problem would be to have a long sequence of logic “if/then” steps. In the real world, this approach becomes quickly impractical as it entails an explosion of interacting possibilities; it is really hard to produce robots that are able to run the program quickly enough to intervene on their actions in a timely fashion. Moreover, the situation becomes unmanageable once one realises that changing the action plan while it is executed inherently changes what should count as new anomalies. If the “plan” itself keeps changing, also the systems used to detect deviations need to dynamically readjust accordingly, while what would be an appropriate reaction to further departures from an already changing plan would also change at the same time! If you sensed the dangers posed by bottom-less recursion, I’d say that you have grasped the computational difficulty that is inherent in action-control.

Realising how hard this problem is has a direct consequence in our context: it is self-evident that animals in general and humans in particular are very good at the what we have just found to be computationally difficult (to put it mildly). The questions that should therefore puzzle more or less every neuroscientist interested in action control are:

How can nervous systems achieve what seems to be almost impossible?

Or:

How does it happen that extremely complex dynamic actions such as walking along a hiking trail (where the surface is uneven and each step requires different and fine adjustments) are normally fluid and feel effortless?

As you’re probably guessing, PP promises to solve this particular conundrum. Let’s see how.

It is well known that proprioception (the ensemble of sensory signals that report about position of our movable parts along with forces applied to them) follows its own sensory pathways, which, in somewhat surprising ways, are still hard to fully understand. In PP, sensory “prediction-based” architecture is expected to apply also to proprioception, with the added expectation that proprioception error signals are also used to control effectors (muscles). In this context, predictions represent, as before, the best guess the organism can produce for what a given sensory signal should be in the current context. Importantly, the last sentence implicitly contains a major twist in our story: in the proprioceptive arena, the context necessarily includes what the body is doing, or, if you prefer, it includes action. Better still, action (how the sensed body is moving) is inevitably a major ingredient of what signals are produced by proprioceptive organs. This means that context-dependent predictions have to be heavily influenced by what the organism is doing; it is a clearly strict requirement for the PP model to even apply to proprioception as a whole. Thus, according to PP, at any given level in a proprioceptive pathway, a higher PP layer would produce a prediction of what the proprioceptive signals would be if the body was moving in the expected way.

As a consequence, if PP does apply to proprioception, the relevant prediction error signals become concise descriptions of what isn’t moving according to the “original plan” (the prediction). PP theorists therefore propose that the prediction error, besides participating in the usual PP pattern, can also be recycled to control muscles. The key element here is that error signals are the “distilled” representations of the deviations from the expected action plan: they are inherently the exact kind of information that is required to readjust and can therefore be used more or less directly to control muscles **[Update 01/07/2017: following Clark’s kind feedback, please see note below for two important addenda]. Moreover, because the same error signals also participate in the multilayered PP pathway, large deviations will get a chance to travel upwards to higher level layers, and will thus be able to influence the overall plan and/or trigger a radical re-evaluation of the current high-level hypotheses. In this way, the overall PP architecture is able to directly explain how finely tuned control is even possible, as well as the role that proprioception is expected to have in our ability to understand what is going on in the real world. Depending on the strength and amount of prediction errors, error signals may trigger fine movement readjustments, and/or a change of plan, and/or force the organism to realise that the current best hypothesis about the state of the world was wrong and needs to be re-evaluated.

Naturally, real-time control must be supported, and this is inherently included, for the lower layers will be able to produce quick and small adjustments (with minimal impact on the overall plan), while big prediction errors will fail to be ironed out by the lower layers and will keep travelling upwards, where, if necessary, the original plan itself might change in more significant ways (which would, unsurprisingly, require more time). If even major action plan changes would fail to minimise proprioceptive prediction errors, the overall increase of error signals would force a re-evaluation of the context itself, as this condition inevitably occurs if/when the current state of affairs is likely to be quite different from the currently active “best explanation” computed by the overall PP system.

Going back to our engineering perspective, it is worth noting that, for control problems that include more that one linear degree of freedom (applies to virtually all action-control issues encountered by complex organisms), common artificial controllers end up being error-minimising feedback circuits (see for example Proportional-Integral-Derivative controllers / PID-controllers), which are at the very least analogous to a single PP layer.

For single PP action-controlling (proprioceptive) layers (as well as PID-controllers), if the computed error signal is entirely cancelled, it means that “everything is proceeding according to plan” and therefore the effectors receive no new control signal and can continue operating as planned. This chimes with the observation I’ve reported above: the absence of an error signal becomes a signal than means “all is well, no adjustment is needed”.

To complete the “action-control” picture I’ve tried to summarise, one element is still missing: the role of precision weighting. As per “passive” sensory pathways, proprioception organs will also have their inherent precision, thus, the initial sensory stimulus will still have an associated precision which can be weighted against top-down confidence. In terms of action planning and control (fine tuning of an action plan can be conceptualised as action planning and control at a high spatio-temporal resolution), the confidence that we have on a given action plan would be a direct function of how confident we are on our assessment of the current situation, as well as “how robust” the current plan seems to be. In PP, this confidence measure can be obtained by recycling the residual error produced by whichever PP layer is issuing the relevant “prediction for action”. Since all PP layers are expected to report a prediction error along with its precision/confidence weighting, this information is always available, making it theoretically possible for any PP layer to control action. This is important, but requires a long digression which I plan to follow separately. For now, I will concentrate on the proposed function of the precision weighting signal in action control.

At one level, it is obvious: low confidence in a given action plan (justified either by low confidence on our current evaluation of the external state of affairs, or by a low confidence on the effectiveness of the plan itself), means that deviations from the plan will have higher relative importance. Thus, error signals will have a bigger chance of travelling towards higher level PP layers and less propensity of being “explained away” by adjustments to the action itself. This mechanism follows the general PP architecture without any ad-hoc change and seems entirely appropriate: the lower the confidence on our action plan, the higher our propensity of radically changing the plan should be. Moreover, one interesting case is what happens when confidence is minimal (I don’t think it can be zero*). In this case, the possibility that such “extremely low confidence action predictions” will have of actually controlling action will be minimal, perhaps to the point of having no chance of initiating and/or influencing any movement at all. Thus, such predictions will remain output-less: they should be understood as action-plans that are not expected to be acted upon.

… !!! …

Yes, what you are thinking is what I meant! Adding precision weighting to the proposed PP action control mechanism immediately explains how brains may become able to produce and evaluate alternative action plans and, by extension, allows to start building an explanation of how imagination and day-dreaming can be implemented. Along the way, the basic mechanism underpinning actual dreams is also implied. QED, if you are reading this, I hope you are starting to understand the huge explanatory potential of PP in general and of precision weighting in particular.

Bibliography and notes

* In mainstream PP implementations, precision weighting is encoded as the gain of a given signal (irrespective of its direction). Thus, a prediction issued with zero confidence would be implemented as a signal with zero gain, which means “no signal at all”.

** [Update] Andy Clark has very kindly (thanks!) made me realise that it may be useful to make two additional points explicit.
1. In the special case when action is being initiated, the error signal will be maximal (the prediction would be entirely wrong, as the expected movement isn’t happening at all). In this situation, the error signal itself would contain precisely the information needed to get the planned action started. To be translated into actual movement, precision weighting must, in this case, markedly favour the prediction itself. In this way, PP becomes a unified framework which may be able to encompass perception, action selection (issuing the prediction in question), action control, and learning (see below).
2. Importantly, the proposed architecture is also able to learn. The whole idea is that error signals that can’t be cancelled by issuing more accurate predictions will ignite additional mechanisms dedicated to finding new and better predictions. I confess that I don’t have a clear idea of how such mechanisms are expected to operate (in terms of precise neurophysiological mechanisms, I might tackle this point in a later post), but in this context, it is important to note that the multilayered architecture allows for a concurrent “search” of more apt predictions across the whole stack, from perception to action control, passing through action planning/initiation. This allows to dynamically accommodate deviations that are due to noise, as well as bigger changes (say, in the case of a damaged limb, extreme tiredness, or a change of situation – swimming, for example).
The proposed architecture actually (/theoretically) allows to bootstrap action control itself: in fact, this view directly affects how we might interpret the uncoordinated movements of newborns. The main purpose of such relatively random (or apparently aimless) movements might in fact be to allow the whole stack of layers to search for and select appropriate predictions, based on the feedback signals that are triggered by the movements themselves.

ResearchBlogging.org

Clark, A (2016). Surfing Uncertainty: Prediction, Action, and the Embodied Mind Oxford Scholarship DOI: 10.1093/acprof:oso/9780190217013.003.0011

Tagged with: , , , , , , ,
Posted in Neuroscience, Philosophy, Psychology

Predictive Processing: the role of confidence and precision

This is the second post in a series inspired by Andy Clark’s book “Surfing Uncertainty“. In the previous post I’ve mentioned that an important concept in the Predictive Processing (PP) framework is the role of confidence. Confidence (in a prediction) is inevitably linked to a similar, but distinct idea: precision. In this post I will discuss both, trying to summarise/synthesise the role that precision and confidence play in the proposed brain architecture. I will be doing this for a few reasons: first and foremost, much of the appeal of PP becomes evident only after integrating these concepts in the overall interpretative framework. Secondarily, Clark does an excellent job in linking together the vast number of phenomena where precision and confidence are thought to play a crucial role, thus an overview is necessary in order to allow enumerating them (in a follow-up post). Finally, reading the book allowed me to pinpoint what doesn’t quite convince me as much as I’d like. This post will thus allow me to summarise what I plan to criticise later on.

Image adapted from Kanai et al. 2015 © CC BY 4.0

[Note: this series of posts concentrates on Clark’s book, as it proposes a comprehensive and compelling picture of (mostly human) brains as prediction engines, from perception to action. For a general introduction to the idea, see this post and the literature cited therein. As usual, I’ll try to avoid highly abstract maths, as I’d like my writing to be as accessible as possible.]

Precision and confidence: definitions.

Precision is a common concept in contexts such as measurement, signal detection and processing. Instruments that measure something (or receive/relay some signal) can never produce exact measures: on different occurrences of the same quantity (whatever it is that it’s being measured/transmitted), the resulting reaction of the device will change slightly. To be honest, it’s more complicated than that: in discussing precision, one should also mention accuracy and how both values are needed to characterise a measurement system – as usual, Wikipedia does a good job at describing the two, allowing me to gloss over the details, for now.

The point where we first encounter precision is when dealing with perception: it goes without saying that perceptions rely on sensory stimuli, and these can be captured in ways that are more or less precise. For example, eyesight can be more or less precise in different people, but for all, the precision will drastically drop when looking underwater with our naked eyes. Our vision underwater becomes heavily blurred, and I think that we can all agree to describe this situation as a marked drop of precision in the detected visual signals.

Confidence is more slippery concept: the term itself is loaded because it presupposes an interpreter. Someone must have a given degree of confidence in something else: “confidence” itself cannot exist without an agent. I’ll come to this thorny philosophical issue (and others) in later posts. For now, we can discuss how Clark uses the concept (which is typical of PP frameworks). The general idea is that perception is an active business. Brains don’t passively receive input and then try to interpret it. In PP, brains are constantly busy trying to predict the signals that are arriving; when a prediction is successful, it will also count as a valid interpretation of the collected stimulus (one attractive feature of this architecture is that it allows to collapse certain powerful forms of learning along with active interpretation of sensory input: if PP is roughly correct, they happen within the same mechanism). In mainstream PP theories, prediction happens continuously at multiple layers within the brain architecture and is organised hierarchically, different layers will be busy predicting different aspects of incoming signals.
Within this general view, the idea of multiple layers allows to avoid positing a central interpreter that collects predictions: at any given time, each layer will be busy producing predictions for the layer below, while also receiving predictions from above. Thus, having dispensed of the dreaded homunculus (a central, human-like interpreter), the concept of confidence becomes more tractable: a given prediction is now a bundle of nervous signals, which can come encoded with some associated confidence (indicating the estimated likelihood that the prediction is correct), without having to sneak-in a fully fledged interpreter. The encoded confidence can have systematic effects on the receiving layer and exert such effects in a purely mechanistic way.

Thus, we can generally expect incoming (sensory) signals to arrive along with their evaluated precision (a mix of precision and accuracy, to be fair) while the downward predictions travel with a corresponding (but distinct!) property which looks at least analogous to what we normally call confidence.

What counts, and what is proposed to explain a fantastically diverse range of phenomena (from attention to psychosis, from imagination to action), is the interplay between precision (coming up, arriving in) and confidence (going down, from centre towards the sensory periphery). Let’s see a general overview, which will allow to refine the current sketch.

Interplay and conflation between precision and confidence.

In PP, any given layer would receive two inputs, one is arriving from the sensory periphery, the other is the prediction issued by higher-level layer(s). The general schema posits that the two inputs are compared. If the two signals match perfectly, the layer will remain silent (a sign of a successful prediction), otherwise the difference will be sent back to the higher level layer, signalling a prediction error. What precision and confidence do, in the PP flavour generally espoused by Clark, is change the relative importance of the two inputs (within a layer) and the importance of each signal in general, across all layers. Thus, a very precise signal will, in a sense, overpower a not-so-confident prediction; a very confident prediction will in turn be able to override a not so precise signal. Simple, uh? Perhaps an example can help clarifying. Our eyesight is quite precise in detecting where the source of a given signal is: we can use sight to locate objects in space with very good precision. Not being bats, the same does not apply to our auditory abilities. We can roughly localise where a noise comes from, but can’t pinpoint exactly where. Thus, vision has high spacial precision, hearing does not.

When I’m slumped on the sofa watching TV, the sounds I’ll perceive will come out from the speakers; however, I’ll perceive voices as if they were coming from the images of talking people within the screen. Why? According to PP, there will be a layer in my brain that combines auditory and visual “channels”. The visual one will be producing a prediction that a given sound comes from (the image of) a given mouth, the auditory channel will suggest otherwise (sound comes from where the speakers are). Thus, combining the two is a symmetric business: it could be that a given layer (driven by vision) produces the “source of sound” prediction and sends it to a layer which receives auditory data (from below). Otherwise the reverse could be the case, and the upcoming signal is visual, while the descending prediction is informed by the auditory channel. Either way, the visual channel (when discerning location) will have high precision (if upcoming) or high confidence (when issuing a prediction), while the auditory has low precision or confidence. When the two are combined to produce the prediction error (one that applies specifically to the combination of these two channels!), the visual signal will matter more, as it’s more precise/confident. Thus, if the prediction is visual, the error signal will be somewhat suppressed, signalling that the expectation (sound should come from where the mouth is seen) is likely to be correct. Vice-versa, if the prediction comes from the auditory channel, the error signal will be enhanced (signalling that the expectation is likely to be wrong). Either way, the end result doesn’t change: because vision is spatially more accurate than hearing, the final hypothesis produced by the brain will be that the voice is coming from where the mouth is seen, and the discrepancy across the two channels will be superseded.

This (oversimplified) example is interesting for a number of reasons. First of all, allows me to introduce another fundamental concept, which I’ll enunciate for completeness’ sake (I will not explain it in this post). In PP, what we end up perceiving at the conscious level is the most successful overall hypothesis: the combination of what all the layers produced, or the one hypothesis that is able to better suppress the error signals globally (within a single brain). There is a lot to unpack about this concept, so much so that even a full book can’t hope to explore all implications (more will follow!); for now, I will need my readers to take the statement above at face value.

The second interesting point is that the description above shows a peculiar symmetry: it doesn’t matter whether auditory information is used to produce a prediction, which is then matched to what is arriving via the visual pathway (in PP, this will be itself a residual prediction error), or vice-versa. In either case, we’ll perceive the sound as if it was coming from the viewed mouth. In turn, this means that the confidence of predictions (flowing down) and the precision of sensory signals (which are, after the very first layer, always in the form of residual errors!) are always combined, and can be modelled in terms of relative weight (higher weight is given more importance). In other words, the two values matter only relatively to one another; at a given layer, the effect of precision and confidence is determined by relative importance alone. That’s quantifiable in a single number, or, if you prefer, by a unidimensional, single variable.

Third observation is that, in view of the last point, the conflation of precision and confidence espoused by Clark and most of the PP theorists (for a paradigmatic example, see Kanai et al. 2015, where precision and confidence are described as a single variable, encoded by the strength of neural signals) is justified – at least, it is justified at this level of analysis. Because of how PP is supposed to work, it seems reasonable to conflate the two and sum them up in a single measure. In practical terms, the move is sensible: to describe the effects of precision and confidence on a single PP layer, all we need is a single measure of relative weight. Conceptually, it also makes sense: after the first layer, the upcoming signal (what I’ve described so far as incoming, sensory, information) is in fact a prediction error, which is in itself heavily influenced by the predictions that shaped it along the way. Thus, upcoming (incoming) signals cannot be said to encode their own precision (as they aren’t measurements any more), they de-facto encode a precision-cum-confidence signal. Overall, to fully embrace the PP hypothesis we are asked to collapse the (usually) distinct concepts of precision and confidence (at least for the upcoming signal); failing to do so would count as an a-priori rejection of the whole paradigm.

The above might look preposterous and over-complicated, however, I would like to remind my readers that brains are the most complex objects known to humanity (How complex? Beyond our ability to comprehend!). Thus, it would be unreasonable to expect that we could make sense of how they work via a single approach that also happens to be simple. Moreover, it’s relevant to note that both the concepts of perception (intended as mere signal detection) and prediction include their respective evaluation of reliability: any system described via one of the two concepts requires to treat either precision or confidence, in order to be fully functional (as commonly understood). What use is a weather forecast if it doesn’t at least implicitly come with an assurance that what it predicts is more accurate than pure guesswork? Would you use a measuring instrument that returns random numbers? Thus, I’d argue that a discussion of precision and confidence is necessary for any serious PP model, it is not a secondary hypothesis (or ingredient), it is as fundamental as the idea of prediction itself.

Finally, in the next post we’ll see that indeed, the proposed role of the interplay between precision and confidence is also the reason why PP is such an attractive proposition: the potential explanatory power of this orchestration is indeed stunning, to the point of being, perhaps, too good to be true.

Bibliography

ResearchBlogging.org
Clark, A (2016). Surfing Uncertainty: Prediction, Action, and the Embodied Mind Oxford Scholarship DOI: 10.1093/acprof:oso/9780190217013.003.0011

Kanai R, Komura Y, Shipp S, & Friston K (2015). Cerebral hierarchies: predictive processing, precision and the pulvinar. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 370 (1668) PMID: 25823866

Tagged with: , , , , , , , ,
Posted in Consciousness, Neuroscience, Philosophy
Follow me on Twitter

All original content published on this blog is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Creative Commons Licence
Please feel free to re-use and adapt. I would appreciate if you'll let me know about any reuse, you may do so via twitter or the comments section. Thanks!

%d bloggers like this: