DNA, ideas, knowledge, books, computations, schedules, job descriptions, money(!), bank accounts, music, culture, beliefs and every last thing that has some importance in our lives has something to do with “information”, but still, all my attempts to find a no-nonsense, unique definition of information that can be directly applied to all of the above have failed. Result: I will try to make up my own definition; but it ain’t easy, so I’ll start my quest by doing two things:
1. Use this post to put some order in my thoughts.
2. Call for help: hopefully, this post may be thought-provoking enough to get some smarter and more knowledgeable people to challenge it. If this happens, I expect to learn valuable lessons.
Hence, the all important disclaimer: what follows is work-in-progress, unlike most of my posts, that usually aim to reach some definite (but never definitive) conclusion, this current effort is intended to be fluid, I expect my conclusions to change as I learn more.
Why now? Two reasons, first of all, this is long overdue. I’ve been talking about the nature of knowledge for quite some time, but clearly, knowledge is somehow linked to (or based on) information, so I can’t possibly consider my understanding sufficiently solid without providing (or adopting) a convincing definition of information, and without clarifying how it relates to knowledge. In other words, I’ve been building on top of shaky foundations, and I would like to rectify.
Second reason: my vague ideas on the subject have been simmering for ages, and this thought-provoking post by John S. Wilkins provided the catalytic powers needed to start solidifying. If you’re vaguely interested in my writings, Wilkins’ post is a must read: please click the link and make sure to read the comments, that’s where most of the good stuff is.
Information is a slippery concept, it is usually understood and/or defined on the basis of Shannon’s Information Theory, but this never satisfied me because to my eyes, Shannon’s theory is a highly abstract theory of Signal Transmission, something that is about moving information around and not really useful to define what information is. What I am trying to find is a precise conceptual definition of information, one that would be possible to apply to all the domains mentioned in the first sentence of this post, and that would therefore help to discern what they have in common; this is because I have a strong intuition that they do have something in common and that understanding what it is will be immensely useful.
What I’m after is a useful concept, not a universal truth. I am trying to find a definition that will help make sense of the world, as such it will be symbolic, and I’ve clarified already that I don’t expect any symbol to have perfect equivalence with anything in the real world. Still, the definitions of Information that I was able to find don’t satisfy me because they can only be applied to some domains (again, Wilkins’ post and discussion are perfect to illustrate this point), and I’m actively trying to find a better definition, where “better” means: more universal, a definition that applies to as many domains as possible.
Being and empiricist, I will start looking at physics, and try to build and generalise my definition from there, moving into more “abstract” domains in small incremental steps.
Step one: information in fundamental Physics
Luckily, I don’t need to invent much on this level. When you look for useful intersections between the laws of physics and Information Theory, you’ll find a neat and beautiful starting point in Landauer’s principle. In layman terms, the laws of thermodynamics tell us that whenever you do something, some of the energy you’ll use will always be irreversibly dispersed by increasing the overall entropy of the system. Entropy is closely related (almost exactly coincides with) disorder, and in the kind of world we inhabit usually takes the form of kinetic energy distributed to molecules, a quantity also known as heat. Information Theory can boil down to bits, the minimal, atomic amount of “information” (with scare-quotes as it’s Shannon’s kind of information) that can be stored or transmitted and storing or transmitting a single bit is “doing something” right? Therefore, thermodynamics tells us that doing it will always have a minimum energetic cost, dispersed in the form of increased entropy. To calculate it, one needs to observe that storing or transmitting one single bit requires to reduce the level of disorder (or uncertainty) of the medium that contains the signal (where our single bit is located) to virtually zero, so what we are doing is in fact the tiniest possible entropy pump, as it removes disorder/unpredictability from whatever we are using to host our bit. The result is that one can use the laws of thermodynamics to calculate the minimal energetic cost (the bare minimum necessary when using a perfectly efficient system) of one single bit. I will spare us the maths, and observe instead that Landauer’s principle provides a first neat and direct link between physics and (Shannon’s sort of) information. Not bad at all. But not quite enough to open the champagne.
We now have a link between bits (whatever they are) and the physical world, but no idea of what to do with these bits. We know that, whatever the medium, we can store or transmit these bits at an energetic cost, and that’s all. One problem remains open: the interesting property of signal-transmission (storage can be understood as signal transmission over time, so the “and/or storage” expression is redundant) is that it does nothing without a receiver, and even if we can define such receiver in general-enough terms, we still have to acknowledge that the receiver has an additional need, it has to “know” how to interpret the bits that it receives. In other words, signal transmission needs a system to “decode” the incoming information, otherwise it will be a meaningless series of zeroes and ones. Of course, we may also need a sender (whatever generates the signal) and a system to encode the signal into bits, to start the whole process. Shannon’s theory does a very good job at defining these additional elements, the sender, the receiver and one or two codes used to encode and decode the signal, but this doesn’t quite make me happy, because, at least in my understanding, it does not provide a generic/universal way to describe such elements in terms of fundamental physical properties. Without that, I see no reason to believe that we’ll be able to generate one single model (a useful set of concepts) that can be applied across the wide range of domains where information is typically applied (see the non-comprehensive list on top).
Step two: reducing “decoding and receiver” into bits
(Pun intended, with apologies!)
Before proceeding, I need to better explain what I think it’s still missing. If we take the complex domains I’ve listed above, contexts such as cultural transmission, financial transactions, musical performances, literature, and the lot, they can all be easily described in terms of (optional) sender and encoding (not all signals are deliberately created as such!), transmission medium, decoding, and receiver; therefore, Shannon’s theory is good enough for most “complex” domains, but still fails to be fully generalised, because it breaks down when one moves in the other direction, if one tries to decrease the level of abstraction and map Shannon’s concepts into straightforward physics, the only reduction that is readily available applies to the signal (via Landauer’s principle), while the other elements become a puzzle instead. Not good: some more thinking is needed.
I will continue from what we’ve established: the signal itself is contained in a physical object of any kind that allows to be modified so that it can “store” one or more bits. A simple switch (even if connected to nothing) can store one bit, flicking it one way or the other is reversible, and I could agree a convention with anybody, establish a code and thus use any switch to transmit a “yes/no” message. Waveforms, magnetic fields, transistor states and capacitor charges are all based on media that can be reversibly influenced to mean one thing or the other. But what do they have in common? They all have some structural properties that can (with an unavoidable energetic cost) be changed between two or more states. No mystery here, what remains unclear is what are the minimal common features shared by all sorts of decoding and all sorts of receivers.
Note that, for the time being, sender and encoding can be left aside because of two considerations. First, not all signals have a deliberate sender that encodes. When something makes a noise by falling, the sound I may hear is a signal, but sending and encoding are, at best, accidental. Second, one could probably work backwards: if we define decoding and receiving in clear-enough terms, it’s possible that we’ll also learn something also about encoding and sending.
Once more, the open question is: how do I reduce decoding and receiver to something that can be modelled in terms of fundamental physics? Here is how: the signal is a (usually reversible) property of a given medium, a structure, and a structure that will have some effect (of any sort) on the receiver.
Not helpful? Maybe not yet.
So far we have shifted the mystery into the expressions “some effect, of any sort” and “structure”. But hey, we only need one more step, and link all of the above with the concept of catalyst. The definition of catalysy, from the Oxford Dictionary is:
Catalyst – A substance that increases the rate of a chemical reaction without itself undergoing any permanent chemical change: chlorine acts as a catalyst promoting the breakdown of ozone.
In practice, imagine that when you dissolve A and B in a glass of water (where G is the whole system, glass with water, A and B), nothing else happens: A and B remain dissolved in water. You then add a catalyst C, and this enables a chemical reaction: A and B, in the presence of C, combine together to form substance Q (and some thermal energy); in the process your element C remains unmodified. What happened there is that the presence of C (equivalent to signal delivery), via a catalytic mechanism (equivalent to decoding) has produced an effect on G, creating Q.
Distilled and generalised, the above becomes: signals (or transmitted information) are (usually reversible) structures that have a catalytic effect. The code is the catalytic mechanism, the receiver is the system on which the catalytic effect occurs.
But one does not need to stop here, in fact the “reversible” qualifier is unnecessary, what remains is that, information is a structure that has some effects (even when the structure does not survive such effects, and is not therefore a perfect catalyst).
If you prefer, at the ultimate abstraction level, the definition becomes: information is a structure that makes a difference.
Phew, so what? And more importantly: does this generalisation really apply? I think I’d be able to apply it to all the domains I’ve mentioned above (and more), but for brevity’s sake, I won’t. Instead, I would like you (the reader) to challenge me by proposing an example where my definition doesn’t apply (if you can!). [Or, alternatively, tell me why this isn’t news at all]
Before concluding, I’ll address the “so what?” question instead.
The explicit aim of all this is to generate some useful conceptualisation, or, in other words, produce some way to understand reality that allows us to easily discern patterns that can then be recruited to generate predictions. In this context, what I’m proposing is immediately useful as it makes an otherwise problematic generalisation straightforward. The seminal intuition is a famous one, it is the hope that it must be possible to extend Darwinism to other fields, or the seemingly reasonable expectation that natural selection operates on all forms of information, not “just” genetic material. In terms of what I’ve outlined so far, this intuition becomes self-explanatory.
We start with a structure that makes a difference, something that has measurable effects on its environment, but not just on the basis of its basic ingredients: the effects occur specifically because of how these ingredients (or components) are combined (assembled) to form a specific structure. The transformation that said structure generates can have three possible effects on the structure itself: it can have no effect whatsoever (this is unlikely but possible, if it happens, we talk of perfect catalysis) or it could make the structure either more or less likely to persist. In some rare cases, the effect of the structure will indeed be that of making it more likely that the structure (let’s call it sA) will remain intact for longer, and therefore, the probability of finding said structure at time N will be higher when sA is present at time zero. The consequence is that, assuming that new instances of sA can appear out of pure chance, the probability of finding an instance of sA increases with time. A structure that has the effect of making itself durable will become more and more frequent, until this effect is reversed. This is the basis of the accumulation of information that we can observe on planet Earth. Furthermore, a structure (sB) that increases the probability of instantiating another “copy” of sB, will favour the creation of more and more instances of sB. This is classic natural selection, of the “selfish gene” (modern synthesis) sort. But if we accept the idea that all information is a structure that makes a difference, then both mechanisms will apply to ideas, knowledge, books, computations, websites, and much, much more.
In other words, the definition of information offered above, based on fundamental physical principles, makes it absolutely clear why and how natural selection operates universally, it applies to everything we intuitively recognise as “information-based”. It is therefore possible to expect that there will be some rules that actually do apply to all such fields. I am not saying that natural selection operates on the design of internal combustion engines in exactly the same way as it operates on genes; all I am saying is that it should be possible to discern and distil some common patterns, some general rules that, albeit instantiated in specific ways, apply to widely different domains. If true, that’s certainly useful enough for me. If false, I’ll be glad if you could show me why.