Reading E. T. Jaynes
Edwin Thompson Jaynes was a 20th-century physicist who wrote several texts on probability theory, espousing what some would describe as a very strongly opinionated view on the correct interpretation of probability. As Jaynes sees it, all randomness is just the uncertainty that exists in the eye of the beholder. No physical process, from coin flips to card draws to quantum measurement, is intrinsically or objectively random.
In other words, “the chance of the coin landing heads is 50-50” is a property of your state of knowledge about the coin flipping process, not a property of the coin itself. This view is known as Bayesianism, and while it is becoming more widely (if not universally) accepted these days, Jaynes is still willing to go further than most in following it to its conclusions.
Not just anybody can write about mathematics with the fire and fury of a religious zealot. And while I think Jaynes does tend to spend a little too much of his time and energy launching into passionate tirades against his ideological opponents, I have to admit that it does make for more entertaining reading than the average textbook.
Regardless, I find Jaynes’ arguments entirely logical and persuasive. Furthermore I think that Jaynes’ hardline-Bayesian take not only makes probability and statistics far more clear and intuitive than any other, it also unlocks deep connections between other fields. Any time I read Jaynes, I find my confusion dissolving into crystal clarity. As such, in my writing I take an unabashedly Jaynesian viewpoint.
Here are some of the conventions that I am adopting from Jaynes:
Probabilities range between 0 and 1, expressed as P(X | I), which can be read as the probability that proposition X is true, given that information I is true. The propositions are on the left of the vertical bar, and the conditionals (priors) are on the right. (X | I) is an equivalent shorthand notation.
“Explicit is better than implicit” — The Zen of Python. All probabilities must explicitly acknowledge in their notation the existence of some prior information. Without a prior, the assignment of probability is impossible. As such, if there is no vertical bar, it’s not a properly-formatted probability expression.
As such, Bayes’ rule can be expressed as (A | B, I) = (A | I) (B | A, I)/(B | I).
I think this notation already clears a lot of confusion that students of probability encounter. Conventional textbooks write Bayes’ rule as P(A|B) = P(A) P(B|A)/P(B), which immediately invites the question: “What on earth is P(A)?”. The way it’s formatted, it looks like it refers to some ‘pure’ probability of A, when it really just means the prior. By explicitly including the prior information in every expression, it becomes clear that (A | B, I) is merely a more informed probability than (A | I).
Jaynes’ book, Probability Theory: The Logic of Science is his most well-known work, but he has many others that are well worth reading. Almost all of his published scientific writings are available free online, courtesy of Dr. G. Larry Bretthorst.
From time to time, I’ll post and comment on excerpts from his papers that I think are particularly interesting or insightful.
To start, though, here’s a fun example. Jaynes makes a great case for using a decibel scale to reason about the probabilities of things that one is absolutely certain of.
... it is very cogent to give evidence in decibels. When probabilities approach one or zero, our intuition doesn't work very well. Does the difference between the probability of 0.999 and 0.9999 mean a great deal to you? It certainly doesn't to the writer. But after living with this for only a short while, the difference between evidence of plus 30 db and plus 40 db does have a clear meaning to us. It's now in a scale which our minds comprehend naturally... In the original acoustical applications, it was introduced so that a 1 db change perceptible to our ears. With a little familiarity and a little introspection, we think that the reader will agree that a 1 db change in evidence is about the smallest increment of plausibility that is perceptible to our intuition.
[...] What probability would you assign to the hypothesis that Mr Smith has perfect extrasensory perception? ... To say zero is too dogmatic. According to our theory, this means that we are never going to allow [our] mind to be changed by any amount of evidence, and we don't really want that. But where is our strength of belief in a proposition like this?
[...] We have an intuitive feeling for plausibility only when it's not too far from 0db. We get fairly definite feelings that something is more than likely to be so or less likely to be so. So the trick is to imagine an experiment. How much evidence would it take to bring your state of belief up to the place where you felt very perplexed and unsure about it? Not to the place where you believed it -- that would overshoot the mark, and again we'd lose our resolving power. How much evidence would it take to bring you just up to the point where you were beginning to consider the possibility seriously?
So, we consider Mr Smith, who says he has ESP, and we will write down some numbers from one to ten on a piece of paper and ask him to guess which numbers we've written down. We'll take the usual precautions to make sure against other ways of finding out. If he guesses the first number correctly, of course we will all say 'you're a very lucky person, but I don't believe you have ESP'. And if he guesses two numbers correctly, we'll still say 'you're a very lucky person, but I still don't believe you have ESP'. By the time he's guessed four numbers correctly -- well, I still wouldn't believe it. So my state of belief is certainly lower than -40 db.
How many numbers would he have to guess correctly before you would really seriously consider the hypothesis that he has extrasensory perception? In my own case, I think somewhere around ten. My personal state of belief is, therefore, about -100 db. You could talk me into a +-10db change, and perhaps as much as +-30 db, but not much more than that.
— E. T. Jaynes (in Probability Theory: The Logic of Science)
At the time Jaynes wrote this example, the scientific community may still have been considering (if extremely skeptically) the plausibility of ESP as a scientific phenomenon. I think that today we’d probably assign an even lower prior, and each successful guess would do more to challenge our confidence in our anti-cheating precautions than our confidence in the non-existence of ESP.
But while there might be better examples to use, and I am generally against decibels, there is no arguing with the effectiveness of Jaynes’ method. We need to be prepared to update even our most confidently-held beliefs in the face of evidence, and yet, it’s really hard to reason about extreme confidence using the typical (0-1) probability scale. In common language, someone expressing total confidence might say they’re “100% certain”, but if they really felt that way they would be totally incapable of handling the situation where they’re wrong. The next step down from certainty is usually to say that one is “99% certain”, but for a lot of real-world situations, that’s not nearly sure enough.
So maybe Jaynes is right, and just this once, decibels are the most intuitive scale to use.
At least, I can’t be 100% sure they aren’t.
This post isn’t intended as professional engineering advice. If you are looking for professional engineering advice, please contact me with your requirements.