Say a coin is tossed 10 times, and each time it comes up heads. What is the probability of heads on the next toss? It might be tempting to say that the probability is low, since surely 11 heads in a row is extremely unlikely. But the correct answer is 50%.

Or is it? It turns out, it depends on what kind of statistics you rely on.

Mark Twain talked about three kinds of falsehood: lies, damned lies, and statistics. What he didn’t point out is that there are actually different kinds of statistics, and they sometimes give different answers! The traditional school, known as “frequentist” statistics, is based on the independence of events. The chance of heads on any toss is completely unrelated to the prior results. While the probability of 11 heads in a row is indeed extremely unlikely (about 1 in 2000), the chance of any one of those tosses being heads – even the last one after a string of other heads – is still 1 in 2. Yet it still feels counterintuitive. After all, I just saw 10 heads some up – how could there possibly be another?

An increasingly popular approach to statistics attempts to answer this. Bayesian statistics, named for the Reverend Thomas Bayes^{1}, does not assume that the probability of an event is completely independent of prior events. Instead, the expected probability of an event incorporates other known information, including other results up to that point. In this case, if I am asked to estimate the chance of an 11^{th} head, I would look at the string of 10 heads in a row and reasonably wonder if perhaps this is not a fair coin. The answer to that question, in turn, would be based on other information. How well do I know the person tossing the coin? Did I have a chance to examine it beforehand? If there is good reason to believe that the coin may in fact be biased, then I would have to conclude that the probability of a head coming up on the next toss is indeed higher than 50%. Which is what it feels like intuitively.

While frequentist statistics is what is most commonly taught, most of us in reality behave like Bayesians. We don’t simply ignore the string of 10 heads as being irrelevant. While a frequentist would assume the coin is fair, the Bayesian at least asks the question when confronted with evidence that it might not be.

This shouldn’t be an excuse for complete subjectivity. A true Bayesian approach is just as analytic and quantitative as a frequentist one, and in practice can be more complicated. But it does more closely mirror the way our minds actually work. Most physicians are Bayesians when it comes to diagnostic decision-making. Here’s an example. I see a child with abdominal pain, and am concerned about appendicitis. At first, all I know is the age and gender – say, an 11 year old boy. I know that approximately 10% of all 11 year old boys who come to the ER for belly pain will have appendicitis. That seems high enough to worry about, but not high enough to go ahead and remove his appendix just yet. So I go ahead and examine him. He has a Pediatric Appendicitis Score of 3. According to the research, half of patients with appendicitis would have a score of at least 3, and only 17% of the patients without appendicitis have a score that high. Using Bayes’ theorem (look it up if you want, but trust me on the math), I can revise my estimate for the probability of this patient of having appendicitis, knowing not only that he is an 11 year old boy, but an 11 year old boy with a PAS of 4. His chance of appendicitis is no longer 10%, it is 25%, high enough that I should probably not just ignore it but do further tests. Based on the results of those tests, I would again update my estimate of probability upward or downward. On the other hand, with a score of 1, this boy’s chance goes from 10% to less than 2%, and I can reassure the family that we do not need to worry about it unless something changes.

This example illustrates something that may be surprising to non-medical professionals, who may think that tests can tell you whether someone does or does not have a disease. This is almost never the case. Every test can have false positive or false negative results. They are simply one more piece of information that must be interpreted in light of what else we know. A positive test generally increases the likelihood that someone has the condition we are checking for. A good test will indicate a high enough probability to take action, while a not-so-good test leaves us sufficiently uncertain that we need more information. And there are a lot of not-so-good tests out there.

Perhaps the biggest challenge to Bayesian analyses is the need for prior information. Sometimes this might be based on good research or our own prior experience. However, we must often make an educated guess. In those cases, our judgment may be biased by many factors, including what I have previously referred to as availability bias (the tendency to be overly influenced by recent experience or information.) A child comes to the emergency department with a fever. His mother recently returned from Africa. What is the chance the child has Ebola? Your first reaction might be a small but measurable number, say, a 1% or even a 5% chance. In reality, we know very little, for example, whether or not the mother has been in a part of Africa affected by Ebola, when she was there, and whether she has had any symptoms and could therefore have transmitted the disease to the child. Based on what we do know, our highest possible estimate (assuming she had fever and that her travel was in the past 21 days) would be the number of known Ebola patients in Africa (about 10,000) divided by total population of Africa (a little over a billion), or 0.001%. If we found out, for example, she had been in Guinea, we would change our estimate to a higher chance, while if she had been in South Africa it would be far lower.

And Mark Twain never heard of Ebola.

[1] An 18^{th} century Presbyterian minister in England. The apochryphal story is that he developed his theorem in an effort to prove the existence of God.

[…] toward indicators focused on underuse of resources rather than overuse. Public perception that diagnosis is more precise than it really is, coupled with an intuitive sense that it must be better to detect disease, as another factor. But […]