# Confidence games – Justin Domke

Say you’re developing a new anti-cancer drug. You apply it to some cell line, draw 40 random cells, and manually inspect them. You find that the drug changed 16 of the 40, suggesting the drug is around 40% effective. But of course, this is just an estimate. So you plug the numbers into an online calculator which tells you something like this:

With 90% confidence, the true fraction is between 26.9% and 54.2%.

OK, but what does this actually mean? Here “confidence” is a technical work, with a precise (and somewhat subtle) meaning. Almost every scientist interacts with confidence intervals, but surveys show the vast majority don’t fully understand them (Hoekstra et al., 2014, Lyu et al. 2020).

I have a theory: Confidence intervals can be explained—*really* explained—in a fully conceptual way, without using any math beyond arithmetic. The core idea to first look at confidence *sets* in cases where everything is discrete. In these cases, everything can be laid out in a table, and the core difficulty just making sure you don’t confuse the rows and the columns.

The year is 2052. The youth have grown tired of their phones and now only care about probability and game theory. You work at a carnival. One day your boss comes over and says, “*Listen, we don’t have enough capacity on our new ride. To pander to the kids and their damn probabilities, we’re going to make a game, where winners can go on the ride. You’re going to run this game.*

Your boss outlines some (slightly strange) rules for how the game is supposed to work:

- The guest picks one of a few 4-sided dice, each of which has a different color and weight distribution.
- The guest will roll that die, and the outcome (⚀, ⚁, ⚂, or ⚃) is announced.
- Based on that outcome, you need to guess some set of colors.
- The true color of the die is revealed. If it’s
*not*in the set of colors, the guest can go on the ride.

For example, say that after the die is rolled you guess {red, blue}, and the true color turns out to be red. Then the guest wouldn’t get to go on the ride. If you’d guessed {green} and the true color were blue, the guest would get to go on the ride

Your boss stresses two things: First, the ride only has capacity for 30% of the guests. Second, you should keep your sets as small as possible (to keep things interesting).

Here are the six dice, which the carnival’s lab has helpfully CT scanned and run simulations on to calculate the true probability that each die will come up with each side:

Die | ⚀ | ⚁ | ⚂ | ⚃ |
---|---|---|---|---|

red | .7 | .1 | .1 | .1 |

green | .1 | .7 | .1 | .1 |

blue | .1 | .1 | .7 | .1 |

yellow | .4 | .3 | .2 | .1 |

white | .1 | .2 | .4 | .3 |

What should your strategy be? You reason as follows:

- For each outcome (⚀, ⚁, ⚂, or ⚃) you need to choose some set of colors. So choosing a strategy is equivalent to choosing some subset of entries in the above table.
- The guests will share information and use their cursed game theory to maximize their chances. If any color gives them a better chance to get on the ride, they’ll all pick that color. So, it’s necessary that for each row, the sum of probabilities in the columns you include must add up to at least .7.

The obvious choice is the following strategy, where the included entries are bold.

Die | ⚀ | ⚁ | ⚂ | ⚃ |
---|---|---|---|---|

red | .7 | .1 | .1 | .1 |

green | .1 | .7 | .1 | .1 |

blue | .1 | .1 | .7 | .1 |

yellow | .4 | .3 | .2 | .1 |

white | .1 | .2 | .4 | .3 |

Or, you can picture your strategy as a list of outcomes, and what you guess for each. These guesses are called **confidence sets**.

Outcome | What you guess |
---|---|

⚀ | {red, yellow} |

⚁ | {green, yellow} |

⚂ | {blue, white} |

⚃ | {white} |

What can we say about this? You have the guarantee your boss asked for: No matter what color die the guest chooses, the set you guess will include the true color 70% of the time.

THAT’S ALL WE CAN SAY. THAT AND NOTHING MORE. Say the guest chooses a die and rolls ⚁. You might be tempted to say “With 70% probability, the true color is either green or white.” Wrong. I know from experience that many people don’t want to accept this and that after being told this is wrong they look for a way out, a way to escape this harsh reality. Stop looking. You cannot talk about the probability of the die being a given color because *the guest already chose it*. It’s a fixed quantity, you just don’t happen to know what it is.

In the worldview of confidence sets, it’s nonsensical to talk about probabilities of fixed unknown things. That would be like talking about the “probability” that George Washington was born in 1741, or the “probability” that France is larger than Germany. These things don’t have probabilities because they aren’t repeatable random events. To be sure, many people are actually fine with using probabilities like that (they’re called subjectivists) but *by definition*, confidence sets don’t use probability like that. If you want to roll that way, then you’re a Bayesian (congratulations!), but you’ll still want to understand confidence when other people talk about.

Still don’t believe me? Say you’re OK with subjective probabilities and say that guests choose dice randomly so that the prior probabilities of colors are the same. Now suppose the guest rolls ⚃. Would you be tempted to say that there is a 70% chance the true color was white? In this situation, the posterior probability of each color is proportional to the chance that color rolls ⚃. Look at the right column of the big probability table. There’s more ways to get ⚃ via other colors than via white. Dividing by the sum of all entries in the table, the posterior probability of white is 3/8 =.3 / (.1 × 5 + .3). It’s not even half! In the same way, we can calculate the posterior probability that the confidence set contains the true color for each outcome by summing up the bold entries in each column and dividing by the sum of all the entries:

Outcome | Prob guess includes true color |
---|---|

⚀ | 11/14 = 0.785 |

⚁ | 10/14 = 0.714 |

⚂ | 11/15 = 0.733 |

⚃ | 3/8 = 0.375 |

Sometimes it’s higher and sometimes it’s lower, but it’s never 70%. And remember, this is assuming that guests choose dice completely at random, which they *don’t*.

So, while it’s tempting to talk about probabilities, we can’t. What do we do instead?

If we wanted to be as confusing as possible, we’d pick a word that in English *sounds* like it means probability, and then we’d use it a sentence as if it *was* probability, but the whole time we’d be referring to a completely different concept that *absolutely is not a probability*. Well, umm, that’s what we do: Given a roll of ⚁, we say:

“With 70% confidence, the true color is either green or white.”

This sentence is designed to mislead you. It does not mean anything similar to what it would mean as a normal English sentence. It is just a *shorthand* for this:

“We have a procedure that maps dice rolls to sets of colors. We’ve designed the procedure so that, if we roll any of the dice millions of times and compute the corresponding sets of colors, at least 70% of those sets will contain the true color. For the dice roll we observed

in this particular instanceour procedure maps the outcome to {green, white}.”

Say that the carnival opens for the day, and we carefully record everything that happens.

Guest | True color | Outcome | Guess |
---|---|---|---|

Amy | red | ⚀ | {red, yellow} |

Bob | yellow | ⚂ | {green, yellow} |

Carlos | green | ⚁ | {green, yellow} |

… | … | … | … |

Zander | white | ⚃ | {white} |

All we can guarantee is that, in the long run, 70% of the guesses will contain the true colors. We can’t say anything about the particular probability in any particular row because:

- Working with confidence sets means acceptance of a worldview in which it’s meaningless to talk about subjective probabilities of things that have already happened, just because you happen not to know what those things are.
- Even if you were willing to talk about subjective probabilities, you can’t do it because you’d need a prior distribution over the different colors.
- Even if you have a prior distribution over the colors and calculate the probabilities, they might be much larger or smaller than 70%.

When talking about confidence, we give no guarantee—none!—about what the actual true color is in any particular instance. All that we guarantee is that in the multiverse of different worlds that branched out when the die was rolled, the true color is in the set in 70% of them. But you’re in one *specific* universe that may or may not be in that 70%.

This leads us to what I think is the core confusion about confidence sets:

**The guarantees run over the rows of the big probability table, not over the columns.**

We only make guarantees about what happens if you run the experiment many times, i.e. for the *rows*. We guarantee nothing about the true color for any outcome, i.e. for the *columns*. That’s quite annoying. I think part of what makes it confusing is that’s it’s so different from what people actually want. You probably want to know what the color is in *this* world—why are we talking about other branches of the multiverse? Well, because that’s because all we can do.

*Still* tempted to think about confidence as probabilities? Here’s one last illustration. We could have used a different strategy, where for white we include ⚀ and ⚁ instead of ⚃:

Die | ⚀ | ⚁ | ⚂ | ⚃ |
---|---|---|---|---|

red | .7 | .1 | .1 | .1 |

green | .1 | .7 | .1 | .1 |

blue | .1 | .1 | .7 | .1 |

yellow | .4 | .3 | .2 | .1 |

white | .1 | .2 | .4 | .3 |

This is a “worse” strategy in the sense that we tend to have larger sets that won’t impress the guests. But we still include .7 probability from each row, so it’s still valid.

Here are the corresponding confidence sets:

Outcome | What you guess |
---|---|

⚀ | {red, yellow, white} |

⚁ | {green, yellow, white} |

⚂ | {blue, white} |

⚃ | {} |

Now suppose we play the game, and the outcome is ⚃. Then we can say:

“With 70% confidence, the true color is nothing.”

This is perfectly valid! Obviously, the true color is *never* nothing. But we are allowed to say this because we are using a procedure in which the above statement doesn’t occur very often. When you talk about “70% confidence” you promise that *most* of the statements you make are true, but you can be completely, arbitrarily wrong 30% of the time.

**Published**