Probability Puzzle: Monty Hall Problem

The Monty Hall Problem

The Monty Hall problem has been a probability puzzle for about 40 years. This problem is a great example of how probability thinking is against human nature. Human brain is the finest product of evolution; it helps us survive in natural environment during hunting and gathering era; but it has not evolved quickly enough to generate intuition for non-natural concepts such as probability.

In Monty Hall problem, a guest stands in front of three closed doors, behind which are two goats and a car. The guest picks a door (say door A), hoping for the car of course. Monty Hall, the host, examines the other doors and always opens one of them with a goat. Then, he will ask the guest: Do you stick with door A (original guess) or switch to the other unopened door? Does it matter?

The web contains plenty of materials explaining the details. YouTube has many videos explaining the problem. For example, this video explains it in an intuitive way and is easy to follow. Other videos explain it using total probability, conditional probability, and Bayes’ theorem. The Wikipedia page for the Monty Hall problem describes its history and various solutions.

This article intends to explain the Monty Hall problem using a variant of Bayes’ theorem. It involves a few equations. Once you understand the symbols and notations, you will have a deeper understanding of the problem and the Bayesian way of thinking.

Bayes’ Theorem

Typical Bayes’ theorem has the following form:

Pr(H|D) = ( Pr(H) × Pr(D|H) ) / ( Pr(H) × Pr(D|H) + Pr(-H) × Pr(D|-H) ) — (1)

where Pr(H|D) represents the conditional probability of event H given event D. Typically, H represents a hypothesis, a fancy way of saying educated guess. -H represents the complementary event of H, meaning the hypothesis is false. D represents the data or evidence, typically an event or observation that is known and helps to judge hypothesis H. Pr(H|D) is the probability of H being true if D is observed; Pr(D|H) is the probability of D will be observed if H is true; Pr(D|-H) is the probability of D will be observed if H is false. Pr(H) and Pr(-H) are the probabilities of H is true or false regardless of the knowledge of observed data, typically referred as a priori probability. Putting these symbols together, Equation (1) tries to make a judgement of whether H is true given D is observed using the knowledge of a priori probabilities and the effects of hypothesis on the evidence D.

The complementary equation of judging whether H is false given D is observed is as follows:

Pr(-H|D) = ( Pr(-H) × Pr(D|-H) ) / ( Pr(H) × Pr(D|H) + Pr(-H) × Pr(D|-H) ) — (2)

This equation can be obtained by simply replacing H by -H in Equation (1).

Note the right hands of Equations (1) and (2) have identical denominator, the total probability Pr(D). If the two equations are divided by each other, the following equation is obtained:

Pr(H|D) / Pr(-H|D) = Pr(D|H) / Pr(D|-H) × Pr(H) / Pr(-H) — (3)

The left hand side of Equation (3) represents the odds of whether H is true or false given evidence D. The right hand side contains two parts. Pr(D|H) / Pr(D|-H) is the ratio between the probability of evidence D being observed given the hypothesis is true or false. Pr(H) / Pr(-H) is the ratio of the prior probabilities.

Bayesian Solution

So how do all these crazy symbols help us understand the Monty Hall problem? In Monty Hall problem, when an initial door is chosen, say door A, a hypothesis H is made that the car is behind door A.

H: car is behind door A.

Then Monty Hall opens a door that does not have a car, say door B. This is evidence D.

D: car is not behind door B.

The question is: given this new evidence D, would you like to stick with door A or switch to the unopened door C?

Since there are only two doors left, switching to door C is assuming the complementary hypothesis:

-H: car is behind door C.

So the question is equivalent to: given evidence D, which of the two hypotheses has higher probability, H or -H. Using symbols discussed above, these two probabilities are Pr(H|D) and Pr(-H|D), which are the two terms on the left hand side of Equation (3).

On the right hand side of Equation (3), there are four terms. Pr(D|H) is the probability that Monty Hall opens door B if car is behind door A. Since neither of door B and C has car, he would randomly open one of the two. Therefore, Pr(D|H) = 0.5.

Pr(D|-H) is the probability that Monty Hall opens door B if car is behind door C. Since Monty does not have a choice but opening door B, Pr(D|-H) = 1.

Pr(H) and Pr(-H) are the probabilities for the initial guess. Since there are three doors, Pr(H) = Pr(-H) = 1/3.

Now, putting all these terms together in Equation (3), we have:

Pr(H|D) / Pr(-H|D) = 0.5 / 1 × (1/3) / (1/3) = 1/2

Therefore, -H is twice as likely as H. You should switch to door C.

The result appears to be counter intuitive. Nothing has changed on door A and C; before door B is opened, door A and C have same 1/3 probability; after door B is opened, door C is twice likely to have the car. Why opening door B makes switching to door C a better choice?

The key is: Monty Hall knows which door has the car. By opening door B, he provides new information to the guest. As we have seen above, Pr(D|H) = 0.5 but Pr(D|-H) = 1. The differences of these two probabilities are the information contained in the evidence that door B is opened. This additional information leads to the result that door C is twice likely has the car.

The Bayesian Way of Thinking

The Monty Hall problem is an example that shows the power of Bayes’ formula. Its essence is to use evidence to update the prior probabilities. This section describes some of the distinctive features of the Bayesian way of thinking.

Use Evidence to Update Knowledge

In the Monty Hall problem, one common trap is the ignorance of the information that Monty has brought by opening door B. As you can see in the derivation above, the two hypotheses have different probabilities leading to this evidence. The Bayes’ formula uses this evidence, adjusts the prior probabilities, and provides the updated posterior probabilities of the hypotheses. This Bayesian way of thinking resembles the evidence-based scientific method: The test of all knowledge is experiment and observation. Evidence is the “sole” judge of scientific truth. When evidence arises, we should adjust our beliefs on the hypotheses accordingly.

Recursive Update Process

Bayes’ formula can be used multiple times when updating posterior probabilities. Every time new evidence is obtained, the knowledge on the hypotheses can be updated; the updated posterior becomes the prior probability in the next update. As more and more evidence arise, the posterior probability approaches the truth regardless of the initial prior probability. Nate Silver called this updating and approximation process as “less and less and less wrong” [1]. Similar to the process of patching a software package based on test and bug findings, posterior probabilities are updated each time new evidence arises.

Base Rate Included

The Bayes’ formula also reveals the role of prior when making a judgement. When faced with single evidence, people may neglect the prior and solely rely on the current evidence. For example, a person is tested positive with a rare disease. The test can correctly identify 99% all person with the disease and produce incorrect result on 1% of people without the disease. Many people would think the person would have 99% chance having the disease. However, Bayes’ formula needs the prior, the base rate of the disease, to make correct judgement. If the disease occurs only in 0.1% of the population, Equation (1) gives the following result:

Pr(H|D) = (0.001 × 0.99) / (0.001 × 0.99 + 0.999 × 0.01) = 9%

So the chance that person has the disease is less than 10%. In psychology, this fallacy of neglecting of the prior information is called Base Rate Neglect [2]. The Bayesian thinking specifically includes base rate information and provides the best way to avoid this psychological fallacy.

Proven Method for Forecasting

If you want to forecast the future, Bayesian thinking is your best secret weapon. Nate Silver, the editor of FiveThirtyEight website, is a forecaster in action; Philip Tetlock, a professor of psychology and political science, has studied forecasting for over 30 years. In their books [1, 3], both authors advocate the Bayesian thinking as the key for forecasting.

Summary

This article uses the Monty Hall problem as an example to explain the Bayes’ formula and the Bayesian way of thinking. It shows how counter intuitive probability can be and how the Bayesian thinking can help to make better judgement.

Reference

[1] Signal and Noise: Why Most Predictions Fail – but Some Don’t. Nate Silver, New York: The Penguin Press, 2012, Ch. 8.

[2] Thinking: Fast and Slow. Daniel Kahneman, New York: Farrar, Straus and Giroux, 2013, Ch. 14.

[3] Superforecasting: The Art and Science of Prediction. Philip E. Tetlock and Dan Gardner, New York: Crown Publishers, 2015.