Basically, there's a very good chance that the best team

*does not*win in the Major League Baseball playoffs.

On a related note, often before a playoff series sportswriters will make predictions of the form "[team] in [number of games]". In a best-of-(2n-1) series, if we know the probability

*p*that a team (let's call them the Phillies) wins a game against some other team (as I've done before, let's call them the Qankees), then we can compute the probability that they'll win the series. Let

*q*= 1 -

*p*be the probability that the Qankees win a single game. Then the probability that the Phillies win in

*k*games, where

*k*is between

*n*and 2

*n*-1, can be obtained as follows. There are arrangements of wins and losses in a

*k*-game series that allow the Phillies to win in

*k*games -- they must win n-1 our of the first k-1 games. Each of these occurs with probability p

^{n}q

^{k-n}. So the probability that the Phillies win in

*k*games is

and likewise the probability that the Qankees win in

*k*games is

Q_{n,k}(p) = {k-1 \choose n-1} (1-p)^n p^{k-n}

(The number

*n*is the number of games needed to win the series; in a best-of-seven series,

*n*= 4.) For example, P

_{4,6}(.6) is the probability that the Phillies win a best-of-seven series in six games, given that they have a .6 probabilty of winning each games; it's .

A prediction of the winner of a series and the number of games they win in amounts to a prediction of

*p*. If we assume that the predictor simply predicts the most likely outcome of the series given what they believe

*p*to be, then we want to find the largest of

.

To do this, we start by finding the ratio ; this is k(1-p)/(k-n). If this is greater than 1, it means a win in k+1 games is more likely than a win in n games; it becomes greater than 1 at p=(n-1)/k. So, for example, as we decrease p, a five-game win in a best-of-seven series becomes more likely than a sweep when p = 3/4 (we have n=3 and k=4); a six-game win in a best-of-seven series becomes more likely than a five-game win when p = 3/5; a seven-game win becomes more likely than a six-game win when p = 3/6 = 1/2.

And in general, as we decrease p, a win in (2n-1) games becomes more likely than a win in (2n-2) games when p = (n-1)/(2n-2) = 1/2.

But when p dips below 1/2, that's also when losses should become more likely than wins!

In particular, the ratio between the Phillies' probability of winning in 2n-1 games and that of losing in 2n-1 games is P

_{n,2n-1}(p)/Q

_{n,2n-1}(p) = p/(1-p); if p < 1-p then winning in 2n-1 games can't be the most common outcome. At best, winning in 2n-1 games is

*as probable*as winning in 2n-2 games... when p = 1/2, and at that moment losing in 2n-1 and losing in 2n-2 have the same probability.

Concretely, in a best-of-seven series you should predict that the Phillies:

- win in four, if p > 3/4;
- win in five, if 3/5 < p < 3/4;
- win in six, if 1/2 < p < 3/5;
- lose in six, five, or four in cases symmetric to the three above.

If p = 1/2, then the probability of a win in six, win in seven, loss in six, or loss in seven are all the same, 5/32 each.

The point here is that either type of seven-game series is

*never*the sole most likely outcome in this model (although it may be in reality, because games aren't independent -- home-field advantage, who's starting that day, and so on enter into the picture), and that it almost never makes sense to predict a sweep (playoff teams will be evenly enough matched that the worse one should be able to beat the better one more than one-quarter of the time).

Yet four- and seven-game series happen. I'm not saying that these are ridiculously rare events, just that it doesn't make sense to predict them

*a priori*. It's a bit surprising, though -- if you actually played all seven games, 4-3 would be the most common outcome for series than are nearly evenly matched -- but enough of those come from the team already down 4-2 winning the last game that you don't see that in the best of seven format.

Realistically, though, a prediction of "[team] in 7" is just a sportswriter's way of signaling "I think this team is slightly better than its opponent", which is all it should be taken as.

## 2 comments:

Do you have any idea if the actual distribution of 4, 5, 6 and 7 game series conforms to the idea that all the games are independent, or are 7 game series actually over-represented?

A very nice explanation.

helped me a lot!

Post a Comment