18 October 2007

will the best team win?

Will the Best Team Win? Maybe -- by Alan Schwarz, in the October 14 New York Times.

Basically, there's a very good chance that the best team does not win in the Major League Baseball playoffs.

On a related note, often before a playoff series sportswriters will make predictions of the form "[team] in [number of games]". In a best-of-(2n-1) series, if we know the probability p that a team (let's call them the Phillies) wins a game against some other team (as I've done before, let's call them the Qankees), then we can compute the probability that they'll win the series. Let q = 1 - p be the probability that the Qankees win a single game. Then the probability that the Phillies win in k games, where k is between n and 2n-1, can be obtained as follows. There are ${k-1 \choose n-1}$ arrangements of wins and losses in a k-game series that allow the Phillies to win in k games -- they must win n-1 our of the first k-1 games. Each of these occurs with probability pnqk-n. So the probability that the Phillies win in k games is
P_{n,k}(p) = {k-1 \choose n-1} p^n (1-p)^{k-n}

and likewise the probability that the Qankees win in k games is
Q_{n,k}(p) = {k-1 \choose n-1} (1-p)^n p^{k-n}
Q_{n,k}(p) = {k-1 \choose n-1} (1-p)^n p^{k-n}

(The number n is the number of games needed to win the series; in a best-of-seven series, n = 4.) For example, P4,6(.6) is the probability that the Phillies win a best-of-seven series in six games, given that they have a .6 probabilty of winning each games; it's ${5 \choose 2} (.6)^3 (.4)^2 = 0.3456$.

A prediction of the winner of a series and the number of games they win in amounts to a prediction of p. If we assume that the predictor simply predicts the most likely outcome of the series given what they believe p to be, then we want to find the largest of
P_{n,n}(p), \ldots, P_{n,2n-1}(p), Q_{n,n}(p), \ldots, Q_{n,2n-1}(p)
.
To do this, we start by finding the ratio $P_{n,k+1}(p)/P_{n,k}(p)$; this is k(1-p)/(k-n). If this is greater than 1, it means a win in k+1 games is more likely than a win in n games; it becomes greater than 1 at p=(n-1)/k. So, for example, as we decrease p, a five-game win in a best-of-seven series becomes more likely than a sweep when p = 3/4 (we have n=3 and k=4); a six-game win in a best-of-seven series becomes more likely than a five-game win when p = 3/5; a seven-game win becomes more likely than a six-game win when p = 3/6 = 1/2.

And in general, as we decrease p, a win in (2n-1) games becomes more likely than a win in (2n-2) games when p = (n-1)/(2n-2) = 1/2.

But when p dips below 1/2, that's also when losses should become more likely than wins!

In particular, the ratio between the Phillies' probability of winning in 2n-1 games and that of losing in 2n-1 games is Pn,2n-1(p)/Qn,2n-1(p) = p/(1-p); if p < 1-p then winning in 2n-1 games can't be the most common outcome. At best, winning in 2n-1 games is as probable as winning in 2n-2 games... when p = 1/2, and at that moment losing in 2n-1 and losing in 2n-2 have the same probability.

Concretely, in a best-of-seven series you should predict that the Phillies:

  • win in four, if p > 3/4;

  • win in five, if 3/5 < p < 3/4;

  • win in six, if 1/2 < p < 3/5;

  • lose in six, five, or four in cases symmetric to the three above.

If p = 1/2, then the probability of a win in six, win in seven, loss in six, or loss in seven are all the same, 5/32 each.

The point here is that either type of seven-game series is never the sole most likely outcome in this model (although it may be in reality, because games aren't independent -- home-field advantage, who's starting that day, and so on enter into the picture), and that it almost never makes sense to predict a sweep (playoff teams will be evenly enough matched that the worse one should be able to beat the better one more than one-quarter of the time).

Yet four- and seven-game series happen. I'm not saying that these are ridiculously rare events, just that it doesn't make sense to predict them a priori. It's a bit surprising, though -- if you actually played all seven games, 4-3 would be the most common outcome for series than are nearly evenly matched -- but enough of those come from the team already down 4-2 winning the last game that you don't see that in the best of seven format.

Realistically, though, a prediction of "[team] in 7" is just a sportswriter's way of signaling "I think this team is slightly better than its opponent", which is all it should be taken as.

2 comments:

  1. Do you have any idea if the actual distribution of 4, 5, 6 and 7 game series conforms to the idea that all the games are independent, or are 7 game series actually over-represented?

    ReplyDelete
  2. A very nice explanation.
    helped me a lot!

    ReplyDelete