18 October 2007

will the best team win?

Will the Best Team Win? Maybe -- by Alan Schwarz, in the October 14 New York Times.

Basically, there's a very good chance that the best team does not win in the Major League Baseball playoffs.

On a related note, often before a playoff series sportswriters will make predictions of the form "[team] in [number of games]". In a best-of-(2n-1) series, if we know the probability p that a team (let's call them the Phillies) wins a game against some other team (as I've done before, let's call them the Qankees), then we can compute the probability that they'll win the series. Let q = 1 - p be the probability that the Qankees win a single game. Then the probability that the Phillies win in k games, where k is between n and 2n-1, can be obtained as follows. There are ${k-1 \choose n-1}$ arrangements of wins and losses in a k-game series that allow the Phillies to win in k games -- they must win n-1 our of the first k-1 games. Each of these occurs with probability pnqk-n. So the probability that the Phillies win in k games is
P_{n,k}(p) = {k-1 \choose n-1} p^n (1-p)^{k-n}

and likewise the probability that the Qankees win in k games is
Q_{n,k}(p) = {k-1 \choose n-1} (1-p)^n p^{k-n}
Q_{n,k}(p) = {k-1 \choose n-1} (1-p)^n p^{k-n}

(The number n is the number of games needed to win the series; in a best-of-seven series, n = 4.) For example, P4,6(.6) is the probability that the Phillies win a best-of-seven series in six games, given that they have a .6 probabilty of winning each games; it's ${5 \choose 2} (.6)^3 (.4)^2 = 0.3456$.

A prediction of the winner of a series and the number of games they win in amounts to a prediction of p. If we assume that the predictor simply predicts the most likely outcome of the series given what they believe p to be, then we want to find the largest of
P_{n,n}(p), \ldots, P_{n,2n-1}(p), Q_{n,n}(p), \ldots, Q_{n,2n-1}(p)
To do this, we start by finding the ratio $P_{n,k+1}(p)/P_{n,k}(p)$; this is k(1-p)/(k-n). If this is greater than 1, it means a win in k+1 games is more likely than a win in n games; it becomes greater than 1 at p=(n-1)/k. So, for example, as we decrease p, a five-game win in a best-of-seven series becomes more likely than a sweep when p = 3/4 (we have n=3 and k=4); a six-game win in a best-of-seven series becomes more likely than a five-game win when p = 3/5; a seven-game win becomes more likely than a six-game win when p = 3/6 = 1/2.

And in general, as we decrease p, a win in (2n-1) games becomes more likely than a win in (2n-2) games when p = (n-1)/(2n-2) = 1/2.

But when p dips below 1/2, that's also when losses should become more likely than wins!

In particular, the ratio between the Phillies' probability of winning in 2n-1 games and that of losing in 2n-1 games is Pn,2n-1(p)/Qn,2n-1(p) = p/(1-p); if p < 1-p then winning in 2n-1 games can't be the most common outcome. At best, winning in 2n-1 games is as probable as winning in 2n-2 games... when p = 1/2, and at that moment losing in 2n-1 and losing in 2n-2 have the same probability.

Concretely, in a best-of-seven series you should predict that the Phillies:

  • win in four, if p > 3/4;

  • win in five, if 3/5 < p < 3/4;

  • win in six, if 1/2 < p < 3/5;

  • lose in six, five, or four in cases symmetric to the three above.

If p = 1/2, then the probability of a win in six, win in seven, loss in six, or loss in seven are all the same, 5/32 each.

The point here is that either type of seven-game series is never the sole most likely outcome in this model (although it may be in reality, because games aren't independent -- home-field advantage, who's starting that day, and so on enter into the picture), and that it almost never makes sense to predict a sweep (playoff teams will be evenly enough matched that the worse one should be able to beat the better one more than one-quarter of the time).

Yet four- and seven-game series happen. I'm not saying that these are ridiculously rare events, just that it doesn't make sense to predict them a priori. It's a bit surprising, though -- if you actually played all seven games, 4-3 would be the most common outcome for series than are nearly evenly matched -- but enough of those come from the team already down 4-2 winning the last game that you don't see that in the best of seven format.

Realistically, though, a prediction of "[team] in 7" is just a sportswriter's way of signaling "I think this team is slightly better than its opponent", which is all it should be taken as.


Mike said...

Do you have any idea if the actual distribution of 4, 5, 6 and 7 game series conforms to the idea that all the games are independent, or are 7 game series actually over-represented?

Javed said...

A very nice explanation.
helped me a lot!