27 September 2007

Probabilities in the wild-card race

A quick question: what's the probability that the four teams from a given league of Major League Baseball which make it to the playoffs are actually the best four teams in that league?

For those who aren't familiar with MLB's playoff structure -- there are two leagues, the National and the American. The National League has sixteen teams, divided into divisions of five, six, and five; the American League had fourteen teams, divided into divisions of five, five, and four. (If you're wondering why they don't have fifteen teams in each league, it's for scheduling reasons. Baseball teams play nearly every day, and don't play teams in the other league except during certain prescribed times of the season; if there were an odd number of teams in a league then one team would always be idle.) For the purposes of not wanting to deal with a zillion special cases, what follows will actually be a computation for a fifteen-team league, divided into three five-team divisions.

In each league the team with the best win-loss record in each division makes the playoffs; furthermore the best team among teams that didn't win its division also makes the playoffs, as the "wild card", for a total of four teams in each league.

I'll make the simplifying assumption that a team's record accurately reflects its skill. This is quite an assumption. First, a team that is exactly average should win half its games, or 81 games out of a 162-game season; but the games are independent and so the standard deviation of the number of games they win is ((162)(0.5)(0.5))1/2, or about 6.4. Baseball teams tend to be tightly bunched enough that that's a pretty big measurement error. Second, teams face different schedules, mostly because they play about 18 games against each team in their division and about 6 games against teams not in their division but in the same league. Third, we'll assume that good teams have no propensity to be in a particular division.

Thus, we can model the problem as follows: partition the set {1, 2, ..., 15} into three blocks of five, uniformly at random. (1 represents the best team, 15 the worst.) What is the probability that two of the numbers {1, 2, 3, 4} are in one block and each of the remaining two is in a different block?

For example, consider the 2001 American League. (Why? There were no ties in the 2001 AL. I don't want to have to figure out to resolve ties.) The best team were the Seattle Mariners, with 116 wins, in the West Division; we put a 1 in a box corresponding to that division. The second best team was the Oakland A's, also in the West Division; we put a 2 in the same box as the 1. The third-best team was the New York Yankees, in the East; we put a 3 in a different box. The fourth-best team was the Cleveland Indians, in the Central; we put a 4 in the box which is still empty. And so on; in the end we get the partition

East = {3, 7, 8, 13, 14}
Central = {4, 5, 6, 11, 12}
West = {1, 2, 9, 10}

and this is the sort of event we're looking for. (Note that "the best four teams make the playoffs" is different from "the wild card comes in fourth"; in the 2001 AL the wild card was the A's, who came in second.)

Anyway, the probability we seek is not hard to compute. First, place the first team in some division; they have a 1/3 chance of being in any division. Call the division they land in A.

Then the second team also must be in some division. There are 14 "slots" left, four of them in division A. So with probability 4/14 the best two teams are in the same division, A; with probability 10/14 one is in A and the other is in B. (The 2001 AL is of the first type.)

Now, we place the third team.

Say teams 1, 2 were both in A; then with probability 3/13 team 3 is also in A (and we lose) and with probability 10/13 team 3 is in some other division, called B. So with probability (4/14)(10/13), teams 1 and 2 are in A and team 3 is in B.

Say team 1 is in A and team 2 is in B, which occurs with probability 10/14; then with probability 8/13 team 3 is in a division that already has a team in it (A or B) and with probability 5/13 it is not.

So after placing three teams, the probability is (4/14)(10/13) + (10/14)(8/13) = 120/182, or nearly two-thirds, that two are in the same division and one is in a different one. The probability is (10/14)(5/13) = 50/182 that all three are in different divisions.

Finally, if we're in the first of those cases, then we need to place the fourth team in the division which contains none of the first three; there are five slots for that team out of twelve, so that contributes (120/182)(5/12) = 600/2184 to the probability. If we're in the second case, we can place the fourth team freely, contributing (50/182)(12/12) = 600/2184. So the probability that the four best teams are the three division winners and the wild card is 1200/2184, or about 55 percent.

The probability that the three division winners are the best teams and the wild card is fourth-best is exactly half the probability that the three division winners and the wild card are collectively the four best teams, which is not necessarily something you would have expected.

In a league with three n-team divisions, as n goes to infinity, the corresponding probability is slightly lower, I believe 4/9.

Since the introduction of the wild card, there have been twelve full seasons played (1995-2006) in each league, for a total of twenty-four league-years. In fifteen of those leagues, the four teams with the best records made the playoffs. In two (1996 NL and 1998 AL), there was a tie for fourth-best in the league, and the three best teams and one of the two fourth-best teams made the playoffs. In the other seven, something else happened. So two-thirds of the time in reality, the four teams with the best records have made the playoffs; that differs from my prediction of 55 percent, but not by much.

What happens "in general"? I don't know. This analysis was all about counting up different cases, which is something I wouldn't want to do in general (I only did it because this particular case was one I was interested in). Finding probabilities like this in general will take a bit more cleverness.

No comments: