20 June 2007

The Phillies defeat the Qankees in nine games?

Scott Boras is pushing for a nine-game World Series, with the first two games to be played at a neutral site. This gives me a reason to post the following, which I wrote a couple weeks ago, about how good the World Series is at identifying the "best" team. A best-of-nine World Series wouldn't be much of an improvement over a best-of-seven World Series, in this regard. Let's say one of the two teams in the World Series has a 55% chance of winning over the other in any given game; they'd have about a 61% chance of winning a seven-game series, and a 62% chance of winning nine. It would bring more money. It would also lengthen the season by, say, another three days (two playing days and a travel day), and as many people pointed out in April the season is already too long for an outdoor sport. Game 7 of this year's World Series, if it happens, will happen on Thursday, November 1. Do we really want November baseball to be a routine event?

(Also, Boras likens these first two, neutral-site games to the Super Bowl. Which makes me wonder -- has the Super Bowl ever been played at a site that wasn't actually neutral, because it just happened that the host team did well that year?)

Anyway, on to the math.

In the World Series of baseball, two teams P and Q play against each other until one has won four games; this team is declared the champion. If team P has probability p of winning each individual game, and the games are independent, what is the probability that team P wins the series?

(From here on out, I will call the two teams the Phillies and the Qankees. "Phillies" because, well, I actually want them to win; Qankees because Qankees is fun to say. It's pronounced quan-keys, like how a little kid would say "cranky".)

This problem is often posed in introductory probability texts (in fact, a couple days ago I showed a student I'm tutoring how to do it), and the solution those texts have in mind runs something like this. If the Phillies win, they do it in four, five, six, or seven games. The probability of them winning in four games is clearly p4. The probability of them winning in five games is 4p4q, where q = 1-p. This is because there are four ways for the Phillies to win in five games -- we just pick which of the first four games they lose. Similarly, the probability of the Phillies winning in six is 10p4q2 -- the number "10" is the number of ways we can pick two games out of the first five for them to lose. And they win in seven with probability 20p4q3.

Thus, the total probability of the Phillies winning is p4(1 + 4q + 10q2 + 20q3.) If we remember that q = 1-p, we get that this is p4(35 - 84p + 70p2 - 20p3). Plugging in numbers, we see, for example, that if the Phillies have a 55% chance of winning each individual game, they have about a 60.8% chance of winning the entire series, and that the Qankees would then be throwing temper tantrums because they didn't get their twenty-seventh championship. (How you would know they have a 55% chance of winning each game is a different story.)

But if you're designing a system of playoffs, this polynomial isn't that interesting. It seems to me that you can basically sum up the whole problem in a single number. You can view the playoffs as a probabilistic algorithm for determining which team is better. (They're a pretty weak probabilistic algorithm, at least when the teams are fairly evenly matched.) The natural question to ask seems to be: if the Phillies have a probability 1/2 + ε of winning a single game, what's the probability of them winning the series? (This turns out to be a sneaky way to compute the derivative of the above polynomial at p = 1/2.) So we let p = 1/2 + ε in the above computation, and -- here's the trick -- we act as if ε2 = 0. Then the Phillies' probability of winning is

(1/2 + ε4) (1 + 4(1/2 - ε) + 10(1/2 - ε)2 + 20(1/2 - ε)3)

which simplifies to 1/2 + (35/16)ε. I'll call this coefficient 35/16 the amplification of this playoff system.

Doing the same computation for different series lengths gives:




Number of wins needed2345681216
Amplification3/215/835/16315/128693/2566435/2048

(as decimal)1.501.882.192.462.713.143.874.48


The amplification is very nearly the square root of the number of wins needed. I'd bet it's 2/√π times the number of wins -- the constant seems to work out, and π seems to occur a lot in these types of problems, because in the end we have to compute factorials and Stirling's approximation is a very good approximation to factorials that includes π. (A confession: the numerical work actually comes from computing the probabilities in the first way given above, because I have a fast computer at my disposal and didn't want to figure out how to program the second solution.)

What does this tell us, then? It tells us that to get the amplification to be twice as good, we have to play four times as many games! This is something that occurs pretty often -- political opinion polling, for example, follows the same principle. To get the "margin of error" down from the standard 3% to 1.5% requires polling four thousand people instead of one thousand.

A principle that occurs fairly often in randomized algorithm design is that if you have an algorithm that gives you a correct answer with probability greater than 1/2, then you can run the algorithm repeatedly and be more confident in its results. This is actually the principle behind playing a series of games instead of a single game, but what if we played a series of series? Or a tennis match, with its multi-tiered structure (point, game, set, match)? I'll look at this in a future post.

No comments: