03 October 2007

Home-field advantage and postseason baseball scheduling

Major League Baseball keeps playing around with the way they schedule the various playoff series.

Thing #1: during the season teams play every day. But during the first round of the playoffs, which start in two hours with the Phillies and the Rockies, there are four five-game series. Each of these series includes two off-days, except for the Red Sox and Angels, who have three. Take a look at the schedule.

You'd think that one benefit of this would be that the TV stations televising the series never have to broadcast four games in one day. You'd be right. You'd think another benefit of that would be that teams could have their games scheduled at times that are not ridiculously inconvenient for their fans. You'd be wrong, at least if you're a Phillies fan; the five games in the first-round series are scheduled for Wednesday at 3, Thursday at 3, Saturday at 9:30, Sunday at 10, and Tuesday at 6:30. (The first two are weekdays.) The Cubs-Diamondbacks schedule isn't much better. The Yankees-Indians and Red Sox-Angels series get all the good time slots.

Anyway, a big deal is made of having home-field advantage in these series. (At least, people in places like Boston make a big deal out of it, because the Red Sox knew they had a playoff spot a week before the end of the season. The National League was a bit crazier this year; the Diamondbacks and Cubs weren't assured of playoff spots until late Friday night, the Phillies when the final pitch was thrown on Sunday, and the Rockies late Monday night.) Currently, in a five-game series the team with home-field advantage (which is the team with the better regular-season record, except it can't be the wild card) plays games 1, 2, and 5 at home, and games 3 and 4 on the road; this is referred to as a "2-2-1" format. In the past, a "2-3" format was used -- for example in the 1984 NLCS -- although I can't tell which team was considered to have home-field advantage. (edit: this article indicates that it was the team which started on the road and had three games at home. At that time, the divisions alternated home-field advantage from year to year, instead of basing the seeding on regular-season record, which makes life a lot easier on the people selling the tickets!)

So I ask the question: say that a team -- let's call them the Phillies -- has a probability p of winning a game played at a neutral site against their opponents, the Qockies. (The Qockies are related to the Qankees, who I wrote about here and here.) The probability of the Qockies winning a game is, of course, 1-p; we'll call this q.

Next, we'll say that the probability of the Phillies winning a game at home is p + ε, and on the road p - ε, for some constant ε. (Incidentally, ε is about 0.04 for most teams -- that is, home teams win about 54% of their games.) Then the probability of the Phillies winning a series in a 2-2-1 format, where they start at home, can be found by summing the probabilities of all the different ways in which they can win. For example, the probability that the Phillies win in three games is (p+ε)2(p-ε). The probability that they win game 1, lose game 2, win game 3, lose game 4, and win game 5 is (p+ε)(q-ε)(p-ε)(q+ε)(p+ε). We get ten terms in p, q, and ε which correspond to the various ways in which a team can win and lose games in a five-game series and still win the series. These add up to

6p5 + (-15+6ε)p4 + (10 - 12ε) p3 + O(ε2)

and so we conclude that a home-field of ε in each game confers an advantage in the series of

p4 - 12εp3 + 6εp2

or 6ε(pq)2. If p = q = 1/2 this means that home-field advantage in a series is worth something like 3/8 of what it's worth in a single game; a team that has a 0.540 winning percentage will win a series 0.515 of the time. (The actual number there is 0.5150646144.)

What about in the 2-3 format? A similar polynomial can be computed; it turns out to be exactly the same polynomial, if you assume that the team with home-field advantage starts on the road (and therefore gets three home games). So fooling around with the order of the games doesn't change the impact of the home-field advantage. Yet one might say that the team that starts on the road actually doesn't have the advantage if the series only plays three games -- they have to play two on the road!

This seems surprising, but it's not. Think of a playoff series as an experiment to figure out which team is the better team. In order to do the experiment, you play five games, and whichever team wins more games is declared the better team. If the games are independent, then the order in which the games are played shouldn't affect the probability of a given team winning the series. In reality, this is exactly what is done -- except that we stop playing when one team wins three games, meaning that no matter how the rest of the series turns out, they will end up winning more games.

I'm not going to weigh in on which format is preferable, because that's not a mathematical question. But since either format, 2-2-1 or 2-3, is mathematically equivalent, one probably wants to go with the format that appears most fair, which is probably 2-2-1 -- in a series with an odd number of games, the team with home-field advantage will have one more home game, and in a four-game series each team has two. Furthermore, the team with home-field advantage opens the series at home, which seems fair.

But then how does one handle a seven-game series? The current format is 2-3-2, and you could argue that the team that opens at home -- and has four scheduled home games out of seven -- is hurt in a five-game series. (The same argument as to why they're not really hurt applies.) It wouldn't surprise me to see MLB go to a 2-2-1-1-1 format and introduce even more off days into the postseason -- did you know that Game 7 of the World Series this year is scheduled for November 1? Before you know it they'll be playing baseball on Thanksgiving.

(For the record, I credit the Phillies being in the playoffs to the colors of my blog. My blog's original layout was blue and orange, which are the Mets' colors although I didn't realize it at the time. Now it's red and white. As a result, Mets fans look like this guy.)

5 comments:

Anonymous said...

My thought is that it might make sense to model the probability of winning with the home field advantage as p (1 + epsilon), and as p (1 - epsilon) away, where epsilon perhaps averages .08.

You could still calculate with your model: let's call your epsilon e, where e := epsilon * p. So when you get a series homefield advantage result like 6e(pq)2, this is 6 epsilon (p^2 - p^3).

In your original model, the series value of the homefield advantage is maximized for evenly matched teams. In this model, I believe the series value of the homefield advantage is maximized for a team with a 2/3 probability of winning.

I wonder if these two models can be distinguished by an empirical test?

Michael Lugo said...

Anonymous,

it probably wouldn't be too difficult to distinguish the two cases; there's enough data out there that I suspect one could tell whether home field advantage is bigger for unevenly matched teams than for evenly matched teams.

But as you pointed out, the two models are just related by a change of variables.

Anonymous said...

as any gambler knows odds are nice luck is better.

From Mike Lopresti's column about Philly Fnas:

"They may now return to their suffering, not to mention their booing.

They have always been considered a merciless bunch. Some of them would throw beer at a nun. But it has not been easy, their lot in life. And now that the Phillies have reached 10,000 defeats, something is needed to put such an unprecedented feat into proper context.

This will do: The Phillies would need 32 straight 100-win seasons to reach .500."

Anonymous said...

6p5 + (-15+6ε)p4 + (10 - 12ε) p3 + O(ε2)

You give this formula in your account, but I think there is a mistake. Where did q go in the multiplication?

Michael Lugo said...

quantumduck,

q = 1-p.