24 July 2007

predicting baseball attendance

Back in July I wondered where baseball attendance numbers come from, and pointed to J. C. Bradbury's post about the Harry Potter effect on baseball attendance; he came up with a quick-and-dirty estimate of the number of people who didn't go to baseball games because they were reading the new Harry Potter book. This got me thinking -- how could we know how many people we expect to go to any given baseball game?

This paper by Brian Stoner gives the results of a regression analysis which forecasts the average attendance of a baseball team, over the course of a season, as a function of average ticket price, number of seats in the team's stadium, the previous year's attendance, the team payroll in the current year and number of wins in the previous year, the team's television market size, and whether or not the team has a new stadium; these turn out to explain most (98%) of the variation over the sample period. For the rest of this post, let's assume that we can predict a team's average attendance; we know it'll be, say, 32,000 per game. (Equivalently, we can predict the attendance for the entire season, since the average is just this figure divided by 81.)

But what I was wondering about is not average attendance, but attendance at a specific game, or equivalently how much the attendance is expected to be above or below the average. If you look at the attendance figures for a given team, their deviation from the average attendance depends on other figures which vary from day to day. First, there are factors that probably affect any outdoor activity, not just baseball:

  • day of week -- weekend baseball is more popular than weekday baseball

  • time of day -- day games and night games are different, although which is more popular probably depends on day of week and the season

  • weather. Pre-purchased tickets probably depend on the average weather; for example, my family has a long-standing policy of not going to day games in June through August, because there's too much risk of the weather being unbearable. I suspect fans in colder-weather markets than Philadelphia avoid games in early April. (Philly can be cold in early April; I wore a winter coat to the Phillies' second game of the season this year.) After the many games canceled due to poor weather in April (including that Indians-Mariners series in Cleveland that got snowed out!),Gregory Goodrich, a meteorologist, examined the frequency of such "miserable baseball days". Walkup tickets, by contrast, probably depend quite heavily on the actual weather. Also, weather matters a lot less in a domed stadium; only in particularly miserable weather people won't even want to leave their homes.

I don't think these three variables could be treated independently; I'd probably prefer a day game in April (the nights can be very cold!) but a night game in July. And people probably care less about weekday versus weekend when school isn't in session.

There are then the factors that are unique to sports:

  • quality of the visiting team. In general I suspect people want to see the home team play a good opponent more than a bad opponent. But at the same time people like to see the home team win. Perhaps people most want to see a visiting team with the same record as the home team, which maximizes the chances of a competitive game. (Note that I'm assuming that the quality of the home team is factored into the average attendance.)

  • identity of the visiting team. When two teams in the same division play each other, I expect the attendance is higher than usual. Similarly, when teams have a long-standing rivalry more people come to games. This is distinct from the "quality of the visiting team" -- Phillies fans are more likely to come down to the ballpark to see the Mets than the Brewers or Dodgers, who have similar records as of this writing. Giants-Dodgers, Red Sox-Yankees, Cubs-Cardinals, etc. games probably sell more tickets than other intradivisional games (Giants-Diamondbacks, Red Sox-Orioles, Cubs-Astros, etc.)

  • time of season. This is distinct from "weather" because while one might expect the same weather in May and September, the two are very different in the context of a baseball season; a game near the end of the season is usually seen to be "more exciting", at least if at least one of the two teams involved has the potential of making the playoffs.

  • playoff importance. Games that are important in determining who ends up in the playoffs should have more attendance; for example, games where the two teams are first and second in their division or in a wild-card race.

I don't know if anyone's done this analysis (and if they have, it wouldn't surprise me if it's a baseball-team front office that isn't talking!) but I'd like to see the results.

