10 July 2007

most common line score?

Major League Baseball is doing what they're calling the scoreboard challenge for the All-Star Game tonight. They challenge people to guess the number of runs that will be scored in each half inning, as well as the final number of hits and errors for each team.
Ignoring the hits and errors for a moment: what do you think the most common set of scores in each half inning is?
My guess is that it's no score in each half inning except for a single run in the bottom of the first. Why? Well, in any randomly chosen half-inning, 0 is the most common number of runs scored. So I'd go for the all-zeroes line score -- but then the game wouldn't end! So I begrudgingly allow a single run to score.
Why does that run score in the bottom of the first, you ask? The home team is more likely to score than the visiting team, so it should be in the bottom of some inning. And in general, teams score more in the first inning than any other inning, because batting orders are set up so that the best hitters bat first, so I'll put the run in the first inning. (The second inning, by the way, is historically the lowest-scoring inning.)
This seems a bit counterintuitive, because if I wake up tomorrow (the game starts at eight and will probably last forever because All-Star games are full of substitutions, so I'm not sure if I'll stay awake for the whole thing) and see that the score was

American: 000 000 000
National: 100 000 00x

I'd surely say "oh my god, what happened?" On average a baseball team scores five runs or so, so something like the 2005 All-Star game line score

National: 000 000 212
American: 012 202 00x

seems a lot more likely. And in fact, the National League scoring two runs in each of two inning and one in a third inning, and the American League scoring two runs in each of three innings and one in a fourth inning, probably is more likely than the 1-0 game I invented -- but only because we didn't specify which innings.
The people who designed this contest seem to be aware of this, because the rules specify that they give people 5 points for each correct 0 or 1; 10 points for each correct 2 or 3; and 20 for each correct >= 4.
Now, how often does a team not score at all in a half-inning? Looking at Sunday's games, 264 half-innings of Major League Baseball were played; 69 had at least one run. On Saturday, it was 67 out of 301. On Friday, 97 out of 290. (Friday was a high-scoring day.) That's 233 out of 855 for the weekend. In general, we see that about 73% of innings have no runs. Similarly, there were 49 one-run innings on Friday, 32 on Saturday, and 33 on Sunday, so 114 out of 855, or 13%.
So if we assume that all innings are independent, then the probability of a given game having the line score I gave is (.73)17(.13) or about 0.00060; there should be, on average, one game like this out of every 1,600 or so. There are 2,430 games played each season (162 games per team, times thirty teams, divided by two), so we expect one and a half games with that line score per season.
Except the actual number should be a bit more, because:

  • the innings of a game aren't independent. In particular, most of them are pitched by the same pitcher

  • like I said before, if you're ever going to score, you're going to do it in the bottom of the first.

But if you actually go through the 38 1-0 games that occurred last season, the single runs scored as follows:

visitors : 011 343 300 000 0
home team: 012 424 222 210 1

and there were no 1-0 games where the only run scored in the top or bottom of the first inning -- this sort of thing can happen with rare events. I'm kind of curious if this has ever occurred -- for the reasons I outlined above, I suspect it has -- but not curious enough to go digging back any further.
If you'd like to use what I've said to enter in the contest, it's here -- but hurry, the deadline is 7:59 PM Eastern.

edit, 10:04 pm: Four innings are done. The National League is up 1-0... and they scored that run in the bottom of the first.


frank said...

Two points about this post, which is interesting by the way.

First, in your probability, the home team doesn't bat in the bottom of the ninth, so it should be (.73)^16 * .13. This is not a huge difference, but bumps up the expected per season to about 2.

Another point is that you stated that a team is most likely to score in the first due to the lineups. I have not looked at the data, but this ignores the fact that pitchers get tired and often give up more baserunners as their pitch counts climb.

Michael Lugo said...


you raise a good point about the home team not batting in the bottom of the ninth in this situation. I was working quickly and didn't notice that.

As for the inning with the most scoring: I'm looking at the data at pages like this one which indicate all sorts of data about how teams bat in certain situations. It seems that there are two peaks: one in the first inning, and then one around the sixth inning. (But I don't have aggregate data, just data for individual teams, and I'm not quite interested enough to start combining it.) It's true that there's often an inning sometime between, say, the fifth and the seventh where a lot of baserunners get on as the starting pitcher tires -- but we can't predict in advance what inning that will be.

frank said...

Upon further searching, I came across this. You can get all run distributions for a given year, but I can't figure out how to combine years. In most years, there seems to be a bimodal distribution of runs and also 1 run innings. The first inning is often (but not always) the highest.