22 August 2007

Texas 30, Baltimore 3.

While at the Phillies game tonight (the Phillies lost, 15-3, to the Dodgers), I saw on one of the out-of-town scoreboard a score of Texas 26, Baltimore 3.

I figured it had to be an error, as did the other people in the stands who noticed it. Twenty-six runs? (At that moment, the other out-of-town scoreboard reported the score in that game as 16 to 3, because it can't show scores above 19. Those don't happen too often in baseball.)

In the end, Texas won that game -- the first game of a doubleheader, no less -- 30 to 3. MLB.com's account of the game is here. It's the most runs scored since the Chicago Colts (now Cubs) beat the Louisville Colonels (who haven't existed since 1899) by a score of 36 to 7, on June 29, 1897. Nobody in "modern" baseball (by convention, this is since 1901) has scored more than 29. Thirty runs is also an American League record. There have been eighteen 25-run games in modern baseball.

You'd think that a team scoring thirty runs would be likely to score in every inning, or at least in more innings than not. But Texas only scored in four innings -- 5 runs in the fourth, 9 in the sixth, 10 in the eighth and 6 in the ninth.

This reminded me of a record I once came across: teams which have scored in every inning of a nine-inning game. This is rarer than you'd think -- only seven times in major league history, all in the National League. (Note that it's almost always the visiting team that sets this particular record; a home team which has scored in the first eight innings will almost certainly not come up in the ninth.) Those seven teams won by scores of 19-8, 26-12, 20-10, 36-7 (the same 36-7 game as above), 22-8, 15-2, and 13-6. The losing teams in these efforts put up an average of 7.6 runs a game -- quite a bit higher than the average of four or five runs -- which lends a bit of credence to the idea that the two teams' scores in a game aren't independent.

And here's a mathematical question: if a team scores 30 runs, what is the probability they manage to do it while only scoring in four or less innings? We'll assume the team is a visiting team, and thus has all nine innings to work in. Furthermore, the number of runs the team scores in each inning is independent. The most recent data I could find for run distribution is from John Jarvis' collection, for the 2005 American League, so I'll use that. The distribution of the number of runs scored in a single inning, for the 2005 AL, is as follows:
Number of runs scored012345678910
Number of times14458307114426763251507822446
Probability.714.152.0713.0334.0161.00741.00385.00109.000198.000198.000297


Incidentally, under the assumption of independence, the probability of a team scoring in all nine innings in a game in the 2005 AL is (1-.769)9, or about one in 79,000; there are 2,430 games in a season currently (162 per team, times thirty teams, divided by two), so we expect one such game every thirty years or so.

We can thus take the probability generating function corresponding to this distribution:

f(z) = (14458 + 3071z + 1442z2 + 676z3 + 325z4 + 150z5 + 78z6 + 22z7 + 4z8 + 4z9 + 6z10)/20236.

The probability generating function corresponding to the sum of this distribution with itself is g(z)2; this is the sum of the number of runs scored in two independent innings, given that scoring takes place in both of those innings. A similar interpretation holds for higher powers. From this, we can easily find the probability that a team scores a total of thirty runs in the first k innings of a game, and none in the remaining 9-k innings; it's

[z30](f(z)-p)k p9-k

where p = 14458/20236 is the probability of not scoring in a given inning. The probability that a team scores a total of thirty runs in some k innings is this times the binomial coefficient C(9,k), which is the number of ways to pick the innings in which the scoring happens. Call this P(k). Then we get

k3456789
1010P(k)2.9174.85614.581670.551888.56908.64150.82
normalized P(k).00055.01409.11572.31455.35560.17108.02840


The normalized probabilities are just P(k) divided by the sum of the P(k) values; thus the value in column k is exactly the probability I sought originally. Rather surprisingly, to me, a team which scores thirty runs is most likely to score in seven of the nine innings, which is also the median of the distribution; innings with no score are so common that at least one happens in all but three percent of thirty-run games. To answer the original question, the probability that a team scoring thirty runs does so while scoring in four or less innings is about 1.5%. (Assuming, of course, that innings are independent and run distributions remain like the 2005 American League indefinitely.) And you can't even begin to suspect I'm wrong until, oh, a couple hundred thirty-run games have happened without this occurring again.

(Complete data of this form: a team scoring one run is of course most likely to score in one inning. A team scoring two or three runs is most likely to have scored in two innings; 4-7 runs, three innings; 8-12 runs, four innings; 13-19 runs, five innings; 20-28 runs, six innings; 29-40 runs, seven innings; 41-57 runs, eight innings; 58 or more runs, nine innings. I suspect at least the first few of these numbers could be checked against actual data.)

2 comments:

frank said...

This was truly a remarkable game. I could not stop laughing when I realized they still had a second game to play.

I was also amazed to see how quickly the game was played 3 hours and 21 minutes.

You can take the runs scored numbers and reverse the question, then given that a team scores in exactly 4 innings, the probability of scoring 30 runs is .000000048. The most likely result is 6 runs with a probability of 0.176.

michael cassidy said...

'The losing teams in these efforts put up an average of 7.6 runs a game -- quite a bit higher than the average of four or five runs -- which lends a bit of credence to the idea that the two teams' scores in a game aren't independent.'

what would attribute co-dependence to?