20 September 2007

Baseball entropy

I was thinking of putting together a prediction of the Phillies' odds of making the postseason -- it's late enough in the season that I can do the calculations exactly, if I'm willing to ignore the teams that plainly have no chance -- but the results would be depressing. (Although I did do a prediction of the Phillies' ten-thousandth loss, which came a couple days earlier than I predicted. That was depressing, too.) The good folks at Baseball Prospectus do a simulation where they run the rest of the season a million times; at this time of the season, with ten games left, the odds fluctuate wildly with each game.

I've also wondered if it would be possible to determine some sort of information entropy (the link goes to a nice intuitive explanation of what that means) from these postseason odds, and use that as a single quantity to determine how "close" the playoff races are at a given moment. For example, at basically any moment this season, the National League has been "closer" than the American League. Okay, by "wondered" I mean I thought "I don't really want to do the computations, because I'm lazy". The information entropy explains "how surprising" a random variable is. The entropy of a random variable which takes each of n values 1, ..., n with probabilities p1, ..., pn is

-(p1 log p1 + ... + pn log pn)

where we adopt the convention that 0 log 0 = 0 (or, equivalently, we say that all the probabilites involved are nonzero.) For example, consider the winner of this year's National League pennant. If there are, say, three known playoff teams and a fourth team which is equally likely to be one of two teams, then the probability that the pennant winner is any of the first three teams is 1/4, and any of the other two teams 1/2; then the entropy of that random variable is

3 (1/4 log 4) + 2 (1/8 log 8) = 9/4

(logs are base 2 here; this means that entropy is measured in bits). The entropy of a random variable which is equally likely to take any of n values is log n bits; thus there are in some sense 29/4 = 4.756... contenders, in that if we could have a random variable which took 29/4 values with equal probability, it would have the same entropy. This interpolates between four and five; there are five contenders but three of them are clearly stronger than the other two. As of right now the probabilities of each National League team making the playoffs, according to Baseball Prospectus, are

.9449017 (Mets), .8990023 (Diamondbacks), .8109878 (Padres), .7179337 (Cubs), .2938922 (Phillies), .2825803 (Brewers), .0310565 (Rockies), .0106489 (Dodgers), .0089891 (Braves), .0000075 (Cardinals), all other teams zero

I'll assume that each of these teams, if they should make the playoffs, have a one-in-four chance of winning the pennant; thus the entropy of the pennant winner is given by summing a bunch of terms which look like

-.9449017/4 log (.9449017/4)

and in the end we get 2.5312 bits, corresponding to 22.5312 = 5.780 contenders. This seems reasonable; there are basically six contending teams at this point.

The American League has four teams above 99 percent right now (Red Sox, Yankees, Indians, Angels), and the entropy of their pennant winner is 2.019 bits or 4.054 "effective" contenders.

And this post was originally supposed to be about math humor in a baseball radio broadcast: Ryan Howard, with the bases empty, has fourteen home runs and fourteen RBIs so far this season.

I was just (well, a couple innings ago now) informed of this by the Phillies radio announcers; it appeared on the monitor that tells them the statistics.

They chuckled at this. I assume they were chuckling because it's trivial, as one of them said "well, how many RBIs was he supposed to have?" The only way one can bat in a run if the bases are empty is to hit a home run; furthermore that will bat in exactly one run.

(I assume that the monitor in question breaks down a player's statistics by the eight possible situations for who is on the bases; the other lines probably seem less silly. They won't show this same sort of thing, because when runners are on base it's possible to get them in without scoring a home run.)

1 comment:

Anonymous said...

You sound like a Brooklyn Dodger fan, always worried always fretting.

My problem with the Mets/Yankees is that neither are the Dodgers and I only half root for them.

Anyway I'll trade you Moto for a pack of gum if you promise to use him.