16 August 2007

Fibonacci win points

The weekend after next, I'm going to the National Baseball Hall of Fame. While poking around on the internet, I found some references to Whatever Happened to the Hall of Fame by Bill James. James is well-known as one of the first people to apply statistical methods to baseball (though I must confess I've never read any of his work; I might get this book. If the Hall of Fame is at all competent, they'll have it at the gift shop.) To be frank, from what I've heard a lot of his work doesn't really have a sound mathematical basis, but it's inspired people to look at the numbers in order to judge players' performance, and a lot of people (like, say, the folks at Baseball Prospectus) have taken their inspiration from him.

Anyway, the Wikipedia article mentions various methods that James came up with to judge a player over the course of a career; this includes the intriguingly named "Fibonacci win score", but doesn't explain how this is calculated. Naturally, I was curious. Google turned up this thread at baseball fever, which says that the Fibonacci win score for a pitcher is the number of career wins, times the winning percentage, plus the number of "marginal wins" (i. e. wins minus losses). This is typical of James in that it doesn't make sense at first -- why would you multiply wins by winning percentage?

The reason it's called "Fibonacci" is because of the answer to the natural question -- how does the Fibonacci win score for a player compare to their actual number of wins? Say a pitcher's winning percentage is k, and he won W games in his career. Then he loses [(1-k)/k]W games, and his number of win points is kW + W - [(1-k)/k]W. For this to be equal to W, we have

k + 1 - (1-k)/k = 1

and this has one root with k between 0 and 1, namely k = (√5 - 1)/2;, or about .618; this is the limit of the ratio between consecutive Fibonacci numbers, hence the name. A pitcher with a better winning percentage than this will have a higher win score than his actual number of wins; a pitcher with a worse record than this will have a lower win score than his actual number of wins. (The highest win score in history is 511, by Cy Young, who won 511 games and lost 316 in his career; indeed, Young's win percentage was .618.) The purpose of this statistic is to reward pitchers that pitched well and penalize pitchers who were just mediocre over very long careers.

The other question that comes to mind is -- if a pitcher wins a game, or loses a game, what does this do to his number of win points? Let f(W,L) denote the win score of a pitcher with W wins and L losses; then we have

f(W,L) = W (W/(W+L)) + W - L = (2W2 - L2)/(W+L)

Incidentally, in this formula the numerator is negative for a pitcher whose winning percentage is less than 1/(1+√2), or .414. If we differentiate this with respect to W and simplify, we see that

fW(W,L) = 2 - [L/(W+L)]2

and thus an additional win gets a pitcher two win points, minus the square of his losing percentage. Similarly, we have

fL(W,L) = -1 - [W/(W+L)]2

and so a loss costs a pitcher one win point, plus the square of his winning percentage. It almost seems meaningless to say that, because there's no a priori reason why this particular arrangement of variables should mean anything -- though at least it's dimensionally consistent, and has units of wins; there are a lot of random-looking combinations of statistics that don't even do that!

1 comment:

dfan said...

You are pretty much correct about James's mathematical sophistication. What he did have was an immense amount of common sense, and even though he didn't have the mathematical tools that sabermetricians 20 years later take for granted, it was always a pleasure to watch him dissect nonsensical arguments and cut to the heart of the issues he analyzed, largely because his arguments were so easy to follow and never required much statistical knowledge at all. Every once in a while you could see him struggle because he didn't quite have the tools to do what he wanted, but it was amazing what he did with effectively 8th grade math.