21 August 2007

information-theoretic entropy in the weather

Right now, in Philadelphia, it's sixty-one degrees, with a light rain.

I have long maintained that this is "generic Philadelphia weather". By this I do not mean that it's always like this here. What I mean is that if I for some reason do not know what season it is, and I head outside and find it is sixty-one degrees with a light rain, this gives me very little information, because this sort of weather can happen any time of year. Another characteristic of such a day is that the high and low temperatures are very close together, say within ten degrees of each other; it's been between 60 and 64 since midnight and probably won't get out of the sixties (in either direction) all day.

Looking at the data from the last year, June 14 was pretty close to this, although it was just overcast, not rainy; January 6 might look that way on paper, except I remember it clearly and it was actually a freakishly warm and sunny day. I wore shorts. It was January. We had the New Year's Day parade that day. November 8, 13, and 14 fit as well; also October 11 and a few other October days; September 1. (I remember the weather on September 1 quite well, because I moved that day. The rain was light for most of the day and got heavier about an hour after my movers were done getting everything in.) I'm deliberately being vague about what constitute a day like this.

Not surprisingly, this sort of weather is most common in the spring and fall (mostly because I care about temperature) but it is possible in the winter or summer as well. And this gets me wondering -- in general, what is the information content of the weather? If it's 100 degrees and sunny, there might be a 2% chance that it's July 24; a 0.5% chance it's September 1; an 0.01% chance that it's November 1; and a one-in-a-million chance it's January 15. This sort of weather is very localized towards a certain time of year. One could imagine calculating the Shannon entropy corresponding to this distribution; it would be a lot smaller than the entropy you'd get from a similar distribution if you conditioned on sixty degrees and light rain.

Of course, in this formulation, the whole idea is kind of silly -- when am I not going to know what the date is? But looking at the information-theoretic entropy of weather seems like a potentially useful way to quantify how extreme the seasons in some place might be; it's possible but not normal to get the same weather in winter and summer in Philadelphia, say; routine in San Francisco; unheard of in Minneapolis. (I am picking these places without looking at actual statistics, so I might be wrong.) Why one would want to quantify that, though, I'm not sure.

2 comments:

John Armstrong said...

I'd read "generic Philadelphia weather" as "It's not always like this, but it's always like this in any open neighborhood of now."

Ian Varley said...

I think the potential value in this might be to reduce weather reporting to the minimal number of bits required - if the information content is low (it's 60 degrees and partly cloudy, again!), do I really give a crap? No, I just want to be notified insofar as the weather deviates from what I'd expect (i.e. an unusually hot day, a rainy day in a dry season or vice versa - all based on local norms).

Also reminds me of a funny line from Waiting For Guffman: "when you step into that [crop circle] it is always 67 degrees with a 40 percent chance of rain - always"