26 May 2011

The most well-read cities in the United States

From Berkeleyside: Berkeley is the third most well-read city in the US, according to amazon.com data.

This is among cities with population 100,000 or greater. Number 1 is Cambridge, Massachusetts (105K people); number 2 is Alexandria, Virginia (140K); number 3 is Berkeley, California (112K); number 4 is Ann Arbor, Michigan (114K); number 5 is Boulder, Colorado (100K). There are 275 cities of population greater than 100,000 in the US; Alexandria, the most populous of these five, is ranked 177.

My first thought upon seeing this is that these are all small cities, and of course you expect to see more extreme results in small cities than in large cities. Small cities are perhaps more likely to be homogenous. (This seems especially likely to be true for small cities that are part of larger metropolitan areas.) Actually, my quick analysis of the top five doesn't hold up for the top twenty; the average rank of the top twenty cities listed at amazon is 127.1, which is LOWER (although not significantly different) from the 138.5 you'd expect if being on this top-twenty list was independent of size. But it's certainly possible that, say, some 100,000-person section of the city of San Francisco actually has higher amazon.com sales than Berkeley. (There are a surprisingly large number of bookstores in the Mission.)

Also, people in college towns tend to read a lot -- that's no surprise (although one does hear that students don't read any more a lot these days). Four of the top five (all but Alexandria) are college towns; also in the top 20 are Gainesville (Florida), Knoxville (Tennessee), and Columbia (South Carolina). And in case you're wondering, Alexandria is not named after the city in Egypt with the Great Library.

25 May 2011

xkcd, philosophy, and Wikipedia

If you hover the cursor over today's xkcd, you'll see the following:

Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at "Philosophy".

I first heard this a few days ago, but with "Philosophy" replaced by "Mathematics". Here's an example:

I clicked on "Random article" which took me to Billy Mercer (footballer born 1896). Following the instructions goes to England (Mercer was English), Country, Geography, Earth, Orbit, Physics, Natural science, Science, Knowledge, Fact, Verification, Formal verification, Mathematical proof, Mathematics.

(A few days ago "fact" went to "information"; the article starts "The word fact can refer to verified information" and someone made "verified" into a link recently. In that case the sequence is fact, information, sequence, mathematics.)

If you keep going you get "quantity", "property (philosophy)", "modern philosophy", "philosophy", "reason", "rationality", "mental exercise", "Alzheimer's disease", "dementia", "cognition", "thought", "consciousness", "mind", "panpsychism", and back to "philosophy".

("rationality" used to go to "philosophy", until someone edited it, leaving the note "Raised the period of the Philosophy article... it was ridiculously low." Of course once someone points out some property of Wikipedia, people will tamper with it.

This doesn't seem to happen if you click on random links, or even second links. The basic reason seems to be a quirk of Wikipedia style -- the article for X often starts out "X is a Y" or "In the field of Y, X is..." or something like that, so there's a tendency for the first link in an article to point to something "more general". Does this mean that "mathematics" necessarily has to be the attractor? Of course not. But it does mean that the attractor, if it exists, will probably be some very broad article.

Edited to add, Thursday, 10:26 am: Try the same thing at the French wikipedia; it doesn't work. This seems to depend on certain conventions that English-language Wikipedians have adopted. However, it seems to work at the Spanish wikipedia, with FilosofĂ­a as the target.

07 May 2011

Two no-hitters four days apart is not that rare

Justin Verlander just threw a no-hitter for the Detroit Tigers. On May 3rd, Francisco Liriano threw one for the Minnesota Twins.

There have only been 271 no-hitters in one hundred years of Major League Baseball, so two separated by four days seems unusual.

But two no-hitters within four days of each other has happened several times before. From Wikipedia, there have been two no-hitters within four days of each other on the following dates:

August 19 and 20, 1880
September 19 and 20, 1882
two on April 22, 1898
September 18 and 20, 1908
August 26 and 30, 1916
May 2, 5, and 6, 1917
September 4 and 7, 1923
June 11 and 15, 1938
June 26 and 30, 1962
September 17 and 18, 1968
September 26 and 29, 1983
June 1 and 2, 1990
two on June 29, 1990
September 4 and 8, 1993
May 11 and 14, 1996

Is this list surprisingly long? If you assume that baseball has been played 180 days a year for 130 years, then that's 23,400 days on which baseball has been played. There have been 271 no-hitters, so on an average baseball-playing day there are 0.01158 no-hitters. After any given no-hitter there's a four-day window in which the list I gave above could be added to. So you'd expect (271)(0.01158)(4) = 12.5 pairs in that list. There are 17 pairs on the list. (I'm counting the 1917 triplet as three pairs. I'm not counting today's no-hitter.) So there doesn't seem to be particularly strong evidence for no-hitters somehow causing more no-hitters in their wake. (Although my model of the baseball schedule is, I admit, ridiculously crude. In particular I have ignored the fact that the number of teams isn't constant.)