The following paper provides some intersting insights: <A HREF="sweet.ua.pt/~a29583/research/PhysicaA_final.pdf" REL="nofollow">Frequency of occurrence of numbers in the World Wide Web</A> scaliger, your SSN has 10 digits? Dammit, I <I>knew</I> the gummint was cheating me!

The distribution is skewed in non-mathematical ways. 5-digit numbers are zipcodes, 7-digit numbers are phone numbers, 8-digit with leading zero are also phone numbers, 10-digit are phone numbers with area codes (or Social Security numbers or ISBNs), 13-digit are EANs, 16-digit are credit card numbers. If you allow leading zeroes, 9-digit "numbers" not in Google's database are common. That makes for a lumpy distribution. This comment has been removed by the author.

So, I took the number in the origianl post that indicated that it had 4 hits. And entered it into Google search (used double quotes around the number). I expected it to have 5 hits (since the original post would account for the 5th hit).<BR/><BR/>And sure enough there were 5 hits.<BR/><BR/>:o)

yes, after I googled it I realized that the "DEF TUV TUV OPER OPER" form was more memorable. It's fortunate that they have a number such that all those strings are pronounceable; if their phone number has, say, 7s in it, people would be wondering "um, how do I pronounce 'PQRS'?"

michael, you might recognize that number in the encoding 'DEF TUV TUV OPER OPER', as seen on posters everywhere at MIT.<BR/><BR/>mark, that might be a script best run from within Google... niklas... nice. We used to have an upper boung....

Thanks for clarifying that. I think I follow your argument now.<BR/><BR/>I'm half tempted to write a script that performs a search for the smallest number but considering the sheer number of queries it would have to make I'm sure it would be against Google's terms of service.

Mark, I wasn't entirely clear. What I meant is that there should be a power law for the "expected" number of google hits for a certain number (where the expectation is taken over some ensemble of Internets) and then the fluctuations from that expectation are given by the Poisson distribution.

Wouldn't it make more sense to fit some kind of power law distribution over the numbers rather than a Poisson distribution? <BR/><BR/>To quickly test this, I took the number 13033319 and repeatedly incremented the first digit to see if the resulting distribution fit with Benford's law. Here's what I got:<BR/><BR/>13033319: 157 <BR/><BR/>23033319: 81<BR/><BR/>33033319: 33<BR/><BR/>43033319: 39<BR/><BR/>53033319: 10<BR/><BR/>63033319: 46<BR/><BR/>73033319: 65<BR/><BR/>83033319: 23<BR/><BR/>93033319: 23<BR/><BR/>So not a great fit for Benford's law but it's suggestive.

348494233 doesn't have any hits on Google, let's see how that lasts.

plam, I assume your choice of number was not accidental, although I didn't recognize the last four digits at first. (And on a related note, I'd expect 617253xxxx to be seriously overrepresented. Although maybe not -- MIT people would perhaps be unlikely to put their phone numbers on their web pages. Nobody uses phones there.)<BR/><BR/>On a similar note, while playing around with this I noticed that numbers equal to or slightly less than 2008 tend to be overrepresented, because they often refer to years. (2008 is more common than 6, and twenty times as common as 2009.) And five-digit numbers have interpretations as US zip codes, so for example 90210 is much more common than 90209 or 90211. Well, there may be a bunch of 8-digit numbers not in Google, but when you get to the 9-digit range, you'll pick up a lot of numbers of the form 617 253 8800. Some prefixes won't be represented, of course, but there are a lot more 9-digit numbers than there would otherwise be. And if you knew exactly, you couldn't really say. (If you did, Google might pick it up...)<BR/><BR/>Jonathan