08 August 2008

A story of Google, Wikipedia, and languages I only sort of read

The Viquipèdia article "Mathemàtiques" has a bunch of amusing pictures that are meant to be icons of different types of mathematics: a Rubik's cube for abstract algebra, a Koch snowflake for fractal geometry, the Lorenz attractor for chaos theory, dice for probability, an elliptic curve for number theory, and so on.

Some areas don't translate into pictures as well: for category theory they have a commutative diagram, for combinatorics the six permutations of [3], etc.

Also, here are Representacions matemàtiques de diversos camps. (The English version of the article does not have this picture currently; they have a picture of Euclid.)

Much of the article seems to be a straight translation of the English version, but I find myself focusing more on the pictures when reading the Catalan version, because I don't actually read Catalan. But I read French and, to a lesser extent, Spanish, so I can figure things out.

(As to why I'm looking at the Catalan wikipedia -- well, I ended up there because Kowalski said he googled 1.70521 when it appeared in some of his work, and I wanted to see the results.))

Such a search finds Wikipedia articles, usually tables of mathematical constants, in Serbian, English, Esperanto, Catalan, Japanese, Thai, Czech, Turkish, Japanese, Serbo-Croatian, and Bosnian. This is both an illustration of the universality of mathematics and the extent to which Wikipedia is an international enterprise. (My apologies if I misidentified any of these languages!)

But the first hit upon Googling 1.70521 (upon this writing) was Kowalski's post, though it's twenty minutes old. That shows you how fast Google is at indexing. (By the time you read this, who knows? This post might be the first hit.)

11 comments:

Emmanuel Kowalski said...

Correct, your post is now first (at least for google.ch...)

.mau. said...

when I post to my blog, google is automatically pinged. This is why they may index you at once.

unapologetic said...

Bah! Clearly this article should just read:

1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900, 2674440, 9694845, 35357670, 129644790, 477638700, 1767263190, 6564120420, 24466267020, 91482563640, 343059613650, 1289904147324, 4861946401452, …

Isabel Lugo said...

John,

Richard Stanley makes a similar joke in the exercises of the second volume of Enumerative Combinatorics. After a large number of exercises illustrating the many contexts in which that sequence arises, there's an exercise that asks the reader to identify the following sequence: un, dos, tres, quatre, cinc, sis, set, vuit, nou, deu, ...

Unfortunately, Catalan the person was Belgian.

Anonymous said...

It's not just the speed of indexing that matters. There's also the crawling, a necessary precursor to the indexing, and the ranking, a necessary precursor to showing the indexed page in the search results. Also, as I understand it, Google tries to distinguish between different kinds of content, crawling, indexing, and ranking things that are likely to be topical and timely in a separate pass from those that are less volatile. Thus things that create more volatile interest -- things like blog posts -- can momentarily shoot to the top of the search results, while things that are likely to have lower volatility to their importance or relevance will likely take a little longer to show up but then persist longer.

This requires the three different processes to work on different schedules for different kinds of content: spidering (or crawling) the web, indexing the results and then ranking the pages. Of course since Google owns so many sites now, the spidering has become easier. And some people ping Google automatically. Next there's indexing, which you referred to, but without the ranking the job's not done. That's done via approximating the eigenvector of the adjusted adjancency matrix of web pages.

The indexing by search terms of interest (which need not appear on the page itself -- links to the page can cause the page to match a search term, leading to the famous phenomenon of "Google bombing"), requires determining which words or phrases are most appropriate to the page, producing something like a table that can be used determining the pertinence of each page to a particular search term. This pertinence score can be multiplied by a page's rank to determine a particular page's overall estimated significance to the search in question.

Google used to recalculate the page rank for most pages monthly, which was famously referred to as the monthly "Google Dance". The results during this period used to be somewhat chaotic, so the different pages returned appeared to be "dancing" up and down in significance. I believe, based only on my own experience, they have made this a smoother and more frequent process, even for regular, non-volatile pages.

matemàtiques said...

Italian would be a very good help in understanding Catalan. Much better than French or Spanish.

Anonymous said...

Greetings from one Catalan reader! Very nice blog.

Anonymous said...

Hey there. I'm trying to find a place to married personals. I've heard really good things about dating for married are hard to come by!

Anonymous said...

Hey I love the blog. I've been looking for more information on Downtown San Diego Real Estate and I was wondering if you have any good tips or pointers? I'm getting ready to move and I need all the information I can get. Thanks!

Anonymous said...

Order [url=http://buy-cialis.icr38.net/Desyrel]desyrel online[/url] easy - Grandiose Offer oxytrol online now - Advantageous Chance

Anonymous said...

Not sure where to post this but I wanted to ask if anyone has heard of National Clicks?

Can someone help me find it?

Overheard some co-workers talking about it all week but didn't have time to ask so I thought I would post it here to see if someone could help me out.

Seems to be getting alot of buzz right now.

Thanks