17 June 2011

X-Y correspondence

Google hits for "Pascal-Fermat correspondence": 3,230. For "Fermat-Pascal correspondence": 206.

Does anyone have any idea why? In particular it seems like one would sort either on importance (but which of these two is more important?) or alphabetically (but that gives the wrong result).

And given two individuals X and Y, how can we predict whether "X-Y correspondence" or "Y-X correspondence" is more common? I'm sure there are no hard and fast rules here but there must at least be some trends, and would expect a situation similar to how gender is assigned to words in languages, foreign to me, where words have gender.

1 comment:

AA said...

I am not sure how "importance" would be quantified (there are metrics based on the indegree for example, but context has to be taken into account as well) but i doubt that google's results are sorted alphabetically.

I would say that google's results are sorted according to "popularity"...So the "Pascal-Fermat Correspondence" as a string is much more frequent than the "Fermat-Pascal Correspondence" in the body of web pages that google is indexing.

What goes for the string now, is that it probably contains effects of influence. For example, an influential source, published an article on the subject and the information eventually got diffused in relevant pages reproducing the adopted terminology.

To predict this kind of associations we would need some data (or metadata). That is, either the raw data, or a model for the data (an ontology) so that we can experiment at the reasoner level (so that we can resolve the likelihoods for any two given words, not just names, taking into account their context)...

For the particular case of scientists though, "Fermat" and "Pascal" are scientists that have each produced Fx, Px articles. We could establish a metric of similarity between the text of the "Communications" and the texts of any previous scientific work by each of these scientists BEFORE those communications. A higher similarity would mean that "From the given evidence, scientist X was likely to be working on subject P BEFORE scientist Y...So it is more likely that this scientist was the first to initiate the communication which would provide a reason for labeling it as "X-Y Communication".

Perhaps though, there are additional rules to be taken into account like the "rhythm" of the language. X-Y could be more "pleasant" to a particular language than Y-X...It would flow more easily in the middle of a sentence (although, maybe this is a weaker rule)