05 February 2011

Computing distance in the Facebook graph

Is there some nice algorithm for computing the distance between two nodes in a graph, that gracefully fails when they're far apart? I'm asking this prompted by this metafilter question on how to compute the distance between two individuals in the Facebook graph; it seems to me that if someone is at distance 5 versus 6 from me, for example, I don't really care which it is.

Dijkstra's algorithm, for example, runs in time proportional to the number of edges of the graph. (Well, O(E+V log V), but I'm guessing that log V is smaller than E/V.) This is the search algorithm I learned in an algorithms class once. And most algorithms that I've seen -- of course this isn't an area I know a huge amount about -- tend to run in time at least as large as the number of edges of the graph. (Of course this makes sense because one could have to actually look at all the edges of the graph -- such are the pitfalls of worst-case analysis.)

It seems like bidirectional search would work -- if my friends of friends overlap with your friends of friends, then we're at a distance of most 4, for example. (I didn't realize this had a name until just now; it just seems like an obvious idea to me.) But is there some reason I'm overlooking that this wouldn't be practical?


Suresh Venkatasubramanian said...

Actually a generalization of bidirectional search is quite common when dealing with large graphs. Fix a collection of "landmark" nodes, and determine for each node which its closest landmark is (by BFS say). Then compute the exact distance between pairs of landmarks (there are presumably only a few of them). To now compute the distance between two nodes takes three lookups: from each node to its closest landmark, and the landmark-landmark distance.

this tends to work fairly well in practice, especially with graphs that contain many tree-like tendrils and a highly connected core.

Michael Lugo said...

Thanks, Suresh! If I remember correctly I've heard that a similar method is used in generating road directions. Here the landmarks are intersections of major roads, so it's usually semi-obvious what the closest landmark is.