God Plays Dice: networks

Showing posts with label networks. Show all posts

18 November 2008

Navigability of complex networks

Here's an interesting idea, which falls in the very large set of "things I thought about but have no idea how to implement": M. Boguna, D. Krioukov, kc claffy. Navigability of Complex Networks, arXiv:0709.0303v2. (Published in the November 16, 2008 issue of Nature Physics; I learned about it from physorg.com.) The basic idea is that it is possible to navigate in complex networks by navigating in some underlying hidden metric space. Nodes are close together in the hidden metric space if they are similar to each other in some abstract sense, and they are more likely to be connected to each other in the network if they are closer together.

To take an example where the metric space isn't hidden, one navigates between airports by thinking about distances on the surface of the Earth; if you route greedily, by at each airport taking the flight which gets you closest to where you want to go, you usually will end up at your destination. Of course, this isn't a priori true -- you could get stuck in an infinite loop -- but the point is that real networks seem to obey this.

This is something that I think a lot of people have an intuitive sense of. For example, there are people I've met that when I meet them, I want to ask "why haven't we met before?" -- some combination of shared geography and shared interests makes it seem like we should know each other. (In my own life, there is at least one case of somebody who I never met when I was an undergrad, though there were plenty of amusing stories about mutual friends of ours that we'd both heard; we met shortly after I moved back to Philadelphia for grad school.)

27 December 2007

A couple articles from the Princeton Companion to Mathematics

Differential forms and integration (10 pp.), from the forthcoming Princeton Companion to Mathematics. (From Terence Tao, who has also written other articles for the PCM. I think it would have been rather nice to receive such an article at the outset of the class in which I learned about this stuff, as it would have helped to give some context; although one can often get context from a textbook, it's difficult to do so because all the little explanatory bits are often mingled with the details. The PCM articles I've seen do not suffer from this flaw; indeed, one might describe a lot of them as what the preface of a textbook covering the same material should be.

In a less theoretical, and less general, vein, here's Frank Kelly's The Mathematics of Traffic in Networks, also from the PCM, which I learned about from a post from Tao last week. One thing this mentions is the paradox that under certain conditions, adding roads to a road network can make trips slower, even though each driver tries to minimize their own time spent on the road. (Fortunately, my life is arranged so that I walk almost everywhere; thus at the moment I only appreciate the weirdness of traffic on a theoretical level. But I know it won't always be like that.) I'd be interested to see if this paradox occurs in actual road networks or only in the contrived example that Kelly gives.

The PCM will sell for about $100, so I probably won't buy it when it comes out. But I suspect I will spend some time browsing it when our library gets it.

30 August 2007

profiles of interconnection networks

Flajolet and Sedgewick's Analytic Combinatorics (link goes to 12 MB PDF of the book, which is still in progress) is a most interesting book. I took a course based on the first three chapters of the book last fall; the purpose of the class was basically to teach us how to translate combinatorial structures into generating functions, in a fairly routine way. Unfortunately, we didn't get to the part of the text where we could then tell things about the asymptotics of those combinatorial structures, which is a tremendously useful thing to do.

So I've been working through parts of the book lately. In example V.8 Flajolet and sedgewick talk about a problem considered by Lagarias, Odlyzko and Zagier in their paper On The Capacity of Disjointly Shared Networks (although not quite in the same language). Flajolet and sedgewick ask: "There are 2n points on a line, with n point-to-point connections between pairs of points. What is the probable behavior of the width of such an interconnection network?" The width is defined in the following way: imagine the connections as circular arcs between the points on a horizontal line; let a vertical line sweep from left to right; then the width is the maximum number of arcs encountered by such a line.

For example, if we have n = 6 (and thus 12 points), we might make the connections 9-12, 5-10, 1-2, 6-8, 3-11, 4-7; as the line sweeps between 6 and 7 there are four open connections (and never are there more), so the width is 4. The "profile" of this network is the sequence given by the number of open connections just after the line sweeps past 0, past 1, past 2, and so on (call these a₀, a₁, a₂, ...); this is the sequence

0, 1, 0, 1, 2, 3, 4, 3, 2, 3, 2, 1, 0

and in general this sequence increases to somewhere around n/2 and then decreases. Flajolet and sedgewick says that Louchard proved this, but I haven't looked at that paper. What surprised me, when I saw this problem, is that the profile actually looks something like a parabola, as you can see from the simulations at the top of p. 311 in Flajolet and sedgewick. Louchard proved that the profile "conforms asymptotically to a deterministic parabola 2nx(1-x)... to which are superimposed random fluctuations of O(√n)". So, for example, in such a network with one thousand nodes (n = 500), the profile goes as high as 250 (when x = 1/2) before going back down to zero.

Why should this be?

Imagine that you're putting together a random interconnection network. The profile sequence clearly has the property that each member a_k is one more or one less than the last one. It'll be one more if k is the first member of one of the pairs, and one less if it's the second member. What's the probability that k is the first (i. e. smaller) member of its pair? If k is small, it'll be large; if k is near 2n then it'll be small. In fact, if we know k is a member of some pair, the other member of the pair is equally likely to be any integer from 1 to 2n except k; thus the probability that k is the smaller member of its pair is approximately 1 - k/2n, and the probability that it's the larger member is approximately k/2n.

Thus a_k = a_k-1 + 1 with probability 1 - k/2n, and a_k = a_k-1 - 1 the rest of the time. The expected value of a_k - a_k-1 is approximately (1-k/2n) - (k/2n), or 1-k/n. So, for example, one-fourth of the way through the process, when k = n/2, we expect a_k to increase by about one-half each time we increase k by one.

Integrating with respect to n, we get

a_t ≈ ∫₀^t (1-k/n) dk = t - t²/2n

which is just a rescaling of Louchard's parabola. Flajolet and sedgewick's book is full of examples of things like this, where some apparently random structure turns out to have a lot of order when you "zoom out".

27 August 2007

some links, primarily about the "social graph"

Brad Fitzgerald, formerly of livejournal, gives his Thoughts on the Social Graph. The social graph is, roughly, the graph whose vertices are individuals and whose edges connect pairs of people who know each other (for some definition of "know"); Fitzgerald advocates making the social graph a "community asset" that would be available to anyone who wanted it.

I see the term "social graph" is a bit misleading because it suggests that there is only one type of edge, i. e. one kind of way in which one can know other people. (Indeed, this is a perennial problem on a lot of social networking sites; there's often no way to indicate the difference between, say, a person that one met once at a party and one's spouse.) Fitzgerald acknowledges this:

It's recognized that users don't always want to auto-sync their social networks. People use different sites in different ways, and a "friend" on one site has a very different meaning of a "friend" on another. The goal is to just provide sites and users the raw data, and they can use it to implement whatever policies they want.

The next step in the analysis of social networks might be to take into account the "strength" of relationships, as currently most social networking sites I know of don't acknowledge that I am more interested in some of my friends than others. Also, to what extent can the strength of relationships be guessed just from looking at the social graph without this strength information? A person who I know quite well is likely to be someone with whom I have many friends in common.

Also, the social graph is in some important ways directed (the link goes to Good Math, Bad Math); for example, if edges connect X to Y if X reads Y's LiveJournal. (It wouldn't surprise me to learn that Fitzgerald has this example in mind.) The somewhat addictive tool LJ Connect, which finds the shortest path between two individuals with LiveJournals via their friends, explicitly acknowledges this.

Finally, some shadowy figures seem to be aggregating the mathematics blogging community, and various other blogging communities as well. Their in-house mathematician gives yet another example of using the formula 1+2+...+n = n(n+1)/2.

And some links:

Paul Graham writes about Holding a program in one's head; a lot of his ideas seem to have obvious analogues to mathematical research, which isn't surprising, as programming and mathematics are quite similar. I'll probably have more to say about this later.

There is a graphical formalism for quantum mechanics, which its authors call "kindergarten quantum mechanics"; this is said to be a very considerable extension of Dirac's notation, and gives short derivations of deep results concerning teleportation, quantum mechanics, and so on. If this is true (I'm not familiar enough with the results concerned to say), it's a good example of what Graham has to say about choosing appropriate notation.

Vlorbik asks us to please lie more carefully. He writes:

We were supposed to test the claim that a certain population proportion was 10% against a sample proportion of 13%, based on n= 57 data points (at some stated confidence level that I’ve forgotten). But wait a minute. You can’t get 13% from a sample of 57 subjects: 7/57 ~ .122807 (i.e., 12%) and 8/57 ~ .14035 (i.e., 14%).

This particular one isn't a big problem, but it's symptomatic of something more general -- that the "applications" problems in a lot of mathematics textbooks bear very little resemblance to reality, which only seems to frustrate the students. (Another complaint I have about such textbooks is that the calculus texts seem to assume the student is familiar with physics, which is often not a reasonable assumption to make, and so the instructor ends up teaching physics instead of calculus.)

In Nature's Casino, by Michael Lewis, from the New York Times, August 26, about the insurance market for catastrophic events.

13 August 2007

links for 14 August

The "is the new" diagram attempts to document all instances of "X is the new Y", which is a fairly common phrase, with an arrow from X to Y. Not surprisingly, the node of highest indegree appears to be "black". You can try to make a partial order here, assuming that if X is the new Y and Y is the new Z, then X is the new Z; for example, local is the new kosher (local is the new organic, organic is the new kosher) -- but there aren't any really large components in this graph. It seems like most of the time, when people say "X is the new Y", nobody has ever made a statement "X is the new Z" or "W is the new Y" before. It's interesting to see if you can get why these things are true -- why is Ohio the new Florida? gadget bag the new man purse? individuality the new conformity? And isn't this last one impossible by definition? Is it really individuality if everyone's doing it? (Via information aesthetics.) It would be interesting to see how this network evolves with time; eventually people will start talking about things that have already been talked about and larger components will start to emerge.

[edit, 9:18 am: a particularly perplexing entry here is "Canada is the new Estonia". If you google this phrase, you get a lot of confused people, but eventually you find the beginning of a New York Times article from 2005, which says "America's next top model? she's [sic] likely from canada.Canada is the new Estonia - at least when it comes to modeling." I didn't realize Estonia was the old Estonia.]
Bad countries come with long names, from Social Science++, via Language Log. The theory is that a country with a good-sounding name like "Great Socialist People's Libyan Arab Jamahiriya" or "Democratic People's Republic of Korea" probably has something to hide; it seems to be borne out, although the United Kingdom of Great Britain and Northern Ireland would probably disagree. This is related to a pet theory of mine that restaurants advertising "authentic X cuisine" probably have something very far from authentic X cuisine. If you really were a democratic country, or you really served authentic food, you wouldn't go to the trouble of advertising it; building up a reputation would be enough. (But how is reputation built? This is a question I wouldn't mind studying from a mathematical perspective.)

Atle Selberg died, from Terence Tao; a mathematical obituary. Selberg and Erdos produced the first elementary proof of the prime number theorem; the story goes that Erdos got the lion's share of the credit, to the point where someone walked up to Selberg (not knowing that it was him) and said "did you hear that Erdos and some Scandinavian came up with an elementary proof of the Prime Number Theorem?" Then again, that seems like the sort of thing Erdos himself might have done; supposedly late in life he forgot who people were, even the people he collaborated with often.

The Quant Bloodbath, at Secret Blogging Seminar: how will the current crisis on Wall Street affect the job market for quants? Will Wall Street decide it doesn't need mathematicians after all (because the quantitative models don't seem to be working), or will it decide that it needs more mathematicians to make better quantitative models?

Ken Jennings (yes, that Ken Jennings) writes about Pic-Tac-Toe puzzles, in which nine images are arranged in a three-by-three grid, and the images in each of the three rows, three columns, and two diagonals have a common theme. An alternative is if you're given eight of them and you have to fill in the ninth; see this example from the 2003 MIT Mystery Hunt.

Science after Sunclipse has launched MathSciJournalWiki, which is what it sounds like -- a freely editable source of information about scientific journals. Also of interest is Scientific Publishing: A Mathematician's Viewpoint, by Joan Birman, from the July 2000 issue of the Notices of the AMS.

Yesterday I wrote about men having more sex than women?; over at Statistical Modeling, Causal Inference, and Social Science they're going on about the inaccuracies of the NYT article I mentioned there, which conflates the mean and the median. Not surprising, since they're statisticians over there; I still think that people lying about their sexual activities is the big thing, though.

God Plays Dice