God Plays Dice: random walks

Showing posts with label random walks. Show all posts

24 July 2008

The 2000 election, eight years later

Outcomes of presidential elections and the house size, by Michael Neubauer and Joel Zeitlin. (It's in a journal, at PS: Political Science and Politics, Vol. 36, No. 4 (Oct., 2003), pp. 721-725 -- but that's not where I found it, and that's not where the link goes.) The link comes from thirty-thousand.org, a site which claims that congressional districts were never intended to be as large as they are; they advocate one per fifty thousand people, which is six thousand representatives. (Thirty thousand is approximately the original number of people per representative.)

The authors look at the 2000 U. S. presidential election and concluded that given the way in which seats in the House of Representatives are awarded, if the House had 490 members or less the election would have gone to Bush; with 656 or more, it would have gone to Gore; in between it goes back and forth with no obvious pattern, and some ties. The ties come at odd numbers of House members, which surprised me. But the size of the Electoral College is the number of House members, plus the number of Senators (always even, since there are two per state), plus three electoral votes for DC. So an odd number of House members means an even number of electoral votes, as in the current situation where there are 435 in the House and 538 electoral votes.

In case you're wondering why a small House favors Bush and a large house favors Gore, it's because the states that Gore won made up a larger portion of the population, but Bush won more states. In the large-House limit, the number of electoral votes that each state gets is proportional to the population, since the two votes "corresponding to" Senators are essentially negligible. In the small-House limit, each state has 3 electoral votes (I'm assuming that each state has to be represented) and so counting electoral votes amounts to counting states.

The states that Bush won had a total population in the 1990 Census (the relevant one for the 2000 election) of 120,614,084; the states that Gore won, 129,015,599. So 51.68% of the population was in states won by Gore, 48.32% in states won by Bush. Bush won 30 states, Gore 21. (I'm counting DC as a state, which seems reasonable, although the 23rd Amendment says that DC can't have more electors than the least populous state, even though it does have more people than the least populous state.)

So if there are N House members, we expect Bush to win 60 + .4832N electoral votes; the 60 votes are two for each state, the .4832N his proportion of the House. Similarly, Gore expects to win 42 + .5168N electoral votes. (The three are for DC; I'm assuming that DC would always get three electoral votes in this analysis, which isn't quite true. So Bush wins by 18 - .0336N electoral votes, which is positive if N is less than 535. The deviations between this and the truth basically amount to some unpredictable "rounding error".

If you look at the difference between the number of Bush votes and the number of Gore votes, you do see roughly a linear trend. To me it looks like a random walk superimposed on linear motion. This isn't surprising. As we move from N seats to N+1 seats in the House, 51.68% of the time the next seat should go to a Gore state; 48.32% of the time, to a Bush state. (The method that's used allots the seats "in order", i. e. raising N by 1 always adds a seat to a single state. Not all apportionment methods have this property; this is the Alabama paradox.) So the difference between the number of seats in Bush states and in Gore states will fluctuate, but the overall trend is clear. Of course the noise isn't actually random, coming as it does directly from the populations of the states, but the dependence on the state populations is so complicated that we might as well think of it as random.

I believe that something similar would happen with any set of election results in which more states voted for candidate A, but the states that voted for candidate B collectively had greater population. (Note that the latter criterion is not the same as candidate B winning the popular vote.)

Incidentally, I remember hearing in 2000 that if the House had had only a few more seats than it did, or even a few less seats, Gore would have won -- the implication being that N = 435 was a particularly fortuitous choice for the Republicans. This isn't true. But it's also possible that my memory is false.

12 April 2008

Red Bull gives you wings

So one of my favorite mathematical quotes is one that Rick Durrett (in his text Probability: Theory and Examples) credits to Shizuo Kakutani: "A drunk man will eventually find his way home but a drunk bird may get lost forever." More formally, random walks on the square lattice in two dimensions return to the origin infinitely often; random walks in two dimensions with more general steps allowed, but in which the expected position at any time is still zero, return arbitrarily close to the origin infinitely often. In three dimensions this is not true. (Random walks that return arbitrarily close to the origin infinitely often are called "recurrent", the others "transient".)

I just came across this paper from MIT's Undergraduate Seminar in Discrete Mathematics (18.304, Spring 2006), for which an anonymous student wrote some notes on simple random walks. Here we learn that a drunk man will eventually get home, but a drunk man who has had drinks containing Red Bull will not; as you know from the commercials, "Red Bull gives you wings!"

Another random walk which is transient would be that performed by a (non-flying) drunk with a time machine -- although not as viewed by the drunk, but as viewed by an observer who is fixed at one point in space-time. (The model I have of space-time in my head for this problem is something like that which one sees in certain movies, where if you're not careful you can run into a copy of yourself.)

(Incidentally, I almost wrote "Michiko Kauktani" there; she's a literary critic for the New York Times. It turns out that Michiko is Shizuo's daughter, which I didn't know.)

04 September 2007

Variations on the drunken-bird theorem, and real-world sightings

God Plays Dice exists in the real world!

My department traditionally has dinner for all the graduate students on the day before classes start. (This is preceded by the chair of the graduate program giving the annual ritual reminder of how the program is structured; I was basically told I should take some classes, teach my class, and do some research.) Anyway, I was waiting for dinner, and one of the new grad students got to talking, and I introduced myself, and he said "oh, are you the one who writes God Plays Dice?" Since you're reading this, you know I am. Someone said that he knew of two math blogs -- mine and Terence Tao's. Obviously this isn't saying that I'm as good a mathematician as him, but it's good company.

Anyway, we got to talking about how areas like combinatorics and probability are often perceived as "easier" than, say, algebra or analysis because the problems can be stated in terms that someone without much training can understand, even if the proofs aren't simple.

This led to me telling the joke about how "a drunk man always finds his way home, but a drunk bird can get lost forever". This is the colloquial statement of the following theorem about random walks. Consider a person who starts at the point (0, 0), and at each step of time takes a step to the north, east, south, or west with equal probability -- those are steps of (0, 1), (1, 0), (0, -1), or (-1, 0), respectively.¹ In order to be back at the origin after some number of steps, 2n, there must have been an equal number of up and down steps (say m of each) and an equal number of left and right steps (say n of each). Thus the number of ways that this can occur is

$\sum_{m=0}^n {(2n)! \over m! m! (n-m)! (n-m)!}$

which can be rewritten as

${2n \choose n} \sum_{m=0}^n {n \choose m} {n \choose n-m}$

by rearranging the factorials. Now, to do this sum, think about choosing n things out of 2n. To do this, we can choose m things from the first n, and then another n-m things from the remaining n; this gives the well-known identity

${2n \choose n} = \sum_{m=0} {n \choose m} {n \choose n-m}$

and thus the number of ways that we can be back at the origin after 2n steps is in fact ${2n \choose n}^2$ . This is approximately 4²ⁿ/(π n) as can be seen by Stirling's approximation. Each path occurs with probability 4^-2n, and so the probability of being back at the origin after time 2n is proportional to 1/n. Since Σ_n 1/n diverges, the expected number of returns to the origin is infinite; this means (in this particular case; there's a lemma I'm not writing out) that the probability of returning to the origin is 1.
(This is basically following the exposition in Section 3.2 of Probability: Theory and Examples, third edition, by Rick Durrett.)

In three dimensions, though, we get something proportional to n^-3/2 there, and Σ_n 1/n^3/2 converges, so the expected number of returns to the origin is finite; this implies (and I'm sweeping some details under the rug) that there's a chance of the random walk getting away forever. The two-dimensional case corresponds to the drunk man; the three-dimensional case corresponds to the drunk bird.

So where's the line between "always finds his way home" and "gets lost forever"? It seems to me that this should be at 2+ε dimensions, since that's what we need to make the sum convergent, but I don't know if anyone's studied random walks on fractal spaces. But this ε has to be fairly large; we came up with two examples, one on what one might call "2+ε" dimensions and one on "3-ε" dimensions. Both of these are transient, which is basically the technical term corresponding to "gets lost forever". A "recurrent value" of a random walk on R^d is one for which the random walk almost surely gets arbitrarily close to that point infinitely often; a random walk is called "recurrent" if it has at least one recurrent value (and then, in fact, all possible values are recurrent), otherwise transient.

The (2+ε)-dimensional example is a random walk on Z² × [-n, n] for some small integer n; essentially, this is a random walk on a slab. (I'm not particularly worried about how we handle the boundaries; any reasonable way will do.) This random walk ought to be recurrent; the random walker (let's call it a "fish", swimming around in an ocean which is of finite depth), starting at (0, 0, 0), will return to points of the form (0, 0, k) infinitely often. At each of those times, the fish has third coordinate zero with some probability on the order of 1/2n; 1/2n of infinity is still infinity.

The (3-ε)-dimensional example is a random walk on Z² × [0, &infty;), which actually corresponds better to the "drunk bird" because we can take the xy-plane to be the surface of the earth. One can invoke a sort of "reflection principle" here; this can be viewed as a random walk on Z³ where we've identified points which are mirror images across the xy-plane with each other. This lets us conclude that the random walk is transient. However, the reflection principle handles the boundary condition "wrong"; the natural thing to do when at a point (x, y, 0) is to allow steps to (x ± 1, y, 0), (x, y ± 1, 0), and (x, y, 1) with equal probability. So we have a probability 1/5 of leaving the xy-plane, while in the reflection-principle version that probability is 1/3. This gives the plane a certain "stickiness", and I assume this means that the random walk has a larger expected number of returns to the origin, but that number should still be finite.

1. In combinatorics, there is the convention of referring to, say, the "northeast" or "southwest" part of a drawing, which corresponds to how those directions look on a map. Mathematicians who aren't familiar with this convention find it strange, but for some reason "northeast" seems to flow a lot better than "upper right".

04 August 2007

geographical random walks

Let's say you have a map of the streets in an area. Could you guess, from looking at that map, which intersections the locals consider "most important"?

You could probably make a decent first guess by assuming that the importance of an intersection goes up with the number of roads at the intersection.

Why? Imagine you are randomly walking around a town. Each time you get to an intersection you choose uniformly at random from each of the possible ways you could go. (For the sake of simplicity, I'll assume that this includes going back the way you came.) It's a well-known result that this sort of random walk on a graph leads to a certain stationary distribution; that is, if a bunch of people walk around randomly you'll get to a point where the number of people at any given intersection is roughly constant. And that result is that number of people at any intersection is proportional to the number of roads coming into it. (I'll count the two directions of a road coming into the same intersection as two distinct roads.)

This has implications if, say, you want to start a business. If you're located in the middle of a block, there are two roads coming in. If you're at a "T"-shaped intersection, there are three. A normal grid intersection, four. An intersection like 48th and Baltimore or 23rd and South, five -- both of these have two streets that go through and a third that starts there. (Most of the intersections on Philadelphia's Baltimore Avenue are five-pointed, because the street grid has a sort of discontinuity there; this may explain why it's sort of the main street of its part of the city.) An intersection like Broad and McKean, which has three streets going through it, six. You want to be where there are more roads, so that more people will walk by. In Washington they have intersections like Dupont Circle where five roads go through, for a total of ten "arms". (The counting gets a bit tricky, because some of the roads don't actually go "through".) In Paris there's an intersection with eleven arms, fittingly called L'Etoile. (Much more than that seems impractical.)

Of course, this neglects a huge number of facts. Not all roads are equal. For example, I'm ascribing Baltimore Avenue's primacy in its part of Philadelphia to the fact that it's got five-pointed intersections; but probably more important is that, historically, it was the road that went to Baltimore. (Hence the name.) Also, a trolley runs down Baltimore Avenue and has for over one hundred years; but why is it there, and not somewhere else? Because people already were living or working on or near that street. And they were doing that because other people were. Once a slight imbalance is created -- say by the random-walk model I alluded to above -- people are naturally going to gravitate, even if it's very slightly, towards the place where people already are. The roads that lead towards the important nodes will become important in their own right. And street networks aren't static -- they can change. (River networks can't, though -- and a lot of cities are located where two rivers come together to form a third, the most famous U. S. example probably being Pittsburgh.) But the initial imbalance has to come from somewhere, and this is as good a place as any, especially in places where the roads mostly form a grid and hence their arrangement is determined well in advance of their actually being built.

Incidentally, this is my theory on why Boston and Philadelphia -- two cities which are of similar size and population density, at least if you consider them as the center of their respective metropolitan areas, and the two cities I am most familiar with -- differ in their transit system. Boston is able to have subways because for the most part it's obvious where the subway stops should be. Radical Cartography has a map of Boston as a series of squares; the original transit system was basically built around people getting from one square to the next. Philadelphia would have trouble supporting a subway system more extensive than its current two lines, because it's hard to point to a place that a huge number of people go which isn't served by the existing system; the population is more diffuse, because for the most part Philadelphia's streets form a grid and no node is any different from any other. If I wanted to I could draw, say, six or eight subway lines that I'd like to exist in Philadelphia -- but none of them seem essential to me, at least at the price that it costs to build a subway. (Some Philadelphians will point out that there's a long-standing plan to put a subway down Roosevelt Boulevard; to them, I will say that I have very rarely had any reason to be in Northeast Philly, so I forget about the Boulevard. I'm sorry.) Manhattan has subways -- even though it's for the most part a grid -- because it's so dense. (But I think I heard somewhere that Times Square is the busiest subway stop -- and it's under Broadway, which cuts through the grid diagonally.)

God Plays Dice