Showing posts with label elections. Show all posts
Showing posts with label elections. Show all posts

06 August 2008

Who's doing election prediction by simulation

Here's a list of all the web sites I know of where one can find simulations of the upcoming 2008 US presidential election. (I'm going to stay out of this game, because who wants to track down polling data every day, re-run the simulations, and so on?) Generally these start by obtaining a probability that one candidate or the other will win each state, from polls and sometimes from demographic data as well. These are then in some way aggregated in "simulated" elections. Often these are accompanied with a probability that Obama will win the election, or would win it if it were held today, and in many cases a probability distribution of the number of electoral votes won by Obama.

Yes, Obama. Not McCain. People who create these sites generally have a choice to make -- since nobody other than Obama and McCain has any reasonable chance of getting electoral votes, stating everything from the Obama point of view or from the McCain point of view has the same content -- but it seems that there's a leaning towards choosing Obama for this purpose in this corner of the blogosphere. (This is an entirely unscientific sample, though.)

Note that although Sam Wang says what he does isn't simulation, that's only because his method allows him to do all the possibilities at once. This is because he uses the magic of generating functions. This trick only works if you can make the simplifying assumption that winning in each state is independent of each other state. This is reasonable if you're trying to predict what would happen if the election were held today -- there's not any big reason for sampling error in different states to be correlated. But if you're trying to predict what will happen in the actual election, this assumption is very risky. It seems that the actual movement of voter opinions in different states should be correlated.

Here's the list:

This list is by no means complete.

24 July 2008

The 2000 election, eight years later

Outcomes of presidential elections and the house size, by Michael Neubauer and Joel Zeitlin. (It's in a journal, at PS: Political Science and Politics, Vol. 36, No. 4 (Oct., 2003), pp. 721-725 -- but that's not where I found it, and that's not where the link goes.) The link comes from thirty-thousand.org, a site which claims that congressional districts were never intended to be as large as they are; they advocate one per fifty thousand people, which is six thousand representatives. (Thirty thousand is approximately the original number of people per representative.)

The authors look at the 2000 U. S. presidential election and concluded that given the way in which seats in the House of Representatives are awarded, if the House had 490 members or less the election would have gone to Bush; with 656 or more, it would have gone to Gore; in between it goes back and forth with no obvious pattern, and some ties. The ties come at odd numbers of House members, which surprised me. But the size of the Electoral College is the number of House members, plus the number of Senators (always even, since there are two per state), plus three electoral votes for DC. So an odd number of House members means an even number of electoral votes, as in the current situation where there are 435 in the House and 538 electoral votes.

In case you're wondering why a small House favors Bush and a large house favors Gore, it's because the states that Gore won made up a larger portion of the population, but Bush won more states. In the large-House limit, the number of electoral votes that each state gets is proportional to the population, since the two votes "corresponding to" Senators are essentially negligible. In the small-House limit, each state has 3 electoral votes (I'm assuming that each state has to be represented) and so counting electoral votes amounts to counting states.

The states that Bush won had a total population in the 1990 Census (the relevant one for the 2000 election) of 120,614,084; the states that Gore won, 129,015,599. So 51.68% of the population was in states won by Gore, 48.32% in states won by Bush. Bush won 30 states, Gore 21. (I'm counting DC as a state, which seems reasonable, although the 23rd Amendment says that DC can't have more electors than the least populous state, even though it does have more people than the least populous state.)

So if there are N House members, we expect Bush to win 60 + .4832N electoral votes; the 60 votes are two for each state, the .4832N his proportion of the House. Similarly, Gore expects to win 42 + .5168N electoral votes. (The three are for DC; I'm assuming that DC would always get three electoral votes in this analysis, which isn't quite true. So Bush wins by 18 - .0336N electoral votes, which is positive if N is less than 535. The deviations between this and the truth basically amount to some unpredictable "rounding error".

If you look at the difference between the number of Bush votes and the number of Gore votes, you do see roughly a linear trend. To me it looks like a random walk superimposed on linear motion. This isn't surprising. As we move from N seats to N+1 seats in the House, 51.68% of the time the next seat should go to a Gore state; 48.32% of the time, to a Bush state. (The method that's used allots the seats "in order", i. e. raising N by 1 always adds a seat to a single state. Not all apportionment methods have this property; this is the Alabama paradox.) So the difference between the number of seats in Bush states and in Gore states will fluctuate, but the overall trend is clear. Of course the noise isn't actually random, coming as it does directly from the populations of the states, but the dependence on the state populations is so complicated that we might as well think of it as random.

I believe that something similar would happen with any set of election results in which more states voted for candidate A, but the states that voted for candidate B collectively had greater population. (Note that the latter criterion is not the same as candidate B winning the popular vote.)

Incidentally, I remember hearing in 2000 that if the House had had only a few more seats than it did, or even a few less seats, Gore would have won -- the implication being that N = 435 was a particularly fortuitous choice for the Republicans. This isn't true. But it's also possible that my memory is false.

06 January 2008

In which I do not endorse a presidential candidate

So I usually don't talk about politics here. And for the moment, this blog will refrain from endorsing a presidential candidate. This is mainly because I haven't thought too hard about the presidential elections, because the Pennsylvania primary isn't until April and the nominations will probably be decided by then; the election that really matters for me is the general election, and I don't want to get too attached to a particular candidate right now since they may not be in the general election. (Here's an interesting interview with William Poundstone on different methods of voting, via Slashdot.)

But it occurs to me that one thing that we should be against in a presidential candidate is pigheadedness of the sort that George W. "Stay The Course" Bush has shown. New information becomes available, and this is something that any presidential candidate -- well, really any president -- should take into account. They should not become wedded to their positions if new information becomes available. (By the way, I'm not saying that I want a president who decides what to do based on opinion polling that tells them whether they'll be able to keep their jobs. I want someone with principles -- but these principles should include a willingness to change their mind.) I don't want to go so far as to say "path-dependence is the scourge of history", but I'll say it inside quotes.

02 December 2007

Redistricting maps

I've previously mentioned the shortest splitline algorithm for determining congressional districts in the United States. (This could work in other districting situations, as well.) The algorithm breaks states into districts by breaking them up along the shortest possible lines; for a more thorough description see this.

Well, today I got an e-mail from the good people at rangevoting.org saying that they now had computer-generated maps of their algorithm's redistrictings for all fifty states. These, I assume, supercede their approximate sketches

I'm not entirely sure how good an idea this particular redistricting algorithm is. Basically, assuming that straight lines are the right way to break things up seems to imply that all directions should be treated equally, when actual settlement patterns aren't isotropic. But the beauty of any algorithm which doesn't include any "tunable" parameters -- of which this is an example -- is that there is zero possibility of gerrymandering. If we imagine an algorithm that takes into account "travel time" between points, for example, instead of as-the-crow-flies distance, then how do you define travel time? And next thing you know, you'll see new roads getting built because of how you'd expect them to change congressional districting. As-the-crow-flies distance along the surface of the earth doesn't have these issues.

Not surprisingly, the people at rangevoting.org also support something called range voting, which would basically allow people to give scores to candidates in an election, and the candidate with the highest scores would win. I haven't much thought about it, but it seems like a good idea. And here's their page for mathematicians!

24 September 2007

redistricting redux

At Statistical Modeling, etc. I learned about Roland Fryer and Richard Holden's Measuring the Compactness of Political Districting Plans, which is a nice mixture of political science and mathematics. (I've talked about redistricting before, here and here.) A nice thing about this paper is that they show their results in an arbitrary metric space -- working in Euclidean space seems kind of limiting because distances in "real life" aren't Euclidean. Sometimes you can't get there from here. Or sometimes you can but no one ever does.

The authors define a "relative proximity index" for a districting plan of a state. First they define an absolute proximity index (they don't use this name, as far as I can see) as the square of the distances between voters, summed over all pairs of voters who are in the same district. Then the relative proximity index is this index measured on a scale where its minimum over all possible districtings of a state is 1. (A districting is a partition of the people in a state into n sets which differ in size by at most 1.)

The main result is that the RPI is an example of a "compactness index", which should satisfy three axioms: anonymity (if you interchange people, the index doesn't change), something called "clustering", and "independence" (which means that a compactness index doesn't vary if the size, population density, or number of districts in a state changes, holding all else constant). It turns out that if one districting plan is better than another under RPI, that relation will also hold under any other compactness index.

This paper also wins the prize for "biggest number I've seen that actually means something". The number of ways to partition the 6,800 census tracts of California into 53 districts is 78.4 × 1059,351. (I was about to write "about 1060,000", despite the fact that these numbers differ by a factor of more than 10647... the fact that I'm willing to throw out such a large factor tells you just how large a number this is!) This is the size of the search spaces they have to consider.

(I don't pretend to understand the details... I've only skimmed the paper.)

11 August 2007

how should the electoral college be set up?

States Try to Alter How Presidents Are Elected, by Jennifer Steinhauer, today's New York Times.

Apparently factions in California and North Carolina are both considering apportioning their electoral votes by Congressional district instead of by state; this CNN article explains it a bit better. This is the same plan that's currently used in Nebraska and Maine, except that people just accept that as a strange quirk of the system because Nebraska and Maine are small, fairly homogeneous states, while California and North Carolina are not. The people in favor of these plans are pitching them as a matter of giving the states in question more representation. (More specifically, it's the minority party in both states that's pitching the plan that way, and so partisan politics clouds the issue; but I'll ignore that.)

But what is "representation", really? It seems to me like it comes down to this question: what is the probability that California can change the outcome of the election? In the current system, essentially zero; I suspect this election might be close, and a Democratic margin of victory will be slim enough that if California were to go Republican that would hand the presidency to the Republicans -- but California as a whole is not going to vote Republican. But in this new system, if it goes through? Now there's a little more of a chance for California to have some influence, because a few congressional districts might actually be in play. But the potential number of electoral votes that might actually be in play still is small; most districts are safe for one party or the other, thanks to gerrymandering. I will not attempt to compute the probability that I mentioned at the beginning of this paragraph -- any calculation I can do assumes that districts vote independently of each other, which clearly isn't the case.

I live in Pennsylvania. The presidential candidates in 2008 will pay attention to me, since Pennsylvania is seen as a "swing state". But I live in a district (the 2nd) in which the current Congressional representative, Chaka Fattah, routinely receives 85% or more of the vote. The last time this district elected a Republican was sixty years ago. If a plan like this were to go through in Pennsylvania, you can bet presidential candidates won't care about my vote any more. On the other hand, there are some districts in the Philadelphia suburbs which will probably be quite close; the people in those districts become even more influential.

(A superficially similar plan was attempted in Colorado in 2004, which would have alloted the state's electoral vote in proportion to the popular vote. This would clearly have been a loss for Colorado, since it meant that basically, instead of the people's votes deciding whether Colorado's electoral votes would be split 0-9 or 9-0, they were deciding whether they'd be split 4-5 or 5-4.)

It's not immediately obvious that this plan, should it pass in California, disenfranchises the state of California as a whole. Let's say (and I'm making this number up) that ten of California's 53 districts are actually in play. The people in those ten districts suddenly get some attention from presidential candidates. And perhaps a plan like this could actually work if all states had it. I can actually imagine that it might bring some measure of relevancy back to the politically heterogeneous parts of the state. Perhaps it even makes sense, on a state-by-state basis, for "safe" states to pass this sort of plan (since it gives some part of them influence) while "battleground" states don't (since it just shifts the influence around).

But I don't think it makes sense for the various states to be choosing how they'll allot their electors based on some probabilistic calculation. That would create the perception that the elections were unfair in one way or another, and the perception of fairness is as important as actual fairness. People need to believe that the political system has the interests of the people at heart. A complex pastiche in which some states apportion their votes by congressional district, some winner-take-all, some in proportion with the popular vote in that state, etc. -- even if each state chooses its system in such a way as to maximize the probability that its choices can flip the election -- isn't the way to go.

I don't support the idea, though. If it were to happen, I would want to see redistricting to eliminate gerrymandering, but even so you'd still end up with homogeneous districts in some places. And I suspect that a lot less people would live in "swing districts" than currently live in "swing states". The obvious way to make sure that everybody lives in a "swing district" is to have the "district" in question be the entire country; that is, to just count the national popular vote. And indeed, there's an ingenious hack of the Electoral College which would do just this without a Constitutional amendment. The goal of the National Popular Vote Project is to get bills passed in states representing 270 electoral votes that say that those states will give their electoral votes to the winner of the national popular vote -- but only if states representing 270 electoral votes have passed such a law. This seems like the fairest way to do things, to me; I understand that the purpose of the current system is to make sure that the concerns of all states are paid attention to and that a candidate can't just pile up votes in the big states and ignore the small ones, but that's not how it works right now.

A nationwide recount sounds scary, though...

24 June 2007

Bloomberg as a kingmaker?

President? Or Kingmaker? by Patrick Healy, in today's New York Times.

Michael Bloomberg, mayor of New York City, recently officially changed from a member of the Republican party to an independent. This has been interpreted as a harbinger of a presidential run as an independent. However, that's a long shot, even though there's speculation that Bloomberg would be willing to spend a billion dollars of his own money on his campaign.

Healy suggests that Bloomberg ought to run as a "kingmaker". He should attempt to win one or two large states (New York is the obvious choice, since he's mayor of the city that makes up nearly half that state's population) and basically forget about the others. After that, he would need to hope that neither the Republican nor the Democratic candidate has 270 electoral votes. The election then by default goes to the House of Representatives. However, electoral votes aren't cast until December 15, six weeks after the general election. So Bloomberg could make deals with one of the two major-party candidates.

This has been tried before; George Wallace attempted it in 1968, Strom Thurmond in 1948. But neither of those elections was close enough for the strategy to work.

This raises a question, though. Let's say Bloomberg can win New York (31 electoral votes). What are the chances that the other states are evenly split enough?

Let's assume that each state's winner is decided by flipping a coin. (This, of course, does not reflect the reality of American politics -- some states are much more likely to break one way or the other -- but bear with me.) Then each candidate expects to win half of the remaining electoral votes -- that's 253.5. The variance of the number of electoral votes won by, say, the Democratic candidate is the sum of the variances of the number of electoral votes won in each state. In a state with n electoral votes, that's n2/4. Adding the results up for each state, we see that the variance of the number of electoral votes won by the Democrat is 2326.25; the standard deviation is the square root of this, 48.23. I'll assume that the distribution is normal -- if all the states were the same size, this would be the Central Limit Theorem, and hopefully the fact that the states aren't all the same size doesn't kill us. So the probability that the Democrat gets 270 electoral votes in this scheme is the probability that a normally distributed random variable with mean 253.5 and standard deviation 48.23 is at least 269.5; that's 37%. Similarly for the Republican. That leaves Bloomberg a 26% chance -- barely one in four -- that this scheme would work. He might be willing to take those odds.

But, of course, there are some states that are sure to go one way or the other. Say only one-third of states (representing one-third of electoral votes) are sure to go to the Democrats, one-third to the Republicans, and one-third in play. Then the variance gets divided by 3; the standard deviation is now 27.85; Bloomberg's chances of the election being close enough for this strategy to come into play are 44%.

And this whole analysis neglects the finer points of electoral college strategy. States aren't independent of each other -- we wouldn't see an election in which Utah went Democratic while Massachusetts went Republican, or even one where Virginia went Democratic but New Jersey went Republican, to be a little more reasonable. (Both of those states are probably in play, but New Jersey is far enough left of Virginia that they shouldn't break that way.) And in the end it could come down to just a few states -- the 2004 election basically came down to Florida, Ohio, and Pennsylvania -- in which case this whole normal approximation breaks down. But we won't know whcih states those are for a long time yet.

edit, 12:09pm: Can Bloomberg Win? suggests the reverse of Healy's plan -- Bloomberg wins a few states, the Democrat and Republican split the rest of the states, and cuts a deal with electors of the party that gets less electoral votes that makes him President. Rasmussen Reports talks about possible "electoral chaos" which could fundamentally change the way we elect our Presidents.

edit, 7:47pm: As reader Elizabeth has pointed out in a comment, New York is reliably Democratic; this changes things a bit, so the chances that Bloomberg plays the spoiler by allowing neither other candidate to get 270 electoral votes (under the second set of assumptions) are more like 38%.