30 June 2010

How not to visualize the electoral college

I went to the National Constitution Center in Philadelphia today.

As you may know, the Constitution provides that, in elections for the President, each state receives a number of electors equal to its total number of senators and representatives. Each state has two senators, and the number of representatives is proportional to the population. The number of representatives is adjusted after the census, which happens in years divisible by ten.

Why am I telling you this? Because at one point on the wall there was an animated map, which displayed how apportionment had changed between censuses. Each state was represented as a "cylinder", with base the state itself and height proportional to its number of electors. (Or representatives; it honestly would be impossible to tell the difference by eye, as in this scheme that would just push everything up by two units.) There was one such display in the animation for each census, with smooth transitions between them.

Since the eye wants to interpret the "volume" of a state as its number of electors, this has the effect of making geographically-large states look like they have better representation than they do. I noticed this by looking at New Jersey and Pennsylvania, which have areas of 7417 and 44817 square miles, and 15 and 21 electors respectively. The solid corresponding to Pennsylvania has about eight times the volume as that corresponding to New Jersey. New Jersey's an easy one to look at because it happens to be the most densely populated state at the present time, and in this visualization it is not the tallest.

The volume of the solid corresponding to each state is proportional to the product of its number of its electors and its area. The states for which this product is largest are, in order, Texas, California, Alaska, New York, Florida, Illinois, Arizona, Michigan, Pennsylvania, and Colorado. The first two of these, between them, have 41% of the total volume in this visualization.

I'd suggest replacing this with a model where volume is proportional to the number of electoral votes. Or, since that might have its own problems, a cartogram which evolves in time. The West would just grow out of nowhere.

23 June 2010

The ridiculously long match at Wimbledon

As you may have heard, there's a match at Wimbledon , between John Isner and Nicolas Mahut, in which the last set is tied at 59 games. (The previous longest set at Wimbledon was 24-22.)

A set goes until one player has won six games, and has also won two more than the opponent. This means that back since the set was tied at 5 games, games 11 and 12 were split by the two players; so were 13 and 14; and so on up to 117 and 118.

Terence Tao points out that this is very unlikely using a reasonable naive model of tennis, which assumes that the player serving has a fixed probability of winning the game. (Service alternates between games.) His guess is that some other factor is at play; for example, "both players may perform markedly better when they are behind".

This seems statistically checkable, at least if records of that sort of thing are kept. I'm not sure if they are; it seems like tennis scores are often reported as just the number of games won by each player in each set, not their order. Another hypothesis, of course, is that the match has taken on a life of its own and, subconsciously, the players are playing to keep the pattern going.

Edit (Thurs. 7:49 am): More on Isner-Mahut: Tim Gowers' comments, and some odds being offered by William Hill, the betting shop.

Reconstructing World Cup results

The final standings in Group C of the 2010 World Cup were as follows: USA 5, England 5, Slovenia 4, Algeria 1.

Question: given this information, can we reconstruct the results of the individual games? Each team plays each other team once; they get three points for a win, one for a draw, zero for a loss.

First we can tell that USA and England must have had a win and two draws, each; Slovenia, a win, a draw, and a loss; Algeria, a draw and two losses. (In fact you can always reconstruct the number of wins, draws, and losses from the number of points, except in the case of three points, which can be a win and two losses, or three draws.)

Since neither USA nor England have a loss, they must have drawn. Similarly, Slovenia's win must have been against Algeria.

But now there are two possibilities; we have to break the symmetry between USA and England. Let's say, arbitrarily, that USA drew against Slovenia and defeated Algeria, instead of the other way around. (This is, in fact, what happened.) Then Algeria's draw must have been against England, and England's win against Slovenia.

In an alternate universe where USA and England switch roles (does this mean that England was a USA colony in this universe?) USA defeated Slovenia and drew against Algeria, and England draws against Slovenia and defeats Algeria.

Of course, the next question is: given the goal differentials (+1 for USA and England, 0 for Slovenia, -2 for Algeria), can we figure out the margins in the various games? (Assume we know which of the two universes above we're in; for the sake of avoiding cognitive dissonance, say we're in the first one.) Since Algeria was only defeated by a total of two goals, the margin in each of their losses must have been 1. And the margin in the Slovenian win (to Algeria) and loss (to England) must have been the same, namely 1.

If you in addition are given the total number of goals scored (USA 4, Slovenia 3, England 2, Algeria 0) you can reconstruct the scores of each match. I leave this as an exercise for the reader. Hint: start with Algeria.

Another question: is it the "usual case" that individual match results can be recovered from the final standings, or is this unusual? The table of standings in a group in the World Cup has something like thirteen degrees of freedom. Given the number of wins and draws, goals scored, and goals against for three of the teams, we can find the number of losses and goal differential for each team, the number of wins, draws and losses for the fourth team, and the goal differential of the fourth team. We need one more piece of information - say, the number of goals scored by that fourth team - to reconstruct the whole table. We're trying to derive twelve numbers from this (the number of goals scored by each team in each match). It will be close.

In an n-team round robin, the number of degrees of freedom in the table of standings grows linearly with n, but the number of games grows quadratically with n. For large n it would be impossible to do this reconstruction; for n=1 it would be trivial.

21 June 2010

Bad word problems

An example of a bad word problem, from Frank Quinn's article The Nature of Contemporary Core Mathematics, who is at Virginia Tech:

Bubba has a still that produces 700 gallons of alcohol per
week. If the tax on alcohol is $1.50 per gallon, how much tax will Bubba pay in amonth? [Set up and analyze a model, then discuss applicability of the model.]

I have given an example with obvious cultural bias because I am not sure I could successfully avoid it. At any rate students in my area in rural Virginia would think this problem is hilarious. We have a long tradition of illegal distilleries and they would know that Bubba has no intention of ever paying any tax.

New MathOverflow-type sites?

As some of you may have noticed, I spend lots of time at MathOverflow these days. This explains my lack of posting.

Actually, my lack of posting is also in part because of taking a break after finishing my PhD. But I am now trying to prepare for the Next Step. The Next Step is an academic job, in fact, so if you've been holding your breath and wondering if I got one, you can breathe again. Details will follow, but the job is technically not official yet, so I don't want to say where it is here. My productive efforts are going towards preparing for courses I'll be teaching, factoring my dissertation (warning: large PDF) into papers, and moving my worldly possessions.

I'm posting in order to mention two potential sites hosted on the StackExchange platform that might be of interest to my readers (and also to MO users); these are one for statistical analysis and one for mathematics. The mathematics site will differ from MathOverflow in being somewhat lower-level, which I think is valuable; one of the things that's plagued MathOverflow from the beginning is that there are frequently questions which are clearly below the level of the site but we have nowhere good to send these questions! Similarly, MathOverflow gets a lot of statistics questions that the MO readership isn't equipped to handle. (People try on the questions that are really about probability, but some are about the more "practical" side of statistics and we don't have too many experts in that.)

If you click on those links, you can "commit" to participating in one or both of these sites. The idea is that once enough commitments are made, the site will be launched. This is StackExchange's new model; their old model was that people paid to have a site hosted with the software. This is the model on which MathOverflow works -- using money from Ravi Vakil's research funds -- but they don't do this any more, because there were ghost sites. See a fuller explanation.

14 June 2010

Euler's identity geometrized

How to explain Euler's identity using triangles and spirals, by Brian Slesinsky. (Euler's identity refers, here, to e = -1.) Uses the geometric interpretation of complex multiplication to explain this fact.