This morning, Nate Silver of fivethirtyeight.com posted State Similarity Scores. For each pair of states, Silver reports a score that gives the political "distance" between the two states. (He actually reports only the three states closest to each state.)
These are based on an analysis of certain variables that appear to be important in US politics, weighted by their importance in determining state-by-state polling in the 2004 and 2008 presidential elections. As it turns out, the pair of states that are closest to each other in this metric are the Carolinas, followed by the Dakotas; Kentucky-Tennessee; Michigan-Ohio and Oregon-Washington.
It occurred to me that the minimum-weight spanning tree for this data might look interesting. And indeed it does. I'm having some trouble articulating why it's interesting, but I just wanted to post the tree. There may be a slight issue because I don't have the full set of similarity scores, but the tree generated from the subset of the data that I do have is probably pretty close to the "true" tree and is quite interesting to look at. (The weight for the edge between any two states is 100 minus Silver's similarity score for that pair of states; Silver's similarity scores have a theoretical maximum of 100.)
Note that the positioning of the states in the drawing of the tree below is entirely irrelevant; I just attempted to draw the tree in such a way that people wouldn't be inclined to see edges that weren't actually there. In particular, Ohio is not somehow "unusual" even though the edges connecting it to adjacent states are long. (As a start, though, it does seem to be useful to think of Ohio as the center of the graph, in line with the conventional political wisdom that Ohio is at the political center of the US.) I thought about trying to make the distances in the drawing reflect the weights, but that was more trouble than I wanted to go to.
Also, some states that are close to each other in Silver's metric aren't close in the tree. There may be errors, since I did this by hand.
Here's the tree.
07 July 2008
Subscribe to: Post Comments (Atom)
It might be interesting to see the same tree as a single-linkage hierarchical clustering (form two clusters by deleting the heaviest MST edge, then continue forming subclusters in the same way within each cluster).
The DOT language is pretty neat for drawing pretty trees. I haven't checked whether Asymptote has a tree package.
But automating it sure beats doing that shit by hand.
Is the metric actually a metric? Specifically, does the triangle inequality hold?
I'm wary of analyses that are organized around administrative units (such as states). Just because data happens to be collected at the level of administrative units doesn't mean that those units reflect underlying reality. It's a bit like searching for your lost car keys by the streetlamp because the light is better there.
The article Obama's Is an Appalachia Problem, Not a Whites Problem by Jonathan Tilove provides a good example of how paying too much attention to state boundaries (or other semi-arbitrary lines) can mislead.
that's a good point. I think that states are more than just administrative units, though, because of the electoral college. It makes sense to try to predict things on the state level because that's the level that matters in determining the winner.
Of course, it probably makes sense to look at data on a level finer than the state level, even if one only wants to make state-level predictions. A state where most people's political views are in the center and a state where people are evenly split between the far left and the far right might look the same in a state-level analysis but actually behave quite differently.
Интересно написано....но многое остается непонятнымb
i truly enjoy all your posting style, very interesting,
don't give up and keep writing considering it simply truly worth to follow it,
excited to look at much more of your current article content, kind regards ;)
Post a Comment