30 September 2007

A random weapon in the war on terror

A random weapon in the war on terror, from Newsweek.

Airport security personnel at Los Angeles International Airport are now randomly being deployed; the thinking behind it is that if there's no pattern in where the security people are, it's harder for potential attackers to find a hole in the system that can be exploited.

This is the idea of a "mixed strategy" in game theory. There are various things that each of the two parties can do, and the payoffs to each player in each situation are known. It often turns out that the optimal strategy for one player might be to do one thing sixty percent of the time, say, and another thing forty percent of the time -- but that doesn't mean they should do some sort of strict alternation. Rather, they should choose which thing to do at each time at random.

The canonical example of this, I think, is pitch choice in baseball, where the pitcher chooses his pitches at random and the batter chooses what he will expect (and therefore swing at) also at random. But I'm not sure to what extent I believe it actually applies there because the strategy is different on each count. (And it would be incredibly difficult to study. You can maybe get information on what kind of pitch the pitcher pitched. But how could you know what the batter expected?)

The theory of mixed strategies, though, assumes that both parties have perfect information. So it's not exactly the same thing; the terrorists and the airport security don't have such information. Even if they think they know where there's a hole, how do they know they just haven't been watching long enough? But the idea seems reasonable; security people develop patterns, and if you can avoid creating those patterns you can avoid having them exploited.

P.S. I had to mention baseball because the Phillies won today, and are therefore in the playoffs. And the Mets lost; they have to go home now. I was at the Phillies game. See a picture of a sad Mets fan from the New York Times. At one point the odds of the Mets not making the playoffs were 500 to 1, according to Baseball Prospectus. But that's actually not the worst collapse ever; the '95 Angels beat 8800-to-1 odds. (The '64 Phillies are only tenth, according to that list.)

29 September 2007

In which baseball people continue to think weird things

Apparently conventional wisdom in baseball is that a leadoff walk is more likely to lead to a multi-run inning than a leadoff home run.

(This was said by the Smart People on Fox with one out in the bottom of the seventh, Greg Dobbs at bat for the Phillies. Aaron Rowand led off the inning with a home run, then Jayson Werth flied out.)

But they asked the people at Stats, Inc. about this, and it turns out the conventional wisdom is wrong -- a lead-off home run is more likely to lead to a multi-run inning.

Of course! It's a lot easier to get two runs in an inning if you already have one. I know conditional probabilities are counterintuitive, but they're not that counterintuitive.

(As of right now, in this inning there are runners on first and second, with one out, and one run scored. The Phillies are down 4 to 1.)

Baseball commentators say other mathematically unsound things: see here. There are plenty of others but I don't write them all down.

Potlucks and fractals

Last night, I was at a party thrown by a friend of mine.

This friend of mine lives at a house that has a potluck dinner most Wednesday evenings; I live three blocks away, so I go fairly often. I wasn't there last Wednesday, though, because it had been a long day and I was tired.

(Often there inadvertently seems to be a "theme" to the potluck despite nobody actually trying to do this, because pretty much any interesting-seeming coincidence counts. Last week it was the "night of the seven grains", as the seven people there brought dishes with seven different grains -- normal rice, arborio rice, buckwheat, wheat, two that I don't remember, and corn. Yeah, yeah, I know, you're thinking that arborio rice is rice. The point here is that if one interprets things widely enough there's always some sort of coincidence -- lots of today's foods are rectangular, or are yellow, or whatever. I think we once even said that the theme was "things with spices in them".)

While I wasn't there on Wednesday, people apparently came to discussing the existence of "some sort of crazy triangle fractal thing", and they had lamented that I wasn't there. I was eventually able to figure out that they were talking about the Sierpinski gasket, which is obtained by taking an equilateral triangle, removing a smaller triangle (half the size) from the middle to obtain three triangles with half the side length of the original triangle, and repeating ad infinitum. Click here for an illustration.

While this is nice, my favorite way to think of the Sierpinski gasket is via the so-called chaos game (more formally, an "iterated function system"), which is animated here. Start by picking three points (call them the red, yellow, and blue points) in the plane, and a random point x0 in the triangle which they bound. Plot that random point. Now, pick one of the three corner points (red, yellow, or blue) at random, each with probability 1/3; then pick the point halfway between x0 and that corner point, and call it x1. Pick a corner point again at random; the point halfway between x1 and that corner point is x2. Do this a thousand times or so and you get a nice picture. An applet showing more examples of such iterated function systems is available at cut-the-knot.org.

Why does this work? Basically, if you take the whole Sierpinski gasket, and contract it by a factor of two towards one of its edges, you get one of its three self-similar parts; call these the "red", "yellow", and "blue" parts, as in the picture at left. Now, x0 is some distance d from the gasket; the distance from x1 to the gasket is d/2, the distance from x2 to the gasket is d/4, and so on -- thus as the sequence is generated we get points closer and closer to the gasket. The gasket has measure zero, so with probability one we never actually end up with a point in it -- but we end up as close as we like after a very short time.

Although I'm not an expert in this, from some crude experimentation it looks like if you replace "each with probability 1/3" above with some other distribution, you end up with a gasket that is much more concentrated near the corners that you go to with high probabilities.

Edited, 5:06 pm: other surprising places where fractals appear include in the study of fast algorithms for multiplication (which also includes an explanation of the multiplication bug in Excel 2007), found via Good Math, Bad Math.

28 September 2007

Nested radicals, smoothness, and simplification

I saw an expression involving a nested radical, namely

\phi = \sqrt{ 1 + \sqrt{ 1 + \sqrt{ 1 + \sqrt{ 1 + \cdots } } } }
.

(Write φ = (1 + φ)1/2 and solve for φ.) The Wikipedia article on nested radicals led me to Simplifying Square Roots of Square Roots by Denesting. The authors tell us that:

The term surd is used by TeX as the name for the symbol √ Maple has a function called surd that is similar to the nth root defined here; like all good mathematical terms, the precise definition depends upon the context. In general, a mathematical term that does not have several conflicting definitions is not important enough to be worth learning.

This reminds me of a couple things that happened in my class yesterday. First, I was defining what it means for a curve to be smooth; our definition was that the curve given by the vector function r(t) is smooth if r'(t) is continuous and never zero, except perhaps at the endpoints of the interval over which it's defined. (This makes smoothness a property of a parametrization, which is a bit counterintuitive. I suppose that one could define a curve -- as an abstract set of points -- to be smooth if it has a smooth parametrization. Although I haven't worked it out, I assume that if a curve has a smooth parametrization, the arc-length parametrization is smooth.) One of the students said "but the professor said 'smooth' means something else!" I'm not sure if the professor actually said "smooth means X" or if he said "some people think smooth means X", but it's a good point. (In particular, "smooth" often seems to mean that a function has infinitely many continuous derivatives.)

Second, the article is about using computer algebra systems to simplify expressions like
\sqrt{5 + 2 \sqrt{6}} = \sqrt{2} + \sqrt{3}

where the left-hand side is "simpler"; sometimes my students worry that they are not presenting their answer in the simplest form. While I'll accept any reasonably simple answer (unless the problem statement specifies a particular form), it is remarkably difficult to define what "simple" means.

One rule I have figured out, though, is that 4x - 4z - 8 = 0 should be simplified to x - z - 2 = 0 by dividing out the common factor. In general, given a polynomial with rational coefficients, one probably wants to multiply to clear out the denominators and then divide by any common integer factor of the new coefficients, so the resulting coefficients are relatively prime integers. The article addresses this sort of "canonicalization" in the context of nested radicals. I keep telling my students that they should keep that sort of thing in mind, especially since our tests will be mostly multiple-choice.

(Sometimes I'm tempted to define "simplest" as "requires the fewest symbols"... but how does one prove that some 100-character expression one has written can't be written in 99 characters? And how do you count something like "f(x, y)= (x+y)1/2 - (x-y)1/2, where x = foo and y = bar?" ("foo" and "bar" are supposed to be very complicated expressions.) Do you plug foo and bar into the original equation and then count the characters, or do you count the actual characters that are between the quotation marks?)

They're building a science thing!

From The Onion: Scientists Ask Congress To Fund $50 Billion Science Thing.

Another diagram presented to lawmakers contained several important squiggly lines, numbers, and letters. Despite not being numbers, the letters were reportedly meant to represent mathematics too. The scientists seemed to believe that correct math was what would help make the science thing go.


That's right, folks. Mathematics is all about numbers, at least in the eyes of Normal People. But I can't count how many times I've looked at a blackboard during a lecture and realized there wasn't a single number there.

27 September 2007

What is the simplest problem you cannot solve?

I'm currently reading The Cauchy-Schwarz Master Class (link goes to MAA review, which pretty well explains the book's raison d'etre and contents) by J. Michael Steele. (How did I find this book? I'm taking Steele's course "Probability Inequalities and Machine Learning"; he alludes to this book every so often.)
It is full of interesting inequalities, and I may have something to say about them later, but I just wanted to share the following, where Steele is talking about George Polya and his book How to Solve It:

Some of the richest of Polya's suggestions may be repackaged as the modestly paradoxical question: "What is the simplest problem you cannot solve?".... Perhaps no other discipline can contribute more to one's effectiveness as a solver of mathematical problems.

I think this is worth remembering.

Fun with fugues

At Science after Sunclipse, Blake Stacey links to some videos from Stephen Malinowski's Music Animation Machine. The Music Animation Machine animates music; the basic idea is that notes are represented by rectangles, the length of which corresponds to the length of the note. What I found particularly interesting was that the videos use color to represent either:
  • the "voices" in the score, as in the first video below (Bach), and
  • harmonic information, as in the second video below (Chopin)
Here are the videos I'm referring to: (Many others are available from Malinowski's site.)
Johann Sebastian Bach, Toccata and Fugue in D Minor:


Frederic Chopin, Etude, opus 10 #7:


The harmonic coloration is based on the circle of fifths, which is an interesting solution to the problem that notes which are close together in pitch are not close together in some sort of "harmonic space".

I find myself wondering if the coloring based on voice could be automated. This would be trivial for things written entirely in, say, standard four-part voice leading as taught in an introductory music theory class, because there are always exactly four notes at any given time and the voices don't cross; it would be very nontrivial for actual music. (This isn't just a musical problem, believe it or not. A related problem is as follows: a baseball team has five starting pitchers, which it uses in a pitching rotation: ideally pitcher n pitches on days n, n+5, n+10, ... But there are off days, people get hurt, and so on. How do you decide when one pitcher has "replaced" another in the rotation? The people at Baseball Prospectus have thought about this -- sorry I can't find the link -- but their solutions basically involve just staring at a list of who pitched what day and writing numbers next to them kind of arbitrarily. It's not quite the same thing, though, because there's only one starting pitcher per game (relief pitchers are used in a much more ad hoc manner) but notes are played simultaneously. I suspect there are other problems of this sort where there are logical ways to sort some sequence of things into bins -- musical voices, rotation slots, and so on -- but none come to mind.)

The sort of notation they're using here seems like a logical historical antecedent of present-day musical notation (though I don't know enough about the history to know if it is); the main differences are that in modern notation we decide that seven of the notes in each octave are more important than the other five (and enshrine this in the notation) and we don't write a whole note as being, say, sixteen times as long as a sixteenth note. This is probably a good thing from the point of view of readability. It also resembles a player piano roll (modulo coloring) which I doubt is a coincidence.

Oh, and you have to watch the "Oops, I Did It Again" fugue, which I found from Good Math, Bad Math:

Probabilities in the wild-card race

A quick question: what's the probability that the four teams from a given league of Major League Baseball which make it to the playoffs are actually the best four teams in that league?

For those who aren't familiar with MLB's playoff structure -- there are two leagues, the National and the American. The National League has sixteen teams, divided into divisions of five, six, and five; the American League had fourteen teams, divided into divisions of five, five, and four. (If you're wondering why they don't have fifteen teams in each league, it's for scheduling reasons. Baseball teams play nearly every day, and don't play teams in the other league except during certain prescribed times of the season; if there were an odd number of teams in a league then one team would always be idle.) For the purposes of not wanting to deal with a zillion special cases, what follows will actually be a computation for a fifteen-team league, divided into three five-team divisions.

In each league the team with the best win-loss record in each division makes the playoffs; furthermore the best team among teams that didn't win its division also makes the playoffs, as the "wild card", for a total of four teams in each league.

I'll make the simplifying assumption that a team's record accurately reflects its skill. This is quite an assumption. First, a team that is exactly average should win half its games, or 81 games out of a 162-game season; but the games are independent and so the standard deviation of the number of games they win is ((162)(0.5)(0.5))1/2, or about 6.4. Baseball teams tend to be tightly bunched enough that that's a pretty big measurement error. Second, teams face different schedules, mostly because they play about 18 games against each team in their division and about 6 games against teams not in their division but in the same league. Third, we'll assume that good teams have no propensity to be in a particular division.

Thus, we can model the problem as follows: partition the set {1, 2, ..., 15} into three blocks of five, uniformly at random. (1 represents the best team, 15 the worst.) What is the probability that two of the numbers {1, 2, 3, 4} are in one block and each of the remaining two is in a different block?

For example, consider the 2001 American League. (Why? There were no ties in the 2001 AL. I don't want to have to figure out to resolve ties.) The best team were the Seattle Mariners, with 116 wins, in the West Division; we put a 1 in a box corresponding to that division. The second best team was the Oakland A's, also in the West Division; we put a 2 in the same box as the 1. The third-best team was the New York Yankees, in the East; we put a 3 in a different box. The fourth-best team was the Cleveland Indians, in the Central; we put a 4 in the box which is still empty. And so on; in the end we get the partition

East = {3, 7, 8, 13, 14}
Central = {4, 5, 6, 11, 12}
West = {1, 2, 9, 10}

and this is the sort of event we're looking for. (Note that "the best four teams make the playoffs" is different from "the wild card comes in fourth"; in the 2001 AL the wild card was the A's, who came in second.)

Anyway, the probability we seek is not hard to compute. First, place the first team in some division; they have a 1/3 chance of being in any division. Call the division they land in A.

Then the second team also must be in some division. There are 14 "slots" left, four of them in division A. So with probability 4/14 the best two teams are in the same division, A; with probability 10/14 one is in A and the other is in B. (The 2001 AL is of the first type.)

Now, we place the third team.

Say teams 1, 2 were both in A; then with probability 3/13 team 3 is also in A (and we lose) and with probability 10/13 team 3 is in some other division, called B. So with probability (4/14)(10/13), teams 1 and 2 are in A and team 3 is in B.

Say team 1 is in A and team 2 is in B, which occurs with probability 10/14; then with probability 8/13 team 3 is in a division that already has a team in it (A or B) and with probability 5/13 it is not.

So after placing three teams, the probability is (4/14)(10/13) + (10/14)(8/13) = 120/182, or nearly two-thirds, that two are in the same division and one is in a different one. The probability is (10/14)(5/13) = 50/182 that all three are in different divisions.

Finally, if we're in the first of those cases, then we need to place the fourth team in the division which contains none of the first three; there are five slots for that team out of twelve, so that contributes (120/182)(5/12) = 600/2184 to the probability. If we're in the second case, we can place the fourth team freely, contributing (50/182)(12/12) = 600/2184. So the probability that the four best teams are the three division winners and the wild card is 1200/2184, or about 55 percent.

The probability that the three division winners are the best teams and the wild card is fourth-best is exactly half the probability that the three division winners and the wild card are collectively the four best teams, which is not necessarily something you would have expected.

In a league with three n-team divisions, as n goes to infinity, the corresponding probability is slightly lower, I believe 4/9.

Since the introduction of the wild card, there have been twelve full seasons played (1995-2006) in each league, for a total of twenty-four league-years. In fifteen of those leagues, the four teams with the best records made the playoffs. In two (1996 NL and 1998 AL), there was a tie for fourth-best in the league, and the three best teams and one of the two fourth-best teams made the playoffs. In the other seven, something else happened. So two-thirds of the time in reality, the four teams with the best records have made the playoffs; that differs from my prediction of 55 percent, but not by much.

What happens "in general"? I don't know. This analysis was all about counting up different cases, which is something I wouldn't want to do in general (I only did it because this particular case was one I was interested in). Finding probabilities like this in general will take a bit more cleverness.

26 September 2007

Parallel universes?

Recently it's been reported at Slashdot that a mathematical answer to the question of parallel universes exists.

(They used the word "mathematical", which should have been my first clue that I shouldn't be reading it. "Mathematical" means "I'm trying, and failing, to use math".)

Anyway, a bit of combing through the Slashdot comments turned up this article and this one from New Scientist. It seems reasonably clear that somewhere there is some work, and what this work purportedly shows is that the many-worlds interpretation of quantum mechanics is consistent. One of the articles I found said that the work was done by David Deutsch, David Wallace, and Simon Saunders, and was presented at a conference at the Perimeter Institute for Theoretical Physics; I think that the articles are referring to this talk by David Wallace, but they might be referring to a talk by Saunders at the same conference.

Basically, what the many-worlds theory says is that parallel universes are constantly branching off of ours whenever any sort of observation occurs, therefore causing the collapse of a wavefunction.

Of course, events like those are happening all the time. Where are these parallel universes? And I'm not even going to try to think about how many of them there are, because the number is obscenely ridiculous. How many wavefunctions are collapsing around you right now? And the whole structure is branching. Let's say that every second, somewhere in the universe, exactly one wavefunction collapses, and there are two possible pure states that it could collapse to. The universe is about 5 × 1017 seconds old, so the number of universes you'd need to make this work is two to that power. Occam is rolling over in his grave. Except maybe in the universe these people are in, Occam never existed.

Besides, we can't see these other universes. Isn't that convenient? Perhaps the two major interpretations of quantum mechanics -- the probabilistic interpretation and the many-worlds interpretation -- are just two different formalisms for understanding the same thing. It's easy to picture this giant combinatorial tree of universes; it's harder to picture superpositions.

You're offended by the idea that "God plays dice with the universe". So you call zillions of other universes into existence because you can't handle it? Einstein would be ashamed.

(Edited, 12:37 pm: today's Questionable Content refers to the many-worlds interpretation.

Fun with images! (Mostly about math tattoos.)

Stokes' theorem graffiti. In a bathroom. Only in Camberville. (I would have said "only in Cambridge", but this particular graffito is actually in Somerville. Diesel Cafe, to be precise; there was a summer where I spent a lot of time there doing math, so I'm not surprised.)

and a tattoo of the Y combinator, which is part of this collection of images of science tattoos. Other mathematically inspired tattoos in that collection are:


I can think of others. An ex-girlfriend had an infinity symbol tattooed on her (although I believe that was more of a literary thing than a scientific one). A friend of mine has

¬ (p ∧ ¬ p))

tattooed on his leg. A friend of a friend in college had the Taylor series for sine on her arm. It's written out, as

x - {x^3 \over 3!} + {x^5 \over 5!} - {x^7 \over 7!} + \cdots

except with exactly enough terms to go once around her arm; it's not in the more compact form

\sum_{n=0}^\infty {(-1)^n x^{2n+1} \over (2n+1)!}.

I don't have any tattoos; I can't think of anything I'd want permanently inked on me. But if I were to get a tattoo, the equation

\prod_{i=0}^\infty \left( 1 + z^{2^i} \right) = {1 \over 1-z}

would be up there. (I had it written on my wall in college.) I've heard it called the "computer scientist's identity"; it's an analytic version of the fact that every positive integer has a unique binary expansion. The left-hand side expands to

(1+z1)(1+z2)(1+z4)(1+z8)...

and you can imagine expanding that out; then you get something like

1 + z1 + z2 + z2+1 + z4 + z4+1 + z4+2 + z4+2+1 + ...

where each sum of distinct powers of two appears as an exponent exactly once. The right-hand side is the geometric series

1/(1-z) = 1 + z + z2 + z3 + ...

and in order for these two expressions to be equal, the coefficient of zn in each of them has to be equal. All the coefficients on the right-hand side are 1; thus the coefficients on the left-hand side must be too. That means each nonnegative integer appears exactly once as an exponent, i. e. as a sum of distinct powers of two. The fact that one can disguise this combinatorial fact as a fact about functions from C to C -- and then apply the methods of complex analysis, although they're not incredibly useful in this particular case -- is a fascinating example of the interplay between the discrete and the continuous.

25 September 2007

A problem of Feynman on parallel computation

Here's a problem from Feynman Lectures on Computation (which, incidentally, I didn't know existed until I was absent-mindedly wandering in the library a few days ago):

Problem 1.1(c). Most present-day [mid-1980s] computers only have one central processor -- to use our analogy, one clerk. This single file clerk sits there all day long working away like a fiend, taking cards in and out of the store like mad. Ultimately, the speed of the whole machine is determined by the speed at which the clerk -- that is, the central processor -- can do these operations. Let's see how we can maybe improve the machine's performance. Suppose we want to compare two n-bit numbers, where n is a large number like 1024; we want to see if they're the same... [suppose] we can hire n file clerks, or 2n or perhaps 3n; it's up to use to decide how many, but the number must be proportional to n. [How can you get the comparison time to be proportional to log n?]


The proof is by induction. The base case, when n = 1, is clear -- we can check that two 1-bit numbers are the same in one step. We want to show that if n clerks can do this in time t(n), then 2n clerks can do it in time t(n) + c, where c is a constant. But that's easy -- split each of the two n-bit numbers a and b into half: a = a1a2, b = b1b2. If a1 = b1 AND a2 = b2, return TRUE; otherwise return FALSE. That's just one more time step once you've figured out if a1 = b1 AND a2 = b2.

The picture I get in my head of this situation makes me smile. You need n clerks -- n parallel processors. The kth clerk compares the kth bit of the first number with kth bit of the second number; ey returns a 0 if the two bits are different and a 1 if they're the same. (This is, of course, bitwise exclusive OR exclusive NOR (thanks Anonymous!).) Then everything pours into a single node which is at the root of a binary tree of depth log2 n.

The next subproblem in Feynman is to add two n-bit numbers in logarithmic time by using such parallel processing. (This is harder because you have to worry about carrying!) Again, it reduces to a problem of combining information on words of one length to words of twice that length -- if you have a = a1a2 (where a1 is the "high", or more significant half, and a2 is the low half), and b = b1b2, what is a+b? Breaking it down into its high half and its low half,

a + b = (a1+b1+c)(a2 + b2)

where c denotes a carry from the addition of a2 and b2, and a2 + b2 denotes addition modulo 2k where k is the length of the summands. We assume a1+b1, c, and a2 + b2 are already known. The problem is... what if a1+b1 ends in a bunch of 1's, and c is 1? We have something like

...0111111 + 1

and we want to add the 1, so we think "okay, I'll just change the last bit in the first summand from 0 to 1"... but it's 1... so you change the last two bits from 01 to 10... but no, that won't work either... so you change the last three bits from 011 to 100...

On average this isn't a problem. On average the number of 1s at the end of a1+b1 is unity.

But not always. Somebody could be deliberately feeding you numbers that are hard to add. Some sort of sadistic grade school teacher.

edited, 10:48 am: Welcome to those of you who came here from reddit. Be aware that I am not a computer scientist -- I'm a mathematician -- so I might not really know what I'm talking about here.

24 September 2007

redistricting redux

At Statistical Modeling, etc. I learned about Roland Fryer and Richard Holden's Measuring the Compactness of Political Districting Plans, which is a nice mixture of political science and mathematics. (I've talked about redistricting before, here and here.) A nice thing about this paper is that they show their results in an arbitrary metric space -- working in Euclidean space seems kind of limiting because distances in "real life" aren't Euclidean. Sometimes you can't get there from here. Or sometimes you can but no one ever does.

The authors define a "relative proximity index" for a districting plan of a state. First they define an absolute proximity index (they don't use this name, as far as I can see) as the square of the distances between voters, summed over all pairs of voters who are in the same district. Then the relative proximity index is this index measured on a scale where its minimum over all possible districtings of a state is 1. (A districting is a partition of the people in a state into n sets which differ in size by at most 1.)

The main result is that the RPI is an example of a "compactness index", which should satisfy three axioms: anonymity (if you interchange people, the index doesn't change), something called "clustering", and "independence" (which means that a compactness index doesn't vary if the size, population density, or number of districts in a state changes, holding all else constant). It turns out that if one districting plan is better than another under RPI, that relation will also hold under any other compactness index.

This paper also wins the prize for "biggest number I've seen that actually means something". The number of ways to partition the 6,800 census tracts of California into 53 districts is 78.4 × 1059,351. (I was about to write "about 1060,000", despite the fact that these numbers differ by a factor of more than 10647... the fact that I'm willing to throw out such a large factor tells you just how large a number this is!) This is the size of the search spaces they have to consider.

(I don't pretend to understand the details... I've only skimmed the paper.)

23 September 2007

A notion of combinatorial continuity?

My topology professor, a few days ago, pointed out the following fact:

Maps(X, Maps(Y, Z)) = Maps(X × Y, Z)

where "Maps(A, B)" represents the space of continuous functions from A to B with the compact-open topology. He mumbled some words about "adjoint functors" and such to justify this. (I don't remember what the words were; it wasn't particularly important, because I believed it.)

But the "obvious" justification, to me, is just that if Maps represents all functions, then this is obviously true. Consider some black box where you can put in a member x of the set X, and it spits out another black box, which will take a member y of the set Y and spit out a member of the set Z. Why not just stick both x and y in in the first place?

The other notation I've seen used for Maps(X, Y) is YX, which makes sense in a combinatorial context -- YX is the set of sets of elements of Y which are indexed by elements of X. It makes perfect sense when X is finite -- we have

Maps({1,2}, Y) = Y × Y

because in order to specify a function which takes 1 to some element of Y, and 2 to some other element of Y, we just have to name the two elements of Y. And in this notation the original result becomes

(ZY)X = Z(Y × X)

which looks right, because it would be true if those symbols represented integers instead. Or even sets, where we don't care about the continuity. (I'm kind of bothered by the fact that the notation implies continuity, but I can deal with it.)

But that got me thinking -- what would a "combinatorial" version of continuity look like? For concreteness, what is a continuous function from Sn to Sn, where Sn is the symmetric group on n letters? The problem is that there's no natural topology on Sn -- sure, you could use the discrete topology, but then everything's continuous. There's the usual informal characterization of a continuous function that takes things which are close together to things which are close together... but what does "close together" mean here? We could define the distance between two permutations, d(σ, τ) as, say, the number of transpositions needed to get from one to another, and then say f: Sn &rtarrow; Sn is continuous if

d(σ, τ) = 1 implies d(f(σ), f(τ)) ≤ L

for some integer constant L. But this is really more of an analogue of Lipschitz continuity. And what metric is natural?

It seems natural to me to take sets of points in [0,1]n -- the unit n-cube -- and map them to elements of Sn in the "obvious" way, that is, the way which preserves the order relations among the elements of the sequences. So, for example, (0.43, 0.17, 0.94, 0.71, 0.01) is mapepd to (3, 2, 5, 4, 1). Then elements in Sn which differ by a transposition are near each other in the unit n-cube, and we can view a map from Sn to Sn as a map from the unit cube to the unit cube -- although not all maps from the unit cube to the unit cube would project down to maps from Sn to Sn, since what if two points that represent the same permutation map to two points which represent different permutations? But this definition might be too strict for some purposes -- I think it would imply that a continuous function meets the above definition with L = 1.

But I don't have a particular problem in mind; perhaps functions from Sn to Sn, where f(σ) and f(τ) are identical or differ by a transposition whenever σ and τ differ by a transposition, are interesting entities. Perhaps not.

22 September 2007

The fundamental theorem of enumeration, and the Princeton Companion to Mathematics

From Doron Zeilberger's chapter on "Enumerative and Algebraic Combinatorics, to be included in the currently-in-preparation Princeton Companion to Mathematics


"The fundamental theorem of enumeration, independently discovered by several anonymous cave dwellers, states that
|A| = Σa∈A 1.
In words: the number of elements of A is the sum over all elements of A of the constant function 1."


Sounds kind of silly, but it's true. The whole chapter is a nice fourteen-page answer to "what is enumerative combinatorics?", mentioning most of the classic problems and most common methods of solution, which appears to be its raison d'être; I know most of this stuff but I can imagine how useful similar blurbs on subjects I'm not so familiar with would be, and indeed most of the book is intended to be at about the first-year undergraduate level; that's low enough that I should be able to read it without stopping for breath. (The guidelines for contributors say that the articles about various subjects should be something like the beginning of a very good colloquium talk, the sort where you really get the feeling that you know something about how some other area of mathematics works.) The PCM has a semi-official blog, which is Tim Gowers' blog. Several dozen of the component articles are available online, on a password-protected site; the password is in the linked-to post by Gowers. I suspect I'll have more to say about the PCM in the future.

Some thoughts on coordinate systems

I'm teaching multivariate calculus this term. The course, as it's taught here, begins with a unit on geometry in three dimensions; here we introduce cylindrical and spherical coordinates. (I'm not entirely convinced that they should be introduced at the point they are, because basically all we can ask the students to do is to convert between the various coordinate systems, but my hands are tied. It almost seems to make more sense to wait to teach the coordinates until we get to the point where we're doing integrals over regions that are best expressed in cylindrical or spherical coordinates.)

Anyway, a student e-mailed me a question today. One problem on the homework was to find equations in cylindrical and spherical coordinates for x2 + y2 + z2 + 2z = 0. In cylindrical coordinates, (where x2 + y2 = r2 this becomes r2 + z2 + 2z = 0. (My aesthetic sense is that this is perhaps better written as r2 + (z+1)2 = 1, because then it's immediately obvious that there are in fact solutions to the equation.) She asked if it was acceptable to leave it in one of these forms, or if it needed to be converted to, say, something with z in terms of r and θ (which would involve square roots).

I responded as follows:
In rectangular coordinates it might be preferable to solve for z in terms of x and y if it's possible to do so without making things too ugly, since we have a tendency to think of surfaces given in Cartesian coordinates as a "graph" of a function of x and y. The same thing is sort of true in spherical coordinates, in that ρ is often seen as a function of θ and φ (This is like polar coordinates, where r is usually a function of θ.) In cylindrical coordinates, though, none of the three coordinates really seem to be "dependent" or "independent".

Of course, all these rules can be violated; for example, one would never write the equation of a unit sphere centered at the origin as

z = ± (1-x2-y2)1/2

unless it were to graph it on a system that can't handle implicitly defined surfaces. The guiding principle should be that you want the simplest equation possible, i. e. the one which takes the least writing.


"Takes the least writing" is admittedly a bit sloppy here; how does one define it? The amount of "writing" any mathematical expression takes depends on one's notation. For example, subscripts or superscripts take up a lot of keystrokes in HTML, not so many in TeX (which is of course optimized for mathematical writing), and if one is writing by hand writing a subscript is just one character. (Indeed, one could even argue that subscripts and superscripts require less writing than full-size characters in handwriting -- they take less ink! But I shudder at the thought of taking this to its logical conclusion.)

Another principle I might invoke is "symmetry" -- another reason the equation about for a sphere looks wrong is because it fails to capture the fact that x, y, and z all behave "the same way". The form x2 + y2 + z2, on the other hand, doesn't have this problem. But I think "symmetry" might not be so useful a concept when trying to teach students; the ones who will understand this rather vague idea are the same ones who would probably just give the equation in a "more symmetric" form even if I didn't tell them to.

And why do cylindrical coordinates not seem to have a "dependent" and an "independent" variable, whereas three-dimensional Cartesian and spherical coordinates do? (I admit that this might just be my intuition, and I don't know if this agrees with anyone else's intuition.)

21 September 2007

The NYT archives on Fermat's Last Theorem

As you may know, the New York Times has opened up its online archives. Thus, I bring you:

June 24, 1993: At Last, Shout of 'Eureka!' In Age-Old Math Mystery by Gina Kolata.


(Note that the NYT doesn't seem to render superscripts correctly; I trust this won't be a problem for you!) They're interesting articles, but they don't say anything that hasn't been rehashed a zillion times in the last fourteen years; I bring them up mostly as historical curiosities.

From the June 24 article:
Dr. Ribet said that 20th-century work on the problem had begun to grow ever more divorced from Fermat's equations. "Over the last 60 years, people in number theory have forged an incredible number of tools to deal with simple problems like this," he said. Eventually, "people lost day-to-day contact with the old problems and were preoccupied with the objects they created," he said.
I've definitely had this feeling about number theory before; a lot of what goes under the name "number theory" doesn't seem to have all that much to do with, well, numbers: 1, 2, 3, 4, and so on. Somehow understanding the structure of those numbers seems more important to me, in some aesthetic sense, than understanding the structure of some crazy set of "numbers" one invents; should these auxiliary structures be anything more than scaffolding? Perhaps this is why I'm not a number theorist.

A column by James Gleick from October of 1993, which includes the following:
What a peculiar bit of knowledge! Yet if anyone is out there judging the galaxy's civilizations according to some handbook of "What Every Bright Species Ought to Know," it seems reasonable to suppose that somewhere on the list will be the fact of arithmetic that we call Fermat's Last Theorem. It must be a universal truth -- true here, true everywhere, true now, true always.

I don't know about that. Fermat's Last Theorem somehow seems accidental, even though it's so simple, and it's not used to prove other results. It wouldn't have nearly the same renown if Fermat hadn't scribbled that note in the margin. It's a "cute" problem, nothing more, one that's become important by historical accident. Gleick points out that whole branches of mathematics have arisen from failed attempts to solve FLT -- but wouldn't they have arisen in some other way? It's important to have big unsolved problems that will inspire people. But I don't know if it matters what problems, exactly, they are. If there are alien mathematicians, I wouldn't be surprised if to them FLT is just a footnote, if they've actually proved it at all. (I think they'll have at least thought of the problem, though, because the Pythagorean theorem seems inevitable, and any mathematical community that never thinks to attempt the generalization to higher exponents isn't a mathematical community worthy of the name.)

But if the Riemann Hypothesis turns out to be true (and I don't doubt that it is) I'd put it above Fermat's Last Theorem on such a list. (Not for the form in terms of the zeta function, but in one of the equivalent forms in terms of strengthenngs of the Prime Number Theorem.)

Also from the Gleick column:
It's an impressively economical way of speaking, really -- a picture's-worth-a-thousand-words approach wherein one word is worth a thousand words. Never mind that the published version of Wiles's proof will run 200 pages. To expand the shorthand, flesh it out so that, say, Fermat could understand it, would require many times more.
That's an interesting point. I've heard it claimed that the true "difficulty" of a piece of mathematics is not its length, but the length of all the pieces of mathematics that had to come before it. And perhaps if we couldn't explain it to Fermat, that means we don't really understand it. Feynman said, when he was teaching the courses that became the Feynman Lectures on Physics, that if he couldn't express something in a way that could be understood by freshmen, it meant he didn't really understand it. Fermat, one might say, could be considered one of these freshmen.

Here are some other things dragged out of the archives:

June 27, 1993: But How Did Fermat Do It?

June 27, 1993: After 350 Years, A "Q. E. D." (This is a very short item; it appears to be from some sort of "week in review".

June 29, 1993: SCIENTIST AT WORK: Andrew Wiles; Math Whiz Who Battled 350-Year-Old Problem by Gina Kolata.

A review of Amir Aczel's book "Fermat's Last Theorem"

And a challenge: find the error in this "parody proof" of FLT, which I found from the Wikipedia article. (I don't think this is particularly difficult. But I'm lazy, so I want you to tell me what it is.)

I think I'll stop here.

20 September 2007

Baseball entropy

I was thinking of putting together a prediction of the Phillies' odds of making the postseason -- it's late enough in the season that I can do the calculations exactly, if I'm willing to ignore the teams that plainly have no chance -- but the results would be depressing. (Although I did do a prediction of the Phillies' ten-thousandth loss, which came a couple days earlier than I predicted. That was depressing, too.) The good folks at Baseball Prospectus do a simulation where they run the rest of the season a million times; at this time of the season, with ten games left, the odds fluctuate wildly with each game.

I've also wondered if it would be possible to determine some sort of information entropy (the link goes to a nice intuitive explanation of what that means) from these postseason odds, and use that as a single quantity to determine how "close" the playoff races are at a given moment. For example, at basically any moment this season, the National League has been "closer" than the American League. Okay, by "wondered" I mean I thought "I don't really want to do the computations, because I'm lazy". The information entropy explains "how surprising" a random variable is. The entropy of a random variable which takes each of n values 1, ..., n with probabilities p1, ..., pn is

-(p1 log p1 + ... + pn log pn)

where we adopt the convention that 0 log 0 = 0 (or, equivalently, we say that all the probabilites involved are nonzero.) For example, consider the winner of this year's National League pennant. If there are, say, three known playoff teams and a fourth team which is equally likely to be one of two teams, then the probability that the pennant winner is any of the first three teams is 1/4, and any of the other two teams 1/2; then the entropy of that random variable is

3 (1/4 log 4) + 2 (1/8 log 8) = 9/4

(logs are base 2 here; this means that entropy is measured in bits). The entropy of a random variable which is equally likely to take any of n values is log n bits; thus there are in some sense 29/4 = 4.756... contenders, in that if we could have a random variable which took 29/4 values with equal probability, it would have the same entropy. This interpolates between four and five; there are five contenders but three of them are clearly stronger than the other two. As of right now the probabilities of each National League team making the playoffs, according to Baseball Prospectus, are

.9449017 (Mets), .8990023 (Diamondbacks), .8109878 (Padres), .7179337 (Cubs), .2938922 (Phillies), .2825803 (Brewers), .0310565 (Rockies), .0106489 (Dodgers), .0089891 (Braves), .0000075 (Cardinals), all other teams zero

I'll assume that each of these teams, if they should make the playoffs, have a one-in-four chance of winning the pennant; thus the entropy of the pennant winner is given by summing a bunch of terms which look like

-.9449017/4 log (.9449017/4)

and in the end we get 2.5312 bits, corresponding to 22.5312 = 5.780 contenders. This seems reasonable; there are basically six contending teams at this point.

The American League has four teams above 99 percent right now (Red Sox, Yankees, Indians, Angels), and the entropy of their pennant winner is 2.019 bits or 4.054 "effective" contenders.

And this post was originally supposed to be about math humor in a baseball radio broadcast: Ryan Howard, with the bases empty, has fourteen home runs and fourteen RBIs so far this season.

I was just (well, a couple innings ago now) informed of this by the Phillies radio announcers; it appeared on the monitor that tells them the statistics.

They chuckled at this. I assume they were chuckling because it's trivial, as one of them said "well, how many RBIs was he supposed to have?" The only way one can bat in a run if the bases are empty is to hit a home run; furthermore that will bat in exactly one run.

(I assume that the monitor in question breaks down a player's statistics by the eight possible situations for who is on the bases; the other lines probably seem less silly. They won't show this same sort of thing, because when runners are on base it's possible to get them in without scoring a home run.)

Broccoli causes lung cancer, or why most science is wrong

Robert Lee Hotz wrote in the Wall Street Journal that most science studies appear to be tainted by sloppy analysis. (My impression here is that "science" means "biomedical science", which is the sort of science that is most reported on by the media, for the simple reason that it is the kind of science which has the most direct effect on people's lives.) I learned about this from Marginal Revolution; NeuroLogica and denialism have picked it up as well; this is all based on some grand meta-analysis by John Ioannidis.

I'd like to share with you an example I made up a few days ago, while I was randomly pontificating. Imagine that for some reason, scientists suspect that there is a link between broccoli and lung cancer. One hundred scientists do one hundred separate studies trying to find this link. Three of them say "there appears to be a positive correlation between broccoli and lung cancer", meaning that they found such a correlation and that it is outside a 95% confidence interval. The media will hear this and will say "look! broccoli causes lung cancer!" and the broccoli industry will be very unhappy. (The elder George Bush will be happy, though.) But if there were no correlation, you'd expect five of the hundred scientists to have found this correlation just by sheer luck! The fact that only three of them found it is evidence towards broccoli not causing lung cancer.

But you'll never hear that on the evening news, because the news people want you to think you're sick, or you're going to get sick. Their audience is aging and their main advertisers are drug companies.

A slightly less toy example can be found in an old Marginal Revolution post.

A bit more seriously, it seems like a lot of people are tinkering with their data too much:
Statistically speaking, science suffers from an excess of significance. Overeager researchers often tinker too much with the statistical variables of their analysis to coax any meaningful insight from their data sets. "People are messing around with the data to find anything that seems significant, to show they have found something that is new and unusual," Dr. Ioannidis said.

But one in twenty things that are "neither true nor false" will appear true. Perhaps it makes sense to require a wider confidence interval for results which are obtained by this sort of "data mining" than for the results which one originally intended to find? It's turning out that a lot of results are not reproducible.

Although I don't presume to be qualified to speak for how medical researchers should do their work, it seems to me that perhaps they need more forums to report negative results. In the broccoli-and-lung-cancer example, I suspect that the researchers publishing the three papers with positive results wouldn't know about enough of the negative results to make them doubt their claim. As Steven Novella points out, the fact that "most published research is wrong" is probably a combination of this lack of such forums and something like my example.

There are growing suggestions that this would even be useful in mathematics, where you'd think we wouldn't need it because we can prove our results beyond a shadow of a doubt. But we don't publicize our negative results -- we don't publish papers saying "I thought proposition X might be true, and here's why, but then I came up with this counterexample", although we might say these things in informal conversation. So there's still probably a tremendous duplication of work. Some duplication of work is probably desirable, even in mathematics; different people will have different perspectives on the same problem. But a lot of people probably have the sense that they are going down an already-explored dead end and it would be nice if they had at least some ability to confirm or refute that. This can only be more important when we're talking about the sort of research where lives are at stake.

19 September 2007

A quick thought on Newton's method

John Armstrong talks about fractals in Newton's method. (If you're not familiar with Newton's method, read that post first; what follows is in the nature of a comment to it.) This is something you wouldn't expect; when you're taught Newton's method the impression is that it'll always find the closest root to your initial estimate, or at least that the region which eventually leads to any given root is "nice-looking". That is very far from being true. Basically, if you're trying to find a zero of f(z) using Newton's method, and f'(z) is small, then you can suddenly find yourself making arbitrarily large steps. The self-similar nature of the picture you get if you color the initial values which lead to one zero red, another zero green, another zero blue, and so on isn't too surprising if you think about it this way; the picture has to be its own image under the transformation which takes c to c-(f'(c)/f(c)), which is analytic when f(c) isn't zero. An example of such a picture accompanies the post.

A heuristic look at the crossing number inequality

Terence Tao gives a (well-known) proof of the crossing number inequality. This inequality states that for a graph of average degree at least eight,

cr(G) ≥ E3/(64V2)

where V is the number of vertices of G, E is the number of edges of G, and cr(G) is the so-called "crossing number" -- the minimum number of crossings of edges in a drawing of the graph. (As is usual in the "probabilistic method", a fairly simple proof gets the same form as the best known proof but with weaker constants; a paper of Pach, Radiocic, Rados, Tardos, and Gabor1 improves the constant to 31.08. (From a quick skim through the paper, it doesn't look too interesting, being mostly a bunch of messy computations.)

Why should the crossing number inequality have these exponents? Say that we expect a lower bound for cr(G) of the form cEαVβ. (We have this, with c = 1/64, α = 3, β = -2.)

Say we replace G with two copies of G. Then we double its number of edges, its number of vertices, and its number of crossings. Our proposed lower bound now becomes c(2E)α(2V)β, and we'd like this to equal twice the old bound. So we have

cEαVβ = c(2E)α(2V)β

and canceling what will cancel, we get α + β = 1. (Of course, this is all technically wrong, because I've been adding inequalities like they're equalities.) So if we can guess what either α or β will be, we win! (What do we win? A cup of bad coffee. Please go to any mathematics department to claim your prize.)

So now let's say we double the number of edges in a graph; we want to know why this should multiply the number of crossings by 23 = 8. This seems not so promising. Imagine two graphs on the same vertex set, G and G', which we've smushed together to make a new graph. There is some number of crossings among edges of G, and some number among edges of G'; we want the total nubmer of crossings on this new graph to be eight times the number on either of the original graphs. So the number of crossings involving an edge of G and an edge of G' must be six times the crossing number of G or G'; this might be true but seems hopeless. (Throwing three graphs on top of each other only makes it worse; now we need twenty-seven times as many crossings on G + G' + G'' as we would on any of the three summands, and so we'll have to get eight times cr(G) crossings from each pair of summands. It only gets worse with more summands.

Okay... here's another idea. Hold V constant. Then the inequality says that the crossing number is at least proportional to the cube of the number of edges. This is strange, as it seems to imply somehow that a crossing is going to involve three edges. But now think about how many potential crossings there are in a graph with E edges. These are just pairs of edges, so there are about E2/2 of them. So the inequality is just saying that the probability of two edges crossing in a graph is proportional to the total number of edges!

But what makes things cross? Well, edges that are long are more likely to cross than edges that are short. (It's like how you find yourself maneuvering around fat people on the sidewalk, but not skinny people.) In particular, if we have two curves of length l1 and l2, their probability of crossing -- assuming they move around "randomly" -- will be proportional to the product of their lengths. But how long are the edges in a drawing of a graph? In order to minimize crossings, we probably want edges which are connected to be near each other. And the vertices are probably "evenly spaced" in the plane; say each one has an area of order 1 that is closer to it than to any other vertex. Let d = 2E/V be the average degree of our graph; then the optimal drawing of a graph should have the edges connected to any given vertex v within a circle of area ~d around that vertex... or a circle of radius ~d1/2. Thus edge length is proportional to the square root of the average degree... and the probability of two edges crossing is proportional to the square of average edge length. So the probability of two edges crossing is proportional to average degree, which if we're holding V constant is proportional to the number of edges.

Just like we wanted.

Of course, there are a lot of places where this argument doesn't make sense; in particular I feel a bit wrong about my argument for the probability of random curves crossing being proportional to the product of their lengths. But it's still how I will understand why this result is true.

(There is a way to reinterpret the crossing number inequality, if for some reason that V in the denominator bothers you, as it did me at first We can interpret it in terms of the average degree d of a graph G, which is just 2E/V. Then we have

cr(G) ≥ Ed2/256

and this form seems clearer to me. Crossing number increases linearly with number of edges, which makes perfect sense -- if you take two disjoint copies of a graph, they each have the same number of crossings, so the total number of crossings should be twice what you started with. And it increases with the square of the average degree. None of this silly stuff with the number of vertices in the denominator. The problem with this, as I discovered, is that the average degree depends on the number of edges...)

1. Pach, János; Radoi\v ci\'c, Rado\v s; Tardos, Gábor; Tóth, Géza Improving the crossing lemma by finding more crossings in sparse graphs. Discrete Comput. Geom. 36 (2006), no. 4, 527--552. Available from Tardos' web site at http://www.renyi.hu/~tardos/crossinglemma.ps.

18 September 2007

how to categorize mathematicians, part N

Andrew at Statistical Modeling, etc. talks about the hedgehog-fox distinction (also here). The idea is that scientists of whatever type can be divided into "hedgehogs" and "foxes", with the "hedgehogs" being those who work on one problem and the "foxes", those who work on many. Andrew claims that statisicians, for the most part, are "foxes". (I can't intelligibly comment on this because in order to do so one has to have a sense of which results come from which people, which I don't in that field.)

There's a similar distinction that some people make in mathematics between people who prove lots of facts and people who produce an overarching theory that unites those facts. Any discipline needs both, I think. In physics one hears the metaphor of experiment and theory "leapfrogging" each other; the theory suggests new experimental approaches, which eventually push the boundaries of the existing theory, which lead theorists to come up with new theory which explains the new experimental results, and so on. The distinction exists in mathematics, too, although it's a little harder to see because what the "theoretical" mathematicians do and what the "experimental" mathematicians do look superficially similar. They both involve lots of staring off into space and occasionally scribbling illegibly on a piece of paper.

But this isn't exactly the same distinction. A mathematical "hedgehog" could, for example, work on solving a few "problems" which happen to be very hard; a "fox" could theorize and systematize in a bunch of different areas. This is perhaps made difficult by the fact that although it's easy to do a problem in an area not one's own, it's harder to pull together many problems in an area not one's own to make a theory without taking a long time to understand the large set of disconnected problems that have been done well enough to see the underlying structure.

Timothy Gowers has talked about the distinction between problem solvers and theorizers in his essay The Two Cultures of Mathematics (via the n-Category Cafe). Gowers also gives some examples of "organizing principles" of combinatorics; a lot of his examples would be difficult to write as "theorems", but are more in the nature of "tricks that work an awful lot of the time":
I have been trying to counter the suggestion that the subject of combinatorics has very little structure and consists of nothing more than a very large number of problems. While the structure is less obvious than it is in many other subjects, it is there in the form of somewhat vague general statements that allow proofs to be condensed in the mind, and therefore more easily memorized and more easily transmitted to others.

Were other areas that now appear to have some sort of big overarching theory like this in the past? Is combinatorics only like this because it's relatively young, or is this a fundamental trait of this part of the mathematical universe?

A convention for quadrant/octant/orthant numbering

In two-dimensional Cartesian geometry, it's conventional to refer to the area where x and y are both positive as the "first quadrant", to that where x is negative and y is positive as the "second quadrant", to that where y is negative and x is negative as the "third quadrant", and to that where y is negative and x is positive as the "fourth quadrant". In tabular form, we have




x < 0x > 0
y > 02nd1st
y < 03rd4th


There seems to be no conventional way to extend this to the eight octants in three-dimensional Cartesian geometry. The only thing that people seem to do consistently is to call the octant where x, y, and z are all positive the "first octant", or sometimes the "positive octant". And I've noticed that a few of my students want to call the octant where x, y, and z are all negative the "eighth octant"; I actually want to do the same. It's far from the first octant; it should have a number that is far from 1. But we don't call the quadrant where x and y are negative the "fourth quadrant".

Actually, the "next step" in this recursion would lead to a scheme in which the (-, -, -) octant is called the fifth octant, and in general the octant obtained by reflecting the nth octant over the xy-plane "should be" (9-n)th octant. This preserves the property that the kth octant and the (k+1)st octant are adjacent; there's no a priori reason why this is a good condition to have, but it's the same recurstion that says that the 3rd and 4th quadrants are obtained by reflecting the 2nd and 1st quadrants, respectively, over the x-axis.

More generally, in d+1 dimensions, the point (ε1, ε2, ..., εd, 1), where each ε is ±1, is in the (d+1)-dimensional orthant with the same number as the d-dimensional orthant containing (ε1, ε2, ..., εd); call this number k. The point (ε1, ε2, ..., εd, -1) is in the orthant with number (2d+1+1-k).

This means that if we traverse the octants (or, more generally, the "orthants" in d-dimensional space) in numerical order, we get a Hamiltonian tour of the vertices of the d-dimensional cube, which is kind of cute.

This page from Math Forum seems to advocate calling the all-negative octant the "seventh octant", which comes from a hybrid of my convention and the convention that octants which border each other over the plane z=0 have numbers differeing by 4. I suppose it's okay in three dimensions, but my sense of aesthetics demands that we have a definition which recurses to higher dimensions.

Still, there's really no reason to be able to number the orthants other than the all-positive one (often called the "first orthant"), and anyone who needs to refer to a particular orthant really ought to just specify it in the form x1 * 0, x2 * 0, ... where each * is < or > in order to avoid confusion.

A cultural map of the world

Take a look at the Inglehart-Welzel Cultural Map of the World:

The World Values Surveys were designed to provide a comprehensive measurement of all major areas of human concern, from religion to politics to economic and social life and two dimensions dominate the picture: (1) Traditional/ Secular-rational and (2) Survival/Self-expression values. These two dimensions explain more than 70 percent of the cross-national variance in a factor analysis of ten indicators-and each of these dimensions is strongly correlated with scores of other important orientations.


The resulting "map" doesn't resemble any "traditional" map of the world, but it's interesting. For example, all the English-speaking countries end up near each other, all the countries of Protestant Europe end up near each other, and so on.

I'm not sure why "English-speaking" is its own group, while no other language is given its own group. In particular, all the Spanish-and-Portuguese-speaking countries seem to end up together. (This is obscured by the map, which for some strange reason includes Uruguay in "Catholic Europe" and Portugal in "Latin America". I'm conflating Spanish and Portuguese not because I don't know there's a difference, but because they are fairly similar as languages go.

My instinct is that the "Traditional Values"/"Secular-Rational Values" divide is similar to the ideological conservative/liberal divide in American politics (although I'm not sure how this would be made precise); I want to say that the "Survival Values"/"Self-Expression Values" dimension corresponds to the economic conservative/liberal divide in American politics although that seems like a lot more of a stretch. Apparently countries seem to move from "Survival" to "Self-Expression" (i. e. rightward on the graph) with time.

One sees a lot of these "factor analysis" plots where a large-dimensional space is reduced to just two dimensions in this way; I don't think there's some fundamental reason why two dimensions is the natural way to think about this, but rather that we're just good at drawing two-dimensional pictures. Dave Rusin's Mathematical Atlas includes such a plot (although naming the dimensions is tricky -- I thought they might be discrete vs. continuous and pure vs. applied.) I've also seen a political map like this, based on the voting records of U. S. Senators and Representatives; it's kind of fascinating to watch how the two parties have moved around with time, and how you can explain almost as much variation between politicians by just looking at a single variable as you used to need two variables for. You can almost predict who will vote for a given bill just by lining up the Senators from "most liberal" to "most conservative" and drawing a line somewhere to separate the two sides. Life is more complicated than that.

I wonder if one could use a plot like this (or the data which underlies it) to predict which international borders are likely to create a lot of tension. For example, the U. S. is (according to this plot) much more similar to Canada than it is to Mexico, and there seems to be a lot more tension at the U. S.'s southern border than at its northern one. Perhaps one could predict strife within a country as well, if a survey like this was done for subnational entities. Lumping the entire United States together seems almost ludicrous to me.

Also, how is this sort of thing correlated with language? And if the language spoken in a country changes, for whatever reason, is this correlated with that country becoming more like countries that speak the new language? It seems reasonable that there should be some connection between language and culture, if only because most of culture is expressed through language. But causation is a problem; do countries become more like each other because they speak the same language, or do countries that speak the same language become more like each other?

17 September 2007

142857

A web site devoted to the number 142857.

1/7 = 0.142857142857142857142857142857142857...

and, for example, 142857 * 2 = 285714 -- the same digits, but in a different order.

There are some facts I hadn't seen, too, like 8572 - 1422 = 714285, and the pair 1428572 = 20408122449, 20408+12249 = 142857. The first one has a nice "approximate" explanation, namely that (6/7)2 - (1/7)2 = 5/7 (multiply both sides by 106), but I don't even see why the second one has any chance of being true.

Hand-waving asymptotics of the primorial function

Let n# be the "primorial" function, which is the product of all the primes less than or equal to n. So, for example, 7# = 8# = 9# = 10# = (2)(3)(5)(7) = 210.

A friend of mine asked for a proof that n# > n, for n ≥ 3.

The easiest proof I can think of is as follows: you begin with Bertrand's postulate, which states that there is always a prime in the interval [k+1, 2k] for any k. (The link goes to David Galvin's exposition of Erdos' elementary proof of this fact, which inspired the following rhyme from Erdos: "Chebyshev said it, and I say it again/There is always a prime between n and 2n." This proof depends on some properties of the prime factorizations of binomial coefficients.)

In particular, there is a prime p, which depends on n, in the interval (n/2, n]; this prime is strictly greater than n/2. Also, 2 is a prime. If n ≥ 4, then p ≠ 2, and we have n# > 2p > 2(n/2) = n. We only need to check n = 3, which is easy: 3# = (2)(3) = 6.

Of course, this proof is throwing out a ridiculously large amount of information. From repeatedly applying Bertrand's postulate we can do better; in particular we get n# > (n/2) (n/2)#. (I'm ignoring the fact that n/2 might not be an integer.) Repeating this we get n# > (n/2) (n/4) (n/4)#, and so on. If we repeat this k times, where k is the greatest integer less than log2 n times, we get

n# > (n/2) (n/4) ... (n/2k) = nk / 21+2+...+k = nk / 2k(k+1)/2.

Recalling that 2k ≈ n, this is

nlg n / n((lg n) + 1)/2

or on the order of n(lg n)/2. (lg denotes the base-2 logarithm.) For example, this tells us that 100# is at least 106 or so.

It's suspected that for all n > 1, there is a prime with n2 < p < (n+1)2. This gives a stronger lower bound. Namely, let k be the largest integer less than or equal to n1/2. Thenn if this conjecture is true, there's a prime in each of (1,4), (4,9), (9,16), (16,25), ..., ((k-1)2, k2). The product of these primes is at least

(1)(4)(9)...((k-1)2)

or ((k-1)!)2; this is on the order of (k/2e)k. (I've replaced k-1 with k here, and used Stirling's approximation.) Recalling that k ~ n1/2, this is (√(n)/2e)√(n). For n = 100, this means 100# is at least 1011 or so. (I'm not actually willing to write n# > (√(n)/2e)√(n) because I haven't carefully considered what's going on at the endpoints of the products.)

And here's a quick heuristic justification for why n# is about en. Consider log n#; this is

Σk=1n (log k) [k is prime]

where the expression "[n is prime]" is 1 if n is prime and 0 isn't. Now, from the Prime Number Theorem we know that the the "probability" that n is prime is 1/(log n). (This is how I remember the Prime Number Theorem; it's a bit silly, because a number is either prime or it's not, but I find it a useful heuristic.) So the "expectation" of log n# is

Σk=1n (log k) (1/log k)

which is just the sum of n terms, each of which is equal to 1. So, in fact, 100# is about 1043. (This argument, of course, isn't rigorous.)

correction to multiple zeta values post: Hoffman's relation

On Friday I made a post about multiple zeta values and was frustrated by my inability to find, for example, ζ(2,1).

Sarah Carr has informed me of the missing relation, "Hoffman's relation", which is used for calculating non-convergent ζ values:
For any convergent sequence of positive integers, k = (k1, ... ,kd) and its
corresponding sequence of 0's and 1's, ε = (0k1-11,...,0kd-11), then Σσ ζ(σ) = 0 where σ runs over all the terms in (1)*k - (1) Ш ε

I'm not particularly interested in finding it anymore (I have bigger fish to fry at the moment), but in the interests of completeness I wanted to make sure I had this right.

16 September 2007

hard science, and calculating the area of a circle

The Really Hard Science, by Michael Shermer, from the October 2007 issue of Scientific American. The author makes the claim that the traditional distinction between "hard" and "soft" science is backwards; the social sciences might more reasonably be considered harder. I'm not sure how much I buy this linguistic claim, because "hard" has two antonyms in English, "soft" and "easy", and I suspect there's a reason we call the social sciences "soft sciences", not "easy sciences". The point that all these disciplines are valuable, I support. Shermer points out that modeling a biological or social system is often more difficult than modeling a physical system. We tend to think of the mathematics used by physicists as more complicated than that used by social scientists, but how much of that is a reflection of the subject matter and how much of that is due to the historical alliance between mathematicians and physicists, which means that physicists have tended to know lots of mathematics and, say, sociologists don't know as much? (I struggle with this in teaching calculus; our textbook, like many calculus textbooks, draws a lot of its examples from physics -- centers of mass, moments of inertia, and such -- and the students aren't as likely to be conversant with physics as the authors of the text seem to assume.)

He writes the following:
Between technical and popular science writing is what I call “integrative science,” a process that blends data, theory and narrative. Without all three of these metaphorical legs, the seat on which the enterprise of science rests would collapse. Attempts to determine which of the three legs has the greatest value is on par with debating whether Ï€ or r2 is the most important factor in computing the area of a circle.

Which is more important in calculating the area of a circle? I say r2. It's interesting that there is a constant, but the way in which the area of the circle grows when you increase the radius (namely, twice as fast) is more important to me. I may only be saying this, though, because I don't have to actually calculate the area of a circle. The people who made the circular table I'm sitting at certainly care about π to tell them how much raw material they will need.

The "twice as fast" statement there is a bit vague, but I mean it in the sense that if one increases the radius of a circle by some small fraction ε (as usual, ε2 = 0), one increases its area by the fraction 2ε. That is,

π(r(1+ε))2 = π(r2 + 2εr2+ ε2r2) = (πr2)(1+2ε)

where the last equality assumes ε2 = 0. For the most part this is how I think of derivatives -- at least simple ones like this -- in my head. The ε2 = 0 idea is the sort of thing one might see in the proposed "wiki of mathematical tricks" of Tao and Gowers (the real meat of the discussion is actually in the comments on those posts), although perhaps at a slightly lower level than they're aiming at.

15 September 2007

just noticeable difference of temperature?

Down with the metre and litre, claims that the sizes of metric units are unintuitive: "On the other hand, a degree celsius is too large and imprecise. By contrast, the Fahrenheit scale was custom made to measure weather phenomena: it has human scale."

A celsius degree is 1.8 times the size of a Fahrenheit degree.

Do people who claim this seriously think that they can tell the difference between, say, 61 degrees Fahrenheit and 62.8 degrees Fahrenheit? I don't think they could, and so I would claim that even the Celsius degree is small enough for "everyday" purposes. (In weather forecasts given in Fahrenheit one often refers to, say, the "low seventies" or the "mid-sixties" which seems to imply a resolution of about three or four degrees.) I claim that in both scales, one degree is less than the just noticeable difference for atmospheric temperatures, although I can't quickly find out if the people who study this back me up.

(One thing I do like about using the Fahrenheit scale for weather is that almost all weather I experience is between about 0 degrees and 100 degrees. In some tellings of the story, that's said to be deliberate. There is something weird about negative temperatures, but I'm not quite crazy enough to just start stating all temperatures in Kelvins.)

Why g ~ π2

The acceleration due to gravity, at the surface of the Earth, is about 9.81 m/s2. (If you are some of my students, you think it's 10, which is confusing for a moment. Fortunately none of my students thought it was 32.)

π2 = 9.87.

The approximate numerical equality of these numbers is not a coincidence.

I was reminded of this by Mark Dominus' post John Wilkins invents the meter, which I found via his post The Wilkins pendulum mystery resolved. In 1668, John Wilkins defined the meter to be essentially the length of the "seconds pendulum", i. e. the pendulum whose period is two seconds (the "mystery" there is that since one wants a pendulum with a light cord and a heavy bob, one has to take into account the moment of inertia of the heavy spherical bob to define this correctly). This turns out to be about 39.1 inches.

A meter is 39.37 inches.

Indeed, originally the length of the seconds pendulum was going to be the length of the meter, but the slightly different suggestion was adopted instead that the meter would be one part in 107 of the distance from the North Pole to the Equator via Paris. (Now, in fact, the meter is defined by assuming the second known and stating the speed of light.) Wikipedia tells me that these definitions were adopted in 1790, 1791, and 1983 respectively; there are others intermediate between the last two.)

The period of a pendulum is approximately T = 2π(L/g)1/2, where L is its length and g is the acceleration due to gravity. If we set T = 2 and L = 1, and solve for g, we get g = π2.

I suspect that the reason the seconds pendulum wasn't adopted is that the formula for the period involves the small-angle approximation; in reality the period of a pendulum depends not only on its length and on the acceleration due to gravity but also on the angle through which it swings. The period is given by

T = 4\sqrt{L \over g} \int_0^{\pi\2} {1 \over \sqrt{1 - {\frac{\theta_0}{2}} \sin^2 \theta}} d\theta = 2\pi \sqrt{L \over g} \left( 1 + {1 \over 4} \sin^2 \frac{\theta_0}{2} + {9 \over 64} \sin^4 \frac{\theta_0}{2} + \cdots \right)

and so the fractional magnitude of the error is about 1/4 sin2 (θ/2); if θ is, say, one degree this is one part in about fifty thousand, or nearly two seconds a day. That error would become an error in the definition of the meter, and perhaps the Powers that Were thought that was a bit too much. (I'm not sure how accurate surveying was at the time, though, so I can't comment on the accuracy of their eventual definition based on the size of the Earth.)

14 September 2007

multiple zeta values

Today I learned about Multiple Zeta Values in my department's graduate student "pizza" seminar. The speaker was Sarah Carr; the link is to her notes. Some of what follows basically rehashes definitions and conjectures from her notes; some of this is my own thoughts. In the future, you might expect to find math that I don't really understand but am pretending to on Fridays, because that's when this seminar meets.

One can define the "multiple zeta value" of a sequence of positive integers (k1, ..., kd), with k1 ≥ 2 (a condition which is needed for convergence), in the following way:


The ordinary Riemann zeta function is just the special case where d=1.

It turns out that these obey certain nice relations which can be found basically by just looking at the sums, for example

ζ(a) ζ(b) = ζ(a, b) + ζ(b, a) + ζ(a+b).

This allows one to compute some of these values; for example, if a = b = 2, we get

ζ(2)2 = 2 ζ(2, 2) + ζ(4)

and using a certain well-known results of Euler, namely that ζ(2) = π2/6 and ζ(4) = π4/90, we get ζ(2,2) = π4/120. Of course, one doesn't want to write ζ over and over again when studying these things, so we'd write something like

(a) * (b) = (a, b) + (b, a) + (a+b)

and this "*" is an example what's called the "stuffle product". (I swear I'm not making this name up!) You can read Carr's notes for the definition in general.

There's a natural way in which we can view sequences of integers as sequences of 0's and 1's; namely, replace each occurence of a by a-1 0's followed by a 1, so, for example, the sequence (2, 3) becomes (0, 1, 0, 0, 1). On these sequences one can define a relation called the "shuffle product", on which one has, for example,

(0, 1) Ш (0, 1) = 2(0, 1, 0, 1) + 4(0, 0, 1, 1)

or, in the original notation,

(2) Ш (2) = 2(2, 2) + 4(3, 1).

Sticking the ζs back in and turning Ш into multiplication is allowed; this is a result of Kontsevich. The proof hinges on a representation of ζ values as integrals, which is pretty natural; the integrals in question have nice power series expansions that coincide with the definition of the ζ values. Doing this, you get

ζ(2)2 = 2ζ(2,2) + 4ζ(3,1)

from which we conclude that ζ(3,1) = ζ(4)/4 = π4/360.

It turns out, though, that the relations one gets from considering the * and Ш operations are graded, in the sense that given a relation among the ζ values, the sum of the arguments in each term of that relation (the "weight" of each term) will be the same. For example, in the relation

ζ(2)2 = 2ζ(2,2) + 4ζ(3,1)

the three terms have weight 2+2, 2+2, and 3+1 respectively. It's conjectured that all the relations among ζ values come from * and Ш, from which it would follow are no relations among ζ values of different weight; this would mean that all ζ values are transcendental. Since putting a sequence which sums to m and one which sums to n into either * or Ш gives a sequence which sums to m+n, this would mean that the ζ values form a graded algebra.

I also don't know how many relations there are among ζ values of the same weight; one might hope that there are enough that we can find all the ζ values of even weight exactly by purely algebraic means, given that we know ζ(2n)? (In particular, this would imply that ζ of any sequence summing to 2n is π2n times some rational number; above, we see that ζ(4), ζ(3,1) and ζ(2,2) are all rational multiples of π4. But I don't have too much hope for that conjecture, because I can't even find ζ(2,1,1) that way! (I think that my inability to do this would follow from a conjecture of Zagier mentioned in Carr's notes, on the dimension of the grade-n part of the algebra of ζ values, but I don't trust myself.)

CORRECTION, Monday, September 17: There's a missing relation that I didn't know about. See this post.

Yet another Congressional redistricting proposal

Yet another Congressional redistricting proposal, from Brian Olson.

This proposal is based on the theory that the "best" districting proposal is the one which has the shortest sum of distances from people to the center of their districts.

It sounds like a good idea, but I'm not sure how one would actually find this. Olson claims to. But his program runs by taking the current districting and perturbing it (he has video showing this), which means that the maps look like the current maps. Gerrymandering becomes a bit more difficult because you can't have districts that are oddly shaped. But, for example, the city of Austin, Texas (with a population which is very nearly that of one congressional district, and is politically quite different from the surrounding areas) is shared between three districts in both the old and new maps.

Of course, Olson's method didn't say it wouldn't do that, and in fact Olson addresses the question of his method breaking up communities in his FAQ. But I also wonder if he's only finding a local minimum for the sum of the distance to the centers, while the global minimum might lie with a map that looks radically different. The space of possible districtings of a state is very-high-dimensional so this is the sort of thing one has to worry about. Olson in fact proposes an Impartial Redistricting Amendment which would enshrine this criterion -- but how do we know that his program, or any other program, actually implements it?

What might be interesting is to just assign each census block in a state to a district at random and then run the algorithm. The problem with that is that there's no way the district lines would end up falling in the same place every time.

(Found at O'Reilly Radar, via Statistical Modeling, etc., the full name of which I'm tired of typing.)

13 September 2007

the Bananach-Tarski paradox

From Courtney Gibbons' Brown Sharpie: the Bananach-Tarski theorem. This tells us that a banana can, through some sort of magic, be made into two bananas.

This is of course a restatement of the Banach-Tarski theorem, which tells us that a solid ball can be partitioned into finitely many non-overlapping subsets, which can then be rearranged to form two balls each of the same size as the original ball.

Of course, you couldn't do this with actual bananas (even though a topologist can't tell a banana from an orange, or a doughnut from a coffee cup, or eir pants from a hole in the ground). Damn atoms.

What other fruit-related math jokes are there? The only one I can think of is "What's purple and commutes? An abelian grape," which barely qualifies as a joke.

I found the Bananach-Tarski paradox via the Facebook group I support the right to choose one element from each set in a collection, which asks:
Do you believe that an infinite product of nonempty sets should be nonempty? Do you feel that non-measurable subsets of the reals should exist? Or games of perfect information with no winning strategy for either player? Do you believe that every set should have a well-ordering, or that any poset in which every chain has an upper bound is entitled to a maximal element? Do nonzero rings have the right to a maximal ideal? Is every vector space entitled to a basis? Should fields have algebraic closures? Should products of compact topological spaces be compact, and countable unions of countable sets be countable? Do you want to be able to cut a sphere up into a finite number of pieces and reassemble them, with only rigid motions, into a sphere twice as large?


Finally, because this blog is at least nominally about probability: imagine a (randomized) algorithm that outputs a string of letters as follows: start with any three letters a1, a2, a3. Then for each n > 3, consider some corpus of English text, and consider all the occurrences of the string an-3 an-2 an-1 in that corpus. Tabulate the distribution of the letter that follows that string in the corpus. Choose an from that distribution.

I don't have the programming skills or the corpus to work from, but I bet there's a significant chance that if you start with BAN you get BANANANANANA...

Rosh Hashanah, and somehow the circle of fifths.

Today is Rosh Hashanah, the Jewish New Year.

The Hebrew calendar is a lunisolar calendar, which means that the months remain roughly in step with the phase of the moon (months start approximately at the new moon) but the beginning of the year is always at about the same time with respect to the seasons. This is arranged by having "leap months"; some Hebrew years have 12 months (of, on average, 29.5 or so days each) and some have 13.

It turns out that the leap years in the Hebrew calendar follow the Metonic cycle, which is a name for the fact that 235 lunar months is very nearly 19 solar years. In fact, 235 months are .076 days longer than 19 years, which seems to indicate that the calendar drifts by .076 days per 19 years, or one day per 219 years, with respect to the seasons. The Rosh Hashanah article backs this up; currently Rosh Hashanah can occur no earlier than September 5, but after 2089 it will occur no later than September 6. (It's legitimate to use the Gregorian calendar here, as the corresponding error in that calendar is something like one day in three thousand years.) The Hebrew calendar was codified in its present form several centuries before the Gregorian calendar, which probably explains why the Gregorian is more accurate; I have no doubt that the creators of the Hebrew calendar would have found a way to make it more accurate if they'd had the data to do so.

I learned from the Wikipedia article on the Hebrew calendar that
Another mnemonic [for remembering the pattern of common and leap years] is that the intervals of the major scale follow the same pattern as do Hebrew leap years: a whole step in the scale corresponds to two common years between consecutive leap years, and a half step to one common between two leap years.
For a moment this seemed surprising, but then I stopped to think about it. One has to have seven leap years (i. e. 13-month years) out of every nineteen. This is slightly more than one out of every three, so one could imagine taking a 21-year cycle of the form

LCCLCCLCCLCCLCCLCCLCC

where L represents a leap year and C a common year, and then deleting two of the C's. It is reasonable to delete two common years that are as far apart as possible, so that the calendar doesn't get too far ahead or behind. Similarly, in the case of the major scale we have two half steps and five whole steps to be arranged; one "wants" the half steps to be as far apart as possible. (This is a little more dubious, but seems reasonable.)

In music we have the "circle of fifths", so one note is a fifth, or four diatonic tones higher than the one before but seven half-steps higher; this only breaks down when seven fifths has to be twenty-eight diatonic tones (four octaves) but forty-nine half-steps (four octaves plus a half-step). A similar construction could exist in the Hebrew calendar, where seven is replaced by eleven; "usually" a period of eleven years has four leap years and seven common years, but if one had nineteen such periods that would give 76 leap years and 133 common years, while eleven nineteen-year cycles ought to have 77 leap years and 132 common years.

In fact, both 4/11 and 7/19 are convergents of a certain continued fraction, the one which is the expansion of the actual number of lunar months per solar year, minus twelve. There is apparently a "tabulated Muslim calendar" that uses a similar-looking 11/30 ratio, although for a different reason; Islam forbids leap months, but it turns out that to keep the calendar in sync with the moon one has to add a day every so often. The average lunar month is slightly longer than 29.5 days, and this "tabular Islamic calendar begins by giving alternate months 30 and 29 days and then adding a day to the last month of eleven years out of thirty. I was actually about to state that 11/30 was the next convergent of the continued fraction in question, but it's not; it's actually 123/334, so one could have a cycle of 123 leap years out of every 334. This would be build up from seventeen of the 19-year cycles and one 11-year cycle. Of course, the problem with such a complicated leap-year rule would be that nobody can remember which years are leap years!

(Incidentally, there are also rules about which day of the week Rosh Hashanah can fall on; it has to be Monday, Tuesday, Thursday, or Saturday. The reason for this is as follows:

  • Yom Kippur, which is the tenth day of the year, should not fall on a Friday or Sunday, i. e. a day adjacent to a Saturday, for this would make two consecutive days when no work can be done. This means that the year should not begin on a Wednesday or Friday.

  • The twenty-first day of the year, which is the last day of Sukkot, called Hoshanah Rabbah, for some reason should not fall on a Saturday. (I once read why, but I forgot.) So the year should not begin on a Sunday.


Therefore Rosh Hashanah doesn't always actually start on the day of the new moon; sometimes it's moved around by a day or two to avoid it falling on such a day of the week.)

12 September 2007

geometric interpretation of vector products

Yesterday, a student asked me if there was a geometric interpretation of the dot product of vectors. I had been talking about the geometric interpretation of the cross product, which is usually given as follows: given two vectors a and b,

  • the magnitude of a × b is the area of the parallelogram determined by a and b, and

  • the direction of a × b is given by the right-hand rule
  • .

That's all well and good, although I was at one point asked what good the cross product was. This is a tricky question to answer, because it is not immediately obvious to students of calculus why one would want to find a vector that is orthogonal to two given vectors; furthermore, the definition of the cross product is a lot more complicated and mystifying than the definition of the dot product, and I suspect they want something more obviously useful in return for that.

Anyway, I couldn't think of a geometric interpretation of the dot product, in the sense that this student seemed to want. The Wikipedia article says the following: Since |a|cos(θ) is the scalar projection of a onto b, the dot product can be understood geometrically as the product of this projection with the length of b. That's true, and I did say that, but what good is it? The two vectors that we're saying ab is the product of the lengths of are in the same direction, so they don't naturally define an area. Is there a geometric interpretation of the dot product that doesn't feel so contrived? (This is sort of a vague question, as some people might not find what I just said contrived.)

We then moved on to the Scalar triple product a ⋅ (b × c), which created other frustration; it's weird to write this thing on the board and not mention that it's symmetric under even permutations of the arguments, because it's really the determinant of the 3-by-3 matrix containing those entries. I mumbled some words about how that sort of determinant comes up in changes of variables in multiple integrals, though; I was thinking of the Jacobian.

Of course, there's the classical claim that "you can only define a cross product in one, three, or seven dimensions", which I didn't mention, because nobody asked "can you define a vector product in two dimensions?" -- I would have mentioned it if they've asked.

But the cross product has that nice geometric interpretation. How do you continue with that? My officemate reminded me that the determinant of the "matrix"

\det \begin{pmatrix} \hat i & \hat \j & \hat k & \hat l \\a_1 & a_2 & a_3 & a_4 \\b_1 & b_2 & b_3 & b_4 \\c_1 & c_2 & c_3 & c_4 \end{matrix}

(where i, j, k, l are the unit vectors in four-space) is the 3-volume of the parallelepiped spanned by the three vectors (a1, a2, a3, a4), (b1, b2, b3, b4), (c1, c2, c3, c4). So it seems that we can define some sort of "cross product" in this sense in any number of dimensions; in Rn it'll take n-1 arguments. This is actually the wedge product in disguise.

But then what's the "cross product" (in this sense) of a single vector in 2-space? It's not the vector itself; if you write the "determinant" it's

\det \begin{pmatrix} \hat i & \hat \j \\a_1 & a_2 \end{pmatrix} = a_2 \hat i - a_1 \hat j

and so applying this operation to the vector (a1, a2) returns (a2, -a1). The operation takes in a vector and spits out its rotation by 90 degrees. This makes sense in retrospect; in the n-dimensional it spits out a vector which is orthogonal to the n-1 input vectors.

It just seems strange to be calling rotation by 90 degrees a "product"...

11 September 2007

Fencepost errors, intervals, redeeming coupons, and drinking

A friend of mine asked today: if a coupon says it expires on a certain date, is that the last date that it's usable, or the first date that it's not usable? (He was referring in particular to the customized coupons that print out on receipts at CVS if you participate in their "loyalty program".)

My answer: the last day it's usable. This is basically by analogy with more "traditional" coupons, which are likely to expire on, say, "September 30". It seems reasonable that one could use the coupon on every day in September, and on no day in October; it would not seem reasonable to be able to use the coupon on every day in September but one. Coupons printed out by this program expire on seemingly random dates, but it seems that the same principle should apply there.

Similarly, if I said "the homework is due on Thursday" to my students, and then one of them handed it in on Thursday at noon, I would not say that what I really meant is "Thursday is the first day on which I will not accept the homework", and that therefore the homework is late. (In practice I usually say what time the homework is due in order to avoid this question.)

I'm not sure what the law actually has to say about this, though. I do know that laws in various places endorse different interpretations with regard to the "sell by" date on milk: in some jurisdictions the "sell by" date is the last date on which the milk can be sold, and in others it is the first day on which it cannot.

I'm inclined to say that one is always allowed to do the thing on the date which is stated, so it should be permitted to redeem a coupon, sell a carton of milk, etc. on the date listed on it. Similarly, if there's something that I'm allowed to do if I am "at least N years old" I should be able to do it on my Nth birthday, and the law agrees with me. (There's an exception of sorts for drinking, actually; in some states one cannot drink at midnight on one's 21st birthday, but must wait until the bars have closed and re-opened again. A friend of mine turned 21 in Massachusetts and was disappointed to learn this. Apparently the reason for these laws is that some people were trying to take 21 shots between midnight and last call, often 1am or 2am, and they were dying. Dying was seen by the lawmakers to be a Bad Thing.) There's a symmetry here, in that both of these are "permissive" interpretations.

But the scenario in which one is always forbidden to do a thing on the date stated on the appropriate piece of paper (so no redeeming a coupon on its expiration date, no drinking on your 21st birthday) also is symmetrical, in a way -- one might call it a "strict" interpretation. One could also have "early" and "late" interpretations -- the "early" interpretation would be that you can't redeem the coupon but you can drink, and the "late" interpretation the reverse.

These four cases are roughly analogous to the four sets of inequalites a ≤ x ≤ b, a < x < b, a ≤ x < b, and a < x ≤ b, although making the analogy precise takes more time than I want to spend on this.

10 September 2007

Asimov on large numbers

Asimov on large numbers, including Skewes' number. (By the way, Skewes is pronounced in two syllables, which I didn't know, and Littlewood's middle name was apparently Edensor.)

Asimov also writes xy as x\y, which is a nice trick when the exponents get complicated, as they do here. Why is there not a non-subscript notation for powers, other than exp(x) for ex when the base happens to be e?

You can take any real number greater than 1 and write it as 10\10\...\10\k for some number of 10's and some constant 1 < k < 10; Skewes' number is 10\10\10\34, or 10\10\10\10\1.53, so Asimov refers to this as a "quadruple-ten number". If I remember correctly Douglas Hofstadter uses this same idea of quantifying numbers by the number of exponentiations needed in one of his Metamagical Themas columns, although more correctly Asimov had it first.

Essentially all "real" numbers are either "single-ten" numbers (between 10 and 1010) or "double-ten" numbers (between 1010 = 10\10\1 and 101010 = 10\10\10)); note that the number of particles in the observable universe is much less than 10\10\2. (It's something like 10\10\1.90, which is actually much smaller than 10\10\2 even though it doesn't sound like it.)

There's a sense in which, say, 10\x and 10\y are close to each other if x is close to y. This is reflected in some asymptotic notation I've seen, where one writes f(x) ~ g(x) if (log f(x)/log g(x)) approaches 1 as x approaches infinity, so, for example, exp(x2) and exp(x2 + x) are "close together" in this sense. I don't know of cases where one can only show that log log f(x) and log log g(x) are asymptotically equal; this seems like too coarse a definition of "equality" to be of much use.

How many well-formed formulas are there?

The following is exercise 1.1.2 of "A Mathematical Introduction to Logic", by Herbert Enderton:
Show that there are no [well-formed formulas] of length 2, 3, or 6, but that any other positive length is possible.


Enderton defines a wff as follows: every sentence symbol An is a wff. If α and β are wffs, then so are (¬ α), (α ∧ β), (α ∨ β), (α → β), and (α ↔ &beta); furthermore, that's all of the wffs. (The parentheses count as symbols. The sentence symbol An counts as a single symbol.)

The problem is pretty straightforward. A, (A ∧ B), and (A ∧ (B ∧ C)) are wffs of length 1, 5, and 9 respectively. If α is a wff of length n, then (¬ α) is a wff of length n+3; thus we can generate wffs with lengths in the following sequences: {1, 4, 7, ...}, {5, 8, 11, ...}, {9, 12, 15, ...}. This is all the integers except 2, 3, and 6.

Now, each of the ways of forming new wffs from old takes one or two wffs, adds them together, and adds three new symbols. So we can't get a wff of length 2 or 3 because it would have to come from a wff of length -1 or 0. Furthermore, there are no wffs of length 6, because they'd either have to be of the form (¬ α) where α has length 3, or (α * β) where * is ∧, ∨, →, or ↔ and α and β have total length 3, which would mean that one of α and β has length 2.

The natural follow-up question, to me at least, is "how many wffs are there"? This is silly to ask because if we have infinitely many sentence symbols, there are of course infinitely many wffs of any length for which there is at least one wff. So let's restrict ourselves to one sentence symbol, A. Let fn be the number of wffs of length n. Let F(z) = Σn fn zn be the generating function of this sequence. Then we have

f(z) = z + z3f(z) + 4z3f(z)2

where the first term on the right-hand side corresponds to the formula which consists of just the sentence symbol, the second term to all those formulas of type (¬ α), and the third term to all those forms of type (α * β). (The coefficient 4 comes up because there are four things that "*" could be.)

Solving this for f, we get

f(z) = {1-z^3 - \sqrt{1-2z^3+z^6-16z^4} \over 8z^3}

and so we get

f(z) = z + z4 + 4z5 + z7 + 12z8 + 32z9 + z10 + 24z11 + 160z12 + 321z13 + 40z14 + 480z15 + ...

which is a bit surprising; the coefficients don't grow "smoothly". This is because, for example, wffs of length 9 are of the form (A * (A * A)) or ((A * A) * A) where * is as above (and yes, these two forms are different), so there are 32 of these; the only wff of length 10 is (¬ (¬ (¬ A))), since any other such formula would come from two formulas with total length 7, and one of those would have to be of length 2, 3, or 6 which is impossible. But I'm more interested in the asymptotics.

It's a general principle that the coefficients of a generating function grow roughly like fn ~ (1/ρ)n, where ρ is the absolute value of the singularity of f(z) = Sigma;n fn zn; this is true up to a "subexponential factor" which has to do with the nature of that singularity (polar, algebraic, etc.) The singularities of f(z) in this case arise from taking the square root of numbers near 0; thus they are located at the roots of the polynomial 1 - 2z3> +z6 -16z4. The smallest root is about ρ = 0.4728339090, so the number of wffs of length n grows like (1/ρ)n, or about 2.11n, times some subexponential factor. If you take larger coefficients they do have a ratio of about 2.11.

The subexponential factor is probably something like n-3/2 -- that is, the number of wffs of length n is probably about 2.11n n-3/2 -- because that particular subexponential factor often arises when one analyzes the asymptotics of tree-like structures. Don't ask me to explain that.

The interpretation this brings to mind is that if you write a wff by just writing symbols at random, at any given step there are in some sense "on average" 2.11 symbols that you could write down which would keep the wff grammatical. There are eight possible symbols, so this suggests that the best special-purpose compression algorithm for this formal language would shorten formulas to log(2.11)/log(8), or about 36%, of their original length.

09 September 2007

division in Foxtrot and some thoughts on arithmetic

From today's Foxtrot, you can learn what division really is. It also illustrates quite vividly the inefficiency of unary notation.

Division, of course, is basically just taking things and sorting them into piles of the same size, and then counting the piles; the result usually known as the division algorithm, which is not actually an algorithm but rather an existence theorem, makes this clear. I wonder if children learning arithmetic know this or if they think that "division" is just some magical algorithm that takes a bunch of numbers and pushes them around and spits out another number; it's been a very long time since I learned about division, and I've also learned not to trust my own introspection to tell me how "most people" see a given mathematical procedure. If I were "most people" I wouldn't be a mathematician.

For the most part I don't do arithmetic by the methods one learns in grade school. From just doing a lot of arithmetic (often in the course of experimenting with some combinatorial problem) I've quasi-memorized a lot of the particular arithmetic problems that come up most often. (In my case these seem to usually be products of numbers with small prime factors.) The set of arithmetical facts that I know off the top of my head is idiosyncratic; for example, I wouldn't know 392 or 432 off the top of my head, but I can instantly say that 372 = 1369 because it is in the name of 1369 Coffee House, which I frequented quite a bit during my undergrad years at MIT. But I also wouldn't multiply 37 by 37 by taking a bunch of things and arranging them in a 37 by 37 square and counting them. For one thing, I don't have 1,369 similarly-sized things. Also, I don't have a flat surface large enough for that in my apartment. This is why we invent calculation algorithms that don't directly mirror the definitions at the lowest level.

The grade-school arithmetic algorithms come pretty close to doing that, though, if you accept that you can arrange your objects in piles of 10, 100, 1000, and so on. But these aren't the most efficient methods; for multiplication, Strassen's method for integer multiplication has time-complexity Θ(N log N log log N) for N-bit integers, compared to Θ(N2) for the grade-school method. This paper of Martin Furer supposedly has better asymptotic behavior, although it wouldn't surprise me to learn that it's better only for very large numbers. (Strassen's algorithm, in turn, is only the best known method once the numbers get above 10,000 digits or so.)

the xkcd number

Check out this xkcd comic, which includes among the "things that xkcd means":

"[xkcd] means calling the Ackermann function with Graham's number as the arguments just to horrify mathematicians."

The Ackermann function is defined as follows: A(m,n) = n+1 if m=0, A(m-1, 1) if m>0 and n=0, A(m-1, A(m,n-1)) if m>0 and n>0.

A(0,n) = n+1, which is obvious.

A(1,n) = A(0, A(1, n-1)) = A(1, n-1) + 1. Thus we have A(1,n) = n+k for some constant k. Now, A(1,0) = A(0,1) = 2, we have A(1,n) = n+2.

The same sort of reasoning leads to A(2,n) = 2n+3, A(3,n) = 2n+3 - 3.

A(4,n) can be written down it's 2^(2^(2^(2^...2)))...) - 3, where there are n+3 twos. (Incidentally, power towers should be evaluated from the right to the left; p^q^r is p^(q^r), not (p^q)^r. The reason I say this is because it's clear we should be consistent and either evaluate from left to right or from right to left, and there's already a perfectly good way of writing (p^q)^r, namely p^(qr).)

But A(5,n) and A(6,n) can't be written nicely.

And Graham's number, denoted g64, is already obscenely large; I'm not going to bother writing out how to define it, you can read Wikipedia. It's an upper bound for the answer to the following problem:

Consider an n-dimensional hypercube, and connect each pair of vertices to obtain a complete graph on 2n vertices. Then colour each of the edges of this graph using only the colours red and black. What is the smallest value of n for which every possible such colouring must necessarily contain a single-coloured complete sub-graph with 4 vertices that lies in a plane?

and the correct answer to this problem was suspected, for a long time, to be 6.

The author of xkcd claims that A(g64, g64) is probably the largest number ever concisely defined. Of course, one can always define a larger number. I nominate gk, where k = A(g64, g64); this number has the unfortunate property that it requires double-subscripts to write. In my opinion, good mathematical notation avoids double subscripts when at all possible.

Of course, this is all just a more mathematically sophisticated version of the "infinity plus one" that makes an appearance every so often on playgrounds. There's always a bigger number. (Although, if you want to get technical, all the numbers I mentioned are finite!)

And the xkcd number A(g64, g64) is only "concisely defined" if you've defined Graham's number and the Ackermann function, which themselves take some work. So what's the largest number you can write with, say, four characters, chosen from some "reasonable" set of characters in our standard mathematical notation? The two obvious candidates are A = 9^(9^(9^9))) or B = 9!!!. (Note that the first of these can be written in four characters if properly formatted.) In fact, A>B. We can see this by taking the logarithm of each of them twice

log A = 9^(9^9) log 9
log log A = log 9^(9^9) + log log 9
log log A = 9^9 log 9 + log log 9 = 851249821

Now, to take the log of a factorial, we recall Stirling's approximation, which I've used before. We have the approximate equality

log n! ~ 1/2 log (2Ï€) - n + (n+1/2) log n

for our purposes we can use the much cruder approximation log n! < 2n log n. Then we have

log B < 2(9!!) log 9!!
log log B < log 2 + log(9!!) + log log (9!!)
log log B < log 2 + 2(9! log 9!) + log ( 2(9! log 9!) ) = 9291071.

Finally, all this business reminds me of a talk I went to about large numbers. Every time the speaker was about to prove that one ridiculously large number was larger than another, the fire alarm went off. It seemed as if the building didn't want us to talk about such large numbers that day, as if we were going to give away the secret of the universe.

08 September 2007

mapping functions and genes "crossing over"

In genetics they have a unit called the centimorgan. This unit is a unit of what is called recombinant frequency, and it doesn't seem to be well-defined. For those of you who don't remember your biology (and I'll admit I'm one of them), recall that almost all cells contain chromosomes in pairs (23 pairs in humans). in the process of meiosis, cells are produced which contain a copy of one member of each pair. When fertilization occurs, these come together to form a new pair of chromosomes. However, this new pair mixes up or "recombines" parts of the old pair, as can be seen in the image.

The result is that two genes which are physically close together on the same chromosome will be inherited together, but two genes which are physically far apart might not be inherited together. When one learns about this in an introductory biology class, I think that the fact that two cross-overs is, in a sense, the same as no cross-over at all is ignored. That is, if the chromosomes cross over twice, or four times, or six times, or any even number of times between two genes, then those genes will end up on the same copy of the chromosome even after crossing over. (A more quotidian analogy: you walk down a street, arbitrarily crossing it "when the mood strikes"; the probability that at some given moment in the future you are on the opposite side from where you started is not the same as the probability that you have ever crossed the street, because you might have crossed back.)

Certainly, I don't remember hearing it in high school biology, and it's not mentioned in Time, Love, Memory: A Great Biologist and His Quest for the Origins of Behavior, which is the book I'm reading right now. It's nominally a biography of Seymour Benzer (who is still an active researcher) but is also something of a history of molecular biology.

Anyway, two genes are said to be one centimorgan apart if the probability of a crossover occurring between them is 0.01 -- or if the probability of an odd number of crossovers occurring between them is 0.01 -- or if the average number of crossovers between them is 0.01 -- I can't determine which. From what I can gather, molecular biologists seem to think of centimorgans as additive, which seems to require the third definition. (It looks like sometimes they use the other definitions and use something called a mapping function to correct for this, but I'm not entirely sure I'm reading this correctly.)

Now, a first guess would be that crossovers occur basically at random over the entire chromosome, and are a Poisson process. For the sake of simplicity assume that crossovers form a Poisson process with rate 1 -- that is, in a piece of the chromosome of length λ, the number of crossovers is a Poisson distribution with mean λ, and non-overlapping pieces have independent numbers of crossovers. What is the probability of an odd number of crossovers occuring in a segment of length λ? Let X be a Poisson(λ) random variable; then it's f(λ) P(X = 1) + P(X = 3) + P(X = 5) + ... The logical question to ask is: is this an increasing function of λ? That is, as we consider points further and further apart on the chromosome, does the linkage between them actually become less strong? You could imagine that the function might not be increasing. For example, say that after one crossover, the next crossover always occurred between 9 and 11 space-units down the line. Then two genes between 11 and 18 units apart would always end up on opposite chromosomes, and two genes between 22 and 27 units apart would always end up on the same chromosome, and in general you'd have some sort of oscillatory behavior.

Under the Poisson assumption, though, the answer is yes. In fact, we have
P(X = 1) + P(X = 3) + P(X = 5) + ...
= λ e + (λ3 e)/3! + (λ5 e)/5! + ...
= e (λ + λ3/3! + λ5/5! + ...)
= e sinh λ
= (1 - e-2λ)/2
which is known as Haldane's mapping function. It's hard to find a clear derivation of this online, because most of what's available online is course notes that are intended for people who will be using this in their work and don't particularly need to know the derivation.

What this tells us is that two genes which are separated by λ "units of space" will recombine with frequency (1 - e-2λ)/2. Note that if λ is small, this is only very slightly smaller than λ, since cases when there is more than one crossover in the space between the genes are vanishingly rare. But it also tells us that if two genes A and B recombine with frequency p, then they are not p of these "natural units" apart, but rather they are a distance λ apart with (1-e-2λ)/2 = p, so λ = -log(1-2p)/2. So, for example, if two genes A and B recombine with frequency .20, the average number of recombinations between them is not .20, but -log(.6)/2 = .255. And if another two genes B and C recombine with that frequency, and they are arranged on the chromosome in the order A, B, C, then the distance between them is -log(.6), and the recombination frequency is (1-elog .6)/2 = .32, not .40. In general, if A and B recombine with frequency p, and B and C recombine with frequency q, then A and C recombine with frequency p+q-2pq. This can be derived from the Haldane mapping function, but the following argument is nicer. In order for A and C to recombine, exactly one of the pairs (A, B) and (B, C) must recombine. With probability p(1-q), A and B recombine while B and C don't; with probability q(1-p) the reverse happens. Again, this formula seems to recur without justification in notes that I can find online.

If you know anything about special relativity, this sort of reminds me of how rapidities add (while velocities don't). The rapidity of a particle with velocity v is is tanh-1 v/c, which is approximately v/c (or v, if you like natural units); relative rapidities are additive, while relative velocities are only approximately additive, and then only for small velocities. Something similar is going on in the genetic situation, where the usual measure of "distance" is only additive for things that are close together and a correction has to be used when they get far apart.

And how would this be different if chromosomes came in, say, triplets instead of pairs? Maybe I should be a mad scientist in my next life. Then I could find out. (Or I could just do the calculation now, but I've got better things to do.)

(I suspect there are places here where I'm not using correct biological terminology; here I follow in the footsteps of Feynman, who when he learned about zoology once went to a library and asked them for a "map of the cat".)

"exponential" decay of iPhone prices?

In yesterday's New York Times: IPhone Owners Crying Foul Over Price Cut, by Katie Hafner and Brad Stone.

As you may know, the iPhone originally cost $599 for the 8-gigabyte model and $499 for the 4-gigabyte model. The 4GB model has been discontinued, and the 8GB model has been cut to $399.

People who bought the iPhone when it came out are complaining. Why? Because they are silly. Prices on consumer electronics almost always go down. Anyone who's bought two computers in their lifetime knows this, because the second one was almost certainly cheaper and had more computing power. (I hesitate to say "faster" because newer software uses computing resources less efficiently than older software.)

The article comments on this:
The price of consumer electronics is always going down thanks to intense competition and the steady decrease of the cost of electronic parts. The pricing is largely determined by Moore’s Law, the observation made by Intel’s co-founder Gordon Moore that the number of transistors on a silicon chip doubles roughly every 18 months. Because this rate of change is described by an exponential curve, it dictates not only that prices fall, but also that they fall at an increasing rate.


No.

Okay, I buy Moore's Law, although perhaps not with the "18 months" coefficient there; depending on who you ask you'll hear anything from 12 to 24 months. And does Moore's law apply to flash memory, which is clearly one of the larger costs involved in the iPhone? Still, let's give them Moore's law. That doesn't mean prices fall at an increasing rate. Let's say that everything involved in the iPhone is subject to Moore's law, so the price decays exponentially; then the price falls at a decreasing rate. For example, let's say that iPhone prices fall by one-third per quarter (as they have so far, although I don't see that holding up) -- then they fall from $600 to $400 in the first three months, then $400 to $267 in the next three, $267 to $178 in the next three, and so on. Each of these decreases is smaller than the one before. If you take the logarithm of the prices and then differentiate -- which has the effect of measuring the rate of price change in units of the current price -- you get a constant. It's not accelerating. More generally, the price of the object is given by P(t) = P0e-kt for some constants P and k; P'(t) is a decreasing function, and d/dt log P(t) is the constant -k.

But some parts of the iPhone are not subject to Moore's Law. The advertising budget, for instance -- there are a lot of ads. I doubt the case that it comes in is made up of transistors, either. The pay of the people in [third-world country] who put it together probably isn't decreasing.

So a better model for the price of the iPhone -- or any such object -- is probably a constant plus a decaying exponential. (The "constant" is measured in real, i. e. inflation-adjusted dollars; if you wanted to do this in nominal dollars it would be a rising exponential.) So we have

P(t) = P1 + P0e-kt

and P'(t) is still decreasing. In this case, even d/dt log P(t) is decreasing:

{d \over dt} \log P(t) = {{-P_0 k e^{-kt}} \over P_1 + P_0 e^{-kt}}

{d \over dt} \log P(t) = {{-P_0 k e^{-kt}} \over P_1 + P_0 e^{-kt}}
which isn't obviously decreasing; we could differentiate again, but it's easier to rewrite it as

{d \over dt} \log P(t) = {{-P_0 k} \over P_1 e^{kt} + P_0}

which has a constant numerator and a clearly increasing denominator; thus the rate of price decrease is actually slowing.

At least they used the word "exponential" correctly; usually the media doesn't even do that.

07 September 2007

More men at the top, and at the bottom.

As has been documented by a lot of people, it seems that a lot of psychological traits have the following properties:

  • men and women have approximately the same average for this trait, and

  • both genders have an approximately normal distribution for this trait, but

  • the distribution of men's values for this trait has a larger standard deviation than the women's values


This has the effect that men are overrepresented at both extremes. The canonical example is skill in science or mathematics; it's been claimed that women and men are on average equally good at mathematics, but most of the best mathematicians are male. This isn't a contradiction, because most of the worst mathematicians are male but we don't notice it. (It actually doesn't matter that the averages are the same; even if men were on average worse than women at math, if they had a larger standard deviation then they'd predominate at the higher levels.)

The article Is There Anything Good About Men?, which was an invited address by Roy Baumeister at the American Psychological Association, addresses this. This is thought to arise from the fact that men can have more offspring than women.

Let's say that the X-ability of women is normally distributed with mean 0 and standard deviation 1, and the X-ability of men is normally distributed with mean 0 and standard deviation σ. Then the probability density function for the mathematical skill of women is

f(z) = {1 \over \sqrt{2\pi}} \exp \left( {-z^2 \over 2} \right)

and that for men is

g(z) = {1 \over \sigma\sqrt{2\pi}} \exp \left( {-z^2 \over 2\sigma^2} \right)

If we look at the ratio f(z)/g(z), this is the ratio of women to men at skill level z. It's

{f(z) \over g(z)} = \sigma \exp \left( {z^2 \left( \sigma^{-2} - 1 \right) \over 2} \right)

and this equals 1 when

z = \pm {\sigma \sqrt{2 \log \sigma \over \sigma^2-1}}.

When z is closer to zero than this, women will predominate; when z is larger, men will predominate. It turns out that

{\sigma \sqrt{2 \log \sigma \over \sigma^2-1}} = 1 + {\sigma-1 \over 2} + O((\sigma-1)^2)

and since σ probably isn't much larger than 1, men will predominate at more than about one standard deviation from the mean and women at less than one standard deviation from the mean. Furthermore, we have f(0)/g(0) = σ; again, since σ isn't that much greater than 1, the predominance of women over men at the center of the overall distribution is difficult to see.

Yet if σ = 1.1 -- meaning that men's skill have a standard deviation 1.1 times that of women -- then g(3)/f(3) = 1.99, so men will be twice as common as women at z=3 (which corresponds to 3 standard deviations above the mean for women, and 2.73 for men). (The same is true at three standard deviations below the mean.) At z=4, men are overrepresented by a factor of 3.6, and at z=5, by a factor of eight.

Another thing that occurred to me is the economic ramifications of this difference. It's well-known that there are more obscenely rich men than obscenely rich women. It seems to me that economic ability -- i. e. the ability to earn money -- could be proportional to, say, the exponential of (some constant times general intelligence); so for every ten IQ points you gain, your income goes up by 30%. (I made up these numbers.) This would mean that economic ability is lognormally distributed (the name is a bit counterintuitive, if you don't know it, but it means that the logarithm of economic ability is normally distributed). But the mean of a lognormally distributed variable is eμ+σ2/2, where μ and σ are the mean and standard deviation of the variable's logarithm. So if intelligence is normally distributed in both canonical genders, but men are more spread out than women in intelligence, then the mean of men's earning potential will be greater than that of women. I'm not saying that earning potential is directly tied to general intelligence (I know plenty of people who are smart but not rich) but it wouldn't surprise me to learn that earning potential is lognormally distributed and that something like what I've outlined here is at work.

How to build mountains

Mark Chu-Carroll of Good Math, Bad Math writes about how images of simulated mountains are constructed using a fractal process.

I'm kind of surprised to see how simple it is; basically you take a triangle and "pull up" a random point in the interior to get an irregular pyramid, and repeat this procedure on each of the faces.

One of the commenters there, mj, writes: "In a way it's not surprising that complex structure comes out of very simple fractal rules. The structures in reality, the real mountains, are also formed by relatively simple processes. A bit of wind and rain and erosion..." That's a good point. What's interesting is that there's no obvious connection between the rules used to generate the fractal and the real rules, but they generate the same sort of structure on a global scale. On a large enough scale, the low-level structure is simply hidden. Something similar happens with, say, random walks; a cloud of random walkers allowed to dissipate will eventually approach a Gaussian distribution regardless of the underlying lattice. An observer who could only observe on a large scale couldn't tell what the underlying lattice is. There are much deeper ideas of this sort; for example, we don't know whether the universe is actually continuous or discrete.

06 September 2007

The Frogger crossword

Check out the New York Sun's crossword for today, September 6, 2007.

The theme of this puzzle is "FROGGER", as we're told by the entry in the center; the crossword was constructed so that you can get from a square at the bottom to a square at the top going via only squares that contain letters in the word FROGGER. A picture of the solved crossword with just the FROGGER letters is at the left.

As it turned out, I'd been absent-mindedly flipping through Percolation by Geoffrey Grimmett right before taking a break to do this crossword. One of the standard results in the theory of percolationis that if we start with an infinite grid and fill in some proportion p of the squares at random, then with probability 1 there will be an infinitely large filled cluster containing the origin when p > 1/2, and if p < 1/2 there's an infinite filled cluster containing the origin with probability 0. (Usually one talks about open and closed bonds -- the lines connecting the sites -- instead of open and closed sites, but since the square lattice is self-dual that doesn't matter.)

So it's noteworthy that this is possible here; it wouldn't be possible in a random crossword. We wouldn't be able to get "far away" from a given site using only letters that are taken from some small set. The transition isn't going to be quite so sharp in the non-infinite cases, and the standard 15-by-15 crossword isn't all that close to infinity. Furthermore, finding a long path is even less likely in a crossword than in an ordinary square lattice, because in a crossword there are some black squares.

Furthermore, percolation theory usually assumes that sites are independent; this isn't true in a crossword, because the letters in sites adjacent to each other are correlated. For example, if a given square contains a T, the letter to the right of it or below it is probably more likely to be an H than otherwise, because the pair of letters "TH" is quite common. Any serious attempt to think about percolation in crosswords -- although I can't imagine anyone would study that seriously -- would have to take this into account.

A brief explanation, What is... Percolation by Harry Kesten, was published in May 2006 in the notices of the AMS.

05 September 2007

How much cash should you carry?

An interesting question from Marginal Revolution: how much cash should you carry?

Greg Mankiw has also commented on this; apparently there's a standard model called the "Baumol-Tobin model" that answers this question.

There are two factors in play here that we need to think about. Any cash you're carrying can't earn interest, so you forgo interest by carrying more cash. But it takes time to go to the ATM, and your time has some value; you use more time in getting money if you habitually carry less cash. (If you also have to pay ATM fees, they can be folded into this.)

I suspect that most people, in most situations, manage their cash in the following simple way -- when the amount of cash they have is less than c dollars, they go to the ATM and withdraw C dollars. (For most people, c is probably on the order of the amount of cash they spend in the average day, because they pass by a convenient ATM once a day or so.) In my case, c = 20 and C = 60; that is, when I have less than $20 I stop at an ATM and withdraw $60. The times I don't do this are when I know that I'll have large cash expenditures; for example, last time I moved I withdrew a few hundred dollars in cash because I knew my movers only took cash. The average amount of money I have in my wallet is something like C/2 + c, since it is about equally likely to be anywhere between c and c+C; I'm going to make the simplifying assumption that c is much smaller than C, so we'll call this C/2.

So how much interest do I forgo, annually, by carrying this amount of cash? That's easy; it's Cr/2, where r is the interest rate. (My ATM card is linked to a checking account which gets essentially zero interest.)

Now, let's say each trip to the ATM costs me an amount a, measured in dollars. This is a combination of the time it takes me to get to the ATM, valued at whatever amount I value my time at, and any ATM fees I might pay. If the amount I spend annually in cash is s, then I'll have to make s/C trips to the ATM during the year. The quantity we want to minimize is the sum of the amount of interest I forgo by carrying cash and the time value of my trips to the ATM, which is

f(C) = as/C + Cr/2

To minimize this, we take its derivative; f'(C) = -as/C2 + r/2. Setting f'(C) = 0 and solving for C, we see that each time I go to the ATM I should withdraw (2as/r)1/2. Note that the dimensions work -- a has units of dollars, as does s, and r is a pure number, so 2as/r has units of square dollars.

Off the top of my head, I have s equal to about $4,000 (I withdraw $60 slightly more than once a week), a about $1 (I'm just guessing here, but the "good" ATM is a bit out of my way), and r about 0.12% (it's a checking account -- those don't have interest any more). Plugging in, I get C = $2,581; I should be withdrawing that amount of money every eight months or so. Even if I take r = 4%, which I could get from a high-yield savings account, I get C = $447, which would correspond to a withdrawal in that amount about every six weeks.

Why don't I withdraw that much money? I think it's because I wouldn't feel comfortable carrying it in my wallet; there are psychological factors that this model doesn't take into account, as people point out. Also, I suspect that if I were carrying a wad of cash like that I'd feel richer, which would lead me to spend more money, which would make me actually poorer. Plus, carrying large amounts of cash probably increases my probability of getting robbed...

Yet more fun with phase lag

It's still warm here in Philadelphia. Highs are in the eighties this week, which is slightly above average for early September.

Walking home from campus yesterday, a little before 7 PM, the sun was low in the sky. I found myself digging for my sunglasses. And I remembered the circumstances under which I bought the sunglasses -- in late March and early April I was having the same problem. (I live west of campus, so I could actually have this problem twice a day, as I walked east in the mornings and west in the evenings.)

But wasn't it cold then, I thought?

But the position of the sun doesn't depend on the cold. The position on the horizon at which the sun rises and sets depends only on the distance of the current day from the summer solstice. Yesterday was the 75th day after the summer solstice (I'm using June 21 here); the 75th day before the solstice was April 7. The high temperature in Philadelphia that day was 41 degrees. (The average high then is 59.) (The declination, which is the celestial analogue of latitude, varies essentially sinuisoidally, but that doesn't mean the position of sunrise varies sinusoidally because there's another circle to deal with.)

In short: in terms of insolation right now is like early April, but in terms of temperature it's like early June. Maybe this sort of asymmetry explains why spring and fall feel different, since we're sensitive to both temperature and amount of light.

Of course, it's hard to know for sure, because you also have to take into account the derivatives of insolation and temperature; in the spring they're both increasing, and in the fall they're both decreasing. (A Jewish friend of mine once said that he doesn't think it's coincidence that Yom Kippur, when Jews are obliged to fast from sundown to sundown, falls near the autumn equinox; that makes the fast a few minutes less than it would be if it fell near the spring equinox.) And there's a big psychological factor, as well. Right now, an hour and ten minutes before my first class of the year, I feel exhilirated intellectually; in, say, May I'll feel exhausted. There's not much of a way to control for the cycles we humans have imposed on the year.

But what would it be like to live in a world where the season with the most sunlight was the cold season? I suspect we'll never know, because the only way to create that sort of world is to isolate people from the natural world for a very long time, and if you're willing to pay people to live in your cave for a while you can probably think of more worthwhile things to find out by using them as test subjects.

04 September 2007

A graphical illustration of different types of attraction

Go look at http://thismight.be/offensive/uploads/2007/09/04/image/attraction.gif -- a chart of different types of attraction, from thismight.be/offensive. (The site doesn't allow external referrers; when I posted the link I thought it just didn't allow inline links to images.)

The chart has an x-axis "physical attraction" and a y-axis "mental attraction", and then the square is divided into regions corresponding to different types of relationships: Zone of Pain, Just Friends, One Night Stands, F Buddy, Awkwardness, Dating Potential, Dating Zone, Marriage Potential, and Null. ("Null" is in the extreme upper right, corresponding to a very high level of both physical and mental attraction; I take it that the creator of this image finds this impossible.

What I immediately noticed is that the chart is basically symmetric across the diagonal, which implies some sort of symmetry between mental attraction and physical attraction. I don't think I believe this, but what sort of hidden symmetry is there in human interaction in general?

Variations on the drunken-bird theorem, and real-world sightings

God Plays Dice exists in the real world!

My department traditionally has dinner for all the graduate students on the day before classes start. (This is preceded by the chair of the graduate program giving the annual ritual reminder of how the program is structured; I was basically told I should take some classes, teach my class, and do some research.) Anyway, I was waiting for dinner, and one of the new grad students got to talking, and I introduced myself, and he said "oh, are you the one who writes God Plays Dice?" Since you're reading this, you know I am. Someone said that he knew of two math blogs -- mine and Terence Tao's. Obviously this isn't saying that I'm as good a mathematician as him, but it's good company.

Anyway, we got to talking about how areas like combinatorics and probability are often perceived as "easier" than, say, algebra or analysis because the problems can be stated in terms that someone without much training can understand, even if the proofs aren't simple.

This led to me telling the joke about how "a drunk man always finds his way home, but a drunk bird can get lost forever". This is the colloquial statement of the following theorem about random walks. Consider a person who starts at the point (0, 0), and at each step of time takes a step to the north, east, south, or west with equal probability -- those are steps of (0, 1), (1, 0), (0, -1), or (-1, 0), respectively.1 In order to be back at the origin after some number of steps, 2n, there must have been an equal number of up and down steps (say m of each) and an equal number of left and right steps (say n of each). Thus the number of ways that this can occur is

\sum_{m=0}^n {(2n)! \over m! m! (n-m)! (n-m)!}

which can be rewritten as

{2n \choose n} \sum_{m=0}^n {n \choose m} {n \choose n-m}

by rearranging the factorials. Now, to do this sum, think about choosing n things out of 2n. To do this, we can choose m things from the first n, and then another n-m things from the remaining n; this gives the well-known identity

{2n \choose n} = \sum_{m=0} {n \choose m} {n \choose n-m}

and thus the number of ways that we can be back at the origin after 2n steps is in fact {2n \choose n}^2. This is approximately 42n/(π n) as can be seen by Stirling's approximation. Each path occurs with probability 4-2n, and so the probability of being back at the origin after time 2n is proportional to 1/n. Since Σn 1/n diverges, the expected number of returns to the origin is infinite; this means (in this particular case; there's a lemma I'm not writing out) that the probability of returning to the origin is 1.
(This is basically following the exposition in Section 3.2 of Probability: Theory and Examples, third edition, by Rick Durrett.)

In three dimensions, though, we get something proportional to n-3/2 there, and Σn 1/n3/2 converges, so the expected number of returns to the origin is finite; this implies (and I'm sweeping some details under the rug) that there's a chance of the random walk getting away forever. The two-dimensional case corresponds to the drunk man; the three-dimensional case corresponds to the drunk bird.

So where's the line between "always finds his way home" and "gets lost forever"? It seems to me that this should be at 2+ε dimensions, since that's what we need to make the sum convergent, but I don't know if anyone's studied random walks on fractal spaces. But this ε has to be fairly large; we came up with two examples, one on what one might call "2+ε" dimensions and one on "3-ε" dimensions. Both of these are transient, which is basically the technical term corresponding to "gets lost forever". A "recurrent value" of a random walk on Rd is one for which the random walk almost surely gets arbitrarily close to that point infinitely often; a random walk is called "recurrent" if it has at least one recurrent value (and then, in fact, all possible values are recurrent), otherwise transient.

The (2+ε)-dimensional example is a random walk on Z2 × [-n, n] for some small integer n; essentially, this is a random walk on a slab. (I'm not particularly worried about how we handle the boundaries; any reasonable way will do.) This random walk ought to be recurrent; the random walker (let's call it a "fish", swimming around in an ocean which is of finite depth), starting at (0, 0, 0), will return to points of the form (0, 0, k) infinitely often. At each of those times, the fish has third coordinate zero with some probability on the order of 1/2n; 1/2n of infinity is still infinity.

The (3-ε)-dimensional example is a random walk on Z2 × [0, &infty;), which actually corresponds better to the "drunk bird" because we can take the xy-plane to be the surface of the earth. One can invoke a sort of "reflection principle" here; this can be viewed as a random walk on Z3 where we've identified points which are mirror images across the xy-plane with each other. This lets us conclude that the random walk is transient. However, the reflection principle handles the boundary condition "wrong"; the natural thing to do when at a point (x, y, 0) is to allow steps to (x ± 1, y, 0), (x, y ± 1, 0), and (x, y, 1) with equal probability. So we have a probability 1/5 of leaving the xy-plane, while in the reflection-principle version that probability is 1/3. This gives the plane a certain "stickiness", and I assume this means that the random walk has a larger expected number of returns to the origin, but that number should still be finite.

1. In combinatorics, there is the convention of referring to, say, the "northeast" or "southwest" part of a drawing, which corresponds to how those directions look on a map. Mathematicians who aren't familiar with this convention find it strange, but for some reason "northeast" seems to flow a lot better than "upper right".

03 September 2007

Which million-dollar problem will fall first?

At Inkling Markets, you can bet on which of the seven Millennium Problems will be solved first.

The choices include the option "none", which is defined as "All problems will resist a solution indefinitely."; I'm not sure who would bet on that, because "indefinitely" means no one will ever be able to claim the prize.

I wonder about the efficiency of prediction markets when most of the people involved don't know what they're talking about. People talk about "the wisdom of crowds". But do the crowds need to be at least somewhat informed? And are the people who will take part in this particular prediction market informed enough as to not be totally worthless? (I suspect, for example, that people will vote for the problem they've heard of; the story of Perelman's proof of the Poincare conjecture was pretty widely distributed in the media about a year ago. It was overshadowed by the Pluto-isn't-a-planet story which broke around the same time, though.)

Incidentally, I'd bet on Poincare, for the obvious reason that Perelman came up with a proof! The CMI hasn't announced that they're giving him the prize yet, because there's a certain waiting period involved in order to make sure that the proof becomes widely accepted, but clearly that's the one furthest along the pipeline.

So in this particular case, the fact that people have heard of the Poincare conjecture is probably because of its connection to these prizes and the proposed proof.

A more interesting prediction market would be for the second Millennium Prize to be claimed.

fostering mathematical conversation

Expect the frequency of posting here to drop off somewhat after tomorrow.

This is because Wednesday is the first day of classes here. I originally began this blog in part because I didn't have much else to do with my time during the summer; now I will have classes and teaching and (at least in theory) research that I'm working on.

I will be taking courses in algebraic topology (homotopy theory, etc.; I'm not particularly looking forward to this one, but it's required for my program); logic and computability (using Enderton's text, A Mathematical Introduction to Logic); and probability inequalities and machine learning. (This last one is probably the one that will inspire the most posts over the course of the semester, if I had to guess.) I'm also TAing a multivariate calculus course.

The reason I mention the text for the logic course is that Enderton has produced an Author's commentary for his text, about which he says:

The purpose of these comments is to explain, section by section, what I am trying to do in the book. My hope is that the commentary will add a helpful perspective for the reader.

In addition being a place for helpful material, this website gives me the chance to post additional material. Users of the book might have varying views on this role of the website.

I welcome suggestions for how the commentary can be made more useful.

This is the sort of thing that I was talking about, to some extent, in this post about "companions to books". I gave there as an example of a companion to a textbook Bergman's A Companion To Lang's Algebra, wich were written by a professor to supplement the textbook for the course he was teaching; this is different in that it's written by the author.

From what I can gather, Enderton's book is one of the standard textbooks for an introductory logic course; presumably the many people who have taught courses based on this book have had similar sets of thoughts (although perhaps less extensive, as teaching a course takes less time than writing a book) and it would be nice to see all of those in some central place.

I said in my previous post that with this sort of companion I did not mean something like Wikipedia, because the original text would be inviolable. But in one sense I do mean something like Wikipedia -- no particular contributor's contribution would be all that valuable but they'd add up to something. The question, though, is who would use that something. The student wouldn't read it, because students are by nature lazy. Would professors teaching the courses care enough to read such a work? And more importantly, would they contribute to it?

I'd also like to point out something that cwitty mentioned in a comment here, namely the apparently dead project called "Fermat's Last Margin". Shae Erisson wrote in 2004:

The idea behind Fermat's Last Margin is...
I write a lot margin notes in books that I own, and research papers that I print out. To me, a research paper is an unfinished discussion. I like to argue, exclaim, deride, doodle, and generally get completely comfortable with academic publications that I read.
There is a downside. Andrew Bromage once made some questioning comments about a Comonads paper on the #haskell irc channel. Six months later, someone else made the same questioning comments. I realized that we can't share our margin notes! Who knows what brilliant ideas we've missed because we had one half, and someone else had the other half? Fermat's Last Margin is my answer.
The names comes from the story of Fermat's Last Theorem. The story is that Fermat wrote in the margin of a book that he had thought of a novel proof for this theorem, but the margin was too small, the proof would not fit. If Fermat had this margin I am making, it would be the last margin he would ever need.

Unfortunately that doesn't seem to have quite gotten off the ground. The point, though, is that there don't seem to be channels for communicating results on a level smaller than the research paper or the conference presentation, which causes unnecessary duplication of effort. Imagine if you couldn't share a body of mathematical ideas until you had enough of them to fill up a semester-long course or a book; chances are math never would have gotten off the ground if that were the case. Sure, there are ways for people to share ideas that aren't enough for a paper or a talk (for example, just talking to each other), but I don't think more of them would hurt, and I think the mathematical community ought to explore how to foster those sorts of interaction using the Web 2.0 paradigm. (I was trying to avoid using the words "Web 2.0" but they're a convenient shorthand for what I want to say.)

data mining, or, Shakespeare was a combinatorialist

A rant on the virtues of data mining (from Statistical Modeling, Causal Inference, and Social Science) and "Data Mining" = voodoo science (from The Geomblog).

Aleks at Statistical Modeling, etc. says that he does data mining, and that social scientists are not happy to see this and say things like "that's not science". (Incidentally, I tend to be skeptical of any discipline that has "science" in it's name. If you have to tell everyone you're a science, perhaps you're trying to hide something. It's like restaurants that serve "authentic Chinese cuisine".)

Now, I don't know a huge amount about data mining techniques. And it seems to me that it would be irresponsible to do the following:

  • Take a data set from which one could generate a large number different hypotheses.

  • Start picking hypotheses at random.

  • When you find a hypothesis that is true at the 95% confidence level, say "Eureka!" and publish.


This is silly because at the 95% confidence level, one expects that you'd get a hit one time out of twenty purely by chance. I sincerely hope that if the people doing this stuff do things analogous to what I just said, they use some much higher confidence level.

Data miners also seem to draw a distinction between exploratory and confirmatory data mining. They might use a technique like the one I just said (although more sophisticated) to find hypotheses that are worth looking at and studying in more detail. This, of course, is something we all do every day -- we look around and when things are interesting we look at them more closely. We cannot look at everything very closely because then our brains would explode.

The trick here, I suppose, is knowing how to distinguish signal from noise. My favorite example of the moment is Example I.11 of Flajolet and Sedgewick's Analytic Combinatorics (link goes to 12 MB PDF of the book). They tell us that the pattern "combinatorics" is hidden in the text of Hamlet, which begins as follows:

Who's there?

Nay, answer me: stand, and unfold yourself.

Long live the king!

Bernardo?

He.

You come most carefully upon your hour.

'Tis now struck twelve; get thee to bed, Francisco.

For this relief much thanks: 'tis bitter cold,
And I am sick at heart.

Have you had quiet guard?

Not a mouse stirring.

Well, good night.
If you do meet Horatio and Marcellus,

At this point we look at the bolded letters, jump up and down and say that Shakespeare was trying to tell us something, despite the fact that Shakespeare had never heard the word "combinatorics". But Shakespeare is also a Yankees fan (see the italicized letters) and went to Harvard (see the underlined letters), so we ought to be suspicious. It turns out that Hamlet contains 1.63 x 1039 instances of the word "combinatorics", whereas a random sequence of letters chosen uniformly at random from the English alphabet and of the same length as Hamlet contains on average 6.96 x 1037 such sequences, and a random sequence of letters chosen at random with the same distribution as normal English text contains on average 1.71 x 1039 such sequences. So all we can conclude, it seems, is that maybe Shakespeare was writing in the same language that the word "combinatorics" is taken from. (One could try to compute the standard deviation associated with that 1.71 x 1039 figure, but it's silly because English text is not created by reaching into a Scrabble bag over and over again.) Of course there are less frivolous examples -- one of them is looking at whether certain patterns that appear in the human genome occur more or less often than you'd expect by chance.

02 September 2007

Another shot at the Doomsday argument

Robin Hanson critiques the Doomsday Argument. This is an argument on the lifespan of the human species, which begins from the following principle: there is nothing special about present-day humans. "Therefore" we can consider the number of humans who have lived so far as a fraction of the number of humans who will ever live; the probability that this is between p and q is q-p. I put "therefore" in quotes because the implication is tempting, but one could equally well conclude that the amount of time there have been humans, as a fraction of the amount of time there will ever be humans, has this same distribution. (Indeed, I've heard both versions of the argument.) The first version of this argument says, for example, that the probability that there will be sixty billion more humans is at least one-half; the second says that the probability that we as a species will survive for another two hundred thousand years or so is at least one-half. (I'm assuming there have been sixty billion people who've ever lived and that our species is 200,000 years old.)

And indeed there are other classes of beings that you can use as the reference class here. Living things, for example. Or vertebrates, or living cells, or humans, or even such classes as "humans who haved lived after the year X", which get kind of ridiculous. That last one is particularly prone to abuse, as we can simultaneously say that humanity has a fifty percent chance of surviving past 2114 (if we take X = 1900) and past 3014 (if we take X = 1000).

The name "Doomsday argument" is rather misleading, too. "Doomsday" is usually seen as a bad thing. But what comes after humanity might be the "posthumans" that the people who believe in a technological singularity talk about; is that really doom? Hanson gives a quantitative version of this where there are several "toy universes".

I've talked before about how I'm not entirely comfortable with the "Copernican principle" from which this is derived. For some reason I am much more uncomfortable with this than I would be with the equivalent line of reasoning applied to non-human objects. If I had an urn containing balls labeled from 1 to N, and I didn't know N, and I reached in and grabbed a ball marked 100, I'd say in a heartbeat that the urn probably contained around 200 balls. But the difference is that in the Doomsday Argument we don't even know what the urn is.

The Doomsday argument supposedly only is provisional, until such time as we have better knowledge on how long societies tend to last. This is in my mind one of the most useful reasons for trying to find extraterrestrial intelligence; the knowledge that they do exist (or even a thorough search which doesn't turn up anything) would give us substantial information about how long we might expect to last.

When I studied biochemistry I thought something similar. Essentially all known life forms on Earth have similar biochemistry, because we all evolved from the same ancestors. So an introductory biochemistry class essentially consists of the memorization of those mechanisms. What I would have wanted to see is, say, a dozen or so independently evolved biochemistries, and then see which features of our own biochemistry are just accidents of evolution and which are essential to having complex, self-replicating systems.

Are you as hot as you think you are?

Megan McArdle of The Atlantic says that You're not as hot as you think you are. (From Marginal Revolution.)

What they don't say is that you're probably hotter than you look in photos.

McArdle writes:
A cognitive scientist at the University of Chicago explained why to me last winter. When we look at ourselves in the mirror, in any given session we tend to anchor on the time slice image that makes us look our best. That, we decide, is the "real" us.

Photographs, however, are a random sample of the various arrangements of light, angle, and facial expression that we can be found in. The median photograph of you is probably the best approximation of your physical attractiveness. But that wars with your self image, which is anchored on other, better combinations.


I don't know if I'd call photographs a "random sample". I don't like being photographed, so I probably look less happy in photographs than I do on average, and happiness makes you look good.

But what's more, evolutionarily we're not used to looking at people at a single moment in time. At the time when our cognitive machinery evolved, there were no still photographs. (It's the reverse of the semi-early Internet, where there were still photographs but no video.) Even a person who you meet for a second will get more information about how you look than they would from a photograph. And perhaps photos look repulsive because they're not moving, and for a long time anybody who wasn't moving was dead.

McArdle also writes:
Which, by the way, is probably the best gauge of how attractive you are; how attractive are the hottest people who want to go out with you? They're probably only slightly more attractive than you are.

This is tempting, but basically impossible. I agree that it's true, among heterosexuals, and in percentile terms -- a man who's more attractive than 60% of men is probably most likely to be with a woman who's more attractive than 60% of women. (I'm assuming that there are an equal number of heterosexual men and heterosexual women.) But we don't think in these terms. Do you have any idea what these hypothetical 60th-percentile people look like? Furthermore, in looking at who people actually are in relationships with a lot more things than just their appearance come into play.

The only thing to do is the following: say you're a man. Then think "what if I were gay? Then who is the most attractive man who would be willing to go out with me?" That might work... but it tells you how hot you are as a gay man, and gay men have different standards for attractiveness than straight women. (Similarly for women; straight men and lesbians have different standards.) For the sake of simplicity I have assumed in this paragraph that there are exactly two genders and everyone belongs to exactly one of them.

Perhaps you could also go to hotornot.com (wow, I haven't looked at that site in years!) -- where people post photographs of themselves and other people rate them on a scale of 1 to 10 -- but we don't know how the scores are distributed there, and more importantly these are photographs.

If you buy the "best time-slice" theory, this also explains why you might find your friends more attractive than the average person. (I know I do.) I spend more time with my friends than with a random person I see on the street (if I didn't, they wouldn't be my friends) and I probably anchor on to their best time-slices. The times when they looked good are stuck in my brain even when they don't actually look good. If you figure that how good one looks at any given moment is normally distributed around their "true" attractiveness, then the "perceived attractiveness" -- the maximum attractiveness I've ever seen them at -- should grow without bound. After I've seen the person n times, the most attractive they've ever been is around the 100n/(n+1) percentile. This sounds ridiculous; the quantitative truth of how we determine someone's attractiveness is probably more like averaging the top √k or so of the k timeslices we have of a person. You still get the "attractiveness growing without bound" problem, but it takes a lot longer to come. (It's probably a normal distribution, because attractiveness is a sum of many different things, so I wave my hands and say the magic words "Central Limit Theorem", but that doesn't matter.) Also, your friends probably share the same cultural affinities that you do, meaning that what they do to make themselves seem attractive is probably similar to what you do to make yourself seem attractive. To avoid cognitive dissonance, then, you have to assume they're hot in order to be hot yourself.

01 September 2007

The "Burger King Hot Zone" is silly

On the Fox Saturday baseball telecasts (about which I have ranted before), every so often a display of the "Burger King Hot Zone" comes up. This is a grid which divides the strike zone into a three-by-three grid, and gives a "batting average" for each of these nine parts of the strike zone.

I assume -- although I'm not sure -- that these are calculated by considering the number of balls which the player put in play from that position, and dividing the number of those which were hits by that number. For what it's worth, nobody seems to know.

The thing is, this is one of those statistics where the sample size is ridiculously small. If you figure the average player puts, say, 450 balls into play a season, then on average fifty of these will be in each of the nine grid pieces. A player who hits .300 on average will get fifteen hits from each of these nine sectors, with a standard deviation of 3.24; that's a .300 mean batting average (of course) with a standard deviation of .065. So my conclusion is that although players probably do have tendencies to hit certain pitches better than certain others, this almost certainly gets lost in the noise.

Ted Williams once did something similar, in his book Science of Hitting where he divided the the strike zone into a seven-by-eleven grid (thus each grid site is roughly the size of a baseball) and assigned a number to each grid site; from what I can gather, the number corresponds to what he thought his batting average would be if he always got a pitch there. At the Hall of Fame (where I was last weekend) they have a rendition of this with actual baseballs painted various colors, which you can see at the link. (I would have taken a picture when I was there, but I didn't bring my camera! I never take my camera anywhere, because I'd rather experience life than take pictures of it. Fortunately the other people who had their cameras with them have remedied this problem.) I wonder how accurately his perception corresponds to reality; I suspect Williams has the order right (the right-down-the-middle area where he said he could hit .400 probably was his best) but the numbers might be wrong. The point still stands, though; don't swing at pitches you can't hit, and know which pitches those are.

While I'm on the subject, I googled for my post and found Mark Dominus' post on baseball team names and followup thereto, which are irrelevant but interesting anyway.

Beauty is in the eye of the beholder

Ben Goldacre, writing in Bad Science, tells us the story of how he was contacted by a PR firm which wanted him to concoct some equations that would "prove" that certain celebrities had sexier walks than certain other celebrities, as part of a promotion for Veet hair removal cream. I am not making this up. It's an extended version of his column in the Guardian.

The press release seems to have beeen reproduced by about a zillion British newspapers, usually in a shortened version; the longest version I could find is here, and I'll be quoting from it.
JESSICA Alba, the film actress, has the ultimate sexy strut, according to a team of UK mathematicians. Beauty is no longer in the eye of the beholder - it can now be worked out using a simple mathematical formula.

Bullshit! And this isn't just me being bitter that I might not fit some social standard of beauty. It's me being bitter that mathematics is being "used" in this way. In Goldacre's column, we learn that there was no team. Richard Weber at Cambridge is the mathematician who's mentioned there, but only after Goldacre was contacted -- and Goldacre as far as I can tell is a medical doctor. (By "there" I mean Goldacre's blog post; I can't find a version of the press release that mentions Weber.)
The academics found that it is the ratio between hips and waist that puts the sway into a woman's walk - and the nearer that ratio is to 0.7, the better.

No they didn't! This seems like the stuff you see out there every so often about the Golden Ratio making things more beautiful -- and in fact that ratio's been used to sell pants! -- where there's some "magic number". There's at least some justification for the 0.7 number, though, in that Real Scientists have done studies, although the preferred ratio varies by culture. And it never seems to be given to more than one decimal place, which suggests that a few inches don't matter. Apparently Weber made a more nuanced comment that got cut down to this.
This ratio provides the body with the right torso strength to produce a more angular swing and bounce to the hips during the walking motion.

Furthermore, the waist-to-hip ratio might actually be important for physical attractiveness, but nobody said that had anything to do with the walk. I don't know much about biomechanics, though. But it looks like the causality just isn't there.

Oh, and they screwed the survey up so badly that it doesn't even mention anything that hair removal cream could actually do. You'd expect a study by a hair removal cream company to say that having smooth, shiny, hairless [insert body part here] was an important part of beauty.

Fortunately, they're just using this to sell hair removal cream. Recent studies have also shown that studies which are funded by pharmaceutical companies are far more likely to say that drugs do something goodthan studies which are not funded by pharmaceutical companies; that sort of hijacking of the scientific apparatus is a lot more insidious, as people could actually die.

I'm not saying that beauty can't be encapsulated in some sort of formula. But it'll be a lot more complicated than this one. (In fact, it might not be a "formula" at all, in the sense that you put in numbers which describe the person and get out a numerical rating of their beauty; much more feasible would be a recommendation system like that on Netflix or Amazon. As I understand it, those systems work by recommending books or movies to you that people who have bought the same books or rented the same movies as you also liked.)

And I'm skeptical of any formula that says that the same people will seem beautiful to everyone, because that's simply not true. I think someone could come up for a formula that will tell them who I am likely to find beautiful -- there are definitely patterns. But my tastes are not the same as yours. And don't try to tell me they should be.

P. S. I moved into my apartment a year ago today. As I sit here, I look out my kitchen window and see a truck from the movers I used. This is probably not a coincidence, though, as September 1 is kind of a big moving day in my neighborhood.