God Plays Dice: 2009

31 December 2009

A hack I'm disturbingly proud of, and its connection to some real math

I'm applying for jobs. Many jobs, because that's how academic job searches work these days. So I have a spreadsheet (in OpenOffice) to keep track of them.

Among the things that I track for each job, there is a column with 0, 1, or 2 in it. 0 means that I haven't submitted anything; 1 means I've submitted something, but not everything that was asked for; 2 means the application is complete. Averaging these numbers and dividing by 2 tells me what proportion of the search is complete.

But I also wanted to know how many 0s, 1s, and 2s there were. And as far as I know the built-in functions in OpenOffice won't do that.

What they will do, however, is this. I have a column consisting of 0s, 1s, 2s, and empty cells. By doing

COUNT(J8:J1000)
SUM(J8:J1000)
SUMPRODUCT(J8:J1000;J8:J1000)

I get the number of cells in that column which are nonempty; their sum; and the sum of their squares. (The SUMPRODUCT function takes 2 arrays of the same shape and returns the sum of the products of corresponding cells.) "8" is the row that contains the first job on the list, and "1000" is just a number that is comfortably more than the number of jobs I am applying for. Call these a, b, and c respectively. Let n₀, n₁, and n₂ be the number of entries which are 0, 1, and 2 respectively. Then I have

a = n₀ + n₁ + n₂
b = n₁ + 2n₂
c = n₁ + 4n₂

which is a three-by-three linear system, and can be solved for n₀, n₁, n₂, giving

n₀ = a - 3b/2 + c/2, n₁ = 2b-c, n₂ = (c-b)/2

and so I can recover the number of applications with status 0, 1, or 2 from this. From the sums of the 0th, 1st, and 2nd powers I can recover the distribution of the values themselves. (The actual code is slightly different, but of course equivalent, because I solved the system "by inspection" and never actually explicitly wrote it out until just now.)

Believe it or not, I actually use this trick in a preprint, "The number of cycles of specified normalized length in permutations", to do some actual mathematics! There I find the expectation of X⁰, X¹, X², ..., X^k where X is a certain random variable known to take on the values 0, 1, ..., k, namely the number of cycles of length in the interval [γ n, δ n] in a permutation of [n] chosen uniformly at random where γ and δ are constants. k is the greatest integer less than or equal to 1/γ ; for example, if we're looking at cycles of length at least 0.15n in permutations of n, there can't be more than six of them. This gives a linear system like the one above which gives the probability that X takes on each value 0, 1, ..., k.

18 December 2009

Uniquely identifying people by birth date, gender, and zip code

Netflix Spilled Your Brokeback Mountain Secret, Lawsuit Claims. A woman is suing Netflix because she was in the closet, and her movie-rental data was part of the Netflix prize dataset. She claims this means that people could figure out her secret.

Now Netflix is starting a second contest, and rumor has it that the data will include the zip code, birthdate, and gender of each individual. According to this paper (abstract only, unfortunately, so I can't comment on methods) by Latanya Sweeney, this is enough to uniquely identify 87% of the US population. A paper by Phillippe Golle gives the figure as 63%, based on actual Census Bureau data. (The Census gives the number of people with each birth year and gender in each zip code.)

Is it surprising that people can be identified this easily?

From the Golle paper, there are 33,233 "Zip Code Tabulation Areas" in the United States.

US life expectancy is 77.7 years. Since this is a back-of-the-envelope calculation, let's assume that everybody drops dead after 77.7 years (28,379 days), and therefore that the age of a random individual is uniformly distributed over the last 28,000 days. (It pains me to say this, because my grandmother is 85 and still living.)

There are, to a first approximation, two genders.

Therefore there are 28,379 * 33,233 * 2, or about 1.9 billion, possible combinations of birthdate, zip code, and gender. There are about 300 million Americans. If we assume all of these are equally likely (which they're not; some ages are more likely than others, and some zip codes have more people than others), and that they're independent (which they're not, as anybody who's lived in a college town can tell you; Golle notes the college-town effect, and also a military-base effect), then on average the number of people having a given (birthdate, zip code, gender) triplet is about 0.16.

So we'll model the population of the US as 1.9 billion Poisson random variables, each of mean 0.16, and each corresponding to a birthdate-zip code-gender triplet. How many of these do we expect to have value 1 (meaning that that triplet picks out exactly one person)? The probability that a Poisson(0.16) random variable takes the value 1 is exp(-0.16)*(0.16). Thus we find that there are (1.9 billion)*(0.16)*exp(-0.16) people uniquely identified by this triplet, out of (2.5 billion)*(0.16) people.

According to this crude model, the probability that a random individual is uniquely identified by these three pieces of information, then, is exp(-0.16), or about 85%. Why is everybody so surprised?

08 December 2009

Distribution of Putnam scores

The distributions of Putnam exam scores are interesting. See, for example, the 2001 distribution. It takes a bit of number-crunching to get an actual distribution of scores from the data; they report the "rank" of the people getting each score. The rank corresponding to a given score is, I assume, A+(B+1)/2 where A is the number of people scoring higher than that score and B is the number of people scoring that particular score. For example, in 2001 -- which happens to be one of the years in which I took the Putnam -- the table begins

Score	101	100	86	80	79	77	73	72	71	70	69	68
Rank	1	2	3	4.5	6	7.5	9	11	14	16.5	19	23.5
Number	1	1	1	2	1	2	1	3	3	2	3	6

where the first two rows are provided by the organizers, and the third row can be worked out by working left to right. For example, once we know 17 people got 70 or better, the fact that the score 69 corresponds to rank 19 means that the people scoring 69 must have been the 18th, 19th, and 20th-best; so there were three of them. (Incidentally, most increasing sequences of half-integers, when interpreted as sequences of ranks, don't appear to correspond to legitimate score distributions; the number of people getting certain scores ends up negative if you're note careful.)

Anyway, if you crunch the numbers on a typical Putnam score distribution you observe two things:
- the scores follow, roughly, a power law; the number of people scoring 10n decays like some power of n, for integer n.
- once you remove this decay (which I haven't actually done; I've just eyeballed it), there are "spikes" at multiples of 10. For example, the number of people scoring 18, 19, 20, 21, 22, 23 in 2001 were 8, 23, 99, 60, 39, 11. Twenty-four people scored 50; seven scored each of 49 and 51.

I can't explain the first one (and it may just be an artifact of the way I'm doing the plotting; lots of things look close to linear when plotted on a logarithmic scale). But the second one is actually easy to explain; Putnam problems are worth ten points each, and most scores are 0 or 10 with a smattering of 1, 2, 8, or 9. Scores between 3 and 7 on a problem are exceedingly rare. So to get a score of, say, 55, one has to get five problems right and have made a bit of progress on three to five more, which is less likely than straight-out solving five or six problems (for 50 or 60, respectively).

Incidentally, I haven't looked at the problems from the 2009 Putnam, because I have work to do.

22 October 2009

Math Overflow

Because I'd have to forfeit my math blogger card otherwise: you should know about Math Overflow This is a site where people can ask mathematics questions -- the level is basically that of questions which would be of interest to professional mathematicians. I'm rather enjoying it; it's procrastination and education at the same time!

It is also useful for asking questions and being told that I already answered them on this blog.

08 October 2009

Evidence that mathematicians have a big Internet presence

If you google genealogy, the first hit is The mathematics genealogy project. In some sense, according to "the Internet", mathematical genealogy is more important than real genealogy! (I have a feeling that the biological parents of mathematicians would be offended by this, so I won't tell my parents.)

If you google AMS, the first hit is the American Mathematical Society. (Societies of meteorologists, musicologists, Montessori schools, etc. show up further down the list.) I have a musicologist friend that I joke with this about, claiming that the mathematical society is the real AMS.

I suspect this is because mathematicians found the Internet early, and seem to be more likely to have personal web pages than even most other academics.

Edit, 4:18 pm: In a comment by Boris, I'm reminded that Google personalizes search results, so what I've said is not true.

30 September 2009

Is zero even?

Did you know that there are actually things to say about whether zero is even or odd (from Wikipedia)? Obviously it is, but the math-ed folks have seriously looked at this.

I found this via a comment by John Thacker at The Volokh Conspiracy. There's a poll there; right now 2% of people have said 0 is odd, 51% even, 43% both, 4% neither. I can kind of understand what's going on with people saying "neither" (perhaps they're getting this from some elementary-school notions), but how is 0 odd?

My answer: yes, zero is even, because it's twice an integer.

(Or because the identity permutation on n letters is an element of the alternating group A_n -- I've been thinking about permutations a lot lately. But if you understand that, you probably are like me, think zero is even, and didn't even think there was anything to discuss.)

Incidentally, sometime recently -- I forget the context -- I saw something that referred to the Gaussian integer a+bi as "uneven" if and only if a and b had different parity.

27 September 2009

Economic impact of mathematics?

Tim Gowers wrote in The Importance of Mathematics:

If you were to work out what mathematical research has cost the world in the last 100 years, and then work out what the world has gained, in crude economic terms, then you would discover that the world has received an extraordinary return on a very small investment.

I don't doubt this. But has anyone actually tried to do this? (And would the numbers even be meaningful?)

23 September 2009

Eponyms in mathematics

Let S be the standard Smith class of normalized univalent Matcuzinski functions on the unit disc, and let B be the subclass of normalized Walquist functions. We establish a simple criterion for the non-Walquistness of a Matcuzinski function. With this technique it is easy to exhibit, using standard Hughes-Williams methods, a class of non-Walquist polynomials. This answers the Kopfschmerzhaus-type problem, posed by R. J. W. (“Wally”) Jones, concerning the smallest degree of a non-Walquist polynomial.

This fake abstract of a paper is from Merv Henwood and Ivan Rival, Eponymy in Mathematical Nomenclature: What's in a Name, and What Should Be? (PDF), from the Mathematical Intelligencer in 1980. It sounds to me like slight caricature -- but only slight. Henwood and Rival point out that such names are lazy. Names have at least two important functions -- to describe and to label -- and eponyms only label.

Perhaps such abstracts would be more common in areas which are small enough that all the major players talk to each other. I imagine that Smith, Matcuzinski, Walquist, etc. know each other.

Also of interest is David Rusin's list of eponyms occurring in the MSC classification. These names in general seem a bit less obscure than the names one would find in the abstract of a random paper, which isn't surprising as they're names of concepts big enough to get areas named after them.

(And can someone confirm or refute the story that Banach, in the paper in which he introduced Banach spaces, called them "spaces of type B" in an effort to get them named after himself? I've heard this one a few times but always unsourced.)

22 September 2009

Steen on mathematics and biology

Here's a fascinating article on what math is good for in biology: The "Gift" Of Mathematics in the Era of Biology, by Lynn Arthur Steen. Steen gives lots of examples about what math is good for in biology. Somewhat surprisingly to me, he doesn't really mention one of the first things that came to mind, namely the use of combinatorial techniques to study the genome, which is nothing but a word on a four-letter alphabet. It's possible that he subsumes this in "statistics", though; to take a simple example, one might want to know how many times a certain sequence of bases would appear in a "random" genome in order to determine whether the fact that such a pattern appears often is signal or noise. Still, he makes the point that while the traditional mathematics curriculum (with lots of calculus and differential equations) takes its scientific inspiration from physics, biology is ascending.

A shorter version of this article is available at The Chronicle of Higher Education.

(How did I find this? Steen was one of the authors of Counterexamples in Topology, which I mentioned yesterday, so I went over to his web site.)

21 September 2009

Perfection "squared" on standardized tests

I came across an article about a student who got a perfect score on both the ACT and the SAT. (These are the two standardized tests used for university admissions in the US; generally schools on the coasts use the SAT and schools in the interior of the country use the ACT, although this is a vast generalization. The geographical separation seems to be a function of where the tests originated, in Iowa and New Jersey respectively.

This article (which I'm not linking to because I found it by googling a student, and the student is probably already not happy that this is all over the Internet) points out that less than 1 percent of students get a perfect score on each of these tests. (As you'll see below, this is quite an understatement.) I think we're supposed to come to the conclusion that less than 1 in 10000 students would get a perfect score on both.

But of course scores on these tests are positively correlated! So the probability of getting a perfect score on both tests is much higher than the product of the probability of getting a perfect score on each. (I don't think knowing that would help you on the SAT. But it's been a while. In my day they were out of 1600.)

This article indicates that 294 of the high school seniors graduating in 2008 got a perfect score on the SAT, and 514 out of 1.4 million got a perfect score on the ACT. Wikipedia puts the number of SAT takers at 1.5 million per year; let's knock this down to 1 million since some people take the test more than once and we're talking about the total number of students. So the probability that a random student who takes both tests gets a perfect score on both is something like (294/1000000) (514/1400000), which is about one in 1.3 million. The number of students taking both tests is less than this (many people only take one of the two), so assuming independence there should be less than one student per year who gets a perfect score on both tests.

But a quick glance at the Google results will convince you that there are a few students per year who pull this off.

Counterexamples in X

Counterexamples in Probability And Statistics (Joseph P. Romano and A. F. Siegel) and Counterexamples in Probability and Real Analysis (Gary L. Wise and Eric B. Hall) both seem to be books in the tradition of Counterexamples in Analysis (Bernard Gelbaum and John Olmsted) and Counterexamples in Topology (Lynn Arthur Steen and J. Arthur Seebach. These are books that collect the examples just "outside" the boundaries of the various standard theorems, the point being to explain why one needs the seemingly strange collections of hypotheses that seem to begin every analytic theorem. (Hence the tags "education" and "teaching"; I've often seen these counterexample books described as "anti-textbooks", and as being complementary to standard textbooks which often spend most of their time telling you what's true.)

It seems that these books are concentrated on the analytic end of mathematics; I couldn't find, for example, books of counterexamples in algebra, combinatorics, or number theory. There is, however, Theorems and Counterexamples in Mathematics. My sense is that the nonexistence of these books is connected to the fact that those fields don't seem quite as rife with theorems where all the work is hidden in the definitions.

02 September 2009

The hidden mathematics of bathrooms

From the xkcd blog: urinal protocol vulnerability.

The basic premise here is the following: there's a long row of urinals (n of them), and a line of men who want to use them. The first man picks a urinal at the end. Each man after that picks one of the urinals which is the furthest from any of the occupied urinals. Nobody ever leaves. How many men have to show up before one of them will be forced into using a urinal adjacent to one that's already occupied? Call this number f(n).

You might think that f(n)/n approaches some limit, but it doesn't; it oscillates between 1/3 and 1/2 based on the fractional part of log₂ n. If n = 2^k + 1 then this "greedy" algorithm for filling the urinals works and every other urinal gets filled: f(2^k + 1) = 2^k-1 + 1. If n = 3 x 2^k-1 + 1 then the worst possible thing happens and only every third urinal gets filled, and f(3 x 2^k-1 + 1) = 2^k-1 + 1. (Yes, that's the same number, and the function's constant in between.) f(5) = f(6) = f(7) = 3, f(9) = ... = f(13) = 5, and so on.) Oscillations like this -- periodic in the logarithm of the problem size -- happen a lot in problems involving binary trees counted by the number of nodes. Still, it was a bit surprising to see this, because I'd never thought about the problem in the case of "unphysically" large n.

Exercise for the reader: invent a mathematically equivalent version of this problem that doesn't involve urinals.

22 August 2009

Two baseball games the same?

The Yankees and Red Sox have played each other two thousand and something times, as the good folks on Fox told us, so I started to wonder: which baseball teams have played each other the most times? If I had to guess it's Giants-Dodgers; Wikipedia agrees with me but gives no source. Apparently Cardinals-Cubs, Cardinals-Pirates, and Cubs-Pirates are all a close second. These are the ones you'd expect if you take into account when teams changed divisions, etc.

Anyway, trying to find this out I found a blog post entitled Have Two Baseball Games Ever Played Out Identically?. The answer is no, but "identically" is defined a bit too strictly; (say) a groundout to second and a groundout to shortstop are counted as different. And the metric that the author uses for similarity of two games A and B is, I think, the number of times where the nth plate appearance in games A and B had the same outcome. Intuitively I think you'd want to line up innings with each other. Two "most similar" games should at least have similar-looking line scores. I think what one wants is some notion of "edit distance" between games, and defining that is hardly trivial.

I've sort of poked at this before: in 2007 I asked what's the most common line score in connection with a promotion that MLB did for that year's all-star game.

There's a nice combinatorial/probabilistic question hiding here; I've seen results on, say, the probability that two randomly chosen permutations of [n] have the same cycle type, or the probability that two binary trees with n labelled nodes have the same shape. Baseball games are combinatorial structures, and I'm not just saying that to justify the fact that I'm probably going to spend six hours today watching baseball. (The Yankees and Red Sox are on TV now, the Phillies and Mets later.)

15 August 2009

A number-theoretic tangent for your Saturday afternoon

You may know that 1001 has a simple prime factorization, namely 1001 = (7)(11)(13).

And of course 1000 has a simple prime factorization, namely 1000 = (2³)(5³).

The natural question to ask at this point, to me, is "how often does this happen?" Of course one has to define "this". One question is "how often are there two consecutive numbers that have all their prime factors less than or equal to 13?" There seem to be finitely many, and after I compiled a list I googled a few numbers from it and discovered that David Rusin had also done so. In case you're wondering why I conjecture there are finitely many: the number of integers less than n with all prime factors at most 13 is proportional to (log n)⁶. (The exponent 6 comes from the fact that there are six primes which are 13 or less.) So the "probability" (in the usual heuristic number-theoretic sense) that a given number n is "13-smooth" is the derivative of this, and thus of the order of (log n)⁵/n. The "probability" that two consecutive numbers are "13-smooth" is on the order of the square of this "probability", i. e. (log n)¹⁰n^-2, and the integral of that from, say, 2 to infinity converges. Nothing is special about 13 here; there should be finitely many pairs of consecutive "p-smooth numbers" where p is any prime.

(Rusin's list, by the way, was something he compiled to illustrate how one might calculate logarithms; since 1000 ~ 1001, we can take common logs and get 3 ~ log(7) + log(11) + log(13). Given enough relations like this one can solve for the logarithms themselves.)

Alternatively, we can define the "roughness" of n as (log p)/(log n) where p is the largest prime factor of n. I call it "roughness", not "smoothness", because if it's smaller than the corresponding number is smoother, i. e. has comparatively small prime factors. This naturally compensates for the size of the number; if I recall correctly the proportion of numbers less than n with roughness less than r approaches some limit (a function of r) as n goes to infinity. (I only dabble in analytic number theory so I can't provide a source off the top of my head.)

It seems reasonable then that there should be a similar limit for pairs of consecutive numbers. Then empirically it seems that the probability that n and n+1 both have roughness at most (log 13)/(log 1001) is about 1 in 250. The pairs of numbers that are "at least as round" as (1000, 1001) in this sense are

(80, 81), (224, 225), (1000, 1001), (1715, 1716), (2079, 2080), (2400, 2401), ...

and it seems like about one in 230 pairs of consecutive numbers have this property (4291 in the first million). As is often the case, the interesting thing is that the limiting probability appears to exist and be neither zero nor one.

Postscript: I know some readers will be offended by my use of "probability" in this number-theoretic sense, because the prime factorization of a number is hardly a random object! But I'm a probabilist; this is how I think.

13 August 2009

Mathematicians in today's New York Times crossword

A clue from today's New York Times crossword puzzle: "Mathematician Post or Artin". (Four letters. If you don't know the answer, click on the links.)

The crossword blogs (here, here, here) think this was an unfair clue; this one says that "Neither [...] will be familiar to most solvers, or even to all mathematicians."

I got this with no problem. But it took me a moment, because the son of the Artin the clue was about was one of my professors as an undergrad. A few commenters here and there say they needed some of the crossing letters to decide which Artin the clue referred to.

04 August 2009

Student "entitlement" to grades

An interesting statistic from this article from the Madison (Wisconsin) Times: in fall 2008, the average grade in social work courses was 3.70 on a 4.0 scale, and the average grade in mathematics courses was 2.79. (The article doesn't indicate why these two departments were chosen, but I suspect they're at the extremes of the distribution.) This is a fact offered in an article about how students feel more "entitled" to high grades than in the past.

I don't want to comment on how students may or may not feel entitled to high grades. Most of the information I've seen indicates that this sort of entitlement is more common now than in the past; I haven't been teaching long enough to feel like I can comment intelligently on historical trends. (And I wouldn't want to include data from when I was taking classes, because my friends and I may or may not have been a good sample.)

From 3σ → left.

01 August 2009

When 2+2 = 5

From "you suck at craigslist:" 2+2 = 5. People selling things on craigslist who want $x for one of some item and more than $2x for two.

I'm pretty sure I once saw somebody selling T-shirts on the boardwalk in Atlantic City for "$3, 3 for $10." Or it might have been "$4, 2 for $10" or "$2, 4 for $10". In any case, it didn't make sense.

I'm sure there are cases where this sort of pricing structure actually makes sense, but I can't think of anybody. (I also wouldn't be surprised to learn that economists have a name for it.) Anyone want to try?

29 July 2009

Fields of onions

The Onion, commenting on Sarah Palin's resignation as governor of Alaska, writes that "Nothing can distract her laser focus from the ultimate prize: the Fields Medal.”

But Palin is 45, and the age limit for the Fields is 40. Perhaps she should aim for the Abel Prize instead?

(Note to the humor-impaired: this is not meant to disparage the Abel Prize.)

27 July 2009

Plats diviseurs, or how the French cut cakes

Apparently in France they have an interesting solution to cake-cutting problems -- a plate with markings on the rim for the proper place to cut into 3, 5, 6, 7, or 9 slices, called the plat diviseur. See also here. You can buy them here; the site is in French. Some especially ornate examples are due to Paul Urfer, who appears to be the original inventor.

I found out about these from The Number Warrior, Jason Dyer. Unfortunately I have no use for one of these, because I live alone, in a small apartment where I couldn't reasonably have enough people over to need a whole cake, and so I do not buy a whole cake at once.

I suspect I have some French readers. Have you seen these before?

24 July 2009

A meta-proof

A meta-proof of P=/!=NP, from the Geomblog in 2004. (That's "equals or does not equal".)

Note that you don't need to know anything about the P vs. NP problem to find it funny.

(via Michael Trick.)

21 July 2009

A number-theoretic coincidence

An interesting little tidbit from Tanya Khovanova: consider the Pythagorean triple 3² + 4² = 5², in which all three numbers are consecutive, and another similar coincidental-seeming equation 10² + 11² + 12² = 13² + 14². These are the first two elements of a sequence of similar equations, with k+1 squares on the left-hand side and k squares on the right-hand side, which she presents there.

Still perhaps coincidental is the fact that 3² + 4² = 5² and 3³ + 4³ + 5³ = 6³. Of course once you see these it's tempting to check if 3⁴ + 4⁴ + 5⁴ + 6⁴ = 7⁴. (It doesn't.) In fact, for n ≥ 4,

3ⁿ + 4ⁿ + ... + (n+2)ⁿ < (n+3)ⁿ

and here's a proof. I'll prove it for n ≥ 5. It suffices to show that

(3/(n+3))ⁿ + (4/(n+3))ⁿ + ... + ((n+2)/(n+3))ⁿ < 1.

Call the left-hand side f(n). But the last term on the right-hand side is decreasing in n, and for n ≥ 5, is at most 16807/32768 (its value at 5). Now, we can write the left-hand side of the above inequality as

((n+2)/(n+3))ⁿ (1 + ((n+1)/(n+2))ⁿ + (n/(n+2))ⁿ + ... + (3/(n+2))ⁿ)

and each term in the second factor is at most ((n+1)/(n+2))ⁿ times the previous one. So we have

(1 + ((n+1)/(n+2))ⁿ + (n/(n+2))ⁿ + ... + (3/(n+2))ⁿ) > (1 + r + r² + r³ + ...)

where r = ((n+1)/(n+2))ⁿ. Of course the right-hand side of the previous inequality is 1/(1-r). r is positive and decreasing in n, so 1/(1-r) is as well. Since n ≥ 5, 1/(1-r) is bounded above by its value at 5, which is 16807/9031. Thus for n ≥ 5,

f(n) < (16807/32768)(16807/9031)

and the right-hand side is easily checked to be less than 1.

For n = 4 this proof doesn't work, because I threw away a bit too much in bounding the sum of a finite series by the sum of an infinite one, but it's easy to check.

In fact, f(n) approaches 1/(e-1) as n goes to infinity.

Edit (8:41 pm):: See also The Everything Seminar.

20 July 2009

News from Course XVIII-P

Glimpi's conjecture, the Holy Grail of numeric set lassitude mathematics, has been proven.

16 July 2009

"Roommates" is a euphemism?

I'm at the Cornell Probability Summer School. (This announcement is too late for anybody who wants to find me here, as it's almost over! But I have been tracked down by at least one fan.)

In a lecture here this morning, Ander Holroyd spoke about the stable marriage problem and variations of it involving point processes (see this paper of Holroyd, Pemantle, Peres, and Schramm, which I've mentioned before, for details). The goal of the problem is to pair up n men and n women in such a way that no two people who aren't married to each other prefer each other to their current partners; this is called a "stable matching" and one always exists.

The original paper on the stable marriage problem is that of Gale and Shapley, in 1962. This paper also talks about the "stable roommates" problem, which is the analogous problem where everybody is of the same gender. Rather surprisingly, I never realized that "roommates" might be a euphemism here, which is something that Holroyd pointed out this morning to quite a bit of laughter.

15 July 2009

Batting under .200

Stat of the day (from baseball-reference.com) has a list of players who went an entire season, had enough at bats to qualify for the batting title (I forget the statistics for this, but this basically means they have to play regularly), and are batting under .200.

Most of them are from a long time ago. Why? Because .200 is well below average and always has been (which is why the list was worth compiling) and the variance in batting averages has gone down as the standard of play has improved. Stephen Jay Gould wrote about this in Full House: The Spread of Excellence from Plato to Darwin; the argument is roughly that as baseball scouting and training has gotten better, there are not as many bad pitchers in the major leagues as there were in the past, so players can't inflate their batting average that way. (I'm in Ithaca and my copy of the book is in Philadelphia, so I can't check if I'm stating this correctly.)

11 July 2009

A puzzle

2²⁹, expressed in base 10, is a nine-digit number. All nine of its digits are different. Find the digit that is missing without explicitly calculating 2²⁹. (Thanks to Kate for this one; a solution is there, so don't look until you've thought about it.)

05 July 2009

Problems that are hard for intermediate values of some parameter

In Clifford Henry Taubes' review of Monopoles and three-manifolds, by Peter Kronheimer and Tomasz Mrowka (citation information; article), near the end of the first paragraph the authors mention the problem of classifying compact manifolds with the homotopy type of the n-sphere. In any dimension there is exactly one. The history of this problem is roughly as follows:

n ≥ 5: Smale, 1960 (and Stallings at around the same time)

n = 4: Freedman, 1980

n = 3: Perelman, early 2000s (this is the Poincare conjecture)

n = 2: "follows from the Riemann mapping theorem"

n = 1: "a nice exercise for an undergraduate"

So in low dimensions the problem is easy, or at least doesn't require "modern" apparatus; in high dimensions it's harder (Smale and Freedman both got Fields Medals); in "middle" dimensions (like 3) it's the hardest, or at least took the longest. It's my understanding that it's pretty typical in geometry/topology for the three- or four-dimensional cases of a problem to be the most difficult?

Can you think of other problems (from any area of mathematics) that have a similar property -- that they're hardest for some medium-sized value of whatever a natural parameter for the problem is? (Yes, it's a vague question.)

25 June 2009

Brownies and space-filling curves

The Baker's Edge brownie pans, which are pans constructed in such a way that everybody gets an edge piece and nobody gets a piece from the middle, remind me of space-filling curves.

The isoperimetric inequality suggests that the only way to do the reverse -- to have pans where nearly everybody gets the middle and nearly nobody gets the edge -- is to have really big pans.

23 June 2009

The Iranian election

The Devil Is in the Digits, an op-ed by Bernd Beber and Alexandra Scacco in Saturday's Washington Post.

This piece claims that the distribution of insignificant digits in vote totals in the recent Iranian election look funny, and that there's a good chance this is because the numbers were made up.

I haven't looked at the numbers myself, but this seems like an avenue worth pursuing.

18 June 2009

Money with mathematicians on it

Banknotes featuring scientists and mathematicians. Including the two in-print US bills that we're all least likely to see: the $100 (Franklin) and the $2 (Jefferson). For the non-US readers: the $100 is the largest bill in general circulation. For some reason the $2 bill has fallen out of favor, and although it's legal it's very rare, to the point that some people don't know about them and urban legends circulate about the $2 being suspected as counterfeit)

There seem to be more "scientists" than "mathematicians" on the list, but this may just reflect the fact that there are more scientists than mathematicians in general. In fact, "scientist" is a broad enough category that I don't think too many people would describe themselves as "scientists" when asked "what do you do?", rather responding with something like "physicist" or "biologist"; but I think a lot of mathematicians would answer "I'm a mathematician" to this question. (This seems to correspond roughly with the way departments are organized in most universities; there's usually a "department of mathematics" but very rarely a "department of science".)

(via a comment at Gil Kalai's blog)

Edit, 6:20 pm: the linguists seem to be compiling their own list of linguists-on-money, over at Language Log.

16 June 2009

The Math Factory?

On Sunday, June 14, in New York City, there was a Math Midway as part of the World Science Festival's street fair. The web page refers to it as a "traveling exhibit" so maybe it's coming to somewhere near you?

This is the first exhibit mounted by the Math Factory, which will be a full-scale museum of mathematics, incorporating the collection of the now closed Goudreau Museum, which was housed in a couple classrooms at a former school. This is the brainchild of Glen Whitney, who got a PhD in math, worked for a hedge fund for some time, and now is devoting himself to this museum, according to this article from the New York Daily News. Here is a list of exhibits they're planning and an interview about the museum that Whitney did in April in the oddly named online magazine gelf.

I found out about the midway from Quomodocumque and the Daily News article from My Biased Coin.

Incidentally, I'm not sure I like the name "Math Factory" for this museum. Factories are generally not pleasant places, and in addition they won't actually be manufacturing mathematics there.

09 June 2009

When do you learn that the rationals are countable, and the reals aren't?

I'm currently teaching a course "Ideas in Mathematics" in our summer session. This is a course generally taken by students not in technical fields; quickly speaking, my syllabus is some basic number theory, different notions of infinity, some bits of geometry (polyhedra, letting them know that there is such a thing as non-Euclidean geometry, etc.), fractals and chaos, and a smattering of probability. This is a course that's not a prerequisite for anything and the students aren't going into fields where they'll need math, so I, like a lot of other people teaching this class, take the approach of showing them that "math is beautiful" rather than that "math is useful".

So today I'm showing my students that the rationals are countable, first by the standard proof and then by the superior Calkin-Wilf proof. I find the Calkin-Wilf proof aesthetically superior because the "standard" proof, in my opinion, is "really" a proof that the set of pairs of natural numbers is countable; we then just cross off the pairs which aren't in lowest terms as a sort of afterthought. As a result, it's difficult to answer questions like "what's the 1000th rational number in the `standard' enumeration?". Then I will show them that the reals are uncountable, using Cantor's diagonalization argument.

While preparing today's class, I realized that I don't know when I learned that the rationals are countable and the reals are uncountable. Is this even part of the "standard" curriculum for math majors? These feel like facts that I have always known; presumably I picked them up from some popular mathematics book at an early age. Do any of you remember when you learned this?

04 June 2009

Odd periods in continued fractions

Here's a question. Why is the period of the quotients in the continued fraction of N^1/2 "usually" even? For example, if N runs over the ninety non-squares less than 100, then only 20 times does the continued fraction expansion of N^1/2 have an odd period. Of the 992 non-squares less than 1024, 157 have an odd period. Of the 9900 squares less than 10⁴, 1322 have an odd period. This is a sign that something is going on under the hood -- naively you'd expect half the periods to be odd.

Arnold has observed this, but only empirically; I first observed it from this problem from Project Euler.

The period of the continued fraction of N^1/2 is odd if and only if x² - Ny² = -1 has solutions in integers. All such integers, it turns out, have no prime factors congruent to 3 mod 4, which is pretty rare for large numbers. (The number of positive integers less than N with no prime factors congruent to 3 mod 4 is about N(log N)^-1/2.) For integers having no prime factors congruent to 3 mod 4, though, a paper of Etienne Fouvry and Jurgen Kluners shows that asymptotically at least 52% of such numbers have odd period, and at most two-thirds do.

03 June 2009

Random Walk: The visualization of randomness

Random Walk: The visualization of randomness, Daniel Becker's diploma thesis, shows fascinating pictures that illustrate various stochastic phenomena: dart-throwing and the Poisson distribution, Benford's law, Monte Carlo methods, some hidden high-order correlations in pseudo-random number generators, and so on.

14 May 2009

Square roots and sunscreen

Also, here's an interesting tidbit, from this New York Times piece on SPF. SPF, or "sun protection factor", is the number on the sunscreen bottle; if a properly applied sunscreen lets through a fraction p of the UV rays it's meant to protect against, then that sunscreen has SPF 1/p. (The numbers in the article talk about the proportion of the UV rays which are blocked; in this case, if a fraction q of the UV rays are blocked, the sunscreen has SPF 1/(1-q).)

Anyway, you're supposed to apply some ridiculous amount of sunscreen to your body, about an ounce. This seems like a lot to most people, because that stuff is expensive! So a lot of people underapply sunscreen. (I'll include myself here.) The article quotes Darrell Rigel, NYU dermatologist, as saying that if you apply half the sunscreen you're "supposed" to, you have to take the square root of the SPF.

That sounds obvious once you think about it -- but I'll admit I'd never thought about it. Say I have a sunscreen that allows through one-sixteenth of the light which hits it when applied properly. Now imagine splitting it up into two coats, each of which allows through the same proportion of the light that hits it. One-fourth of the light makes it through the outer coat; one-fourth of that light makes it through to the skin.

Of course there are issues with this analysis, but according to this paper in the British Journal of Dermatology it appears to hold up. And applying twice the usual amount of sunscreen apparently squares the SPF. (The effect is actually a bit less than this, because sunscreens don't block all wavelengths equally, nor does the sun's spectrum contain all wavelengths equally.)

This all implies that if you want to compare prices of sunscreens, you should divide the cost of the sunscreen by the product of the bottle's volume and the logarithm of the SPF. Do sunscreen prices actually work this way?

Twitter Ratio - why?

Twitter Ratio calculates the ratio of the number of followers you have on Twitter to the number of people who follow you.

Yes, there's a web site to do division. (And Twitter reports the two numbers involved in the quotient, so it's not even like this web site is doing the counting.)

Apparently it also saves historical numbers, so it's not entirely worthless, but it still seems like an odd thing to base a site around.

08 May 2009

The third derivative of the employment rate is positive

The third derivative of the number of people employed in the United States is positive. (From 538.)

Nate Silver puts it as "the second derivative has improved", but let's face it, this is really a statement about the third derivative. Compare Nixon's 1972 statement that the rate of increase of inflation was decreasing, which Hugo Rossi pointed out in the Notices was a statement about the third derivative. (I seem to recall John Allen Paulos pointing this out in one of his books, but I don't recall which book and therefore can't date it relative to Rossi's letter in the Notices.)

The physics of singing in the shower

I was singing in the shower, as I do.

I noticed that certain notes seemed to resonate with the shower more than others.

These were, in ascending order, Eb2, G2, C3, and G3, where C4 is middle C. (These may not be exactly right; I don't have perfect pitch. The intervals are right, though.)

Exercise for the reader: how large is my shower?

07 May 2009

The Calkin-Wilf tree on Wikipedia

The Calkin-Wilf tree now has a Wikipedia page. This is an infinite binary tree with rational numbers at the nodes, such that it contains each rational number exactly once. In the sequence of rational numbers that one gets from breadth-first traversal of the tree,

1/1, 1/2,2/1, 1/3, 3/2, 2/3, 3/1, 1/4, 4/3, 3/5, 5/2, 2/5, 5/3, 3/4, 4/1, ...

the denominator of each number is the numerator of the next; furthermore the sequence of denominators (or of numerators) actually counts something. Plus, there are some interesting pictures that come from plotting these sequences, and some interesting probabilistic properties (see arXiv:0801.0054 for some of the probabilistic stuff, although I actually just found it and haven't read it thoroughly) I've given a talk about this; one day I'll write down some version of it. This is one of my favorite mathematical objects.

It looks like we've got David Eppstein to thank for this. It was introduced in this article by Calkin and Wilf.

04 May 2009

Bears, pigs, and the like

The blog's been slow. I've been off writing real mathematics, thinking for and preparing for the class I'm teaching this summer, and so on. But I'm still here!

And while I'm here, you should read Chad Orzel on the faulty thermodynamics of children's stories. In the story of Goldilocks and the three bears, one would expect that the papa bear is the largest, then the mama bear, and then the baby bear. Furthermore, you'd think that the larger the bear, the larger the bowl of porridge, and the slower it should cool off. But it doesn't seem to work that way! Read the comments come up with some interesting explanations.

Exercise for the scientifically-inclined reader: comment on the physical implications of the Three Little Pigs.

Exercise for the not-so-scientifically-inclined reader: what's with all the animals coming in threes?

23 April 2009

Calculus made awesome

Calculus Made Awesome, by Silvanus P. Thompson, is available online. (Okay, so it's actually called Calculus Made Easy, but I like my alternate title better.)

Furthermore, unlike the modern calculus texts, it is nowhere near large enough to use as a weapon, even if you buy the print version, a 1998 edition fixed up by Martin Gardner Next time I teach calculus I must make sure to tell my students this book exists. And it's in the public-domain and free online, so it's not like I'd be recommending another expensive book. How much has calculus really changed in a century, anyway?

Thanks to Sam Shahfor reminding me of it. See also Ivars Peterson's review of the 1998 reissue. John Baez likes it but doesn't like that the new edition is longer than the old one.

22 April 2009

Constrained tourism

Does there exist a Hamiltonian tour of the graph whose vertices are the 48 contiguous United States and whose edges connect states which border each other? More geographically, can you drive through all of the lower 48 states passing through each exactly once? (From 360.)

Here's one possible solution, although I ignored the constraint implicit in the story that provoked the question, which required a start in Michigan. Note that you have to start or end the tour in Maine since it only borders one other state.

17 April 2009

The Art of the Probable: Literature and Probability

From MIT's Open Course Ware: The Art of the Probable: Literature and Probability. The course readings include both some of the classical mathematical writings about probability (Pascal, Fermat, Leibnitz, Bernoulli, Bayes, Quetelet, etc.) as well as various more "literary" pieces.

Only at MIT...

(Seriously, though, I would have liked to take this class. And one of the readings from the last week is "the Bohr-Einstein dialogue", which you may know refers to whether God does or does not play dice.)

06 April 2009

Global octahedron

The Onion, in its fictional world, is owned by Global Tetrahedron. Their logo is a dodecahedron.

(The title of this post splits the difference.)

James Stewart's house

James Stewart, author of calculus texts, has a $24 million house. It has lots of curved walls. Problem: find their areas or volumes, by integrating.

Simmons Hall, an MIT dorm opened in 2002, has a lot of oddly shaped rooms. (I found this silly, because the curved walls meant wasted space -- but I didn't live there, I just had friends who did, so it didn't bother me too much.) The story goes that the Cambridge fire department had trouble giving them a certificate of occupancy because they couldn't determine the volume of certain rooms and therefore couldn't determine whether they were adequately ventilated.

(Article from the Wall Street Journal; link from Casting Out Nines.)

God and some humans play dice

From Tierney Lab at the New York Times:A puzzle in which God, Einstein, and Oppenheimer play dice, and its solution.

04 April 2009

Which two states are closest together?

Fix two sets X and Y in the plane, each the interior of a curve, such that their closures don't intersect. How would one go about finding points x in X and y in Y such that the distance between X and Y is minimal?

Furthermore, say we have a bunch of sets X₁, ..., X_n, each the interior of a curve, with nonintersecting closures. Let d(i,j) be the minimal distance between X_i and X_j. How can we find the minimal nonzero d(i,j), that is, the minimal distance between any two sets with nonintersecting closures? (In particular, there should be a faster algorithm than computing all the d(i,j). I suspect O(n log n) of the d(i,j) need to be computed although I have no idea why I'm saying this.)

The problem that inspired this is the following geographic one. What two states in the US that don't border each other are closest together? I think I know the answer; I'll post about that later.

31 March 2009

17 Gauss Way

MSRI (the Mathematical Sciences Research Institute) is located at 17 Gauss Way, Berkeley, California. Here's a picture.

Of course, Gauss constructed the 17-gon with ruler and compass and was very proud of this. This article says it's not a coincidence, and so does this official MSRI document.

And rather surprisingly, that's not the only thing on Gauss Way. The Space Sciences Laboratory is at 7 Gauss Way. I'm not sure what significance 7 has, if any.

What time is it?

I looked at my watch at 12:05. I wasn't sure, for a moment, whether it was 12:05 or 1:00; I had to carefully look to determine which of the two hands was the longer one.

A question for you: how many times in a given twelve-hour period could I have this problem? More rigorously, suppose I have an ordinary twelve-hour analog clock, with an hour hand and a minute hand but no second hand. Furthermore suppose I can measure the position of the hands absolutely precisely, and they're "sweep" hands (i. e. they move at a constant angular rate, without "ticks"). At how many times between (say) noon and midnight could I interchange the hands of the clock and still have the hands in a position that corresponds to some time -- but not the time that it actually is? Noon, for example, is not such a time; if I interchange the minute and hour hands at noon I get a valid position of the hands, but that's the position the corresponds to noon. (I won't give an example of a valid time because giving one would be a big hint.)

Bonus: what are these times?

Another bonus: Add a second hand; are there still times which give rise to ambiguous hand configurations? (I don't know the answer to this one.)

(No fair looking up a solution; this is actually a pretty well-known brainteaser. It's well-known enough that I probably knew it existed, somewhere in the back of my mind, before I reinvented it today.)

edit (1:14 pm): Boris points out that he wrote a very similar question as question 23 of this test (PDF).

Fermi problems

Here's a quiz full of Fermi problems (like "how many people are airborne over the US at any moment?") and an article by Natalie Angier in today's New York Times, in which she suggests that some basic quantitative reasoning skills wouldn't kill people. The article was inspired by a recent book of such problems, Guesstimation: Solving the World's Problems on the Back of a Cocktail Napkin, by John Adam and Lawrence Weinstein. Adam also has a forthcoming book entitled A Mathematical Nature Walk which may be interesting.

27 March 2009

What is "classical"?

John Cook quotes a definition of "classical", due to Ward Cheney and Will Light in the introduction to their book on approximation theory. Basically, something is "classical" if it was known when you were a student.

The problem with this definition is that it depends on the speaker, which is really not a good property for a definition!

26 March 2009

Some facts about time to the PhD

I just wondered -- what is the typical age of a PhD recipient? A bit of Googling turned up this table from Inside Higher Ed, which conveniently sorts by discipline; it reports on an NSF brief. Mathematics and physics are tied for second lowest median age at 30.3; chemistry is the only discipline that's lower, at 29.6.

The table I linked to also gives the median time from getting the bachelor's degree to getting the PhD; by subtraction one can get some number that is a "typical" age of bachelor's degree receipt for students who eventually get a PhD. The median time from bachelor's degree to PhD in mathematics is 7.9 years. Subtraction, 30.3 - 7.9, gives 22.4 as a "typical" age (the difference of medians, which isn't really meaningful) for students getting a bachelor's degree who eventually go on to get a PhD in math. (The highest typical age at bachelor's degree is 25.3, for people getting PhD's in education.) This is the minimum among all eighteen disciplines covered here. It's hard to imagine a median much lower than that given the age at which students typically enter formal education and the number of years it takes.

I interpret this as saying that students who get PhD's in mathematics are less likely to take time away from formal education between high school and college or to take longer than the traditional four years to graduate from college. I'd be interested to see if this is because students who spend time away from formal education "lose" whatever mathematics they knew and have trouble picking it back up again; it's a popular conception that mathematics is more "hierarchical" and so this is more of a problem there than in other fields. (Not having much experience with other fields, I can't say.)

Also, chemistry has a median registered time to degree (time from entering a doctoral program to receiving the PhD) of 6.0 years; the next lowest is mathematics at 6.8. Why is chemistry such an outlier?

25 March 2009

List of free mathematics books

Possibly of interest: free mathematics books online. Many seem to be either quite recent (since, say, around 2000) or more than a few decades old, although I haven't systematically checked this statement; this isn't surprising, as the very old books tend to be in the public domain and the very new books tend to have been produced in a period when computers were more universal than they were in the past.

There are a couple hundred books listed here, which is not anywhere near the number of free mathematics books available (legally) online. Various other lists exist, with varying degrees of overlap. Sometimes I flirt with the idea of attempting a more complete list but I realize it would become out of date quite quickly.

20 March 2009

Billions and millions

Yes, I'm still alive. I got out of the blogging groove somehow.

Today's xkcd makes an interesting point about the difference between "billion" and "million".

And although this isn't about math, Carl Sagan's Cosmos can be watched online at hulu.com. (Thanks to Blake Stacey for the pointer.)

13 March 2009

Three songs about circles

WXPN is a radio station of the University of Pennsylvania. This statement is a bit ambiguous; they're not a "college radio station", in that they're not student-run, but rather a professionally-run, public (i. e. non-commercial, and every so often they come on the air and beg for money) radio station. WQHS is the student radio station. I've been working from home this week, since it's our Spring Break, so I've been listening a lot.

Every weekday morning at nine they have a "select-a-set": listeners call or e-mail and suggest three songs, which are perhaps somehow related. Somebody suggested the following three songs today, which got played:

Sarah McLachlan, Circle
Edie Brickell & New Bohemians, Circle
Joni Mitchell, Circle Games

Why? (Hint: the select-a-set feature does not exist on Saturdays.)

10 March 2009

Oral exams

I am finding the database of (oral) general exams taken by Princeton math grad students quite amusing, especially since there seems to be a tradition of pointing out the silly things done by one's committee. (I found it by a Google search while looking for information on one of the people listed there.)

Compare the UPenn archive, which I find terrifying, not because it's any different, but because I know a lot of the people involved. (The astute among you will note that my oral exams are not on the UPenn archive. This is because by the time I had recovered sufficiently to write them up, I forgot what I had been asked.)

09 March 2009

Knuth on solitaire

I'm browsing through Knuth's The Art of Computer Programming (Volume 1, Volume 2, Volume 3), because it's Spring Break, so I have time. I'm reading the mathematical bits, which are perhaps half the work; I'm less interested in the algorithms.

Anyway, we find on page 158 of Volume 2: "Some people spend a lot of valuable time playing card games of solitaire, and perhaps automation will make an important inroad in this area." This is part of Chapter 3, on the generation and testing of random numbers. Of course, this book was published in 1969; Windows Solitaire didn't exist then. (It's also amusing to see Knuth describing things that will be in, say, Chapter 10; he's currently working on Chapter 7, which will be the first half of Volume 4.)

08 March 2009

The Simpsons and continuous compounding

It appears the Simpsons have a mortgage that has 37% interest compounded every minute.

For the record, if the interest is compounded n times per year, then their interest wud be (1+0.37/n)ⁿ-1 compounded annually. If n = 12 (monthly), this is 43.97% per year; if n = 525,600 (every minute), this is 44.7734426% annually; compare 44.7734615% = exp(0.37)-1 for continuous compounding. In other words, compounding every minute might as well be continuous; the difference is one cent per $53,000 or so, per year.

The difference between every-minute and continuous compounding, at an interest rate of r, is the difference

exp(r) - 1 - (1+r/n)ⁿ

where n = 525,600; this is asymptotically r²/(2n). (This actually isn't a great approximation here; the next few terms of the series are reasonably large.)

A commercial for math

Look, it's a commercial for math!

Okay, so it's really a commercial for IBM. But you don't know that until the very end.

Does algeblah confuze u?

Algeblah confuzez mi, from I Can Has Cheezburger.

06 March 2009

Best bad math joke ever

One of my favorite bad math jokes ever is now in Wikipedia, and no, I didn't add it.

Namely, exercise 6.24 of Richard Stanley's Enumerative Combinatorics, Volume 2 asks the reader to

"Explain the significance of the following sequence: un, dos, tres, quatre, cinc, sis, set, vuit, nou, deu..."

The answer is that these are the "Catalan numbers", i. e. the numbers in the Catalan language. If this seems random, note that exercise 6.21 is the famous exercise in 66 parts (169 in the extended online version, labelled (a) through (m⁷)), which asks the reader to prove that 66 (or 169) different sets are counted by the Catalan numbers.

I'm telling you about this joke because the Wikipedia article on Catalan numbers begins with a link to the list of numbers in various languages.

An alternative version of this joke (American Mathematical Monthly, vol. 103 (1996), pages 538 and 577) asks you to identify the sequence "una, dues, cinc, catorze, quaranta-dues, cent trenta-dues, quatre-cent vint-i-nou,...", which are the Catalan numbers 1, 2, 5, 14, 42, 132, 429... in the Catalan language. (I'm reporting the spellings as I found them in my sources; the first series is in the masculine and the second is in the feminine, as Juan Miguel pointed out in the comments.)

Conditioned to rationality

At Uncertain Principles and Unqualified Offerings there has been talk about how students in disciplines where the numbers come with units seem to be conditioned to expect numbers of order unity. (And so are professional scientists, who deal with this by defining appropriate units.)

Mathematicians, of course, don't have this luxury. But we are conditioned to think, perhaps, that rational numbers are better than irrational ones, that algebraic numbers are better than transcendental ones (except maybe π and e), and so on. In my corner of mathematics (discrete probability/analysis of algorithms/combinatorics), often sequences of integers a(1), a(2), ... arise and you want to know approximately how large the nth term is. For example, the nth Fibonacci number is approximately φⁿ/5^1/2, where φ = (1+5^1/2)/2 is the "golden ratio". The nth Catalan number (another sequence that arises often) is approximately 4ⁿ/(π n³)^1/2. In general, "many" sequences turn out to satisfy something like

a(n) ~ p qⁿ (log n)^r n^s

where p, q, r, and s are constants. There are deep reasons for this that can't fully be explained in a blog post, but have to do with the fact that a(n) often has a generating function of a certain type. What's surprising is that while p and q are often irrational, r and s are almost never irrational, at least for sequences that arise in the "real world". Furthermore, they usually tend to be "simple" rational numbers -- 3/2, not 26/17. If you told me some sequence of numbers grows like πⁿ I'd be interested. If you told me some sequence of numbers grows like n^π. I'd assume I misheard you. Of course, there's the possibility of sampling bias -- I think that the exponents tend to be rational because if they weren't rational I wouldn't know what to do! They do occur -- for example, consider the Hardy-Ramanujan asymptotic formula for the number of partitions p(n) of an integer n:

p(n) ~ exp(π (2n/3)^1/2)/(4n √3)).

I know this exists, but it still just looks weird.

(This is an extended version of a comment I left at Uncertain principles.)

04 March 2009

A fool and his money are soon parted

Did you know that there are people who think that by reducing their income from over $250,000 to under $250,000, they can take home more money? For those of you who aren't aware of this, President Obama is planning to increase taxes on families earning more than $250,000 per year.

Of course, the way the US tax code is set up, the amount of tax you pay as a function of your taxable income is continuous, monotone increasing, and Lipschitz with parameter 1. That is, say that T(x) is the tax due if your taxable income is x. Let y > x. Then T(y) > T(x), and T(y) - T(x) < y - x. As you may note, you can derive from the second of these that

y - T(y) > x - T(x)

which tells us that if you make more money, you get to keep more of your money.

Note that T is actually not differentiable, because it's piecewise linear. Your "tax bracket" is in fact the amount of tax you pay on the last dollar of your income; that is, it's T'(x) where x is your income.

I'm not saying that there are no situations where this sort of thing might make sense. (The tax code is complicated.) But it's certainly not as common as these people would have you believe.

Original article from ABC News; I followed a link from The New York Times via The New Republic.

Why isn't it expnormal?

We say that a random variable X has a lognormal distribution if its logarithm, Y = log X, is normally distributed. The normal distribution often occurs when a random variable comes about by combining a bunch of small independent contributions, but those contributions combine additively; when the combination is multiplicative instead, lognormals occur. For example, lognormal distributions often occur in models of financial markets.

But of course X = exp Y, so the variable we care about is the exponential of a normal. Why isn't it called expnormal?

03 March 2009

Square root day

Today, it appears, is "square root day", 3/3/09. 3 is, of course, the square root of 9.

From 360; it was also pointed out there that square roots, i. e. root vegetables cut into squares, do not taste as good as pi(e). So I will wait until next Saturday for my mathematical holiday needs.

26 February 2009

LaTeX equation labels

When writing a paper in LaTeX, you often want to put a numerical label on a displayed equation, say the number (1). So you write some code like
\begin{equation}\label{eq:basel-problem} \sum_{n=1}^\infty {1 \over n^2} = {\pi^2 \over 6} \end{equation}
which compiles to give something that looks like

$\sum_{n=1}^\infty {1 \over n^2} = {\pi^2 \over 6} \quad \quad (1)$

Then later I can insert code like (\ref{eq:basel-problem}) and (1) appears in my docuemnt.

Now, as you may have noticed, I picked an equation that had a nice name, and I labeled it with that name. (The "eq:" in the label, of course, stands for "equation", a convention that I use to tell what sort of entity I'm referencing -- other things I use in that position are def:, thm:, prop:, cor:, lem:, and the like.)

But what do you do when the displayed equation doesn't have a nice "name" -- it's just an equation that occurs somewhere in the course of a calculation? For a while I tried to come up with a name, but I ended up with way too many generic names like "integral" and "sum" and "thing-with-binomial-coefficients". (Okay, so I'm exaggerating on the last one.) These names took time to think of but didn't make things easier on me later. So now I find myself using labels like \label{eq:feb-24-kappa} for the 10th labelled equation that I inserted on February 24. (Why do I use Greek letters? I tried using numbers, but it's too easy to get those confused with the actual numbers that are used to label equations.) But I'm wondering what sort of conventions people use for this; since it's the sort of thing that you can only see when you're looking at other people's LaTeX source, it's hard to know.

Somewhere, somebody is saying that I'm using LaTeX incorrectly. It might be you!

(Yes, I'm taking a break from rewriting a paper. How did you guess?)

24 February 2009

Richard Stanley tells a joke

In the graduate course Richard Stanley is currently teaching, on symmetric functions (also known as "the chapter of Stanley that I haven't really read that carefully", which is indeed the text he's using), students have two options for an end-of-term paper. They can either hand in "a treatise of at least 200 pages on some area of symmetric functions, consisting primarily of original work" which "must contain (correct) proofs of at least two important, longstanding open problems" or an eight-page expository paper.

Somehow I think nobody will choose the first option. (Although if they do, that would be an instant PhD thesis.)

Possibly also of interest: Stanley is working on a second edition of Enumerative Combinatorics, Volume 1, and a draft version of Chapter 1 is available (198-page PDF) This appears to be a substantial extension of Chapter 1 of the original.

23 February 2009

Math and pancakes

In honor of Shrove Tuesday (tomorrow for me, although it's past midnight in the UK), have a meaningless formula that will tell you how to make the perfect pancake.

(It seems to me that these silly formulas usually come from the UK. Why?)

21 February 2009

The large sieve and its applications

You should vote at thebookseller.com for Emmanuel Kowalski's The Large Sieve and its Applications, which has been shortlisted for the "Diagram Prize for Oddest Book Title of the Year". (Apparently "large sieve" sounds like it has something to do with cooking, if you're not a mathematician.)

20 February 2009

Motion of zeroes of complex polynomials

Consider the polynomial f(z) = (z-1)(z-2)...(z-20). Clearly it has 20 roots; these are 1, 2, ..., 20.

Now consider the polynomial g(z) = -z²⁰. It also has 20 roots, namely the origin with multiplicity 20.

And consider h(z) = t f(z) + (1-t) g(z), as t varies from 0 to 1. (Most of the "action" happens when t is very near 0 or 1, so this probably isn't the best parametrization.) Now, as t varies, you can find the roots numerically. Imagine the roots as twenty particles moving around in the plane. What happens, basically, is that as t increases roots start by sliding along the real axis, towards each other in pairs -- this is what you expect if you just plot f(z) as a real polynomial. (Interestingly, the collision appears to be perfectly elastic.) They then bang into each other and head off in the positive and negative imaginary directions. And eventually they curve around and approach the origin, on paths spaced 18 degrees apart. (I can try to produce graphics.)

There's nothing special about these polynomials -- that is, I suspect that something like this happens more generally. This is actually just an extension of an example in Peter Henrici's Applied and computational complex analysis (volume 1, p. 282) -- Henrici says that J. H. Wilkinson looked at the polynomial f(z) - 2^-23z¹⁹ and saw that it had five pairs of complex conjugate zeroes.

But it seems like there should be some sort of general theory of the way that roots of families of polynomials "move around" in the plane. (And if there isn't, why not?) Does the situation I've described ring a bell for anybody?

Why do the Catalan numbers grow like they do?

So "everybody knows" that the nth Catalan number is given by $C_n = {1 \over n+1} {2n \choose n}$ , and furthermore that they have the asymptotic form

$C_n = {4^n \over \sqrt{\pi n^3}} \left( 1 - {9 \over 8n} + {145 \over 128n^2} + \cdots \right)$

(Okay, I'll confess: I knew the first term, and I got Maple to calculate the others just now.)

So I found myself wondering -- why this n^-3/2? Let D_n = 4^-n C_n. Then

${D_n \over D_{n-1}} = {2n-1 \over 2n+2} \approx 1 - {3 \over 2n}$

and so we get

$D_n = \prod_{k=1}^n {D_k \over D_{k-1}} \approx \exp \left( \sum_{k=1}^n -{3 \over 2k} \right)$

; furthermore that sum is about -(3/2) log n, for large n, and so D_n is about n_-3/2. The replacement of 1-x with exp(-x) obviously would need to be justified (and such justification would explain the presence of the mysterious π) but I'm still amused that this simple computation got the exponent.

19 February 2009

The mathematician and the lumberjack

You remember the whole "mathematicians have the best job in America" survey from jobsrated.com (which was based on some fairly questionable criteria)? Well, apparently Sean Hurley at NPR did a story a couple weeks ago, where he talked to a mathematician (Peter Winkler, of Dartmouth) and a lumberjack. Peter Winkler likes his job. So does the lumberjack, although the pay sucks and trees occasionally fall on you, which sucks too.

18 February 2009

Check your units!

From today's New York Times, real estate section, referring to rural Normandy:

But, on average, real estate prices here generally range from 1,500 to 2,000 euros ($1,940 to $2,585) a square foot and a typical three-bedroom house sells for 250,000 euros ($323,165), according to Manuela Marques, a broker with Objectif Pierre, a local real estate agency.

Of course, this doesn't check out, unless a typical three-bedroom house is around 150 square feet (and Normandy has suddenly turned into Manhattan). The resolution is that that's a price per square meter, which is how a French real estate broker would quote things.

Pizza seminars

In Penn's math department, there is a "Pizza Seminar". It is on Fridays at noon, only graduate students in the math department are allowed to come, and there is free pizza. Each week a graduate student gives a talk that is intended to be accessible to most graduate students; sometimes they focus on some accessible piece of their research, but more often these talks are expository. (Occasionally, maybe three times a term, a professor speaks -- but still only graduate students are allowed to attend the talk.)

If you Google "pizza seminar", most of the results are similar series in math or closely allied fields (physics, computer science). Is there some reason that such a format wouldn't work well in more distant fields, or is this just historical accident?

17 February 2009

Sigurdur Helgason is not dead

Sigurdur Helgason dies, says the New York Times.

That's the Icelandair executive, not the MIT mathematician. The mathematician is, as far as I know, alive.

(Also, the New York Times actually refers to "Colombia University" in their obituary.)

Changing subfields

A statistic I overheard today, from a source that shall remain nameless (in case the source is wrong): three-fourths of mathematics graduate students leave graduate school in a different subfield of mathematics from when they came in.

Can anybody point me to a source?

Pavages aléatoires par touillage de dominos

Pavages aléatoires par touillage de dominos, by Thierry de la Rue et Elise Janvresse. ("Random tilings by domino shuffling", roughly.) This expository article describes work, mostly by Jim Propp and coauthors, on the generation of random tilings. As you can guess from the title, it's in French. If you don't read French you can look at the pretty pictures.

Also, this article is on a web page; the people who make the web page have used Javascript in such a way that in various places, if you want more detail, there's a link that you can click on and more detailed explanations will appear in place in the article. This may be a useful presentation technique, although it's hard to know because this is the first time I've seen it.

On publishing your trash can

I'm rereading de Bruijn's book Asymptotic Methods in Analysis (which, sadly, appears to be out of print again!) -- one of the great mathematical expositions, of asymptotic methods in analysis as they stood at midcentury. It's one of the most readable math texts I know.

de Bruijn writes in the preface:

Many things in this book are not presented in the shortest possible form, as an attempt has been made to reveal, to a certain extent, the motives that lead to certain methods. Naturally one cannot go too far in this respect; a mathematician cannot possibly publish his waste-paper basket.

This seems worth remembering; terseness is not always a virtue.

On the combinatorics of tourism

Tourists Not Leaving Landmark Until All Permutations Of Groups And Cameras Exhausted, from The Onion.

14 February 2009

College kids are better than monkeys

A misleading headline from MSN: College Kids and Monkeys About Equal on Math.

This is in reference to the work of Elizabeth Brannon, who actually claims that certain monkeys have an intuitive "number sense" which is as good as humans; see for example this article. Despite what those of us who teach college students may occasionally think, they are better at mathematics than monkeys.

13 February 2009

Two questions on document preparation

1. Why are the default margins in LaTeX so wide? It's kind of useful, because it means that there's a lot of space to scribble in when editing, but it seems that by default they're wider than in just about any other program.

2. Why are dissertations usually double-spaced? I associate double-spaced with draft documents, because you can write things between the lines of text. But the dissertation isn't supposed to be a draft. It's supposed to be a final document!

Complices are made up of simplexes, or something like that

How come the plural of "simplex", in standard mathematical usage, is "simplices", but the plural of "complex" isn't "complices"?

My first thought is that it's because "complex" is also a noun in standard English, so it pluralizes like the English noun, while "simplex" isn't.

(As you may have guessed, I'm reading something that mentioned simplicial complexes.)

12 February 2009

The Arbesman limit

Samuel Arbesman talks how to get something named after yourself. Of course, he names something after himself -- the "Arbesman limit", which is the number of things that one person can have named after themselves. (Gauss, Euler, etc. provide a lower bound for this limit.)

Supposedly Banach originally named his spaces "spaces of type B" or something like that, figuring that people would see the B, assume it standed for Banach, and start calling them Banach spaces. If that's true, it worked.

10 February 2009

Two quasi-mathematical Jeopardy! clues

Two clues from yesterday's episode of Jeopardy!. (I meant to post this yesterday, and in fact wrote the post yesterday, but didn't want to give out spoilers, so you're getting it now.)

1. "In math, it's the degree of correctness of a quantity or expression." Answer: "What is accuracy?" (This was in a category which carried the stipulation that every answer had to begin with A and end with Y.)

2. "While writing Principia Mathematica, this twentieth-century British thinker was a lecturer at Cambridge." Answer: "Who is Bertrand Russell?"

My objection to the first one is more that this isn't a mathematical concept; it seems to me that that use of "accuracy" is more common in, say, the experimental sciences.

And to the second one, I believe Russell was the answer given by a contestant and accepted, but Whitehead also might fit this description. (Whitehead left Cambridge just as the first volume of the Principia was published, in 1910.)

09 February 2009

Permutatinos

One of my most common typos is "permutatinos" for "permutations". A friend points out that this is quite apt.

Brian Hayes on the continental divide

Brian Hayes recently made a fascinating post about determining the location of the Continental Divide. Apparently that's what those pictures on the cover of this month's Notices were. This is to go with David Austin's review of Hayes' Group Theory in the Bedroom, and Other Mathematical Diversions. The book consists, apparently, of reworked versions of Hayes' essays for The Sciences and American Scientist, essays which in general I highly recommend; the post by Hayes that I linked to is his reflections on redoing this map with the data that's available now, which is better than the data he had in 2000 when he wrote the original column. (Not surprisingly, the major discrepancy between the divides obtained then and now is in Wyoming, where there are some pathological areas that don't drain to either ocean.)

08 February 2009

Read today's Foxtrot

You should read today's Foxtrot comic strip. (I won't tell you why, because if I told you why you won't follow the link.)

07 February 2009

Beijing celebrates the mean value theorem

Beijing celebrates the mean value theorem.

Philadelphia, I suppose, celebrates counting, with its numbered streets. (And perhaps Descartes and his coordinate plane, since the streets form a grid?)

Feed change

I just transferred the RSS feed on this blog to a Google account (Feedburner, the service I'd been using, has recently been integrated into Google). There should be no problems, but if there are let me know.

03 February 2009

Electoral hex redux and African colonialism

At Gil Kalai's blog I came across a map in which counties which voted Democratic in some election are colored blue, and counties which voted Republican are colored red. (It is not the 2008 presidential election map, or at least it doesn't match Mark Newman's map.)

Anyway, Kalai suggests "hex voting" -- like the game of Hex, the Republicans win if there's a continuous path in red counties from north to south, and the Democrats win if there's a continuous path of blue counties from east to west. (In the map he gives, the Republicans win, but only barely -- there are some places where the red region is only one county wide.)

It turns out that the British and French played a similar game in Africa early in the last century; here's a map of European claims to Africa in 1913, in which the British were attempting to create a continuous path of red (British) colonies from north to south, and the French were attempting to create a continuous path of blue (French) colonies from east to west. (At least, that's how the Wikipedia article on Cecil Rhodes puts it.)

Divisibility by 7

There's a pretty well-known test for divisibility by seven that came up over coffee today. Namely, to check if a number written in base 10 is divisible by 7, remove the last digit; then subtract twice that digit from the "rest" of the number. The input is divisible by 7 if and only if the output is; since this test makes numbers smaller, that's fine.

For example, 4728402 is divisible by 7. Why? Remove the last digit (2) and subtract its double (4) from the "rest" of the number, giving 472840 - 4 = 472836. Repeating, we get:
47283 - 12 = 47271
4727 - 2 = 4725
472 - 10 = 462
46 - 4 = 42
4 - 4 = 0
which is divisible by 7. (The number isn't special in any way; I just pressed a bunch of number keys and multiplied by 7 to create it.)

But I'd never seen a proof that this works. Here's a quick one. We want to show that 10k + m is divisible by 7 if and only if k - 2m is.

Say 10k + m is divisible by 7. Then 3k+m is, so 10k+m - 3(3k+m) = k-2m is.
Conversely, say k-2m is divisibile by 7. Then 3(k-2m) + 7(k+m) = 10k+m is.

02 February 2009

A strange hole in the Inverse Symbolic Calculator

I'd been fooling around with a problem, and I had identified the constant C = 0.0676676416183063 (yes, actually to that kind of accuracy!) as being important in it.

The Inverse Symbolic Calculator doesn't recognize C as 1/(2e²). But it does recognize it as 1/(2 exp(√2)^√2), and it told me that 10C was (√5)²/(e²).

I am a bit perplexed by why their tables would include these but not the more "obvious" form 1/(2e²).