God Plays Dice: October 2007

31 October 2007

links for 2007-10-31

Hello, India? I Need Help With My Math, by Steve Lohr, today's New York Times. The article's really about how consumer services, like business services before them, are being offshored; tutoring is just an example.

Pollock or Not? Can Fractals Spot a Fake Masterpiece?, from Scientific American. The verdict seems to be mixed. Pollock's paintings often contain certain fractal patterns, and certain simple images look "the same" as a Pollock painting in a certain sense. The researchers argue that their work is still valid, though:
"There's an image out there of fractal analysis where you send the image through a computer and if a red light comes on it means it isn't a Pollock and if a green light comes on it is. We have never supported or encouraged such a mindless view."

I'd agree with them, so long as it's more likely for the metaphorical "green light" to turn on when it sees a Pollock than when it sees a non-Pollock; there's no single way to test whether a creative work is by a particular person, other than going back in time and watching them create it.

On the cruelty of really teaching computing science, by the late Edsger Dijkstra. (I've had this one in the queue of "things I want to talk about" for a while, but I don't remember what I wanted to say, so here it is. There are a bunch of similar things which should dribble out in the near future.) But I can't resist commenting on this:
My next linguistical suggestion is more rigorous. It is to fight the "if-this-guy-wants-to-talk-to-that-guy" syndrome: never refer to parts of programs or pieces of equipment in an anthropomorphic terminology, nor allow your students to do so. This linguistical improvement is much harder to implement than you might think, and your department might consider the introduction of fines for violations, say a quarter for undergraduates, two quarters for graduate students, and five dollars for faculty members: by the end of the first semester of the new regime, you will have collected enough money for two scholarships.

I've long felt the same way about mathematical objects. There are exceptions, but for me these are mostly exceptions in which the mathematics describes some algorithm that has input which is actually coming from somewhere. Here it's not so much the program that is getting anthropomorphized as the user.

And why are they always "guys"? How is it that scribbles of chalk on a blackboard, or pixels on a screen, can have gender? Note that I am not suggesting that mathematical objects should be female, or that some of them should be male and some of them should be female, with the choice being made, say, by the flipping of a coin. (Incidentally, the description of mathematical objects as "guys" seems to be much more common at my current institution than at my previous one.)

By the way, Dijkstra is saying here that he thinks computer science should be taught in a formal manner -- proving the correctness of programs alongside actually writing them -- and that to de-emphasize the pragmatic aspect, students shouldn't execute their programs on a computer, since doing so enables them to not think about what the program is doing. I'm not sure if I agree with this.

A lot of people who write about algorithms say that you should avoid prematurely optimizing an algorithm until you've analyzed it carefully enough that you know which parts of the algorithm are the parts that slow it down.

There are two ways to do this. The first is, of course, to go through some sort of formal analysis of the algorithm, until you notice something like "hey, I'm evaluating lots of polynomials, shouldn't I think about how I could evaluate them quickly?" And then you'd remember that polynomials can be written in a form like

where the left-hand side requires three multiplications (generally, n multiplications for a polynomial of degree n) and three (generally, n) additions to be evaluated; the right-hand side requirese six (generally, n(n+1)/2 = n + (n-1) + ... + 1) multiplications and three (generally, n) additions. (This is, of course, assuming that you've forgotten what x² is when you get to evaluating x³. You wouldn't... but a computer probably would!) Generally,

But sometimes it's easier to just step through the algorithm step by step and realize what you find yourself doing over and over again. For example, in evaluating a polynomial a_nxⁿ + a_n-1x^n-1 + ... + a₁x + a₀, you'd realize that when you compute xⁿ you start by computing x^n-1 and then you multiply it by x; if you did that by hand over and over again you'd eventually want to find a way to speed things up. Wouldn't it be easier if you had already stored the value of x^n-1? So you'd first compute the list [1, x, x^2, ... , x^n] by just noticing that each term is x times the previous one, and then you'd multiply those by their respective coefficients a_i and add those sums up. This takes 2n multiplications and n additions; it's not quite as good as the form I've given above, but the point is that just storing your intermediate results instead of recalculating them over and over again saves a lot of time. And if you don't realize this from formally analyzing the algorithm, you can realize it by just stepping through the algorithm until you get to the point where you start cursing and thinking "didn't I already do this? Why the *^%)!*!# am I doing it again?"

(By the way, this was inspired by my trying to compute the numbers of factorizations of various integers; it turns out that you want a lot of numbers of the form "the number of factorizations of n into factors at most k" and if you're not careful you can end up computing the same numbers over and over again. So I got careful.)

30 October 2007

How to teach physics

In Science Classrooms, a Blast of Fresh O₂, by Natalie Angier, from today's New York Times. The article describes physics classes taught by Faye Cascio to ninth-graders in a magnet school, who are answring questions like the following:

You fall into a swiftly moving river and are in need of a flotational device. You see a life preserver bobbing three meters downstream of you and another one the same distance behind. Which preserver should you swim toward?

If you assume that the river is moving at a constant speed, you can look at this in a frame of reference that moves with the river; in that frame each flotation device is the same distance away, so it doesn't matter which one you swim towards. (Incidentally, I wonder if it is easier or harder to teach this without invoking the word "relativity"; I suspect it's harder, because if you mention to the students that they're doing relativity, they may be aware that this is more modern physics and therefore expect it to be more difficult. But Galilean relativity really is straightforward. This particular problem is of course doable in a frame of reference anchored to the shore, but the algebra's a bit uglier.

3) A plane flying into a headwind will have a lower speed, relative to the ground, than it would if it were flying through still air, while a plane traveling with the benefit of a brisk tailwind will have a comparatively greater ground speed. But what about a plane flying through a 90-degree crosswind, a breeze that is buffeting its body side-on? Will its ground speed be higher, lower or no different than it would be in unruffled skies?

The ground speed is slightly larger; it's the length of the hypotenuse of a right triangle with legs equal to the ground speed in still air and the speed of the wind. (But notice that the derivative of the plane's ground speed with respect to the wind speed is zero at zero wind speed. High school freshmen don't know this, though.)

The students can articulate their reasoning because, for one thing, they have no choice.

This is something I've been trying to get through my own students' head on their homework, by taking into account their ability to explain the solutions when grading; for the most part it seems to work, although since it's on homework (and not in class) there are always some students who don't bother explaining their work and figure that the points they'll lose are made up for by the time they don't have to spend on the work.

Nearly all scientists and educators agree that somehow, at some point during their pedagogical odyssey, most Americans get the wrong idea about what science is, and what it is not.

Carl Sagan once said that scientific education in this country was structured so that it would actually attract the people he thought would be worst at being scientists -- people who like routine and who want to just plug and chug their way through the calculations. I agree with this, and I'd add that mathematical education is much the same way. What one does in math classes through at least midway through college bears so little resemblance to mathematical research that the mathematical community probably loses a lot of potentially very good mathematicians, simply because their classes turn them off. (However, the process might produce people who can competently use mathematics, which is something worth keeping in mind. We must remember that part of what we do is to teach the rest of the world how to use our tools.)

29 October 2007

Enumerating multiset partitions

Brent Yorgey on enumerating multiset partitions. He came to the problem the same way I did -- I was trying to count the number of factorizations of an integer. An integer of the form pⁿ, where p is a prime, has p(n) factorizations, where p(n) is the partition function; for example

giving the p(5) = 7 factorizations of p⁵. An integer of the form p₁p₂...p_n has B_n factorizations (the nth Bell number); for example, 2310 = (2)(3)(5)(7)(11) has fifty-two factorizations. But what about, say, an integer of the form p³qr? How many factorizations does it have? Naturally, this associates an integer with each partition; from the number p⁵ we get the partition 5 and the number of factorizations 15; from the number p₁p₂p₃p₄p₅ we get the partition 1 + 1 + 1 + 1 + 1 we associate 52.

Unfortunately, I don't really follow Yorgey's work because I don't know Haskell. And I don't know a general way to compute these yet, except in the trivial cases. But, for example, p² q has four factorizations:

(p²q) = (p²)(q) = (p)(pq) = (p)(p)(q)

and so we associate to 2 + 1 the number 4; this is intermediate between p(3) (which is 3) and B₃ (which is 5). This makes sense; in general p(n) counts the number of partitions of {1, ..., n} when no elements are distinguishable, and B_n when all elements are distinguishable. Multiset partitions interpolate between integer partitions and set partitions.

(Yorgey says that the problem is also solved in Volume 4, Fascicle 3 of Knuth, which I may track down. But I want the pleasure of working this out for myself, to some extent.)

Yorgey also says that he found the problem via Project Euler, which gives a couple hundred examples of problems that can, in general, be solved by ridiculously inefficient brute force methods or by elegant methods that require not so much computation.

Bayesian gender spam

A Bayesian explanation of how to determine the gender of a person on the street (from observable cues), by Meep.

It's rather similar to Bayesian spam filtering (Paul Graham, see also here). The major difference is that one can generally assume that most e-mail is spam, whereas one cannot assume that most people are of one or the other of the two canonical genders.

In the spam filtering case, though, it doesn't seem that the prior probability that a message is spam matters; Graham claims that most e-mails are either very likely or very unlikely to be spam. But there are probably more words in an e-mail than there are easily observable cues to a person's gender; it seems much more likely to get, say, that a person has a 60% probability of being male than that an e-mail has a 60% probability of being spam.

Also, it's a lot easier to collect the necessary for spam filtering than for gender determination.

e and π go on a blind date

e and π go on a blind date.

(Found at Gooseania.)

28 October 2007

Sorting with error

Here are three problems that have occurred to me, somewhat randomly, over the last few weeks.

1. I mentioned here that it would probably be relatively easy to rank, say, all the open problems in analysis by their perceived difficulty to expert analysts, and all the open problems in algebra by their perceived difficulty to expert algebraists. But how could we merge these two rankings together, given that comparing an algebra problem and an analysis problem is probably much more difficult and error-prone than comparing two algebra problems or two analysis problems? For example, comparing an arbitrary algebra problem and an arbitrary analysis requires finding someone who is fluent enough in the relevant areas to be able to really understand the problems! (If you object to the fact that "algebra" and "analysis" are too broad for this to be true, just subdivide the areas; the idea is still the same. We can rank problems within suitably small subdisciplines of mathematics; how do we merge those rankings into a ranking of all mathematical problems?)

2. Most people are attracted either to men or to women, not to both. (I'm assuming this for the sake of this problem.) It seems reasonably easy to rank-order the set of all men in order of attractiveness, and the set of all women in the same way; but again, how do you merge these two? How do you compare the attractiveness of a man to the attractiveness of a woman? Oddly enough, I think this one's easy to solve because of "assortative mating", which basically means "people will end up with people who are about as attractive as themselves". (Incidentally, although this seems obvious to anyone who's ever been around people, how can it be derived from first principles?) But the information we get from assortative mating is likely to be imperfect, again; there are plenty of relationships where one member of the couple is more attractive than the other.

3. Major League Baseball has two leagues. We have a mechanism for determining the "best" team in each league. The common wisdom now is that the American League is the better of the two leagues; look at this year's World Series, which the Red Sox are currently leading three games to none. (But don't look at last year's.) But that doesn't necessarily mean that, say, the eighth-best team in the AL is better than the eighth-best team in the NL. There's interleague play, but most of the time that's only a single three-game series between any pair of teams, which really doesn't give much information. (And most pairs of teams in different leagues won't play each other at all in any given season.) Let's assume, for the sake of argument, that we have to use just which teams have won and lost against each other, and maybe the scores of the games; we can't use the fact that teams are made up of players, and compare the AL and NL statistics of a player who has played in both leagues.

The common thread is that in all three cases we have a pair of lists which we believe are each correctly rank-ordered, but comparison between the lists is difficult, or we're doing our rankings based on already collected data and we can't go out and get more, and sometimes our comparisons might be wrong (for example, in the baseball scenario, the team which wins the season series between two teams is not necessarily better). All the analysis of sorting algorithms I've seen (which is, I'll admit, not that much) seems to assume that all comparisons are equally easy and there are no errors. That's a nice place to start, but it hardly seems a place to stop.

27 October 2007

The Second Law of Organization

Doyne Farmer, "The Second Law of Organization", a chapter in The Third Culture: Beyond the Scientific Revolution. (It looks to me like you can read the whole book online, but if you like having a paper book in your hands, you can buy it from Amazon.

Farmer was trained as a physicist; he's been referred to as one of the "pioneers of chaos theory" and has also done work in finance.

He writes, about his disappointment at what he learned in physics classes in college:

"And most of physics was still focused on pushing and pulling, on the material properties of the universe rather than on its informational properties. By informational properties, I mean those that relate to order and disorder."

I have the sense that this is still true, although I'm not a physicist; certainly it is true of the physics classes that my physics-major friends took. And there seems to be an analogy in mathematics -- it's becoming more and more possible to predict things about large random structures. (For example, one can say quite a bit about what a random partition of a large integer, or a random set, or any number of other large random structures looks like; in those cases where there aren't proofs, one can still do simulations and see that "yes, they always look the same", at least if you squint.) It seems to be mostly the people with some CS background that are interested in this, because these are the sort of questions that arise naturally in the analysis of algorithms. But there really do seem to be two different ways to look at a lot of these objects -- you can look at individual trees, but what's the point, since you're probably never going to see any particular tree with 1000 leaves. Or you can look at the class of trees... a beautiful object on its own.

"Many of us find this view [that that everything depends on a set of disconnected "cosmic accidents."] implausible. Why would so many different types of order occur? Why would our situation be so special? It seems more plausible to assume that "accidents tend to happen." An individual automobile wreck may seem like a fluke- -in fact, most automobile wrecks may seem like flukes — but on average they add up. "

This is a more dignified way of saying that "shit happens", which is something that people who are planning things forget. It's why large projects always seem to be late and over budget. The planners think they've factored in everything with, say, more than a 5% chance of happening. (In the car analogy, when setting out for their trip they've thought that there might be heavy traffic, or perhaps if it's raining they've considered the possibility of flooding on some flood-prone road.) But there are a zillion things that could happen, and you can't factor all of those in individually. But if there are enough of those rare things, the probability of one occuring will be close to one, or the expected number of them will be quite large. (Nassim Taleb has written about this in the context of financial markets.) Unfortunately, when you're writing the budget you can't put a line for "shit that will happen". This is actually the same idea that I was previously talking about; large systems (like all the cars in a given city) will have fairly stable properties, even when individual cars (that one that just drove by outside my window) don't.

(By the way, the "Second Law" that Farmer is referring to is a reference to the Second Law of Thermodynamics; he's basically saying that order appears out of chaos often enough that this apparent contradiction of the Second Law of Thermodynamics is itself a law.)

26 October 2007

Trickery with divergent sums

When |q| < 1, 1 - q + q² - q³ + ... = 1/(1+q).

And (1 - q + q² - q³ + ...)² = 1 - 2q + 3q² - 4q³ + ..., viewing both sides as formal power series.

Thus we have
1 - 2q + 3q² - 4q³ + ... = 1/(1+q)²
as formal power series. Let q = 1; then
1 - 2 + 3 - 4 + ... = 1/4.
Isn't that weird?

I learned this one today; it's an example of "Abel summation". Basically, in order to sum a sequence, we take its generating function and evaluate it at 1. Nothing too controversial there; at least to me it seems like the natural thing to do. But it looks a little more mysterious if you see it as I first did, which is without the q's. It seems "obvious" that 1 - 1 + 1 - 1 + 1 - 1 + ... = 1/2, if it's going to equal anything; we define such an infinite sum as the limit of its partial sums, which are alternately zero and one, so we should take something halfway between them. But then multiply
(1 - 1 + 1 - 1 + 1 - 1 + ...) (1 - 1 + 1 - 1 + 1 - 1 + 1 ...)
and you get 1 - 2 + 3 - 4 + 5 - 6 ..., in the following sense: let
(a₀ + a₁ + a₂ + ...) and (b₀ + b₁ + b₂ + ...)
be formal sums. Then we define
(a₀ + a₁ + a₂ + ...) (b₀ + b₁ + b₂ + ...) = c₀ + c₁ + c₂ + ...
where
c₀ = a₀b₀, c₁ = a₁b₀ + a₀b₁, c₂ = a₂b₀ + a₁b₁ + a₀b₂
and so on. (The definition is of course inspired by the definition on formal power series.) Then here we have a_i = b_i = (-1)ⁱ, and so each term in the sum defining c_n is (-1)ⁿ; there are n+1 such terms. So c_n = (-1)ⁿ(n+1), which gives that sum. Substituting the value 1 - 1 + 1 - 1 + ... = 1/2 into the above equation gives

1 - 2 + 3 - 4 + 5 - 6 + ... = 1/4.

Suppressing the q's makes this seem a whole lot more mysterious.

There's a Wikipedia article that shows how you can write out that sum four times and rearrange the terms to get 1; that doesn't seem all that satisfying to me, though, because with similar methods one can show that 1 - 1 + 1 - 1 + 1 - ... equals either zero or one.

John Baez has reproduced Euler's "proof" that 1 + 2 + 3 + 4 + ... = ζ(-1) = -1/12, which to me seems even crazier. But here's another reason why ζ(-1) = 1/12 "makes sense", if you buy 1 - 2 + 3 - 4 + ... = 1/4.

Let S = 1 + 2 + 3 + 4 + ...; add this "term by term" to 1/4 = 1 - 2 + 3 - 4 + ... . Then you get

S + 1/4 = 2 + 0 + 6 + 0 + 10 + 0 + ...

Then divide through by 2 to get

S/2 + 1/8 = 1 + 0 + 3 + 0 + 5 + 0 + 7 + 0 + 9 + ...

but it's clear that

2S = 2 + 4 + 6 + 8 + ...

and we might as well just stick some zeroes in there to get

2S = 0 + 2 + 0 + 4 + 0 + 6 + 0 + 8 + ...

(If we were still doing this with the q-analogue, this would amount to replacing q with q^1/2.) Adding these two sums together, term-by-term, gives

5S/2 + 1/8 = 1 + 2 + 3 + 4 + 5 + ...

but we know the sum on the right-hand side is just S. So we have 5S/2 + 1/8 = S; solving for S gives S = -1/12. (I make no guarantee that this is the "most efficient" way to get ζ(-1).)
A basically identical calculation, starting with

${q(q^2-4q+1) \over (q+1)^4} = q - 8q^2 + 27q^3 - 64q^4 + \cdots$

and thus (letting q = 1) -1/8 = 1 - 8 + 27 - 64 + ... yields ζ(-3) = 1/120.

Similarly, we can "compute" ζ(-2); begin with the generating function

${q - q^2 \over (1+q)^3} = q - 4q^2 + 9q^3 - 16q^4 + \cdots$

and letting q = 1, we get 0 = 1 - 4 + 9 - 16 + ...; let S = 1 + 4 + 9 + 16 + ... and add the two sums. Then S = 2 + 0 + 18 + 0 + ...; divide by two to get S/2 = 1 + 0 + 9 + 0 + ...

But remember that S = 1 + 4 + 9 + 16 + ...; thus 4S = 0 + 4 + 0 + 16 + 0 + 36 + ... if we let ourselves stick in zeros arbitrarily. Adding these two expressions, we get

9S/2 = 1 + 4 + 9 + 16 + 25 + ...

or 9S/2 = S; thus S = 0, which is in line with the standard result that ζ(-2n) = 0 for all integers n. This is a consequence of the functional equation for the zeta function,

$\zeta(s) = 2^s \pi^{s-1} \sin \left({\pi s \over 2}\right) \Gamma(1-s) \zeta(1-s)$

since when s = -2n for some positive integer n, the right-hand side is zero.

25 October 2007

Something I didn't know about generating functions (and probably should have)

I learned today, while reading Flajolet and Sedgewick's Introduction to the Analysis of Algorithms, something that I probably should have known but didn't. I should have known this because I've learned what ordinary and exponential generating functions are about four times now (since combinatorics classes often don't assume that much background, they seem to get defined over and over again).

Namely, given a sequence a₀, a₁, a₂, ..., we can form its exponential generating function $A(z) = \sum_{n=0}^\infty a_n {z^n \over n!}$ . Then the ordinary generating function $\tilde{A}(z) = \sum_{n=0}^\infty a_n z^n$ is given by the integral

$\tilde{A}(z) = \int_0^\infty A(zt) e^{-t} \, dt$

when this integral exists. (This is something like the Laplace transform.)

The proof is straightforward, and I'll give it here. (I'll ignore issues of convergence.) Consider that integral; by recalling what A(z) is, it can be rewritten as

$\int_0^\infty \left( \sum_{k=0}^\infty {a_k \over k!} (zt)^k \right) e^{-t} \, dt$

and we can interchange the integral and the sum to get

$\sum_{k=0}^\infty \left( \int_0^\infty {a_k \over k!} (zt)^k e^{-t} \, dt \right)$

But we can pull out the part of the integrand which doesn't depend on t to get

$\sum_{k=0}^\infty \left( {a_k \over k!} z^k \int_0^\infty t^k e^{-t} \, dt \right)$

and that integral is well-known as the integral representation for the gamma function. The integral is k!, which cancels with the k! in the denominator, and we recover $\tilde{A}(z) = \sum_{k=0}^\infty a_k z^k$ .

It's easy to come up with examples to verify this; for example, the exponential generating function of the binomial coefficients ${N \choose 2}$ is z²e^z/2; plugging this into our formula gives z²(1-z)^-3, which is the ordinary generating function of the same sequence.

I'm not sure how useful it is to know this, though... I can't think of too many times where I had an exponential generating function for some sequence and thought "oh, if only I had the ordinary generating function."

Math for America plays with numbers

From the November 2007 Notices, an ad for Math for America (p. 1305):

"Do you know someone who loves π as much as pie? Would they also love a full-tuition scholarship for a master's degree in mathematics education, a New York State Teaching Certificate, and a $90,000 stipend in addition to a competitive salary as a New York City secondary school math teacher? Math for America... [etc.]"

I don't know about you, but when I see a five-figure number followed by the word "stipend" I automatically assume it's an annual stipend. It seems somehow disingenuous to put that number there; it turns out it's a five-year stipend. This is in addition to the usual salary one gets for teaching, though; the idea appears to be that this program is attracting teachers who actually know math by making up at least some of the difference between what they would make teaching and what they could make elsewhere. Also, the people in this program receive a full-tuition scholarship for a master's in math ed.

They don't report it as "$18,000 per year for five years" is because it's not; it's paid as $28,000 in the first year (which is mostly spent being trained as a teacher, and which doesn't carry a salary) and $11,000, $14,000, $17,000, and $20,000 in the second through fifth years (these are in addition to the usual salary a New York City public school teacher would receive). Still, why not say "$90,000 over five years"? I feel like they're trying to fool the people reading the ad -- but the people reading the ad are the people in the world who are least likely to be fooled by tricks played with numbers.

I'm not saying that the financial package isn't valuable. I'm just saying that it feels like the ad is hiding something because they don't give the time period.

It reminds me of a letter I got from a graduate school which claimed that my support package was something like $50,000 per year, which was actually $20,000 in annual stipend and $30,000 in tuition. (This was a school with a comparatively high tuition, obviously; I suspect they did this because the $50K number was larger than schools with lower tuitions but the same stipend would have reported.) But anybody who's actually comparing financial offers would think of them as "full tuition plus [dollar amount]" and not even care what the dollar amount of the full tuition was.

Abstract machines

From The Geomblog: "Made in the USA" can be Revitalized, in yesterday's San Jose Mercury-News, by Ken Goldberg and Vijay Kumar.

As Suresh puts it, they want to Put the Turing in Manufac-turing" -- to bring manufacturing to a higher level of abstraction, so that manufacturing processes can be understood independent of the particular substrate they lie on.

Eventually figuring out how to manufacture things could become another branch of the analysis of algorithms. And even if it doesn't, thinking at a higher level of abstraction might enable people to conceptualize more complex manufacturing processes -- and manufacturing processes will become more complex in the future.

Finally, it doesn't seem surprising that this article was published in San Jose -- the city that perhaps more than any other was transformed by the abstraction of computing.

p. s. The simplest possible Turing machine is actually universal.

24 October 2007

The meta-Minkowski conjecture

In a talk by Eva Bayer-Fluckiger today at UPenn, I learned about Minkowski's conjecture, which is as follows:

Let K be a number field of degree n, O_K its ring of integers, and D_K the absolute value of the discriminant of K. Then the Euclidean minimum of K is the smallest μ such that for any x in K, there exists y in O_K such that the absolute value of the norm of x-y is less than μ; that is, there's always an algebraic integer "within" M(K) of every element of K. If M(K) < 1 then O_K is Euclidean (this is essentially a rephrasing of the definitin); if M(K) > 1 then O_K isn't Euclidean; if M(K) = 1 we don't know. Then we have

Minkowski's conjecture: M(K) ≤ 2^-n (D_k)^1/2

for totally real number fields K of degree n. (I'm taking this from Eva Bayer-Fluckiger and Gabriele Nebe, On the Euclidean minimum of some real number fields, Journal de Théorie des Nombres de Bordeaux 17 (2005), 437-454, with some rephrasing.)

Curtis McMullen recently proved this for n=6 (link is to a preprint "Minkowski's conjecture, well-rounded lattices and topological dimension", on his web page). This follows proofs:

for n=2, Minkowski, circa 1900;

for n=3, Remak, 1928;

for n=4, Dyson, 1948;

for n=5, Skubenko, 1976;

and so it seems like we get one higher degree every thirty years! This leads me to formulate the following:

Meta-Minkowski conjecture: Minkowski's conjecture is true, but will never be proven.

(Sketch of heuristic argument: in approximately the year 1840 + 30n, the degree-n case will be proven. So if we wait long enough, Minkowski's conjecture will be proven for any finite value of n.)

Per a question that was asked at the talk, it seems like the approach in each of these papers was substantially different. McMullen gives sufficient conditions for proving Minkowski's conjecture for any given value of n, and it turns out that those conditions were already known for n = 6.

I wonder to what extent it's possible to predict the mathematical future from the mathematical past, in the case of problems such as this where there's a numerical measure of ``incremental" progress. Often I've seen tables where an improving sequence of constants is given for some problem (for example, a sequence of improving constants in some inequality). It's sort of a Moore's Law for mathematical problems.

When would I make a conjecture that a proof for all n would be found? If, say, we had proofs of some statement for n = 1 in 1960, n = 2 in 1990, n = 3 in 2000, n = 4 in 2005, and in general n = k in 2020 - 60/k; then I'd guess that the full problem gets solved in 2020. Of course, in a case like this the proofs would pile on top of each other, and in reality one would be likely to see large numbers of values of n disappearing in one fell swoop sometime in the 2010s.

Some thoughts on the Rockies' streak

Are the Rockies Really That Good, or Just Lucky?, from The Numbers Guy at the Wall Street Journal.

There's no real conclusion, though. The Rockies probably think it's because they're good. But if you wait long enough, this sort of streak will happen to somebody. What one really wants to know is the number of such streaks that should have happened, historically; this should then be compared to the number of such streaks that actually did happen.

The problem with this is that all the quick and dirty analyses will assume that all teams win exactly half their games, so the probability of winning at least 21 out of 22 games is (22+1)/2²², or about five in a million. But this isn't true. I suspect that the actual distribution of how good teams are is roughly symmetrical around .500, perhaps skewed a little bit to the left (there are more ways to be a bad team than to be a good team), so for every .300 team there should be a .700 team. The .300 team has probability (.3)²¹ (22(.7) + .3), or about one in six billion, of winning 21 out of 22; the .700 team has probability (.7)²¹ (22(.3) + .7), or about four in a thousand. (This is the figure the WSJ reports as "1 in 245".) So to really know how many long streaks are expected, we need to know the historic distribution of team abilities. If there are teams that are "really" .700 teams fairly often, we should expect that these streaks happen fairly regularly; if the best teams are usually only around, say, .600, probably not. And to know how likely the Rockies were to have this streak, we have to know how good they "really" are; are they the 76-72 team they were before the streak started, the 21-1 team they've been since, or somewhere in between? And would anybody have noticed if they had won 21 out of 22, say, starting in the middle of June?

And let's not even go into the fact that games aren't really independent; even if one doesn't believe that teams "get hot", they play in different parks, with different starting pitchers each day, and so on. As at least one commenter at the WSJ pointed out, streaks like this are more likely to happen in September than any other time, because in September you have teams (like the Rockies) that need to win games and teams that have just given up. The Rockies needed to win 14 out of 15 to end the regular season. (There was a point in late September when I thought I had the National League playoff situation all figured out... then I looked at the standings and thought "when the ^)%$!*)#$)^ did Colorado win ten in a row?")

My point is that I don't know the answer, either. I just am not happy that their streak happened to knock the Phillies out of the playoffs for the first time in fourteen years, and I hope that the Red Sox (who I started following in college) don't also become victims of it.

(By the way, my argument for not having more interleague play than there is is that right now, it's possible to be a fan of one National League team and one American League team and not have too many conflicts of interest. With more interleague play that would become less likely. Also, since I like the Phillies and the Red Sox, I essentially automatically dislike the Mets and the Yankees, which means I can never move to New York.)

23 October 2007

An exercise for the reader

"Decode" the following equation, seen on a T-shirt:

${E \over c^2} \sqrt{-1} {PV \over nR}$

Gowers on examples

Tim Gowers wrote My favorite pedagogical principle: Examples first a few days ago. (The comments are worth reading too.) As an example, he gives an axiomatic definition of a field, and then compares this with defining a field by saying "look, you know about the rational numbers, the real numbers, and the complex numbers; fields are `like these', and now here's what we formally mean by that.) People have said most of what I'd want to say about this topic, but there's an interesting question here. It's been pointed out that it's often a good idea to read mathematics "out of order". But then why doesn't the writer write things out of order, since most people's natural impulse is to read things in the order they're written? Presumably the writer understands the material better than the reader, and therefore has a better idea of what order things should be presented in.

The spinning woman, part two

Remember the spinning woman that I wrote aboutI wrote about last week?

Supposedly, people who saw her spinning clockwise are more "right-brained" (in the pop-science sense of creative, able to see the big picture, and so on) and people who saw her spinning counterclockwise are more "left-brained".

According to the good folks at Freakonomics, if you use college major as a proxy for pop-culture brain-sidedness, it actually works the other way -- "left-brained" people are more likely to see her spinning clockwise. (Of course, this isn't scientific -- it's just Freakonomics readers, and the sample sizes are small.)

Steven Levitt writes:

I often joke about how the information provided by someone who is incredibly terrible at predicting the future (i.e., they always get things wrong) is just as valuable as what you get from someone who is good at predicting the future. I used this strategy with some success by betting the opposite of my father whenever he’d bet a large sum of money on a football team that was sure to cover the spread.

That's definitely true. I've heard that another such "anti-predictor" is Punxsutawney Phil, the "official" groundhog of Groundhog Day. He gets yanked out of hibernation on February 2, and if he sees his shadow we're supposed to have "six more weeks of winter". (This is part of the utterly bizarre American tradition of Groundhog Day.) I read once that he's wrong something like 80% of the time, although I don't have a source for this. The Wikipedia article says that the actual error rate is something like 60% to 70%. Still, it seems like the groundhog is more often wrong than right.

22 October 2007

Three posts in one!

1. Literary style by the numbers. Apparently it's reasonably possible to distinguish among authors by just plotting the "average words per sentence" and the "number of complex words"; authors appear to be remarkably consistent on both these measures when looking at book-length samples of text. I wonder if this holds true for mathematical authors; it would be hard to measure for work that includes a lot of mathematical notation.

2. Graph Theory with Applications, J. A. Bondy and U. S. R. Murty (1976) -- out of print for a while, full text online. (Scanned, so it takes a while to load; you can also download it a chapter at a time.) From Anarchaia. (The same authors just finished a graduate text, to be published by Springer; read a blurb about it. I like the sound of the blurb. The book apparently also has a blog but the blog is empty.) Incidentally, I find it interesting that Springer's taxonomy of subdisciplines of mathematics includes "Number Theory and Combinatorics"; usually number theory seems to be classed with algebra.

Despite being an undergraduate book, the book includes a list of unsolved problems, which I think is a good practice; even if the intended reader is not expected to be able to solve these problems, it gives the interested student an idea of the sort of things that real mathematicians are interested in. (Of course, in some fields it wouldn't be practical to put such a list in a textbook; it needs to be possible for students to understand the statements of the problems.)

3. A combinatorial proof of the Rogers-Ramanujan and Schur identities, by Cilanne Boulet and Igor Pak. I haven't read it. (I came across it while looking for some other stuff on partitions.) The Rogers-Ramanujan identity is that the number of partitions of an integer n into parts that are congruent to 1 or 4 modulo 5 is equal to the number of partitions of n such that adjacent parts differ by at least 2.

For example, partitions of 12 counted by the first description are
11 + 1, 9 + 1 + 1 + 1, 6 + 6, 6 + 4 + 1 + 1, 6 + 1 + 1 + 1 + 1 + 1 + 1, 4 + 4 + 4, 4 + 4 + 1 + 1 + 1 + 1, 4 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1, 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
and by the second description,
11 + 1, 10 + 2, 9 + 3, 8 + 4, 8 + 3 + 1, 7 + 5, 7 + 4 + 1, 6 + 4 + 2
and there are eight in each list.

However, don't ask me which ones match up with which ones in the bijection! It's by no means obvious. This paper grabbed my interest because I knew there was no "nice" bijective proof, and the title led me to expect such a nice bijection. The authors lament:

Also, while our proof is mostly combinatorial it is by no means a direct bijection. The quest for a direct bijective proof is still under way, and as recently as this year Zeilberger lamented on the lack of such proof. The results in [23] seem to discourage any further work in this direction.

([23] is an in-preparation work of Pak, "Rogers-Ramanujan bijections are not geometric."; I'm kind of curious to see what it says.)

Diplomats play dice

State Department Struggles To Oversee Private Army, from the Washington Post (Oct. 20)

Marc Grossman, the U.S. ambassador to Turkey in the mid-1990s, recalled telling his staff to take their own security precautions. After losing embassy employees to attacks, he advised staffers to keep a six-sided die in their glove compartments; to thwart ambushes, they should assign a different route to work to each number, he said, and toss the die as they left home each morning.

That's a good idea. I wonder if anybody just told their employees "take different routes to work" and found that their people decided which route to take based on, say, the day of the week -- having more than one route offers some protection from ambush, but there's still a pattern involved, and eventually the would-be ambusher would figure out that if they come by a certain spot on a Tuesday morning they'll succeed.

(Of course, there are still only six possible routes. If one wanted to go really crazy, one could toss a die at each juncture to decide where to go... but then who would ever get to work?)

This reminds me of airport security people who are randomizing where they are in the airport in order to thwart terrorists.

(from Marginal Revolution.)

Mathematics ≠ notation Mathematics is not notation

A rule of mathematical style: it is acceptable to write "Therefore, foo is at most bar.", but it is not acceptable to write "Therefore, F ≤ B." (This is due to J. Michael Steele, who has a lot of good rants.)

Steele's logic is that mathematical symbols cannot function as the verb in a natural-language sentence, which makes sense; although mathematical expressions have their own internal logic, an entire expression seems to function as a single noun when embedded in an English-language expression. Thus we write something like "Therefore, we have F ≤ B." or "Therefore, the inequality F ≤ B holds." ifwe want to write things out in symbolic form.

One thing that bothers me about a lot of mathematicians (myself included) is the propensity for their spoken mathematics to be a mere symbol-by-symbol reading of the written mathematics; thus one hears things like, say, "pi one of X" instead of "the fundamental group of X". This isn't too much of a problem in that case, but then one gets to things like "H two of X" -- is that H₂(X) (homology) or H²(X) (cohomology)? (People don't seem to go to the trouble of distinguishing superscripts from subscripts when they're talking, which is probably a good thing, lest speech be filled with "uppers" and "lowers".)

Sure, you can usually tell from context, if you're sufficiently experienced in a certain area. But people have a tendency to think that mathematics is the formulas, at least to the point where they speak the notations instead of the notions. (This is one of those fortunate puns that translates well from the original Latin: "non notationes, sed notiones".) Spoken mathematics is not written mathematics. Mathematics is not its notation.

This is probably a corollary of the often-stated fact that a page of mathematical text contains about as many ideas as, say, ten pages of "ordinary" text; thus on a per-page basis, mathematical reading should be ten times slower than literary reading. Unfortunately, a minute of mathematical speech contains many more ideas than a minute of "ordinary" text, but one cannot slow the speaker down.

Incidentally, I used to pronounce ≤ and ≥ as "less than or equal to" and "greater than or equal to"; now I pronounce them "at most" or "at least", because these are shorter. I also pronounce g o f as "g after f", which seems both shorter and less ambiguous than "g composed with f" -- in "g composed with f" the order of composition is conventional. "f before g" or "f, then g" is too confusing, because you read the g before you read the f.

21 October 2007

Restoration of Fermat's Lost Letter to "Last Theorem"

No, I didn't prove FLT. Sorry.

But has anyone read this proposed proof of Fermat's Last Theorem? I'm kind of curious... but not curious enough to shell out the $5. And it's not just the money; this doesn't seem like the sort of person I want to be giving $5 to. It's kind of strange to see crackpottery that you have to pay to see these days... usually now people just post it on poorly formatted web pages.

Similarly, I don't have cable TV, not because I want to keep the money, but because I don't like Comcast. But here I am using my Comcast Internet connection to say that I don't like Comcast. Hypocritical? Of course. But I like Verizon even less.

(From John Armstrong.)

The game of "thermonuclear pennies"

The game of thermonuclear pennies. This is, not surprisingly, related to the game of nuclear pennies; the game of nuclear pennies is in turn related to the Seven trees in one theorem of Blass, which I wrote about back in July. (Very roughly speaking, the "number of trees" can be shown to be "the primitive sixth root of unity", but the seventh power of this number is itself! Objects of Categories as Complex Numbers, by Marcelo Fiore and Tom Leinster, legitimates this.¹ The game of nuclear pennies works as follows; we start with a semi-infinite strip of squares, which have pennies on them. We can "split" a penny on site n into two pennies, one on site n-1 and one on site n+1; we can "fuse" pennies on sites n-1 and n+1 into one on site n. We can analyze positions in this game using arithmetic in the ring of integers with a sixth root of unity, ω, adjoined. Given a position with a_n coins in the nth position, associate the number

Σ_n a_n ωⁿ

with this position; it turns out that two positions can be reached from each other if and only if their associated numbers are the same. (We have the relations ω³ = -1, ω - ω² = 1.)

In thermonuclear pennies the splitting rule is a bit different; it turns out that "sixth root of unity" can be replaced with "fourth root of unity", which is the imaginary unit. Positions in this game represent Gaussian integers. And there's even a way to multiply the positions, which corresponds to multiplication of Gaussian integers.

Challenge (which may or may not be hard, I haven't thought about it all): develop such a penny-fusing game where the positions can be understood as elements of the ring of integers with a fifth root of unity adjoined. (I don't expect the result to be as nice; for example, Z[e^2πi/5] isn't a lattice in the complex plane. I'm not sure why that's relevant, but it might be.)

1. Is "legitimate" a verb, meaning "to make legitimate"? (The stress in the verb would be on the last syllable.) I think it should be, if it's not already. I've heard "precise" used as a verb (I don't remember how it's pronounced), and "legitimate" strikes me as more phonetically akin to verbs than "precise" is, probably because of the -ate suffix.

20 October 2007

Homophony part two

As at least two readers have pointed out, my claim that the triviality of the homophony group was part of the unwritten folklore is false. Take a look at the paper Quotients Homophones des Groupes Libres/Homophonic Quotients of Free Groups, by Mestre, Schoof, Washington, and Zagier. They use a lot of the same relations that I did in my proof; in particular damn = damn, damned = dammed, barred = bard, bass = base, jeans = genes, ruff = rough, phase = faze. (The last three of these seem to me to be the ones that most people will find.) Like my proof, v is the last generator to fall; they use chivvy = chivy or leitmotif = leitmotiv. I'm not sure how I feel about these; in both cases these seem like alternate spellings, not different words. veldt = felt, as suggested by a commenter on the earlier entry, feels "right" to me; these are quite clearly two different words with different meanings, veldt being a certain kind of open space in Africa, felt being the stuff out of which the tips of markers are made. (I think this is the one I came up with the first time I saw this problem.) They don't seem satisfied with their proofs for v, either; they introduce the space as a 27th generator, and use avowers = of ours to show the triviality of v.

The paper in question is bilingual; the proof that the English homophony group is trivial is given in French, and the proof that the French homophony group is trivial is given in English. I particularly like the acknowledgements:

The third and fourth authors were partially supported by NSF and (by the results of this paper) numerous other government agencies.

A related problem is as follows: consider the group on twenty-six letters A, B, ..., Z with the relations that two words are equivalent if they are permutations of each other and each appears as a word in some dictionary of choice. Identify the center of the group. According to Steven E. Landsburg, "The Jimmy's Book", The American Mathematical Monthly, Vol. 93, No. 8 (Oct., 1986), pp. 636-638, much work was done on this problem at Chicago in the seventies.

To get started: post = pots = stop = opts = spot = tops. Since post = pots, s and t commute; since pots = opts, o and p commute. This is clearly much harder than the homophony group problem. By the way, elation = toenail. (I've been playing lots of Scrabble recently; that set of seven letters seems to come up a lot.)

The homophony group

From Justin Roberts at UCSD, an old homework problem:

"B: (Fun for all the family, ages 6-66...)The homophony group (of English) is the group with 26 generators a,b, ..., z and one relation for every pair of English words which sound the same. Example: "knight = night" shows that k=1, "earn=urn" shows that ea=u, and so on. (Scrabble rules apply; no proper names or slang!) Prove that the group is trivial! (I would guess it's also trivial in French but probably non-trivial in Spanish and very non-trivial in Japanese!)."

I was looking for something else, but I came across this problem, which I saw before in my first algebra class when I'd just learned what a group was; it's part of the folklore, apparently, and I feel like it's good to have the folklore written down. (It might be written down somewhere on the Internet under another name, but it's hard to find because "group" is such a common word in non-mathematical circumstances.)

I can't do it. I did it before, but I remember it took a lot of looking at lists of homophones; I can't find the particular lists of homophones I used years ago. Most of the relations are simpler than they look, because generally if two words in English are spelled the same they have a lot of letters in common. Here's a partial solution -- I prove that the English homophony group is a subgroup of the free group on two generators, namely m and v. (Or maybe three.)

First, we deal with the vowels. The first homophones that come to anyone's mind are probably to = too = two. From to = too we get o = 1. Thus to = two becomes t = tw, so w = 1. (Yes, I know, w isn't a vowel...) Next, consider the relations

rain = rein = reign, sign = sine, earn = urn.

From rain = rein we get a = e. From rain = reign we get ai = eig; but since a = e we can reduce this to i = ig, or g = 1. Thus we can eliminate the "g" in sign = sine to get sin = sine, so e = 1; recalling a = e, we have a = 1. Finally, from earn = urn we have ea = u; since e = a = 1 we have u = 1.

Another candidate for "most canonical homophone" is probably there = their (I'll ignore "they're" because I don't want to deal with apostrophes); we can clip the "the" in each of these to get re = ir. But e = 1, so this is r = ir, or i = 1. Next, i = eye, but i = e = 1, so this reduces to y = 1. (Does "i" count as a word? Of course I mean "I", the first-person singular pronoun. Roberts said "Scrabble rules apply", but Scrabble rules have never had to weigh in on whether "I" is playable. Scrabble rules out words which are "always capitalized" but the way in which "I" is always capitalized seems different from the way that a proper name is.)

At this point we've shown that the letters a, e, g, i, o, u, w, and y are trivial; this set of letters shouldn't be too surprising to anyone reasonable familiar with English spelling.

There are a lot of silent k's; we take no = know. Since w = o = 1, this reduces to n = kn, so k = 1. Similarly, we can leverage the silent h with wine = whine. lam = lamb gives us b = 1, and dam = damn gives n = 1.

Somewhere around here the problem seems to get harder. One key insight is that groups don't have nontrivial idempotents; if we can show that a generator is equal to its square, then it must be equal to 1. This gets us a few more letters: from bard = barred we have r = rr (recall e = 1), so r is trivial. Similarly, from medal = meddle we get dal = ddle; but a = e = 1, so dl = ddl; thus d = 1. This actually isn't how I take care of t, although there are enough double t's in the language that it's robably possible to handle t this way; I take packed = pact, from which ked = t; but k, e, d have already been shown trivial, so t = 1. A bit more trickily, thinking in terms of double letters, bawl = ball gives us w = l; but w = 1, so l = 1.

That's sixteen letters shown trivial so far: a, b, d, e, g, h, i, k, l, n, o, r, t, u, w, y. The remaining letters are c, f, j, m, p, q, s, v, x, and z. These seem trickier somehow... to my ear they have pretty well-defined sound. The easy letters to get rid of, you'll remember, are the vowels, which can basically sound however you want them to in English.

Still, a few more fall quickly. base = bass (the musical kind, not the fish), so s = 1; razor = raiser can now take advantage of the times when s sounds like z, since all the other letters involved are trivial, to give us s = z, so z = 1. The letters that are phonetically kind of k-like fall next; key = quay gives k = q, so q = 1, and choir = quire (the pair I'm proudest of finding, by the way) gives c = q, so c = 1.

Six letters to go: f, j, m, p, v, x.

lox = locks, which takes care of x. genes = jeans, killing j. phase = faze; only the p and f are nontrivial here, so p = f. There are two cheap tricks to handle this. The first is hiccup = hiccough, where all the letters involved are trivial except p, so p = 1; but are those separate words, or alternative spellings of the same word? The second is if = iff, so f = 1. (The problem here, of course, is that iff is a bit iffy as a word... is it really a word, or just an abbreviation for "if and only if"?) Two half-solutions add up to one whole solution, though, so I say that p and f are trivial.

That leaves two generators, m and v. If there are relations between them, they're trivializing ones; we're not going to get a relation like "mv = vm" (which would bring us down from the free group on two generators to the free abelian group on two generators), since that would require two homophonous words in English, both containing m and v, where in one the m comes first and in one the v comes first. I've tried to use the double-letter trick; the closest I can get is "clamor" and "clammer" to get rid of m, but these aren't really pronounced the same.

Any ideas on how to finish this up?

My vocabulary in other languages isn't rich enough to do this. But I agree with Roberts' judgement about French, Spanish, and Japanese; French is full of silent letters, even more so than English.

Edited, 12:24 pm: In the comments, dan suggests dammed = damned to show that m is trivial. And ruff = rough shows f is trivial, which gets around the objections I had before. I want to take advantage of the fact that the f in of is pronounced like v usually is, but I can't.

19 October 2007

The chain of cities

From radical cartography -- a map of Europe with cities that are near each other (within 150 km) and with populations of more than 100,000 connected by lines. In some places (southern England, the Netherlands, Germany, northern Italy) the lines are very dense; in others they are not. The graph has one large connected component, but there are smaller components -- in Scotland, southern Italy, and the coasts of Spain -- which are separate from the main one.

I suspect you'd get similar-looking maps at smaller scales (that is, if you reduced both the population threshold and the distance threshold). I've heard that the number of cities of population at least N scales like 1/N. To get a similar-looking map with smaller cities you'd want the number of cities "near" each one to be the same. So, for example, say you looked at cities of size 25,000 or greater; there should be four times as many of those "near" a given point as cities of size 100,000 or greater. If we then shorten the distances over which we're willing to draw lines by a factor of two, then we should get the same average degree in the resulting graph. What I'm saying is that I suspect the graph of cities of size 25,000 or greater within 75 km of each other has similar qualitative features. (Of course, at some point you run into the problem of how to define distinct "cities", by which I mean centers of population; should distinct neighborhoods of the same legal municipality be counted differently?)

I'd be interested to see a similar map for the United States (since I'm more familiar with the settlement patterns in the U.S. than in Europe), but not interested enough to make it.

In particular, I see a lot of things that look like the fragments of triangular lattices, with the triangles involved being roughly equilateral. (This seems most visible in Poland.) Now, if you imagine cities as circles -- which isn't a horrible approximation, because you don't expect to see two cities too close to each other -- and you pack them in as tightly as possible on the plane, you should see exactly a hexagonal lattice. Of course, cities aren't all equivalent, and you can't just slide them around arbitrarily...

This makes me wonder -- are there any cities which have street plans that are triangular lattices?

Fibonacci tattoo!

From Pink Haired Girl, who I apparently share several mutual friends with: a tattoo to generate the Fibonacci sequence, in Scheme.

There are faster algorithms to generated the Fibonacci sequence, but they obscure the recursive definition; her tattoo follows the simple recursive definition F_n = F_n-1 + F_n-2, which takes O(n) arithmetic operations to compute F_n. Wikipedia's article on Fibonacci numbers gives an identity
F_2n+k = F_k F_n+1² + 2F_k-1F_n+1F_n + F_k-2F_n²

Now, note that we get F_-2 = -1 , F_-1 = 1 , F₀ = 0 by running the recursion backwards. Letting k=1 gives

F_2n+1 = (2F_n+1 - F_n) F_n

and letting k=2 gives
F_2n+2 = F_n+1² + 2F_n+1F_n.
Letting m+1 = n, we get
F_2m = F_m² + 2F_mF_m-1
= F_m(F_m + 2F_m-1)
= F_m(F_m+1 + F_m-1).

Thus we can express F_n in terms of Fibonacci numbers of indices around n/2, and thus find Fibonacci numbers in logarithmic time... but that obscures the central fact about these numbers, which is the recursion that produces them. (There are probably other ways to arrange those identities that give more efficient algorithms than the one I'm alluding to, but I didn't try that hard.) There are also identities for F_n in terms of Fibonacci numbers of index around n/3, for example -- see Wikipedia -- which could be even faster.

The axiom of choice and the wisdom of crowds

(The two things in the title aren't related, but this post is about both of them.)

I just came across this essay on the Axiom of Choice by Eric Schecter. I'm not a logician, so I don't have any deep commentary. It contains two amusing quotes, though, one by Bertrand Russell:

"To choose one sock from each of infinitely many pairs of socks requires the Axiom of Choice, but for shoes the Axiom is not needed."

(the idea here is that you can distinguish between left shoes and right shoes, but not left and right socks), and

The Axiom of Choice is obviously true; the Well Ordering Principle is obviously false; and who can tell about Zorn's Lemma?

where all three of those statements are equivalent.

A classmate of mine, though, just expressed the opinion that logicians "should" be showing that certain problems are "too hard" for current mathematics, by, say, showing that they're equivalent to problems we already know we have no hope of solving with the current state of affairs.

I wonder to what extent it would be possible to use the "wisdom of crowds" to quantify how difficult a given problem is. If one could poll people in a given area and have them say "I think Problem X is harder than Problem Y", you could probably use that information to rank open problems within a given subfield; people's intuitions wouldn't be perfect but if you had enough of them you'd come up with some interesting results. The hard part would be collating the results from various parts of mathematics; if I know Algebra Problem X is harder than Algebra Problem Y, and Analysis Problem Z is harder than Analysis Problem W, then how do you compare X and Z? You'd need links between the various areas to know how to put all that information together, and to get all areas on the same scale. Perhaps what we need is more people offering financial bounties for problems, because hard currency is a scale that everyone can agree on the value of. (Too bad Erdos is dead, and the Clay Foundation only has million-dollar prizes. If some entity were giving out a prize of $1000 for problem A and $2000 for problem B, then that would let me know that the community as a whole considers problem B to be harder than problem A. i don't know about "twice as hard", though; what does that mean?)

The usual "prediction market" methods wouldn't immediately seem to work; people could bet on which problems they expect will be solved first, but just because a problem is solved later doesn't mean it's harder.

Some other interesting things on Schechter's web page include Common Errors in College Math (which I probably should point my students to, come to think of it) and an explicit statement of the "cubic formula", i. e. a solution by radicals to the equation ax³ + bx² + cx + d. (This formula is a couple lines long significantly longer than the quadratic formula, of course, which is why you don't have it memorized. The "quartic formula" is, if I remember correctly, a couple pages, although it contains a large number of the same subexpressions; in both cases there is a shorter way to write down the formula than the actual formula itself.)

18 October 2007

will the best team win?

Will the Best Team Win? Maybe -- by Alan Schwarz, in the October 14 New York Times.

Basically, there's a very good chance that the best team does not win in the Major League Baseball playoffs.

On a related note, often before a playoff series sportswriters will make predictions of the form "[team] in [number of games]". In a best-of-(2n-1) series, if we know the probability p that a team (let's call them the Phillies) wins a game against some other team (as I've done before, let's call them the Qankees), then we can compute the probability that they'll win the series. Let q = 1 - p be the probability that the Qankees win a single game. Then the probability that the Phillies win in k games, where k is between n and 2n-1, can be obtained as follows. There are ${k-1 \choose n-1}$ arrangements of wins and losses in a k-game series that allow the Phillies to win in k games -- they must win n-1 our of the first k-1 games. Each of these occurs with probability pⁿq^k-n. So the probability that the Phillies win in k games is

$P_{n,k}(p) = {k-1 \choose n-1} p^n (1-p)^{k-n}$

and likewise the probability that the Qankees win in k games is
Q_{n,k}(p) = {k-1 \choose n-1} (1-p)^n p^{k-n}

$Q_{n,k}(p) = {k-1 \choose n-1} (1-p)^n p^{k-n}$

(The number n is the number of games needed to win the series; in a best-of-seven series, n = 4.) For example, P_4,6(.6) is the probability that the Phillies win a best-of-seven series in six games, given that they have a .6 probabilty of winning each games; it's ${5 \choose 2} (.6)^3 (.4)^2 = 0.3456$ .

A prediction of the winner of a series and the number of games they win in amounts to a prediction of p. If we assume that the predictor simply predicts the most likely outcome of the series given what they believe p to be, then we want to find the largest of

$P_{n,n}(p), \ldots, P_{n,2n-1}(p), Q_{n,n}(p), \ldots, Q_{n,2n-1}(p)$

.
To do this, we start by finding the ratio $P_{n,k+1}(p)/P_{n,k}(p)$ ; this is k(1-p)/(k-n). If this is greater than 1, it means a win in k+1 games is more likely than a win in n games; it becomes greater than 1 at p=(n-1)/k. So, for example, as we decrease p, a five-game win in a best-of-seven series becomes more likely than a sweep when p = 3/4 (we have n=3 and k=4); a six-game win in a best-of-seven series becomes more likely than a five-game win when p = 3/5; a seven-game win becomes more likely than a six-game win when p = 3/6 = 1/2.

And in general, as we decrease p, a win in (2n-1) games becomes more likely than a win in (2n-2) games when p = (n-1)/(2n-2) = 1/2.

But when p dips below 1/2, that's also when losses should become more likely than wins!

In particular, the ratio between the Phillies' probability of winning in 2n-1 games and that of losing in 2n-1 games is P_n,2n-1(p)/Q_n,2n-1(p) = p/(1-p); if p < 1-p then winning in 2n-1 games can't be the most common outcome. At best, winning in 2n-1 games is as probable as winning in 2n-2 games... when p = 1/2, and at that moment losing in 2n-1 and losing in 2n-2 have the same probability.

Concretely, in a best-of-seven series you should predict that the Phillies:

win in four, if p > 3/4;

win in five, if 3/5 < p < 3/4;

win in six, if 1/2 < p < 3/5;

lose in six, five, or four in cases symmetric to the three above.

If p = 1/2, then the probability of a win in six, win in seven, loss in six, or loss in seven are all the same, 5/32 each.

The point here is that either type of seven-game series is never the sole most likely outcome in this model (although it may be in reality, because games aren't independent -- home-field advantage, who's starting that day, and so on enter into the picture), and that it almost never makes sense to predict a sweep (playoff teams will be evenly enough matched that the worse one should be able to beat the better one more than one-quarter of the time).

Yet four- and seven-game series happen. I'm not saying that these are ridiculously rare events, just that it doesn't make sense to predict them a priori. It's a bit surprising, though -- if you actually played all seven games, 4-3 would be the most common outcome for series than are nearly evenly matched -- but enough of those come from the team already down 4-2 winning the last game that you don't see that in the best of seven format.

Realistically, though, a prediction of "[team] in 7" is just a sportswriter's way of signaling "I think this team is slightly better than its opponent", which is all it should be taken as.

Un-Austria, and some thoughts on population density

Strange Maps brings us a map of Un-Austria, originally due to Babak Fakhamzadeh. The purpose of this map is to illustrate certain statistics about the Austrian people by associating those groups of people with parts of a map; seven percent of Austrians are "not proud of their nationality" and so an area roughly one-fourteenth of the size of Austria is marked as such.

It's an interesting concept, but I see problems with it. First, it gives the impression that people who are in one of the marked groups aren't in any of the others; this isn't true. (In the most extreme case, shouldn't "members of voluntary environmental organizations" be contained entirely within "members of voluntary organizations"?) Second, although I have no concept of the geography of Austria, I'm sure there are parts of it that are more populated than other parts, and I wonder if the map creates false impressions for those people who actually do live in Austria. (For an American example, New Jersey makes up one part in 434 of the land of the United States. Yet one in thirty-four Americans live in New Jersey. I am aware of this, so if someone associated the land making up New Jersey with some one-in-434 subset of Americans, I might get confused. On the other extreme, Carbon County, Wyoming has about the same area, but only about one in twenty thousand Americans lives there. My point is that most people living in a given country probably have some rough idea of how population density in the country is spread out; I suspect that the way they would "intuitively" interpret a map such as this one takes this into account, but not perfectly. (If I had to guess, the population we intuitively perceive to be in a region of a map such as this one is proportional to the integral of the square root of population density, although I just made that up; it somehow seems appealing that we would perceive an area that's actually four times as dense as another to be two times as dense, because that seems to point to some sort of idea of "linear density". Please ignore the fact that the units here don't work.)

17 October 2007

You are in a maze of twisty little equations, all alike.

A long calculation is like playing a computer game.

This is actually somewhat true, for the sort of computer games where you realize that you've gone down the wrong path and have to backtrack. In both cases you're trying to find your way through a maze of twisty little passages, all alike, from the beginning of the maze (the beginning of the game, or the state of having no knowledge) to the end of the maze (the end of the game, or the result of the computation). There are in both cases a small number of ways to succeed (to successfully reach the end of the game or the result of the computation), but usually more than one. (In the case of computation, "how many ways" there are to do a particular computation depends very strongly on what sorts of computations you consider to be "the same".) But there are a much larger number of ways to fail; the problem is to find a path to "success" in some reasonable amount of time.

Neither breadth-first search (where one does all the possible one-step computations, then all the possible two-step computations, and so on) nor depth-first search (where one follows a computation to its logical end, and only then tries something else) are really practical in most cases; it feels like a lot of learning "how to do mathematics" is learning when to give up on one particular approach to a problem and try something completely different. And a large part of that is to get rid of the qualifier "all alike" in my title. If equations look "all alike", one has no way of knowing what to do; mathematical education is as much about developing a facility for knowing what sort of mathematical expressions are amenable to what sort of tools as it is about learning how to use those tools.

Twin primes

Are There Infinitely Many Primes, by D. A. Goldston, via Casting Out Nines. This was based on a talk given to motivated high school students, so you don't need much background to understand it.

Since this is at least nominally a probability blog, I draw your attention to the following: Call p and p+2 a twin prime pair if they're both prime. Then it's conjectured that the number of twin prime pairs with smaller member less than or equal to x, denoted π₂(x), is asymptotically

$2 \left[ \prod_{p > 2} \left( 1 - {1 \over (p-1)^2} \right) \right] \int_2^x {1 \over (\log t)^2} \; dt$

and there is probabilistic reasoning that leads to this, which is a version of the Twin Prime Conjecture; basically, the "probability" that a number near t is prime is 1/(log t), and so the probability that two numbers near t are both prime should be the square of this. So the number of twin prime pairs under x should be just that integral, except that we have to take some divisiblity information into account, which is what the product out front does. If a number n is odd, then n+2 is twice as likely to be prime as it would ``otherwise" be; if n is not divisible by some odd prime p, then n+2 has ``probability" (p-2)/(p-1) of being not divisible by p (we must have that n isn't two less than a multiple of p), which is 1 - 1/(p-1)² times the "naive" probability that n+2 isn't divisible by p, namely (p-1)/p.

I rather like results of this form, where we can guess properties of the primes by assuming that the probability that a given integer is divisible by a prime p is 1/p and these events are independent for different primes p; I've talked about this before, when I sketched a heuristic argument for the Goldbach conjecture. Sure,.the primes have no business behaving randomly. But what reason is there that they should behave nonrandomly?

Extraordinary claims require extraordinary evidence

Here's a book that's available online that I learned about recently: Information Theory, Inference, and Learning Algorithms by David MacKay. I learned about it from Steve at Information Processing; he learned about it from Nerd Wisdom.

I rather like the trend of books being available online. For one thing, it means that I do not have to carry books back and forth between my office and my apartment. (This is unfortunately not true for the calculus book I'm teaching from, which is about 1200 pages; fortunately my officemate has a copy of the book he doesn't use, so I use his copy when I'm on campus and my copy when I'm at home.)

This book seems to be a good introduction to information theory, machine learning, Bayesian inference, etc.; I have not had the chance to read any part of it thoroughly but I have randomly sampled from it and it seems quite interesting.

A few days ago I wrote about the question of finding the next element in the sequence "1, 2, 5, ..." and some of my readers complained that this problem is not well-posed. MacKay, in his chapter on Occam's razor, gives a similar example: what's the next number in the sequence "-1, 3, 7, 11"? You probably say 15. Consider two possible theories -- the sequence is an arithmetic sequence, or it is given by a cubic function of the form cx³ + dx² + e, where c, d, and e are rational numbers. (The omission of the linear term is deliberate; otherwise the first theory would be a case of the second one.) The first one turns out to be much more likely, for the simple reason that there are less parameters to be tweaked! Let's say it's equally likely that the underlying function is linear or cubic; there are just a lot more possible cubic functions, so each particular cubic function is less likely. (For the details, see p. 345 of MacKay.)

By logic such as this, the most likely next element in that sequence is difficult to say, though... should we prefer 10, because then the nth term is given by the simple explicit formula (n-1)²+1? Or should we prefer 12 or 13, which are both given by simple linear recurrences? My instinct is that it depends on where the problem comes from, since the various possible next terms arise from different sorts of combinatorial structures, and in this sense the problem was ill-posed. In reality we wouldn't start out by assuming that all possible theories have equal probability; for one thing, there are infinitely many of them! The "simple" theories (the sequence has an explicit formula which is a polynomial, or a linear recursion with constant coefficients, or something like that) have higher prior probabilities... but given enough evidence, any complicated theory that starts out with nonzero probability of being true could turn out to be true. Extraordinary claims, though -- those with low prior probability -- require extraordinary evidence. There's a nice toy example a little further on in MacKay (p. 351) showing that if you see a piece of a box sticking out to the left of a tree, and a piece of a box of the same height and color sticking out to the right, it is nearly certain that those are actually pieces of the same box, and not two different boxes.

(What I've sketched in the previous paragraph is a bit of a lie, though; mathematical reasoning is rarely anywhere near this fuzzy. One could almost argue that it never is, because if we are so unsure about whether a result is true or not then we just don't call it mathematics.)

PS: I realized that I had a bunch of things I wanted to write about that were kind of piling up, so I might be digging into stuff that made its way through the blogosphere a month or so ago. I would feel no need to apologize for this except that blogs move so quickly.

PPS: The title of this post is due to Carl Sagan. I did not realize that when I titled the post, though, but I knew I'd heard the phrase somewhere before and Google tells me it's his.

16 October 2007

Polynomials with interesting sets of zeros

Polynomials with interesting sets of zeros, from Richard Stanley.

There are certain polynomials which are naturally defined in terms of combinatorial objects, which have interesting sets of zeros. For example, consider the polynomials

F₁(x) = x
F₂(x) = x + x²
F₃(x) = x + x² + x³
F₄(x) = x + 2x² + x³ + x⁴
F₅(x) = x + 2x² + 2x³ + x⁴ + x⁵

where the x^k coefficient of F_n(x) is the number of partitions of n into k parts, that is, the number of ways to write n as a sum of k integers where order doesn't matter. For example, the partitions of 5 are

5, 4+1, 3+2, 3+1+1, 2+2+1, 2+1+1+1, 1+1+1+1+1

which have 1, 2, 2, 3, 3, 4, and 5 parts respectively; that's where F₅ comes from. It turns out that if you take F_n for large n and plot its zeros in the complex plane, you get a nice-looking figure; see this PDF file. This is true of other combinatorially defined families of polynomials as well. The zeroes of this particular set of polynomials are given in some recent work of Robert Boyer and William Goh. (Boyer gave a talk today at Temple which I went to, hence why this is on my mind lately; I'd actually heard about this phenomenon when I took a class from Stanley as an undergrad.)

At least in the case of the partitions, it seems noteworthy that if we sum yⁿ F_n(x) over all n, getting the generating function of all partitions by size and number of parts, we have something nice, namely

$\sum_n F_n(x) y^n = {1 \over (1-xy)(1-xy^2)(1-xy^3) \cdots}$

and a lot of the other sets of polynomials described by Stanley can probably be described similarly. (Boyer and Goh allude to this at the end of their expository paper.) I don't know enough analytic number theory to say something intelligent about this, though; the proof that the pictures keep looking "like that" (in the case of partition) apparently uses some very heavy machinery.

Look Around You (BBC spoof)

At YouTube: Look Around You - 1 - Maths, an episode of the BBC spoof science program Look Around You. (Why don't we have spoof science programs in this country? As it is, I can't tell which parts I found funny because they're mathematically silly and which parts I found silly because they're British.)

Who scribbles meaningless mathematical formulas on the wall, like the kids at the beginning? It often bothers me when meaningless mathematical notation is used for "effect", because it doesn't even look like something a clueless student might scribble; it just takes a bunch of symbols and throws them out there, and I try to understand them but can't. I am incapable of looking at mathematical symbols and not trying to understand them. I suspect this is something like the Stroop effect, which is the experimentally observed effect that it's easier for people to read words which name colors when the ink color matches the colors named by the words than when they disagree. (Incidentally, in this article from the New Republic, excerpted from his new book The Stuff of Thought, Steven Pinker, reporting on an experiment by Don MacKay, says we have even more trouble doing this task when the words are not color words but obscene words; essentially, people can't ignore obscenity. I can't ignore mathematics.)

and I wonder if "Chapter 3.1415926" of the textbook actually exists, since I don't know the context of this. It reminds me of the version number of TeX, which gets one digit longer with each new version; Knuth has requested that after he dies the version number be fixed at π.

There are two people for whom the largest number they could think of was "a hundred thousand" and "nine hundred ninety-nine thousand", which is kind of surprising; why can't you just add one to both of those? At least someone who answered "999,999" is admitting that they don't know the word for the next number. (The program makes light of this by suggesting a very large number as the very largest number... and then adding one to it.)

I also rather liked the following description of a circle: "A uniformly curved line that somehow joins up with itself that science has yet to find a name for. Can you think of a name for it? If you can, the Royal Mathematics Society would like to hear from you, because they hold a competition each year to find a name for this figure." Ah, if only it were so easy. Naming things isn't hard. (Giving them good names, on the other hand, is quite tricky, especially if you restrict yourself to words that are already words existing in natural languages; I want to be sure that the words I pick to name something don't have the wrong connotations. Sure, I can tell people to ignore them, but it's not so easy to actually do so!)

Also, this video is about "maths", not "math". Are other Americans bothered by this, too? I've never been entirely sure what to make of the fact that across the pond, my field is plural.

15 October 2007

The spinning woman

A spinning dancer, from news.com, via Freakonomics and Marginal Revolution (I'm giving those sources because you might want to look at the comments there.)

There is a woman spinning; the question is, is she spinning clockwise or counterclockwise? Supposedly this has something to do with which side of the brain you use, although I'm not sure I buy it.

But my first thought was "clockwise from whose perspective? From the perspective of someone looking from above, or from below?" (To clarify what I mean by "clockwise", I mean that the dancer is turning to her own right.) I think this does say something about my brain, namely that I demand precise definitions for terms, at least for the sorts of terms that can have them.

Some of the commenters at various places say that you have to look at the shadow, because the shadow doesn't agree with the dancer; as far as I can tell this isn't true, and this is not some sort of optical illusion.

The actual resolution is that you can imagine reflecting the dancer through the plane of your monitor (I almost said "the plane of the board", because usually when I'm thinking about three-dimensional pictures it's when I'm teaching calculus), which will change the direction of rotation but not the picture. So on an intellectual level I understand why both perceptions must be possible. But I can't see her going counterclockwise. Not at all.