- Hello, India? I Need Help With My Math, by Steve Lohr, today's New York Times. The article's really about how consumer services, like business services before them, are being offshored; tutoring is just an example.
- Pollock or Not? Can Fractals Spot a Fake Masterpiece?, from Scientific American. The verdict seems to be mixed. Pollock's paintings often contain certain fractal patterns, and certain simple images look "the same" as a Pollock painting in a certain sense. The researchers argue that their work is still valid, though:
"There's an image out there of fractal analysis where you send the image through a computer and if a red light comes on it means it isn't a Pollock and if a green light comes on it is. We have never supported or encouraged such a mindless view."
I'd agree with them, so long as it's more likely for the metaphorical "green light" to turn on when it sees a Pollock than when it sees a non-Pollock; there's no single way to test whether a creative work is by a particular person, other than going back in time and watching them create it. - On the cruelty of really teaching computing science, by the late Edsger Dijkstra. (I've had this one in the queue of "things I want to talk about" for a while, but I don't remember what I wanted to say, so here it is. There are a bunch of similar things which should dribble out in the near future.) But I can't resist commenting on this:
My next linguistical suggestion is more rigorous. It is to fight the "if-this-guy-wants-to-talk-to-that-guy" syndrome: never refer to parts of programs or pieces of equipment in an anthropomorphic terminology, nor allow your students to do so. This linguistical improvement is much harder to implement than you might think, and your department might consider the introduction of fines for violations, say a quarter for undergraduates, two quarters for graduate students, and five dollars for faculty members: by the end of the first semester of the new regime, you will have collected enough money for two scholarships.
I've long felt the same way about mathematical objects. There are exceptions, but for me these are mostly exceptions in which the mathematics describes some algorithm that has input which is actually coming from somewhere. Here it's not so much the program that is getting anthropomorphized as the user.
And why are they always "guys"? How is it that scribbles of chalk on a blackboard, or pixels on a screen, can have gender? Note that I am not suggesting that mathematical objects should be female, or that some of them should be male and some of them should be female, with the choice being made, say, by the flipping of a coin. (Incidentally, the description of mathematical objects as "guys" seems to be much more common at my current institution than at my previous one.)
By the way, Dijkstra is saying here that he thinks computer science should be taught in a formal manner -- proving the correctness of programs alongside actually writing them -- and that to de-emphasize the pragmatic aspect, students shouldn't execute their programs on a computer, since doing so enables them to not think about what the program is doing. I'm not sure if I agree with this.
31 October 2007
links for 2007-10-31
Labels:
art,
computer science,
education,
fractals,
links
Avoid premature optimization
A lot of people who write about algorithms say that you should avoid prematurely optimizing an algorithm until you've analyzed it carefully enough that you know which parts of the algorithm are the parts that slow it down.
There are two ways to do this. The first is, of course, to go through some sort of formal analysis of the algorithm, until you notice something like "hey, I'm evaluating lots of polynomials, shouldn't I think about how I could evaluate them quickly?" And then you'd remember that polynomials can be written in a form like

where the left-hand side requires three multiplications (generally, n multiplications for a polynomial of degree n) and three (generally, n) additions to be evaluated; the right-hand side requirese six (generally, n(n+1)/2 = n + (n-1) + ... + 1) multiplications and three (generally, n) additions. (This is, of course, assuming that you've forgotten what x2 is when you get to evaluating x3. You wouldn't... but a computer probably would!) Generally,
But sometimes it's easier to just step through the algorithm step by step and realize what you find yourself doing over and over again. For example, in evaluating a polynomial anxn + an-1xn-1 + ... + a1x + a0, you'd realize that when you compute xn you start by computing xn-1 and then you multiply it by x; if you did that by hand over and over again you'd eventually want to find a way to speed things up. Wouldn't it be easier if you had already stored the value of xn-1? So you'd first compute the list [1, x, x^2, ... , x^n] by just noticing that each term is x times the previous one, and then you'd multiply those by their respective coefficients ai and add those sums up. This takes 2n multiplications and n additions; it's not quite as good as the form I've given above, but the point is that just storing your intermediate results instead of recalculating them over and over again saves a lot of time. And if you don't realize this from formally analyzing the algorithm, you can realize it by just stepping through the algorithm until you get to the point where you start cursing and thinking "didn't I already do this? Why the *^%)!*!# am I doing it again?"
(By the way, this was inspired by my trying to compute the numbers of factorizations of various integers; it turns out that you want a lot of numbers of the form "the number of factorizations of n into factors at most k" and if you're not careful you can end up computing the same numbers over and over again. So I got careful.)
There are two ways to do this. The first is, of course, to go through some sort of formal analysis of the algorithm, until you notice something like "hey, I'm evaluating lots of polynomials, shouldn't I think about how I could evaluate them quickly?" And then you'd remember that polynomials can be written in a form like
where the left-hand side requires three multiplications (generally, n multiplications for a polynomial of degree n) and three (generally, n) additions to be evaluated; the right-hand side requirese six (generally, n(n+1)/2 = n + (n-1) + ... + 1) multiplications and three (generally, n) additions. (This is, of course, assuming that you've forgotten what x2 is when you get to evaluating x3. You wouldn't... but a computer probably would!) Generally,
But sometimes it's easier to just step through the algorithm step by step and realize what you find yourself doing over and over again. For example, in evaluating a polynomial anxn + an-1xn-1 + ... + a1x + a0, you'd realize that when you compute xn you start by computing xn-1 and then you multiply it by x; if you did that by hand over and over again you'd eventually want to find a way to speed things up. Wouldn't it be easier if you had already stored the value of xn-1? So you'd first compute the list [1, x, x^2, ... , x^n] by just noticing that each term is x times the previous one, and then you'd multiply those by their respective coefficients ai and add those sums up. This takes 2n multiplications and n additions; it's not quite as good as the form I've given above, but the point is that just storing your intermediate results instead of recalculating them over and over again saves a lot of time. And if you don't realize this from formally analyzing the algorithm, you can realize it by just stepping through the algorithm until you get to the point where you start cursing and thinking "didn't I already do this? Why the *^%)!*!# am I doing it again?"
(By the way, this was inspired by my trying to compute the numbers of factorizations of various integers; it turns out that you want a lot of numbers of the form "the number of factorizations of n into factors at most k" and if you're not careful you can end up computing the same numbers over and over again. So I got careful.)
30 October 2007
How to teach physics
In Science Classrooms, a Blast of Fresh O2, by Natalie Angier, from today's New York Times. The article describes physics classes taught by Faye Cascio to ninth-graders in a magnet school, who are answring questions like the following:
If you assume that the river is moving at a constant speed, you can look at this in a frame of reference that moves with the river; in that frame each flotation device is the same distance away, so it doesn't matter which one you swim towards. (Incidentally, I wonder if it is easier or harder to teach this without invoking the word "relativity"; I suspect it's harder, because if you mention to the students that they're doing relativity, they may be aware that this is more modern physics and therefore expect it to be more difficult. But Galilean relativity really is straightforward. This particular problem is of course doable in a frame of reference anchored to the shore, but the algebra's a bit uglier.
The ground speed is slightly larger; it's the length of the hypotenuse of a right triangle with legs equal to the ground speed in still air and the speed of the wind. (But notice that the derivative of the plane's ground speed with respect to the wind speed is zero at zero wind speed. High school freshmen don't know this, though.)
This is something I've been trying to get through my own students' head on their homework, by taking into account their ability to explain the solutions when grading; for the most part it seems to work, although since it's on homework (and not in class) there are always some students who don't bother explaining their work and figure that the points they'll lose are made up for by the time they don't have to spend on the work.
Carl Sagan once said that scientific education in this country was structured so that it would actually attract the people he thought would be worst at being scientists -- people who like routine and who want to just plug and chug their way through the calculations. I agree with this, and I'd add that mathematical education is much the same way. What one does in math classes through at least midway through college bears so little resemblance to mathematical research that the mathematical community probably loses a lot of potentially very good mathematicians, simply because their classes turn them off. (However, the process might produce people who can competently use mathematics, which is something worth keeping in mind. We must remember that part of what we do is to teach the rest of the world how to use our tools.)
You fall into a swiftly moving river and are in need of a flotational device. You see a life preserver bobbing three meters downstream of you and another one the same distance behind. Which preserver should you swim toward?
If you assume that the river is moving at a constant speed, you can look at this in a frame of reference that moves with the river; in that frame each flotation device is the same distance away, so it doesn't matter which one you swim towards. (Incidentally, I wonder if it is easier or harder to teach this without invoking the word "relativity"; I suspect it's harder, because if you mention to the students that they're doing relativity, they may be aware that this is more modern physics and therefore expect it to be more difficult. But Galilean relativity really is straightforward. This particular problem is of course doable in a frame of reference anchored to the shore, but the algebra's a bit uglier.
3) A plane flying into a headwind will have a lower speed, relative to the ground, than it would if it were flying through still air, while a plane traveling with the benefit of a brisk tailwind will have a comparatively greater ground speed. But what about a plane flying through a 90-degree crosswind, a breeze that is buffeting its body side-on? Will its ground speed be higher, lower or no different than it would be in unruffled skies?
The ground speed is slightly larger; it's the length of the hypotenuse of a right triangle with legs equal to the ground speed in still air and the speed of the wind. (But notice that the derivative of the plane's ground speed with respect to the wind speed is zero at zero wind speed. High school freshmen don't know this, though.)
The students can articulate their reasoning because, for one thing, they have no choice.
This is something I've been trying to get through my own students' head on their homework, by taking into account their ability to explain the solutions when grading; for the most part it seems to work, although since it's on homework (and not in class) there are always some students who don't bother explaining their work and figure that the points they'll lose are made up for by the time they don't have to spend on the work.
Nearly all scientists and educators agree that somehow, at some point during their pedagogical odyssey, most Americans get the wrong idea about what science is, and what it is not.
Carl Sagan once said that scientific education in this country was structured so that it would actually attract the people he thought would be worst at being scientists -- people who like routine and who want to just plug and chug their way through the calculations. I agree with this, and I'd add that mathematical education is much the same way. What one does in math classes through at least midway through college bears so little resemblance to mathematical research that the mathematical community probably loses a lot of potentially very good mathematicians, simply because their classes turn them off. (However, the process might produce people who can competently use mathematics, which is something worth keeping in mind. We must remember that part of what we do is to teach the rest of the world how to use our tools.)
Labels:
education,
New York Times,
physics
29 October 2007
Enumerating multiset partitions
Brent Yorgey on enumerating multiset partitions. He came to the problem the same way I did -- I was trying to count the number of factorizations of an integer. An integer of the form pn, where p is a prime, has p(n) factorizations, where p(n) is the partition function; for example

giving the p(5) = 7 factorizations of p5. An integer of the form p1p2...pn has Bn factorizations (the nth Bell number); for example, 2310 = (2)(3)(5)(7)(11) has fifty-two factorizations. But what about, say, an integer of the form p3qr? How many factorizations does it have? Naturally, this associates an integer with each partition; from the number p5 we get the partition 5 and the number of factorizations 15; from the number p1p2p3p4p5 we get the partition 1 + 1 + 1 + 1 + 1 we associate 52.
Unfortunately, I don't really follow Yorgey's work because I don't know Haskell. And I don't know a general way to compute these yet, except in the trivial cases. But, for example, p2 q has four factorizations:
(p2q) = (p2)(q) = (p)(pq) = (p)(p)(q)
and so we associate to 2 + 1 the number 4; this is intermediate between p(3) (which is 3) and B3 (which is 5). This makes sense; in general p(n) counts the number of partitions of {1, ..., n} when no elements are distinguishable, and Bn when all elements are distinguishable. Multiset partitions interpolate between integer partitions and set partitions.
(Yorgey says that the problem is also solved in Volume 4, Fascicle 3 of Knuth, which I may track down. But I want the pleasure of working this out for myself, to some extent.)
Yorgey also says that he found the problem via Project Euler, which gives a couple hundred examples of problems that can, in general, be solved by ridiculously inefficient brute force methods or by elegant methods that require not so much computation.
giving the p(5) = 7 factorizations of p5. An integer of the form p1p2...pn has Bn factorizations (the nth Bell number); for example, 2310 = (2)(3)(5)(7)(11) has fifty-two factorizations. But what about, say, an integer of the form p3qr? How many factorizations does it have? Naturally, this associates an integer with each partition; from the number p5 we get the partition 5 and the number of factorizations 15; from the number p1p2p3p4p5 we get the partition 1 + 1 + 1 + 1 + 1 we associate 52.
Unfortunately, I don't really follow Yorgey's work because I don't know Haskell. And I don't know a general way to compute these yet, except in the trivial cases. But, for example, p2 q has four factorizations:
(p2q) = (p2)(q) = (p)(pq) = (p)(p)(q)
and so we associate to 2 + 1 the number 4; this is intermediate between p(3) (which is 3) and B3 (which is 5). This makes sense; in general p(n) counts the number of partitions of {1, ..., n} when no elements are distinguishable, and Bn when all elements are distinguishable. Multiset partitions interpolate between integer partitions and set partitions.
(Yorgey says that the problem is also solved in Volume 4, Fascicle 3 of Knuth, which I may track down. But I want the pleasure of working this out for myself, to some extent.)
Yorgey also says that he found the problem via Project Euler, which gives a couple hundred examples of problems that can, in general, be solved by ridiculously inefficient brute force methods or by elegant methods that require not so much computation.
Labels:
combinatorics,
Haskell,
Knuth,
partitions
Bayesian gender spam
A Bayesian explanation of how to determine the gender of a person on the street (from observable cues), by Meep.
It's rather similar to Bayesian spam filtering (Paul Graham, see also here). The major difference is that one can generally assume that most e-mail is spam, whereas one cannot assume that most people are of one or the other of the two canonical genders.
In the spam filtering case, though, it doesn't seem that the prior probability that a message is spam matters; Graham claims that most e-mails are either very likely or very unlikely to be spam. But there are probably more words in an e-mail than there are easily observable cues to a person's gender; it seems much more likely to get, say, that a person has a 60% probability of being male than that an e-mail has a 60% probability of being spam.
Also, it's a lot easier to collect the necessary for spam filtering than for gender determination.
It's rather similar to Bayesian spam filtering (Paul Graham, see also here). The major difference is that one can generally assume that most e-mail is spam, whereas one cannot assume that most people are of one or the other of the two canonical genders.
In the spam filtering case, though, it doesn't seem that the prior probability that a message is spam matters; Graham claims that most e-mails are either very likely or very unlikely to be spam. But there are probably more words in an e-mail than there are easily observable cues to a person's gender; it seems much more likely to get, say, that a person has a 60% probability of being male than that an e-mail has a 60% probability of being spam.
Also, it's a lot easier to collect the necessary for spam filtering than for gender determination.
Labels:
Bayesian inference,
gender,
spam
28 October 2007
Sorting with error
Here are three problems that have occurred to me, somewhat randomly, over the last few weeks.
1. I mentioned here that it would probably be relatively easy to rank, say, all the open problems in analysis by their perceived difficulty to expert analysts, and all the open problems in algebra by their perceived difficulty to expert algebraists. But how could we merge these two rankings together, given that comparing an algebra problem and an analysis problem is probably much more difficult and error-prone than comparing two algebra problems or two analysis problems? For example, comparing an arbitrary algebra problem and an arbitrary analysis requires finding someone who is fluent enough in the relevant areas to be able to really understand the problems! (If you object to the fact that "algebra" and "analysis" are too broad for this to be true, just subdivide the areas; the idea is still the same. We can rank problems within suitably small subdisciplines of mathematics; how do we merge those rankings into a ranking of all mathematical problems?)
2. Most people are attracted either to men or to women, not to both. (I'm assuming this for the sake of this problem.) It seems reasonably easy to rank-order the set of all men in order of attractiveness, and the set of all women in the same way; but again, how do you merge these two? How do you compare the attractiveness of a man to the attractiveness of a woman? Oddly enough, I think this one's easy to solve because of "assortative mating", which basically means "people will end up with people who are about as attractive as themselves". (Incidentally, although this seems obvious to anyone who's ever been around people, how can it be derived from first principles?) But the information we get from assortative mating is likely to be imperfect, again; there are plenty of relationships where one member of the couple is more attractive than the other.
3. Major League Baseball has two leagues. We have a mechanism for determining the "best" team in each league. The common wisdom now is that the American League is the better of the two leagues; look at this year's World Series, which the Red Sox are currently leading three games to none. (But don't look at last year's.) But that doesn't necessarily mean that, say, the eighth-best team in the AL is better than the eighth-best team in the NL. There's interleague play, but most of the time that's only a single three-game series between any pair of teams, which really doesn't give much information. (And most pairs of teams in different leagues won't play each other at all in any given season.) Let's assume, for the sake of argument, that we have to use just which teams have won and lost against each other, and maybe the scores of the games; we can't use the fact that teams are made up of players, and compare the AL and NL statistics of a player who has played in both leagues.
The common thread is that in all three cases we have a pair of lists which we believe are each correctly rank-ordered, but comparison between the lists is difficult, or we're doing our rankings based on already collected data and we can't go out and get more, and sometimes our comparisons might be wrong (for example, in the baseball scenario, the team which wins the season series between two teams is not necessarily better). All the analysis of sorting algorithms I've seen (which is, I'll admit, not that much) seems to assume that all comparisons are equally easy and there are no errors. That's a nice place to start, but it hardly seems a place to stop.
1. I mentioned here that it would probably be relatively easy to rank, say, all the open problems in analysis by their perceived difficulty to expert analysts, and all the open problems in algebra by their perceived difficulty to expert algebraists. But how could we merge these two rankings together, given that comparing an algebra problem and an analysis problem is probably much more difficult and error-prone than comparing two algebra problems or two analysis problems? For example, comparing an arbitrary algebra problem and an arbitrary analysis requires finding someone who is fluent enough in the relevant areas to be able to really understand the problems! (If you object to the fact that "algebra" and "analysis" are too broad for this to be true, just subdivide the areas; the idea is still the same. We can rank problems within suitably small subdisciplines of mathematics; how do we merge those rankings into a ranking of all mathematical problems?)
2. Most people are attracted either to men or to women, not to both. (I'm assuming this for the sake of this problem.) It seems reasonably easy to rank-order the set of all men in order of attractiveness, and the set of all women in the same way; but again, how do you merge these two? How do you compare the attractiveness of a man to the attractiveness of a woman? Oddly enough, I think this one's easy to solve because of "assortative mating", which basically means "people will end up with people who are about as attractive as themselves". (Incidentally, although this seems obvious to anyone who's ever been around people, how can it be derived from first principles?) But the information we get from assortative mating is likely to be imperfect, again; there are plenty of relationships where one member of the couple is more attractive than the other.
3. Major League Baseball has two leagues. We have a mechanism for determining the "best" team in each league. The common wisdom now is that the American League is the better of the two leagues; look at this year's World Series, which the Red Sox are currently leading three games to none. (But don't look at last year's.) But that doesn't necessarily mean that, say, the eighth-best team in the AL is better than the eighth-best team in the NL. There's interleague play, but most of the time that's only a single three-game series between any pair of teams, which really doesn't give much information. (And most pairs of teams in different leagues won't play each other at all in any given season.) Let's assume, for the sake of argument, that we have to use just which teams have won and lost against each other, and maybe the scores of the games; we can't use the fact that teams are made up of players, and compare the AL and NL statistics of a player who has played in both leagues.
The common thread is that in all three cases we have a pair of lists which we believe are each correctly rank-ordered, but comparison between the lists is difficult, or we're doing our rankings based on already collected data and we can't go out and get more, and sometimes our comparisons might be wrong (for example, in the baseball scenario, the team which wins the season series between two teams is not necessarily better). All the analysis of sorting algorithms I've seen (which is, I'll admit, not that much) seems to assume that all comparisons are equally easy and there are no errors. That's a nice place to start, but it hardly seems a place to stop.
27 October 2007
The Second Law of Organization
Doyne Farmer, "The Second Law of Organization", a chapter in The Third Culture: Beyond the Scientific Revolution. (It looks to me like you can read the whole book online, but if you like having a paper book in your hands, you can buy it from Amazon.
Farmer was trained as a physicist; he's been referred to as one of the "pioneers of chaos theory" and has also done work in finance.
He writes, about his disappointment at what he learned in physics classes in college:
This is a more dignified way of saying that "shit happens", which is something that people who are planning things forget. It's why large projects always seem to be late and over budget. The planners think they've factored in everything with, say, more than a 5% chance of happening. (In the car analogy, when setting out for their trip they've thought that there might be heavy traffic, or perhaps if it's raining they've considered the possibility of flooding on some flood-prone road.) But there are a zillion things that could happen, and you can't factor all of those in individually. But if there are enough of those rare things, the probability of one occuring will be close to one, or the expected number of them will be quite large. (Nassim Taleb has written about this in the context of financial markets.) Unfortunately, when you're writing the budget you can't put a line for "shit that will happen". This is actually the same idea that I was previously talking about; large systems (like all the cars in a given city) will have fairly stable properties, even when individual cars (that one that just drove by outside my window) don't.
(By the way, the "Second Law" that Farmer is referring to is a reference to the Second Law of Thermodynamics; he's basically saying that order appears out of chaos often enough that this apparent contradiction of the Second Law of Thermodynamics is itself a law.)
Farmer was trained as a physicist; he's been referred to as one of the "pioneers of chaos theory" and has also done work in finance.
He writes, about his disappointment at what he learned in physics classes in college:
"And most of physics was still focused on pushing and pulling, on the material properties of the universe rather than on its informational properties. By informational properties, I mean those that relate to order and disorder."I have the sense that this is still true, although I'm not a physicist; certainly it is true of the physics classes that my physics-major friends took. And there seems to be an analogy in mathematics -- it's becoming more and more possible to predict things about large random structures. (For example, one can say quite a bit about what a random partition of a large integer, or a random set, or any number of other large random structures looks like; in those cases where there aren't proofs, one can still do simulations and see that "yes, they always look the same", at least if you squint.) It seems to be mostly the people with some CS background that are interested in this, because these are the sort of questions that arise naturally in the analysis of algorithms. But there really do seem to be two different ways to look at a lot of these objects -- you can look at individual trees, but what's the point, since you're probably never going to see any particular tree with 1000 leaves. Or you can look at the class of trees... a beautiful object on its own.
"Many of us find this view [that that everything depends on a set of disconnected "cosmic accidents."] implausible. Why would so many different types of order occur? Why would our situation be so special? It seems more plausible to assume that "accidents tend to happen." An individual automobile wreck may seem like a fluke- -in fact, most automobile wrecks may seem like flukes — but on average they add up. "
This is a more dignified way of saying that "shit happens", which is something that people who are planning things forget. It's why large projects always seem to be late and over budget. The planners think they've factored in everything with, say, more than a 5% chance of happening. (In the car analogy, when setting out for their trip they've thought that there might be heavy traffic, or perhaps if it's raining they've considered the possibility of flooding on some flood-prone road.) But there are a zillion things that could happen, and you can't factor all of those in individually. But if there are enough of those rare things, the probability of one occuring will be close to one, or the expected number of them will be quite large. (Nassim Taleb has written about this in the context of financial markets.) Unfortunately, when you're writing the budget you can't put a line for "shit that will happen". This is actually the same idea that I was previously talking about; large systems (like all the cars in a given city) will have fairly stable properties, even when individual cars (that one that just drove by outside my window) don't.
(By the way, the "Second Law" that Farmer is referring to is a reference to the Second Law of Thermodynamics; he's basically saying that order appears out of chaos often enough that this apparent contradiction of the Second Law of Thermodynamics is itself a law.)
26 October 2007
Trickery with divergent sums
When |q| < 1, 1 - q + q2 - q3 + ... = 1/(1+q).
And (1 - q + q2 - q3 + ...)2 = 1 - 2q + 3q2 - 4q3 + ..., viewing both sides as formal power series.
Thus we have
1 - 2q + 3q2 - 4q3 + ... = 1/(1+q)2
as formal power series. Let q = 1; then
1 - 2 + 3 - 4 + ... = 1/4.
Isn't that weird?
I learned this one today; it's an example of "Abel summation". Basically, in order to sum a sequence, we take its generating function and evaluate it at 1. Nothing too controversial there; at least to me it seems like the natural thing to do. But it looks a little more mysterious if you see it as I first did, which is without the q's. It seems "obvious" that 1 - 1 + 1 - 1 + 1 - 1 + ... = 1/2, if it's going to equal anything; we define such an infinite sum as the limit of its partial sums, which are alternately zero and one, so we should take something halfway between them. But then multiply
(1 - 1 + 1 - 1 + 1 - 1 + ...) (1 - 1 + 1 - 1 + 1 - 1 + 1 ...)
and you get 1 - 2 + 3 - 4 + 5 - 6 ..., in the following sense: let
(a0 + a1 + a2 + ...) and (b0 + b1 + b2 + ...)
be formal sums. Then we define
(a0 + a1 + a2 + ...) (b0 + b1 + b2 + ...) = c0 + c1 + c2 + ...
where
c0 = a0b0, c1 = a1b0 + a0b1, c2 = a2b0 + a1b1 + a0b2
and so on. (The definition is of course inspired by the definition on formal power series.) Then here we have ai = bi = (-1)i, and so each term in the sum defining cn is (-1)n; there are n+1 such terms. So cn = (-1)n(n+1), which gives that sum. Substituting the value 1 - 1 + 1 - 1 + ... = 1/2 into the above equation gives
1 - 2 + 3 - 4 + 5 - 6 + ... = 1/4.
Suppressing the q's makes this seem a whole lot more mysterious.
There's a Wikipedia article that shows how you can write out that sum four times and rearrange the terms to get 1; that doesn't seem all that satisfying to me, though, because with similar methods one can show that 1 - 1 + 1 - 1 + 1 - ... equals either zero or one.
John Baez has reproduced Euler's "proof" that 1 + 2 + 3 + 4 + ... = ζ(-1) = -1/12, which to me seems even crazier. But here's another reason why ζ(-1) = 1/12 "makes sense", if you buy 1 - 2 + 3 - 4 + ... = 1/4.
Let S = 1 + 2 + 3 + 4 + ...; add this "term by term" to 1/4 = 1 - 2 + 3 - 4 + ... . Then you get
S + 1/4 = 2 + 0 + 6 + 0 + 10 + 0 + ...
Then divide through by 2 to get
S/2 + 1/8 = 1 + 0 + 3 + 0 + 5 + 0 + 7 + 0 + 9 + ...
but it's clear that
2S = 2 + 4 + 6 + 8 + ...
and we might as well just stick some zeroes in there to get
2S = 0 + 2 + 0 + 4 + 0 + 6 + 0 + 8 + ...
(If we were still doing this with the q-analogue, this would amount to replacing q with q1/2.) Adding these two sums together, term-by-term, gives
5S/2 + 1/8 = 1 + 2 + 3 + 4 + 5 + ...
but we know the sum on the right-hand side is just S. So we have 5S/2 + 1/8 = S; solving for S gives S = -1/12. (I make no guarantee that this is the "most efficient" way to get ζ(-1).)
A basically identical calculation, starting with
and thus (letting q = 1) -1/8 = 1 - 8 + 27 - 64 + ... yields ζ(-3) = 1/120.
Similarly, we can "compute" ζ(-2); begin with the generating function
and letting q = 1, we get 0 = 1 - 4 + 9 - 16 + ...; let S = 1 + 4 + 9 + 16 + ... and add the two sums. Then S = 2 + 0 + 18 + 0 + ...; divide by two to get S/2 = 1 + 0 + 9 + 0 + ...
But remember that S = 1 + 4 + 9 + 16 + ...; thus 4S = 0 + 4 + 0 + 16 + 0 + 36 + ... if we let ourselves stick in zeros arbitrarily. Adding these two expressions, we get
9S/2 = 1 + 4 + 9 + 16 + 25 + ...
or 9S/2 = S; thus S = 0, which is in line with the standard result that ζ(-2n) = 0 for all integers n. This is a consequence of the functional equation for the zeta function,
since when s = -2n for some positive integer n, the right-hand side is zero.
And (1 - q + q2 - q3 + ...)2 = 1 - 2q + 3q2 - 4q3 + ..., viewing both sides as formal power series.
Thus we have
1 - 2q + 3q2 - 4q3 + ... = 1/(1+q)2
as formal power series. Let q = 1; then
1 - 2 + 3 - 4 + ... = 1/4.
Isn't that weird?
I learned this one today; it's an example of "Abel summation". Basically, in order to sum a sequence, we take its generating function and evaluate it at 1. Nothing too controversial there; at least to me it seems like the natural thing to do. But it looks a little more mysterious if you see it as I first did, which is without the q's. It seems "obvious" that 1 - 1 + 1 - 1 + 1 - 1 + ... = 1/2, if it's going to equal anything; we define such an infinite sum as the limit of its partial sums, which are alternately zero and one, so we should take something halfway between them. But then multiply
(1 - 1 + 1 - 1 + 1 - 1 + ...) (1 - 1 + 1 - 1 + 1 - 1 + 1 ...)
and you get 1 - 2 + 3 - 4 + 5 - 6 ..., in the following sense: let
(a0 + a1 + a2 + ...) and (b0 + b1 + b2 + ...)
be formal sums. Then we define
(a0 + a1 + a2 + ...) (b0 + b1 + b2 + ...) = c0 + c1 + c2 + ...
where
c0 = a0b0, c1 = a1b0 + a0b1, c2 = a2b0 + a1b1 + a0b2
and so on. (The definition is of course inspired by the definition on formal power series.) Then here we have ai = bi = (-1)i, and so each term in the sum defining cn is (-1)n; there are n+1 such terms. So cn = (-1)n(n+1), which gives that sum. Substituting the value 1 - 1 + 1 - 1 + ... = 1/2 into the above equation gives
1 - 2 + 3 - 4 + 5 - 6 + ... = 1/4.
Suppressing the q's makes this seem a whole lot more mysterious.
There's a Wikipedia article that shows how you can write out that sum four times and rearrange the terms to get 1; that doesn't seem all that satisfying to me, though, because with similar methods one can show that 1 - 1 + 1 - 1 + 1 - ... equals either zero or one.
John Baez has reproduced Euler's "proof" that 1 + 2 + 3 + 4 + ... = ζ(-1) = -1/12, which to me seems even crazier. But here's another reason why ζ(-1) = 1/12 "makes sense", if you buy 1 - 2 + 3 - 4 + ... = 1/4.
Let S = 1 + 2 + 3 + 4 + ...; add this "term by term" to 1/4 = 1 - 2 + 3 - 4 + ... . Then you get
S + 1/4 = 2 + 0 + 6 + 0 + 10 + 0 + ...
Then divide through by 2 to get
S/2 + 1/8 = 1 + 0 + 3 + 0 + 5 + 0 + 7 + 0 + 9 + ...
but it's clear that
2S = 2 + 4 + 6 + 8 + ...
and we might as well just stick some zeroes in there to get
2S = 0 + 2 + 0 + 4 + 0 + 6 + 0 + 8 + ...
(If we were still doing this with the q-analogue, this would amount to replacing q with q1/2.) Adding these two sums together, term-by-term, gives
5S/2 + 1/8 = 1 + 2 + 3 + 4 + 5 + ...
but we know the sum on the right-hand side is just S. So we have 5S/2 + 1/8 = S; solving for S gives S = -1/12. (I make no guarantee that this is the "most efficient" way to get ζ(-1).)
A basically identical calculation, starting with
Similarly, we can "compute" ζ(-2); begin with the generating function
But remember that S = 1 + 4 + 9 + 16 + ...; thus 4S = 0 + 4 + 0 + 16 + 0 + 36 + ... if we let ourselves stick in zeros arbitrarily. Adding these two expressions, we get
9S/2 = 1 + 4 + 9 + 16 + 25 + ...
or 9S/2 = S; thus S = 0, which is in line with the standard result that ζ(-2n) = 0 for all integers n. This is a consequence of the functional equation for the zeta function,
Labels:
generating functions,
number theory
25 October 2007
Something I didn't know about generating functions (and probably should have)
I learned today, while reading Flajolet and Sedgewick's Introduction to the Analysis of Algorithms, something that I probably should have known but didn't. I should have known this because I've learned what ordinary and exponential generating functions are about four times now (since combinatorics classes often don't assume that much background, they seem to get defined over and over again).
Namely, given a sequence a0, a1, a2, ..., we can form its exponential generating function
. Then the ordinary generating function
is given by the integral

when this integral exists. (This is something like the Laplace transform.)
The proof is straightforward, and I'll give it here. (I'll ignore issues of convergence.) Consider that integral; by recalling what A(z) is, it can be rewritten as

and we can interchange the integral and the sum to get

But we can pull out the part of the integrand which doesn't depend on t to get

and that integral is well-known as the integral representation for the gamma function. The integral is k!, which cancels with the k! in the denominator, and we recover
.
It's easy to come up with examples to verify this; for example, the exponential generating function of the binomial coefficients
is z2ez/2; plugging this into our formula gives z2(1-z)-3, which is the ordinary generating function of the same sequence.
I'm not sure how useful it is to know this, though... I can't think of too many times where I had an exponential generating function for some sequence and thought "oh, if only I had the ordinary generating function."
Namely, given a sequence a0, a1, a2, ..., we can form its exponential generating function
when this integral exists. (This is something like the Laplace transform.)
The proof is straightforward, and I'll give it here. (I'll ignore issues of convergence.) Consider that integral; by recalling what A(z) is, it can be rewritten as
and we can interchange the integral and the sum to get
But we can pull out the part of the integrand which doesn't depend on t to get
and that integral is well-known as the integral representation for the gamma function. The integral is k!, which cancels with the k! in the denominator, and we recover
It's easy to come up with examples to verify this; for example, the exponential generating function of the binomial coefficients
I'm not sure how useful it is to know this, though... I can't think of too many times where I had an exponential generating function for some sequence and thought "oh, if only I had the ordinary generating function."
Labels:
calculus,
generating functions
Math for America plays with numbers
From the November 2007 Notices, an ad for Math for America (p. 1305):
"Do you know someone who loves π as much as pie? Would they also love a full-tuition scholarship for a master's degree in mathematics education, a New York State Teaching Certificate, and a $90,000 stipend in addition to a competitive salary as a New York City secondary school math teacher? Math for America... [etc.]"
I don't know about you, but when I see a five-figure number followed by the word "stipend" I automatically assume it's an annual stipend. It seems somehow disingenuous to put that number there; it turns out it's a five-year stipend. This is in addition to the usual salary one gets for teaching, though; the idea appears to be that this program is attracting teachers who actually know math by making up at least some of the difference between what they would make teaching and what they could make elsewhere. Also, the people in this program receive a full-tuition scholarship for a master's in math ed.
They don't report it as "$18,000 per year for five years" is because it's not; it's paid as $28,000 in the first year (which is mostly spent being trained as a teacher, and which doesn't carry a salary) and $11,000, $14,000, $17,000, and $20,000 in the second through fifth years (these are in addition to the usual salary a New York City public school teacher would receive). Still, why not say "$90,000 over five years"? I feel like they're trying to fool the people reading the ad -- but the people reading the ad are the people in the world who are least likely to be fooled by tricks played with numbers.
I'm not saying that the financial package isn't valuable. I'm just saying that it feels like the ad is hiding something because they don't give the time period.
It reminds me of a letter I got from a graduate school which claimed that my support package was something like $50,000 per year, which was actually $20,000 in annual stipend and $30,000 in tuition. (This was a school with a comparatively high tuition, obviously; I suspect they did this because the $50K number was larger than schools with lower tuitions but the same stipend would have reported.) But anybody who's actually comparing financial offers would think of them as "full tuition plus [dollar amount]" and not even care what the dollar amount of the full tuition was.
"Do you know someone who loves π as much as pie? Would they also love a full-tuition scholarship for a master's degree in mathematics education, a New York State Teaching Certificate, and a $90,000 stipend in addition to a competitive salary as a New York City secondary school math teacher? Math for America... [etc.]"
I don't know about you, but when I see a five-figure number followed by the word "stipend" I automatically assume it's an annual stipend. It seems somehow disingenuous to put that number there; it turns out it's a five-year stipend. This is in addition to the usual salary one gets for teaching, though; the idea appears to be that this program is attracting teachers who actually know math by making up at least some of the difference between what they would make teaching and what they could make elsewhere. Also, the people in this program receive a full-tuition scholarship for a master's in math ed.
They don't report it as "$18,000 per year for five years" is because it's not; it's paid as $28,000 in the first year (which is mostly spent being trained as a teacher, and which doesn't carry a salary) and $11,000, $14,000, $17,000, and $20,000 in the second through fifth years (these are in addition to the usual salary a New York City public school teacher would receive). Still, why not say "$90,000 over five years"? I feel like they're trying to fool the people reading the ad -- but the people reading the ad are the people in the world who are least likely to be fooled by tricks played with numbers.
I'm not saying that the financial package isn't valuable. I'm just saying that it feels like the ad is hiding something because they don't give the time period.
It reminds me of a letter I got from a graduate school which claimed that my support package was something like $50,000 per year, which was actually $20,000 in annual stipend and $30,000 in tuition. (This was a school with a comparatively high tuition, obviously; I suspect they did this because the $50K number was larger than schools with lower tuitions but the same stipend would have reported.) But anybody who's actually comparing financial offers would think of them as "full tuition plus [dollar amount]" and not even care what the dollar amount of the full tuition was.
Abstract machines
From The Geomblog: "Made in the USA" can be Revitalized, in yesterday's San Jose Mercury-News, by Ken Goldberg and Vijay Kumar.
As Suresh puts it, they want to Put the Turing in Manufac-turing" -- to bring manufacturing to a higher level of abstraction, so that manufacturing processes can be understood independent of the particular substrate they lie on.
Eventually figuring out how to manufacture things could become another branch of the analysis of algorithms. And even if it doesn't, thinking at a higher level of abstraction might enable people to conceptualize more complex manufacturing processes -- and manufacturing processes will become more complex in the future.
Finally, it doesn't seem surprising that this article was published in San Jose -- the city that perhaps more than any other was transformed by the abstraction of computing.
p. s. The simplest possible Turing machine is actually universal.
As Suresh puts it, they want to Put the Turing in Manufac-turing" -- to bring manufacturing to a higher level of abstraction, so that manufacturing processes can be understood independent of the particular substrate they lie on.
Eventually figuring out how to manufacture things could become another branch of the analysis of algorithms. And even if it doesn't, thinking at a higher level of abstraction might enable people to conceptualize more complex manufacturing processes -- and manufacturing processes will become more complex in the future.
Finally, it doesn't seem surprising that this article was published in San Jose -- the city that perhaps more than any other was transformed by the abstraction of computing.
p. s. The simplest possible Turing machine is actually universal.
24 October 2007
The meta-Minkowski conjecture
In a talk by Eva Bayer-Fluckiger today at UPenn, I learned about Minkowski's conjecture, which is as follows:
Let K be a number field of degree n, OK its ring of integers, and DK the absolute value of the discriminant of K. Then the Euclidean minimum of K is the smallest μ such that for any x in K, there exists y in OK such that the absolute value of the norm of x-y is less than μ; that is, there's always an algebraic integer "within" M(K) of every element of K. If M(K) < 1 then OK is Euclidean (this is essentially a rephrasing of the definitin); if M(K) > 1 then OK isn't Euclidean; if M(K) = 1 we don't know. Then we have
for totally real number fields K of degree n. (I'm taking this from Eva Bayer-Fluckiger and Gabriele Nebe, On the Euclidean minimum of some real number fields, Journal de Théorie des Nombres de Bordeaux 17 (2005), 437-454, with some rephrasing.)
Curtis McMullen recently proved this for n=6 (link is to a preprint "Minkowski's conjecture, well-rounded lattices and topological dimension", on his web page). This follows proofs:
and so it seems like we get one higher degree every thirty years! This leads me to formulate the following:
(Sketch of heuristic argument: in approximately the year 1840 + 30n, the degree-n case will be proven. So if we wait long enough, Minkowski's conjecture will be proven for any finite value of n.)
Per a question that was asked at the talk, it seems like the approach in each of these papers was substantially different. McMullen gives sufficient conditions for proving Minkowski's conjecture for any given value of n, and it turns out that those conditions were already known for n = 6.
I wonder to what extent it's possible to predict the mathematical future from the mathematical past, in the case of problems such as this where there's a numerical measure of ``incremental" progress. Often I've seen tables where an improving sequence of constants is given for some problem (for example, a sequence of improving constants in some inequality). It's sort of a Moore's Law for mathematical problems.
When would I make a conjecture that a proof for all n would be found? If, say, we had proofs of some statement for n = 1 in 1960, n = 2 in 1990, n = 3 in 2000, n = 4 in 2005, and in general n = k in 2020 - 60/k; then I'd guess that the full problem gets solved in 2020. Of course, in a case like this the proofs would pile on top of each other, and in reality one would be likely to see large numbers of values of n disappearing in one fell swoop sometime in the 2010s.
Let K be a number field of degree n, OK its ring of integers, and DK the absolute value of the discriminant of K. Then the Euclidean minimum of K is the smallest μ such that for any x in K, there exists y in OK such that the absolute value of the norm of x-y is less than μ; that is, there's always an algebraic integer "within" M(K) of every element of K. If M(K) < 1 then OK is Euclidean (this is essentially a rephrasing of the definitin); if M(K) > 1 then OK isn't Euclidean; if M(K) = 1 we don't know. Then we have
Minkowski's conjecture: M(K) ≤ 2-n (Dk)1/2
for totally real number fields K of degree n. (I'm taking this from Eva Bayer-Fluckiger and Gabriele Nebe, On the Euclidean minimum of some real number fields, Journal de Théorie des Nombres de Bordeaux 17 (2005), 437-454, with some rephrasing.)
Curtis McMullen recently proved this for n=6 (link is to a preprint "Minkowski's conjecture, well-rounded lattices and topological dimension", on his web page). This follows proofs:
- for n=2, Minkowski, circa 1900;
- for n=3, Remak, 1928;
- for n=4, Dyson, 1948;
- for n=5, Skubenko, 1976;
and so it seems like we get one higher degree every thirty years! This leads me to formulate the following:
Meta-Minkowski conjecture: Minkowski's conjecture is true, but will never be proven.
(Sketch of heuristic argument: in approximately the year 1840 + 30n, the degree-n case will be proven. So if we wait long enough, Minkowski's conjecture will be proven for any finite value of n.)
Per a question that was asked at the talk, it seems like the approach in each of these papers was substantially different. McMullen gives sufficient conditions for proving Minkowski's conjecture for any given value of n, and it turns out that those conditions were already known for n = 6.
I wonder to what extent it's possible to predict the mathematical future from the mathematical past, in the case of problems such as this where there's a numerical measure of ``incremental" progress. Often I've seen tables where an improving sequence of constants is given for some problem (for example, a sequence of improving constants in some inequality). It's sort of a Moore's Law for mathematical problems.
When would I make a conjecture that a proof for all n would be found? If, say, we had proofs of some statement for n = 1 in 1960, n = 2 in 1990, n = 3 in 2000, n = 4 in 2005, and in general n = k in 2020 - 60/k; then I'd guess that the full problem gets solved in 2020. Of course, in a case like this the proofs would pile on top of each other, and in reality one would be likely to see large numbers of values of n disappearing in one fell swoop sometime in the 2010s.
Some thoughts on the Rockies' streak
Are the Rockies Really That Good, or Just Lucky?, from The Numbers Guy at the Wall Street Journal.
There's no real conclusion, though. The Rockies probably think it's because they're good. But if you wait long enough, this sort of streak will happen to somebody. What one really wants to know is the number of such streaks that should have happened, historically; this should then be compared to the number of such streaks that actually did happen.
The problem with this is that all the quick and dirty analyses will assume that all teams win exactly half their games, so the probability of winning at least 21 out of 22 games is (22+1)/222, or about five in a million. But this isn't true. I suspect that the actual distribution of how good teams are is roughly symmetrical around .500, perhaps skewed a little bit to the left (there are more ways to be a bad team than to be a good team), so for every .300 team there should be a .700 team. The .300 team has probability (.3)21 (22(.7) + .3), or about one in six billion, of winning 21 out of 22; the .700 team has probability (.7)21 (22(.3) + .7), or about four in a thousand. (This is the figure the WSJ reports as "1 in 245".) So to really know how many long streaks are expected, we need to know the historic distribution of team abilities. If there are teams that are "really" .700 teams fairly often, we should expect that these streaks happen fairly regularly; if the best teams are usually only around, say, .600, probably not. And to know how likely the Rockies were to have this streak, we have to know how good they "really" are; are they the 76-72 team they were before the streak started, the 21-1 team they've been since, or somewhere in between? And would anybody have noticed if they had won 21 out of 22, say, starting in the middle of June?
And let's not even go into the fact that games aren't really independent; even if one doesn't believe that teams "get hot", they play in different parks, with different starting pitchers each day, and so on. As at least one commenter at the WSJ pointed out, streaks like this are more likely to happen in September than any other time, because in September you have teams (like the Rockies) that need to win games and teams that have just given up. The Rockies needed to win 14 out of 15 to end the regular season. (There was a point in late September when I thought I had the National League playoff situation all figured out... then I looked at the standings and thought "when the ^)%$!*)#$)^ did Colorado win ten in a row?")
My point is that I don't know the answer, either. I just am not happy that their streak happened to knock the Phillies out of the playoffs for the first time in fourteen years, and I hope that the Red Sox (who I started following in college) don't also become victims of it.
(By the way, my argument for not having more interleague play than there is is that right now, it's possible to be a fan of one National League team and one American League team and not have too many conflicts of interest. With more interleague play that would become less likely. Also, since I like the Phillies and the Red Sox, I essentially automatically dislike the Mets and the Yankees, which means I can never move to New York.)
There's no real conclusion, though. The Rockies probably think it's because they're good. But if you wait long enough, this sort of streak will happen to somebody. What one really wants to know is the number of such streaks that should have happened, historically; this should then be compared to the number of such streaks that actually did happen.
The problem with this is that all the quick and dirty analyses will assume that all teams win exactly half their games, so the probability of winning at least 21 out of 22 games is (22+1)/222, or about five in a million. But this isn't true. I suspect that the actual distribution of how good teams are is roughly symmetrical around .500, perhaps skewed a little bit to the left (there are more ways to be a bad team than to be a good team), so for every .300 team there should be a .700 team. The .300 team has probability (.3)21 (22(.7) + .3), or about one in six billion, of winning 21 out of 22; the .700 team has probability (.7)21 (22(.3) + .7), or about four in a thousand. (This is the figure the WSJ reports as "1 in 245".) So to really know how many long streaks are expected, we need to know the historic distribution of team abilities. If there are teams that are "really" .700 teams fairly often, we should expect that these streaks happen fairly regularly; if the best teams are usually only around, say, .600, probably not. And to know how likely the Rockies were to have this streak, we have to know how good they "really" are; are they the 76-72 team they were before the streak started, the 21-1 team they've been since, or somewhere in between? And would anybody have noticed if they had won 21 out of 22, say, starting in the middle of June?
And let's not even go into the fact that games aren't really independent; even if one doesn't believe that teams "get hot", they play in different parks, with different starting pitchers each day, and so on. As at least one commenter at the WSJ pointed out, streaks like this are more likely to happen in September than any other time, because in September you have teams (like the Rockies) that need to win games and teams that have just given up. The Rockies needed to win 14 out of 15 to end the regular season. (There was a point in late September when I thought I had the National League playoff situation all figured out... then I looked at the standings and thought "when the ^)%$!*)#$)^ did Colorado win ten in a row?")
My point is that I don't know the answer, either. I just am not happy that their streak happened to knock the Phillies out of the playoffs for the first time in fourteen years, and I hope that the Red Sox (who I started following in college) don't also become victims of it.
(By the way, my argument for not having more interleague play than there is is that right now, it's possible to be a fan of one National League team and one American League team and not have too many conflicts of interest. With more interleague play that would become less likely. Also, since I like the Phillies and the Red Sox, I essentially automatically dislike the Mets and the Yankees, which means I can never move to New York.)
23 October 2007
Gowers on examples
Tim Gowers wrote My favorite pedagogical principle: Examples first a few days ago. (The comments are worth reading too.) As an example, he gives an axiomatic definition of a field, and then compares this with defining a field by saying "look, you know about the rational numbers, the real numbers, and the complex numbers; fields are `like these', and now here's what we formally mean by that.) People have said most of what I'd want to say about this topic, but there's an interesting question here. It's been pointed out that it's often a good idea to read mathematics "out of order". But then why doesn't the writer write things out of order, since most people's natural impulse is to read things in the order they're written? Presumably the writer understands the material better than the reader, and therefore has a better idea of what order things should be presented in.
The spinning woman, part two
Remember the spinning woman that I wrote aboutI wrote about last week?
Supposedly, people who saw her spinning clockwise are more "right-brained" (in the pop-science sense of creative, able to see the big picture, and so on) and people who saw her spinning counterclockwise are more "left-brained".
According to the good folks at Freakonomics, if you use college major as a proxy for pop-culture brain-sidedness, it actually works the other way -- "left-brained" people are more likely to see her spinning clockwise. (Of course, this isn't scientific -- it's just Freakonomics readers, and the sample sizes are small.)
Steven Levitt writes:
That's definitely true. I've heard that another such "anti-predictor" is Punxsutawney Phil, the "official" groundhog of Groundhog Day. He gets yanked out of hibernation on February 2, and if he sees his shadow we're supposed to have "six more weeks of winter". (This is part of the utterly bizarre American tradition of Groundhog Day.) I read once that he's wrong something like 80% of the time, although I don't have a source for this. The Wikipedia article says that the actual error rate is something like 60% to 70%. Still, it seems like the groundhog is more often wrong than right.
Supposedly, people who saw her spinning clockwise are more "right-brained" (in the pop-science sense of creative, able to see the big picture, and so on) and people who saw her spinning counterclockwise are more "left-brained".
According to the good folks at Freakonomics, if you use college major as a proxy for pop-culture brain-sidedness, it actually works the other way -- "left-brained" people are more likely to see her spinning clockwise. (Of course, this isn't scientific -- it's just Freakonomics readers, and the sample sizes are small.)
Steven Levitt writes:
I often joke about how the information provided by someone who is incredibly terrible at predicting the future (i.e., they always get things wrong) is just as valuable as what you get from someone who is good at predicting the future. I used this strategy with some success by betting the opposite of my father whenever he’d bet a large sum of money on a football team that was sure to cover the spread.
That's definitely true. I've heard that another such "anti-predictor" is Punxsutawney Phil, the "official" groundhog of Groundhog Day. He gets yanked out of hibernation on February 2, and if he sees his shadow we're supposed to have "six more weeks of winter". (This is part of the utterly bizarre American tradition of Groundhog Day.) I read once that he's wrong something like 80% of the time, although I don't have a source for this. The Wikipedia article says that the actual error rate is something like 60% to 70%. Still, it seems like the groundhog is more often wrong than right.
Labels:
geometry,
graphics,
prediction
22 October 2007
Three posts in one!
1. Literary style by the numbers. Apparently it's reasonably possible to distinguish among authors by just plotting the "average words per sentence" and the "number of complex words"; authors appear to be remarkably consistent on both these measures when looking at book-length samples of text. I wonder if this holds true for mathematical authors; it would be hard to measure for work that includes a lot of mathematical notation.
2. Graph Theory with Applications, J. A. Bondy and U. S. R. Murty (1976) -- out of print for a while, full text online. (Scanned, so it takes a while to load; you can also download it a chapter at a time.) From Anarchaia. (The same authors just finished a graduate text, to be published by Springer; read a blurb about it. I like the sound of the blurb. The book apparently also has a blog but the blog is empty.) Incidentally, I find it interesting that Springer's taxonomy of subdisciplines of mathematics includes "Number Theory and Combinatorics"; usually number theory seems to be classed with algebra.
Despite being an undergraduate book, the book includes a list of unsolved problems, which I think is a good practice; even if the intended reader is not expected to be able to solve these problems, it gives the interested student an idea of the sort of things that real mathematicians are interested in. (Of course, in some fields it wouldn't be practical to put such a list in a textbook; it needs to be possible for students to understand the statements of the problems.)
3. A combinatorial proof of the Rogers-Ramanujan and Schur identities, by Cilanne Boulet and Igor Pak. I haven't read it. (I came across it while looking for some other stuff on partitions.) The Rogers-Ramanujan identity is that the number of partitions of an integer n into parts that are congruent to 1 or 4 modulo 5 is equal to the number of partitions of n such that adjacent parts differ by at least 2.
For example, partitions of 12 counted by the first description are
11 + 1, 9 + 1 + 1 + 1, 6 + 6, 6 + 4 + 1 + 1, 6 + 1 + 1 + 1 + 1 + 1 + 1, 4 + 4 + 4, 4 + 4 + 1 + 1 + 1 + 1, 4 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1, 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
and by the second description,
11 + 1, 10 + 2, 9 + 3, 8 + 4, 8 + 3 + 1, 7 + 5, 7 + 4 + 1, 6 + 4 + 2
and there are eight in each list.
However, don't ask me which ones match up with which ones in the bijection! It's by no means obvious. This paper grabbed my interest because I knew there was no "nice" bijective proof, and the title led me to expect such a nice bijection. The authors lament:
([23] is an in-preparation work of Pak, "Rogers-Ramanujan bijections are not geometric."; I'm kind of curious to see what it says.)
2. Graph Theory with Applications, J. A. Bondy and U. S. R. Murty (1976) -- out of print for a while, full text online. (Scanned, so it takes a while to load; you can also download it a chapter at a time.) From Anarchaia. (The same authors just finished a graduate text, to be published by Springer; read a blurb about it. I like the sound of the blurb. The book apparently also has a blog but the blog is empty.) Incidentally, I find it interesting that Springer's taxonomy of subdisciplines of mathematics includes "Number Theory and Combinatorics"; usually number theory seems to be classed with algebra.
Despite being an undergraduate book, the book includes a list of unsolved problems, which I think is a good practice; even if the intended reader is not expected to be able to solve these problems, it gives the interested student an idea of the sort of things that real mathematicians are interested in. (Of course, in some fields it wouldn't be practical to put such a list in a textbook; it needs to be possible for students to understand the statements of the problems.)
3. A combinatorial proof of the Rogers-Ramanujan and Schur identities, by Cilanne Boulet and Igor Pak. I haven't read it. (I came across it while looking for some other stuff on partitions.) The Rogers-Ramanujan identity is that the number of partitions of an integer n into parts that are congruent to 1 or 4 modulo 5 is equal to the number of partitions of n such that adjacent parts differ by at least 2.
For example, partitions of 12 counted by the first description are
11 + 1, 9 + 1 + 1 + 1, 6 + 6, 6 + 4 + 1 + 1, 6 + 1 + 1 + 1 + 1 + 1 + 1, 4 + 4 + 4, 4 + 4 + 1 + 1 + 1 + 1, 4 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1, 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
and by the second description,
11 + 1, 10 + 2, 9 + 3, 8 + 4, 8 + 3 + 1, 7 + 5, 7 + 4 + 1, 6 + 4 + 2
and there are eight in each list.
However, don't ask me which ones match up with which ones in the bijection! It's by no means obvious. This paper grabbed my interest because I knew there was no "nice" bijective proof, and the title led me to expect such a nice bijection. The authors lament:
Also, while our proof is mostly combinatorial it is by no means a direct bijection. The quest for a direct bijective proof is still under way, and as recently as this year Zeilberger lamented on the lack of such proof. The results in [23] seem to discourage any further work in this direction.
([23] is an in-preparation work of Pak, "Rogers-Ramanujan bijections are not geometric."; I'm kind of curious to see what it says.)
Labels:
Boulet,
combinatorics,
graph theory,
language,
Pak
Diplomats play dice
State Department Struggles To Oversee Private Army, from the Washington Post (Oct. 20)
That's a good idea. I wonder if anybody just told their employees "take different routes to work" and found that their people decided which route to take based on, say, the day of the week -- having more than one route offers some protection from ambush, but there's still a pattern involved, and eventually the would-be ambusher would figure out that if they come by a certain spot on a Tuesday morning they'll succeed.
(Of course, there are still only six possible routes. If one wanted to go really crazy, one could toss a die at each juncture to decide where to go... but then who would ever get to work?)
This reminds me of airport security people who are randomizing where they are in the airport in order to thwart terrorists.
(from Marginal Revolution.)
Marc Grossman, the U.S. ambassador to Turkey in the mid-1990s, recalled telling his staff to take their own security precautions. After losing embassy employees to attacks, he advised staffers to keep a six-sided die in their glove compartments; to thwart ambushes, they should assign a different route to work to each number, he said, and toss the die as they left home each morning.
That's a good idea. I wonder if anybody just told their employees "take different routes to work" and found that their people decided which route to take based on, say, the day of the week -- having more than one route offers some protection from ambush, but there's still a pattern involved, and eventually the would-be ambusher would figure out that if they come by a certain spot on a Tuesday morning they'll succeed.
(Of course, there are still only six possible routes. If one wanted to go really crazy, one could toss a die at each juncture to decide where to go... but then who would ever get to work?)
This reminds me of airport security people who are randomizing where they are in the airport in order to thwart terrorists.
(from Marginal Revolution.)
Mathematics ≠ notation Mathematics is not notation
A rule of mathematical style: it is acceptable to write "Therefore, foo is at most bar.", but it is not acceptable to write "Therefore, F ≤ B." (This is due to J. Michael Steele, who has a lot of good rants.)
Steele's logic is that mathematical symbols cannot function as the verb in a natural-language sentence, which makes sense; although mathematical expressions have their own internal logic, an entire expression seems to function as a single noun when embedded in an English-language expression. Thus we write something like "Therefore, we have F ≤ B." or "Therefore, the inequality F ≤ B holds." ifwe want to write things out in symbolic form.
One thing that bothers me about a lot of mathematicians (myself included) is the propensity for their spoken mathematics to be a mere symbol-by-symbol reading of the written mathematics; thus one hears things like, say, "pi one of X" instead of "the fundamental group of X". This isn't too much of a problem in that case, but then one gets to things like "H two of X" -- is that H2(X) (homology) or H2(X) (cohomology)? (People don't seem to go to the trouble of distinguishing superscripts from subscripts when they're talking, which is probably a good thing, lest speech be filled with "uppers" and "lowers".)
Sure, you can usually tell from context, if you're sufficiently experienced in a certain area. But people have a tendency to think that mathematics is the formulas, at least to the point where they speak the notations instead of the notions. (This is one of those fortunate puns that translates well from the original Latin: "non notationes, sed notiones".) Spoken mathematics is not written mathematics. Mathematics is not its notation.
This is probably a corollary of the often-stated fact that a page of mathematical text contains about as many ideas as, say, ten pages of "ordinary" text; thus on a per-page basis, mathematical reading should be ten times slower than literary reading. Unfortunately, a minute of mathematical speech contains many more ideas than a minute of "ordinary" text, but one cannot slow the speaker down.
Incidentally, I used to pronounce ≤ and ≥ as "less than or equal to" and "greater than or equal to"; now I pronounce them "at most" or "at least", because these are shorter. I also pronounce g o f as "g after f", which seems both shorter and less ambiguous than "g composed with f" -- in "g composed with f" the order of composition is conventional. "f before g" or "f, then g" is too confusing, because you read the g before you read the f.
Steele's logic is that mathematical symbols cannot function as the verb in a natural-language sentence, which makes sense; although mathematical expressions have their own internal logic, an entire expression seems to function as a single noun when embedded in an English-language expression. Thus we write something like "Therefore, we have F ≤ B." or "Therefore, the inequality F ≤ B holds." ifwe want to write things out in symbolic form.
One thing that bothers me about a lot of mathematicians (myself included) is the propensity for their spoken mathematics to be a mere symbol-by-symbol reading of the written mathematics; thus one hears things like, say, "pi one of X" instead of "the fundamental group of X". This isn't too much of a problem in that case, but then one gets to things like "H two of X" -- is that H2(X) (homology) or H2(X) (cohomology)? (People don't seem to go to the trouble of distinguishing superscripts from subscripts when they're talking, which is probably a good thing, lest speech be filled with "uppers" and "lowers".)
Sure, you can usually tell from context, if you're sufficiently experienced in a certain area. But people have a tendency to think that mathematics is the formulas, at least to the point where they speak the notations instead of the notions. (This is one of those fortunate puns that translates well from the original Latin: "non notationes, sed notiones".) Spoken mathematics is not written mathematics. Mathematics is not its notation.
This is probably a corollary of the often-stated fact that a page of mathematical text contains about as many ideas as, say, ten pages of "ordinary" text; thus on a per-page basis, mathematical reading should be ten times slower than literary reading. Unfortunately, a minute of mathematical speech contains many more ideas than a minute of "ordinary" text, but one cannot slow the speaker down.
Incidentally, I used to pronounce ≤ and ≥ as "less than or equal to" and "greater than or equal to"; now I pronounce them "at most" or "at least", because these are shorter. I also pronounce g o f as "g after f", which seems both shorter and less ambiguous than "g composed with f" -- in "g composed with f" the order of composition is conventional. "f before g" or "f, then g" is too confusing, because you read the g before you read the f.
21 October 2007
Restoration of Fermat's Lost Letter to "Last Theorem"
No, I didn't prove FLT. Sorry.
But has anyone read this proposed proof of Fermat's Last Theorem? I'm kind of curious... but not curious enough to shell out the $5. And it's not just the money; this doesn't seem like the sort of person I want to be giving $5 to. It's kind of strange to see crackpottery that you have to pay to see these days... usually now people just post it on poorly formatted web pages.
Similarly, I don't have cable TV, not because I want to keep the money, but because I don't like Comcast. But here I am using my Comcast Internet connection to say that I don't like Comcast. Hypocritical? Of course. But I like Verizon even less.
(From John Armstrong.)
But has anyone read this proposed proof of Fermat's Last Theorem? I'm kind of curious... but not curious enough to shell out the $5. And it's not just the money; this doesn't seem like the sort of person I want to be giving $5 to. It's kind of strange to see crackpottery that you have to pay to see these days... usually now people just post it on poorly formatted web pages.
Similarly, I don't have cable TV, not because I want to keep the money, but because I don't like Comcast. But here I am using my Comcast Internet connection to say that I don't like Comcast. Hypocritical? Of course. But I like Verizon even less.
(From John Armstrong.)
The game of "thermonuclear pennies"
The game of thermonuclear pennies. This is, not surprisingly, related to the game of nuclear pennies; the game of nuclear pennies is in turn related to the Seven trees in one theorem of Blass, which I wrote about back in July. (Very roughly speaking, the "number of trees" can be shown to be "the primitive sixth root of unity", but the seventh power of this number is itself! Objects of Categories as Complex Numbers, by Marcelo Fiore and Tom Leinster, legitimates this.1 The game of nuclear pennies works as follows; we start with a semi-infinite strip of squares, which have pennies on them. We can "split" a penny on site n into two pennies, one on site n-1 and one on site n+1; we can "fuse" pennies on sites n-1 and n+1 into one on site n. We can analyze positions in this game using arithmetic in the ring of integers with a sixth root of unity, ω, adjoined. Given a position with an coins in the nth position, associate the number
Σn an ωn
with this position; it turns out that two positions can be reached from each other if and only if their associated numbers are the same. (We have the relations ω3 = -1, ω - ω2 = 1.)
In thermonuclear pennies the splitting rule is a bit different; it turns out that "sixth root of unity" can be replaced with "fourth root of unity", which is the imaginary unit. Positions in this game represent Gaussian integers. And there's even a way to multiply the positions, which corresponds to multiplication of Gaussian integers.
Challenge (which may or may not be hard, I haven't thought about it all): develop such a penny-fusing game where the positions can be understood as elements of the ring of integers with a fifth root of unity adjoined. (I don't expect the result to be as nice; for example, Z[e2πi/5] isn't a lattice in the complex plane. I'm not sure why that's relevant, but it might be.)
1. Is "legitimate" a verb, meaning "to make legitimate"? (The stress in the verb would be on the last syllable.) I think it should be, if it's not already. I've heard "precise" used as a verb (I don't remember how it's pronounced), and "legitimate" strikes me as more phonetically akin to verbs than "precise" is, probably because of the -ate suffix.
Σn an ωn
with this position; it turns out that two positions can be reached from each other if and only if their associated numbers are the same. (We have the relations ω3 = -1, ω - ω2 = 1.)
In thermonuclear pennies the splitting rule is a bit different; it turns out that "sixth root of unity" can be replaced with "fourth root of unity", which is the imaginary unit. Positions in this game represent Gaussian integers. And there's even a way to multiply the positions, which corresponds to multiplication of Gaussian integers.
Challenge (which may or may not be hard, I haven't thought about it all): develop such a penny-fusing game where the positions can be understood as elements of the ring of integers with a fifth root of unity adjoined. (I don't expect the result to be as nice; for example, Z[e2πi/5] isn't a lattice in the complex plane. I'm not sure why that's relevant, but it might be.)
1. Is "legitimate" a verb, meaning "to make legitimate"? (The stress in the verb would be on the last syllable.) I think it should be, if it's not already. I've heard "precise" used as a verb (I don't remember how it's pronounced), and "legitimate" strikes me as more phonetically akin to verbs than "precise" is, probably because of the -ate suffix.
20 October 2007
Homophony part two
As at least two readers have pointed out, my claim that the triviality of the homophony group was part of the unwritten folklore is false. Take a look at the paper Quotients Homophones des Groupes Libres/Homophonic Quotients of Free Groups, by Mestre, Schoof, Washington, and Zagier. They use a lot of the same relations that I did in my proof; in particular damn = damn, damned = dammed, barred = bard, bass = base, jeans = genes, ruff = rough, phase = faze. (The last three of these seem to me to be the ones that most people will find.) Like my proof, v is the last generator to fall; they use chivvy = chivy or leitmotif = leitmotiv. I'm not sure how I feel about these; in both cases these seem like alternate spellings, not different words. veldt = felt, as suggested by a commenter on the earlier entry, feels "right" to me; these are quite clearly two different words with different meanings, veldt being a certain kind of open space in Africa, felt being the stuff out of which the tips of markers are made. (I think this is the one I came up with the first time I saw this problem.) They don't seem satisfied with their proofs for v, either; they introduce the space as a 27th generator, and use avowers = of ours to show the triviality of v.
The paper in question is bilingual; the proof that the English homophony group is trivial is given in French, and the proof that the French homophony group is trivial is given in English. I particularly like the acknowledgements:
A related problem is as follows: consider the group on twenty-six letters A, B, ..., Z with the relations that two words are equivalent if they are permutations of each other and each appears as a word in some dictionary of choice. Identify the center of the group. According to Steven E. Landsburg, "The Jimmy's Book", The American Mathematical Monthly, Vol. 93, No. 8 (Oct., 1986), pp. 636-638, much work was done on this problem at Chicago in the seventies.
To get started: post = pots = stop = opts = spot = tops. Since post = pots, s and t commute; since pots = opts, o and p commute. This is clearly much harder than the homophony group problem. By the way, elation = toenail. (I've been playing lots of Scrabble recently; that set of seven letters seems to come up a lot.)
The paper in question is bilingual; the proof that the English homophony group is trivial is given in French, and the proof that the French homophony group is trivial is given in English. I particularly like the acknowledgements:
The third and fourth authors were partially supported by NSF and (by the results of this paper) numerous other government agencies.
A related problem is as follows: consider the group on twenty-six letters A, B, ..., Z with the relations that two words are equivalent if they are permutations of each other and each appears as a word in some dictionary of choice. Identify the center of the group. According to Steven E. Landsburg, "The Jimmy's Book", The American Mathematical Monthly, Vol. 93, No. 8 (Oct., 1986), pp. 636-638, much work was done on this problem at Chicago in the seventies.
To get started: post = pots = stop = opts = spot = tops. Since post = pots, s and t commute; since pots = opts, o and p commute. This is clearly much harder than the homophony group problem. By the way, elation = toenail. (I've been playing lots of Scrabble recently; that set of seven letters seems to come up a lot.)
The homophony group
From Justin Roberts at UCSD, an old homework problem:
I was looking for something else, but I came across this problem, which I saw before in my first algebra class when I'd just learned what a group was; it's part of the folklore, apparently, and I feel like it's good to have the folklore written down. (It might be written down somewhere on the Internet under another name, but it's hard to find because "group" is such a common word in non-mathematical circumstances.)
I can't do it. I did it before, but I remember it took a lot of looking at lists of homophones; I can't find the particular lists of homophones I used years ago. Most of the relations are simpler than they look, because generally if two words in English are spelled the same they have a lot of letters in common. Here's a partial solution -- I prove that the English homophony group is a subgroup of the free group on two generators, namely m and v. (Or maybe three.)
First, we deal with the vowels. The first homophones that come to anyone's mind are probably to = too = two. From to = too we get o = 1. Thus to = two becomes t = tw, so w = 1. (Yes, I know, w isn't a vowel...) Next, consider the relations
rain = rein = reign, sign = sine, earn = urn.
From rain = rein we get a = e. From rain = reign we get ai = eig; but since a = e we can reduce this to i = ig, or g = 1. Thus we can eliminate the "g" in sign = sine to get sin = sine, so e = 1; recalling a = e, we have a = 1. Finally, from earn = urn we have ea = u; since e = a = 1 we have u = 1.
Another candidate for "most canonical homophone" is probably there = their (I'll ignore "they're" because I don't want to deal with apostrophes); we can clip the "the" in each of these to get re = ir. But e = 1, so this is r = ir, or i = 1. Next, i = eye, but i = e = 1, so this reduces to y = 1. (Does "i" count as a word? Of course I mean "I", the first-person singular pronoun. Roberts said "Scrabble rules apply", but Scrabble rules have never had to weigh in on whether "I" is playable. Scrabble rules out words which are "always capitalized" but the way in which "I" is always capitalized seems different from the way that a proper name is.)
At this point we've shown that the letters a, e, g, i, o, u, w, and y are trivial; this set of letters shouldn't be too surprising to anyone reasonable familiar with English spelling.
There are a lot of silent k's; we take no = know. Since w = o = 1, this reduces to n = kn, so k = 1. Similarly, we can leverage the silent h with wine = whine. lam = lamb gives us b = 1, and dam = damn gives n = 1.
Somewhere around here the problem seems to get harder. One key insight is that groups don't have nontrivial idempotents; if we can show that a generator is equal to its square, then it must be equal to 1. This gets us a few more letters: from bard = barred we have r = rr (recall e = 1), so r is trivial. Similarly, from medal = meddle we get dal = ddle; but a = e = 1, so dl = ddl; thus d = 1. This actually isn't how I take care of t, although there are enough double t's in the language that it's robably possible to handle t this way; I take packed = pact, from which ked = t; but k, e, d have already been shown trivial, so t = 1. A bit more trickily, thinking in terms of double letters, bawl = ball gives us w = l; but w = 1, so l = 1.
That's sixteen letters shown trivial so far: a, b, d, e, g, h, i, k, l, n, o, r, t, u, w, y. The remaining letters are c, f, j, m, p, q, s, v, x, and z. These seem trickier somehow... to my ear they have pretty well-defined sound. The easy letters to get rid of, you'll remember, are the vowels, which can basically sound however you want them to in English.
Still, a few more fall quickly. base = bass (the musical kind, not the fish), so s = 1; razor = raiser can now take advantage of the times when s sounds like z, since all the other letters involved are trivial, to give us s = z, so z = 1. The letters that are phonetically kind of k-like fall next; key = quay gives k = q, so q = 1, and choir = quire (the pair I'm proudest of finding, by the way) gives c = q, so c = 1.
Six letters to go: f, j, m, p, v, x.
lox = locks, which takes care of x. genes = jeans, killing j. phase = faze; only the p and f are nontrivial here, so p = f. There are two cheap tricks to handle this. The first is hiccup = hiccough, where all the letters involved are trivial except p, so p = 1; but are those separate words, or alternative spellings of the same word? The second is if = iff, so f = 1. (The problem here, of course, is that iff is a bit iffy as a word... is it really a word, or just an abbreviation for "if and only if"?) Two half-solutions add up to one whole solution, though, so I say that p and f are trivial.
That leaves two generators, m and v. If there are relations between them, they're trivializing ones; we're not going to get a relation like "mv = vm" (which would bring us down from the free group on two generators to the free abelian group on two generators), since that would require two homophonous words in English, both containing m and v, where in one the m comes first and in one the v comes first. I've tried to use the double-letter trick; the closest I can get is "clamor" and "clammer" to get rid of m, but these aren't really pronounced the same.
Any ideas on how to finish this up?
My vocabulary in other languages isn't rich enough to do this. But I agree with Roberts' judgement about French, Spanish, and Japanese; French is full of silent letters, even more so than English.
Edited, 12:24 pm: In the comments, dan suggests dammed = damned to show that m is trivial. And ruff = rough shows f is trivial, which gets around the objections I had before. I want to take advantage of the fact that the f in of is pronounced like v usually is, but I can't.
"B: (Fun for all the family, ages 6-66...)The homophony group (of English) is the group with 26 generators a,b, ..., z and one relation for every pair of English words which sound the same. Example: "knight = night" shows that k=1, "earn=urn" shows that ea=u, and so on. (Scrabble rules apply; no proper names or slang!) Prove that the group is trivial! (I would guess it's also trivial in French but probably non-trivial in Spanish and very non-trivial in Japanese!)."
I was looking for something else, but I came across this problem, which I saw before in my first algebra class when I'd just learned what a group was; it's part of the folklore, apparently, and I feel like it's good to have the folklore written down. (It might be written down somewhere on the Internet under another name, but it's hard to find because "group" is such a common word in non-mathematical circumstances.)
I can't do it. I did it before, but I remember it took a lot of looking at lists of homophones; I can't find the particular lists of homophones I used years ago. Most of the relations are simpler than they look, because generally if two words in English are spelled the same they have a lot of letters in common. Here's a partial solution -- I prove that the English homophony group is a subgroup of the free group on two generators, namely m and v. (Or maybe three.)
First, we deal with the vowels. The first homophones that come to anyone's mind are probably to = too = two. From to = too we get o = 1. Thus to = two becomes t = tw, so w = 1. (Yes, I know, w isn't a vowel...) Next, consider the relations
rain = rein = reign, sign = sine, earn = urn.
From rain = rein we get a = e. From rain = reign we get ai = eig; but since a = e we can reduce this to i = ig, or g = 1. Thus we can eliminate the "g" in sign = sine to get sin = sine, so e = 1; recalling a = e, we have a = 1. Finally, from earn = urn we have ea = u; since e = a = 1 we have u = 1.
Another candidate for "most canonical homophone" is probably there = their (I'll ignore "they're" because I don't want to deal with apostrophes); we can clip the "the" in each of these to get re = ir. But e = 1, so this is r = ir, or i = 1. Next, i = eye, but i = e = 1, so this reduces to y = 1. (Does "i" count as a word? Of course I mean "I", the first-person singular pronoun. Roberts said "Scrabble rules apply", but Scrabble rules have never had to weigh in on whether "I" is playable. Scrabble rules out words which are "always capitalized" but the way in which "I" is always capitalized seems different from the way that a proper name is.)
At this point we've shown that the letters a, e, g, i, o, u, w, and y are trivial; this set of letters shouldn't be too surprising to anyone reasonable familiar with English spelling.
There are a lot of silent k's; we take no = know. Since w = o = 1, this reduces to n = kn, so k = 1. Similarly, we can leverage the silent h with wine = whine. lam = lamb gives us b = 1, and dam = damn gives n = 1.
Somewhere around here the problem seems to get harder. One key insight is that groups don't have nontrivial idempotents; if we can show that a generator is equal to its square, then it must be equal to 1. This gets us a few more letters: from bard = barred we have r = rr (recall e = 1), so r is trivial. Similarly, from medal = meddle we get dal = ddle; but a = e = 1, so dl = ddl; thus d = 1. This actually isn't how I take care of t, although there are enough double t's in the language that it's robably possible to handle t this way; I take packed = pact, from which ked = t; but k, e, d have already been shown trivial, so t = 1. A bit more trickily, thinking in terms of double letters, bawl = ball gives us w = l; but w = 1, so l = 1.
That's sixteen letters shown trivial so far: a, b, d, e, g, h, i, k, l, n, o, r, t, u, w, y. The remaining letters are c, f, j, m, p, q, s, v, x, and z. These seem trickier somehow... to my ear they have pretty well-defined sound. The easy letters to get rid of, you'll remember, are the vowels, which can basically sound however you want them to in English.
Still, a few more fall quickly. base = bass (the musical kind, not the fish), so s = 1; razor = raiser can now take advantage of the times when s sounds like z, since all the other letters involved are trivial, to give us s = z, so z = 1. The letters that are phonetically kind of k-like fall next; key = quay gives k = q, so q = 1, and choir = quire (the pair I'm proudest of finding, by the way) gives c = q, so c = 1.
Six letters to go: f, j, m, p, v, x.
lox = locks, which takes care of x. genes = jeans, killing j. phase = faze; only the p and f are nontrivial here, so p = f. There are two cheap tricks to handle this. The first is hiccup = hiccough, where all the letters involved are trivial except p, so p = 1; but are those separate words, or alternative spellings of the same word? The second is if = iff, so f = 1. (The problem here, of course, is that iff is a bit iffy as a word... is it really a word, or just an abbreviation for "if and only if"?) Two half-solutions add up to one whole solution, though, so I say that p and f are trivial.
That leaves two generators, m and v. If there are relations between them, they're trivializing ones; we're not going to get a relation like "mv = vm" (which would bring us down from the free group on two generators to the free abelian group on two generators), since that would require two homophonous words in English, both containing m and v, where in one the m comes first and in one the v comes first. I've tried to use the double-letter trick; the closest I can get is "clamor" and "clammer" to get rid of m, but these aren't really pronounced the same.
Any ideas on how to finish this up?
My vocabulary in other languages isn't rich enough to do this. But I agree with Roberts' judgement about French, Spanish, and Japanese; French is full of silent letters, even more so than English.
Edited, 12:24 pm: In the comments, dan suggests dammed = damned to show that m is trivial. And ruff = rough shows f is trivial, which gets around the objections I had before. I want to take advantage of the fact that the f in of is pronounced like v usually is, but I can't.
19 October 2007
The chain of cities
From radical cartography -- a map of Europe with cities that are near each other (within 150 km) and with populations of more than 100,000 connected by lines. In some places (southern England, the Netherlands, Germany, northern Italy) the lines are very dense; in others they are not. The graph has one large connected component, but there are smaller components -- in Scotland, southern Italy, and the coasts of Spain -- which are separate from the main one.
I suspect you'd get similar-looking maps at smaller scales (that is, if you reduced both the population threshold and the distance threshold). I've heard that the number of cities of population at least N scales like 1/N. To get a similar-looking map with smaller cities you'd want the number of cities "near" each one to be the same. So, for example, say you looked at cities of size 25,000 or greater; there should be four times as many of those "near" a given point as cities of size 100,000 or greater. If we then shorten the distances over which we're willing to draw lines by a factor of two, then we should get the same average degree in the resulting graph. What I'm saying is that I suspect the graph of cities of size 25,000 or greater within 75 km of each other has similar qualitative features. (Of course, at some point you run into the problem of how to define distinct "cities", by which I mean centers of population; should distinct neighborhoods of the same legal municipality be counted differently?)
I'd be interested to see a similar map for the United States (since I'm more familiar with the settlement patterns in the U.S. than in Europe), but not interested enough to make it.
In particular, I see a lot of things that look like the fragments of triangular lattices, with the triangles involved being roughly equilateral. (This seems most visible in Poland.) Now, if you imagine cities as circles -- which isn't a horrible approximation, because you don't expect to see two cities too close to each other -- and you pack them in as tightly as possible on the plane, you should see exactly a hexagonal lattice. Of course, cities aren't all equivalent, and you can't just slide them around arbitrarily...
This makes me wonder -- are there any cities which have street plans that are triangular lattices?
I suspect you'd get similar-looking maps at smaller scales (that is, if you reduced both the population threshold and the distance threshold). I've heard that the number of cities of population at least N scales like 1/N. To get a similar-looking map with smaller cities you'd want the number of cities "near" each one to be the same. So, for example, say you looked at cities of size 25,000 or greater; there should be four times as many of those "near" a given point as cities of size 100,000 or greater. If we then shorten the distances over which we're willing to draw lines by a factor of two, then we should get the same average degree in the resulting graph. What I'm saying is that I suspect the graph of cities of size 25,000 or greater within 75 km of each other has similar qualitative features. (Of course, at some point you run into the problem of how to define distinct "cities", by which I mean centers of population; should distinct neighborhoods of the same legal municipality be counted differently?)
I'd be interested to see a similar map for the United States (since I'm more familiar with the settlement patterns in the U.S. than in Europe), but not interested enough to make it.
In particular, I see a lot of things that look like the fragments of triangular lattices, with the triangles involved being roughly equilateral. (This seems most visible in Poland.) Now, if you imagine cities as circles -- which isn't a horrible approximation, because you don't expect to see two cities too close to each other -- and you pack them in as tightly as possible on the plane, you should see exactly a hexagonal lattice. Of course, cities aren't all equivalent, and you can't just slide them around arbitrarily...
This makes me wonder -- are there any cities which have street plans that are triangular lattices?
Fibonacci tattoo!
From Pink Haired Girl, who I apparently share several mutual friends with: a tattoo to generate the Fibonacci sequence, in Scheme.
There are faster algorithms to generated the Fibonacci sequence, but they obscure the recursive definition; her tattoo follows the simple recursive definition Fn = Fn-1 + Fn-2, which takes O(n) arithmetic operations to compute Fn. Wikipedia's article on Fibonacci numbers gives an identity
F2n+k = Fk Fn+12 + 2Fk-1Fn+1Fn + Fk-2Fn2
Now, note that we get F-2 = -1 , F-1 = 1 , F0 = 0 by running the recursion backwards. Letting k=1 gives
F2n+1 = (2Fn+1 - Fn) Fn
and letting k=2 gives
F2n+2 = Fn+12 + 2Fn+1Fn.
Letting m+1 = n, we get
F2m = Fm2 + 2FmFm-1
= Fm(Fm + 2Fm-1)
= Fm(Fm+1 + Fm-1).
Thus we can express Fn in terms of Fibonacci numbers of indices around n/2, and thus find Fibonacci numbers in logarithmic time... but that obscures the central fact about these numbers, which is the recursion that produces them. (There are probably other ways to arrange those identities that give more efficient algorithms than the one I'm alluding to, but I didn't try that hard.) There are also identities for Fn in terms of Fibonacci numbers of index around n/3, for example -- see Wikipedia -- which could be even faster.
There are faster algorithms to generated the Fibonacci sequence, but they obscure the recursive definition; her tattoo follows the simple recursive definition Fn = Fn-1 + Fn-2, which takes O(n) arithmetic operations to compute Fn. Wikipedia's article on Fibonacci numbers gives an identity
F2n+k = Fk Fn+12 + 2Fk-1Fn+1Fn + Fk-2Fn2
Now, note that we get F-2 = -1 , F-1 = 1 , F0 = 0 by running the recursion backwards. Letting k=1 gives
F2n+1 = (2Fn+1 - Fn) Fn
and letting k=2 gives
F2n+2 = Fn+12 + 2Fn+1Fn.
Letting m+1 = n, we get
F2m = Fm2 + 2FmFm-1
= Fm(Fm + 2Fm-1)
= Fm(Fm+1 + Fm-1).
Thus we can express Fn in terms of Fibonacci numbers of indices around n/2, and thus find Fibonacci numbers in logarithmic time... but that obscures the central fact about these numbers, which is the recursion that produces them. (There are probably other ways to arrange those identities that give more efficient algorithms than the one I'm alluding to, but I didn't try that hard.) There are also identities for Fn in terms of Fibonacci numbers of index around n/3, for example -- see Wikipedia -- which could be even faster.
The axiom of choice and the wisdom of crowds
(The two things in the title aren't related, but this post is about both of them.)
I just came across this essay on the Axiom of Choice by Eric Schecter. I'm not a logician, so I don't have any deep commentary. It contains two amusing quotes, though, one by Bertrand Russell:
(the idea here is that you can distinguish between left shoes and right shoes, but not left and right socks), and
where all three of those statements are equivalent.
A classmate of mine, though, just expressed the opinion that logicians "should" be showing that certain problems are "too hard" for current mathematics, by, say, showing that they're equivalent to problems we already know we have no hope of solving with the current state of affairs.
I wonder to what extent it would be possible to use the "wisdom of crowds" to quantify how difficult a given problem is. If one could poll people in a given area and have them say "I think Problem X is harder than Problem Y", you could probably use that information to rank open problems within a given subfield; people's intuitions wouldn't be perfect but if you had enough of them you'd come up with some interesting results. The hard part would be collating the results from various parts of mathematics; if I know Algebra Problem X is harder than Algebra Problem Y, and Analysis Problem Z is harder than Analysis Problem W, then how do you compare X and Z? You'd need links between the various areas to know how to put all that information together, and to get all areas on the same scale. Perhaps what we need is more people offering financial bounties for problems, because hard currency is a scale that everyone can agree on the value of. (Too bad Erdos is dead, and the Clay Foundation only has million-dollar prizes. If some entity were giving out a prize of $1000 for problem A and $2000 for problem B, then that would let me know that the community as a whole considers problem B to be harder than problem A. i don't know about "twice as hard", though; what does that mean?)
The usual "prediction market" methods wouldn't immediately seem to work; people could bet on which problems they expect will be solved first, but just because a problem is solved later doesn't mean it's harder.
Some other interesting things on Schechter's web page include Common Errors in College Math (which I probably should point my students to, come to think of it) and an explicit statement of the "cubic formula", i. e. a solution by radicals to the equation ax3 + bx2 + cx + d. (This formula is a couple lines long significantly longer than the quadratic formula, of course, which is why you don't have it memorized. The "quartic formula" is, if I remember correctly, a couple pages, although it contains a large number of the same subexpressions; in both cases there is a shorter way to write down the formula than the actual formula itself.)
I just came across this essay on the Axiom of Choice by Eric Schecter. I'm not a logician, so I don't have any deep commentary. It contains two amusing quotes, though, one by Bertrand Russell:
"To choose one sock from each of infinitely many pairs of socks requires the Axiom of Choice, but for shoes the Axiom is not needed."
(the idea here is that you can distinguish between left shoes and right shoes, but not left and right socks), and
The Axiom of Choice is obviously true; the Well Ordering Principle is obviously false; and who can tell about Zorn's Lemma?
where all three of those statements are equivalent.
A classmate of mine, though, just expressed the opinion that logicians "should" be showing that certain problems are "too hard" for current mathematics, by, say, showing that they're equivalent to problems we already know we have no hope of solving with the current state of affairs.
I wonder to what extent it would be possible to use the "wisdom of crowds" to quantify how difficult a given problem is. If one could poll people in a given area and have them say "I think Problem X is harder than Problem Y", you could probably use that information to rank open problems within a given subfield; people's intuitions wouldn't be perfect but if you had enough of them you'd come up with some interesting results. The hard part would be collating the results from various parts of mathematics; if I know Algebra Problem X is harder than Algebra Problem Y, and Analysis Problem Z is harder than Analysis Problem W, then how do you compare X and Z? You'd need links between the various areas to know how to put all that information together, and to get all areas on the same scale. Perhaps what we need is more people offering financial bounties for problems, because hard currency is a scale that everyone can agree on the value of. (Too bad Erdos is dead, and the Clay Foundation only has million-dollar prizes. If some entity were giving out a prize of $1000 for problem A and $2000 for problem B, then that would let me know that the community as a whole considers problem B to be harder than problem A. i don't know about "twice as hard", though; what does that mean?)
The usual "prediction market" methods wouldn't immediately seem to work; people could bet on which problems they expect will be solved first, but just because a problem is solved later doesn't mean it's harder.
Some other interesting things on Schechter's web page include Common Errors in College Math (which I probably should point my students to, come to think of it) and an explicit statement of the "cubic formula", i. e. a solution by radicals to the equation ax3 + bx2 + cx + d. (This formula is a couple lines long significantly longer than the quadratic formula, of course, which is why you don't have it memorized. The "quartic formula" is, if I remember correctly, a couple pages, although it contains a large number of the same subexpressions; in both cases there is a shorter way to write down the formula than the actual formula itself.)
18 October 2007
will the best team win?
Will the Best Team Win? Maybe -- by Alan Schwarz, in the October 14 New York Times.
Basically, there's a very good chance that the best team does not win in the Major League Baseball playoffs.
On a related note, often before a playoff series sportswriters will make predictions of the form "[team] in [number of games]". In a best-of-(2n-1) series, if we know the probability p that a team (let's call them the Phillies) wins a game against some other team (as I've done before, let's call them the Qankees), then we can compute the probability that they'll win the series. Let q = 1 - p be the probability that the Qankees win a single game. Then the probability that the Phillies win in k games, where k is between n and 2n-1, can be obtained as follows. There are
arrangements of wins and losses in a k-game series that allow the Phillies to win in k games -- they must win n-1 our of the first k-1 games. Each of these occurs with probability pnqk-n. So the probability that the Phillies win in k games is

and likewise the probability that the Qankees win in k games is
Q_{n,k}(p) = {k-1 \choose n-1} (1-p)^n p^{k-n}

(The number n is the number of games needed to win the series; in a best-of-seven series, n = 4.) For example, P4,6(.6) is the probability that the Phillies win a best-of-seven series in six games, given that they have a .6 probabilty of winning each games; it's
.
A prediction of the winner of a series and the number of games they win in amounts to a prediction of p. If we assume that the predictor simply predicts the most likely outcome of the series given what they believe p to be, then we want to find the largest of
.
To do this, we start by finding the ratio
; this is k(1-p)/(k-n). If this is greater than 1, it means a win in k+1 games is more likely than a win in n games; it becomes greater than 1 at p=(n-1)/k. So, for example, as we decrease p, a five-game win in a best-of-seven series becomes more likely than a sweep when p = 3/4 (we have n=3 and k=4); a six-game win in a best-of-seven series becomes more likely than a five-game win when p = 3/5; a seven-game win becomes more likely than a six-game win when p = 3/6 = 1/2.
And in general, as we decrease p, a win in (2n-1) games becomes more likely than a win in (2n-2) games when p = (n-1)/(2n-2) = 1/2.
But when p dips below 1/2, that's also when losses should become more likely than wins!
In particular, the ratio between the Phillies' probability of winning in 2n-1 games and that of losing in 2n-1 games is Pn,2n-1(p)/Qn,2n-1(p) = p/(1-p); if p < 1-p then winning in 2n-1 games can't be the most common outcome. At best, winning in 2n-1 games is as probable as winning in 2n-2 games... when p = 1/2, and at that moment losing in 2n-1 and losing in 2n-2 have the same probability.
Concretely, in a best-of-seven series you should predict that the Phillies:
If p = 1/2, then the probability of a win in six, win in seven, loss in six, or loss in seven are all the same, 5/32 each.
The point here is that either type of seven-game series is never the sole most likely outcome in this model (although it may be in reality, because games aren't independent -- home-field advantage, who's starting that day, and so on enter into the picture), and that it almost never makes sense to predict a sweep (playoff teams will be evenly enough matched that the worse one should be able to beat the better one more than one-quarter of the time).
Yet four- and seven-game series happen. I'm not saying that these are ridiculously rare events, just that it doesn't make sense to predict them a priori. It's a bit surprising, though -- if you actually played all seven games, 4-3 would be the most common outcome for series than are nearly evenly matched -- but enough of those come from the team already down 4-2 winning the last game that you don't see that in the best of seven format.
Realistically, though, a prediction of "[team] in 7" is just a sportswriter's way of signaling "I think this team is slightly better than its opponent", which is all it should be taken as.
Basically, there's a very good chance that the best team does not win in the Major League Baseball playoffs.
On a related note, often before a playoff series sportswriters will make predictions of the form "[team] in [number of games]". In a best-of-(2n-1) series, if we know the probability p that a team (let's call them the Phillies) wins a game against some other team (as I've done before, let's call them the Qankees), then we can compute the probability that they'll win the series. Let q = 1 - p be the probability that the Qankees win a single game. Then the probability that the Phillies win in k games, where k is between n and 2n-1, can be obtained as follows. There are
and likewise the probability that the Qankees win in k games is
Q_{n,k}(p) = {k-1 \choose n-1} (1-p)^n p^{k-n}
(The number n is the number of games needed to win the series; in a best-of-seven series, n = 4.) For example, P4,6(.6) is the probability that the Phillies win a best-of-seven series in six games, given that they have a .6 probabilty of winning each games; it's
A prediction of the winner of a series and the number of games they win in amounts to a prediction of p. If we assume that the predictor simply predicts the most likely outcome of the series given what they believe p to be, then we want to find the largest of
To do this, we start by finding the ratio
And in general, as we decrease p, a win in (2n-1) games becomes more likely than a win in (2n-2) games when p = (n-1)/(2n-2) = 1/2.
But when p dips below 1/2, that's also when losses should become more likely than wins!
In particular, the ratio between the Phillies' probability of winning in 2n-1 games and that of losing in 2n-1 games is Pn,2n-1(p)/Qn,2n-1(p) = p/(1-p); if p < 1-p then winning in 2n-1 games can't be the most common outcome. At best, winning in 2n-1 games is as probable as winning in 2n-2 games... when p = 1/2, and at that moment losing in 2n-1 and losing in 2n-2 have the same probability.
Concretely, in a best-of-seven series you should predict that the Phillies:
- win in four, if p > 3/4;
- win in five, if 3/5 < p < 3/4;
- win in six, if 1/2 < p < 3/5;
- lose in six, five, or four in cases symmetric to the three above.
If p = 1/2, then the probability of a win in six, win in seven, loss in six, or loss in seven are all the same, 5/32 each.
The point here is that either type of seven-game series is never the sole most likely outcome in this model (although it may be in reality, because games aren't independent -- home-field advantage, who's starting that day, and so on enter into the picture), and that it almost never makes sense to predict a sweep (playoff teams will be evenly enough matched that the worse one should be able to beat the better one more than one-quarter of the time).
Yet four- and seven-game series happen. I'm not saying that these are ridiculously rare events, just that it doesn't make sense to predict them a priori. It's a bit surprising, though -- if you actually played all seven games, 4-3 would be the most common outcome for series than are nearly evenly matched -- but enough of those come from the team already down 4-2 winning the last game that you don't see that in the best of seven format.
Realistically, though, a prediction of "[team] in 7" is just a sportswriter's way of signaling "I think this team is slightly better than its opponent", which is all it should be taken as.
Labels:
baseball,
journalism,
probability
Un-Austria, and some thoughts on population density
Strange Maps brings us a map of Un-Austria, originally due to Babak Fakhamzadeh. The purpose of this map is to illustrate certain statistics about the Austrian people by associating those groups of people with parts of a map; seven percent of Austrians are "not proud of their nationality" and so an area roughly one-fourteenth of the size of Austria is marked as such.
It's an interesting concept, but I see problems with it. First, it gives the impression that people who are in one of the marked groups aren't in any of the others; this isn't true. (In the most extreme case, shouldn't "members of voluntary environmental organizations" be contained entirely within "members of voluntary organizations"?) Second, although I have no concept of the geography of Austria, I'm sure there are parts of it that are more populated than other parts, and I wonder if the map creates false impressions for those people who actually do live in Austria. (For an American example, New Jersey makes up one part in 434 of the land of the United States. Yet one in thirty-four Americans live in New Jersey. I am aware of this, so if someone associated the land making up New Jersey with some one-in-434 subset of Americans, I might get confused. On the other extreme, Carbon County, Wyoming has about the same area, but only about one in twenty thousand Americans lives there. My point is that most people living in a given country probably have some rough idea of how population density in the country is spread out; I suspect that the way they would "intuitively" interpret a map such as this one takes this into account, but not perfectly. (If I had to guess, the population we intuitively perceive to be in a region of a map such as this one is proportional to the integral of the square root of population density, although I just made that up; it somehow seems appealing that we would perceive an area that's actually four times as dense as another to be two times as dense, because that seems to point to some sort of idea of "linear density". Please ignore the fact that the units here don't work.)
It's an interesting concept, but I see problems with it. First, it gives the impression that people who are in one of the marked groups aren't in any of the others; this isn't true. (In the most extreme case, shouldn't "members of voluntary environmental organizations" be contained entirely within "members of voluntary organizations"?) Second, although I have no concept of the geography of Austria, I'm sure there are parts of it that are more populated than other parts, and I wonder if the map creates false impressions for those people who actually do live in Austria. (For an American example, New Jersey makes up one part in 434 of the land of the United States. Yet one in thirty-four Americans live in New Jersey. I am aware of this, so if someone associated the land making up New Jersey with some one-in-434 subset of Americans, I might get confused. On the other extreme, Carbon County, Wyoming has about the same area, but only about one in twenty thousand Americans lives there. My point is that most people living in a given country probably have some rough idea of how population density in the country is spread out; I suspect that the way they would "intuitively" interpret a map such as this one takes this into account, but not perfectly. (If I had to guess, the population we intuitively perceive to be in a region of a map such as this one is proportional to the integral of the square root of population density, although I just made that up; it somehow seems appealing that we would perceive an area that's actually four times as dense as another to be two times as dense, because that seems to point to some sort of idea of "linear density". Please ignore the fact that the units here don't work.)
17 October 2007
You are in a maze of twisty little equations, all alike.
A long calculation is like playing a computer game.
This is actually somewhat true, for the sort of computer games where you realize that you've gone down the wrong path and have to backtrack. In both cases you're trying to find your way through a maze of twisty little passages, all alike, from the beginning of the maze (the beginning of the game, or the state of having no knowledge) to the end of the maze (the end of the game, or the result of the computation). There are in both cases a small number of ways to succeed (to successfully reach the end of the game or the result of the computation), but usually more than one. (In the case of computation, "how many ways" there are to do a particular computation depends very strongly on what sorts of computations you consider to be "the same".) But there are a much larger number of ways to fail; the problem is to find a path to "success" in some reasonable amount of time.
Neither breadth-first search (where one does all the possible one-step computations, then all the possible two-step computations, and so on) nor depth-first search (where one follows a computation to its logical end, and only then tries something else) are really practical in most cases; it feels like a lot of learning "how to do mathematics" is learning when to give up on one particular approach to a problem and try something completely different. And a large part of that is to get rid of the qualifier "all alike" in my title. If equations look "all alike", one has no way of knowing what to do; mathematical education is as much about developing a facility for knowing what sort of mathematical expressions are amenable to what sort of tools as it is about learning how to use those tools.
This is actually somewhat true, for the sort of computer games where you realize that you've gone down the wrong path and have to backtrack. In both cases you're trying to find your way through a maze of twisty little passages, all alike, from the beginning of the maze (the beginning of the game, or the state of having no knowledge) to the end of the maze (the end of the game, or the result of the computation). There are in both cases a small number of ways to succeed (to successfully reach the end of the game or the result of the computation), but usually more than one. (In the case of computation, "how many ways" there are to do a particular computation depends very strongly on what sorts of computations you consider to be "the same".) But there are a much larger number of ways to fail; the problem is to find a path to "success" in some reasonable amount of time.
Neither breadth-first search (where one does all the possible one-step computations, then all the possible two-step computations, and so on) nor depth-first search (where one follows a computation to its logical end, and only then tries something else) are really practical in most cases; it feels like a lot of learning "how to do mathematics" is learning when to give up on one particular approach to a problem and try something completely different. And a large part of that is to get rid of the qualifier "all alike" in my title. If equations look "all alike", one has no way of knowing what to do; mathematical education is as much about developing a facility for knowing what sort of mathematical expressions are amenable to what sort of tools as it is about learning how to use those tools.
Twin primes
Are There Infinitely Many Primes, by D. A. Goldston, via Casting Out Nines. This was based on a talk given to motivated high school students, so you don't need much background to understand it.
Since this is at least nominally a probability blog, I draw your attention to the following: Call p and p+2 a twin prime pair if they're both prime. Then it's conjectured that the number of twin prime pairs with smaller member less than or equal to x, denoted π2(x), is asymptotically
![2 \left[ \prod_{p > 2} \left( 1 - {1 \over (p-1)^2} \right) \right] \int_2^x {1 \over (\log t)^2} \; dt](http://snappy.at.org/~cola/tex2img/image.php?id=83b2e50663edc23998da40ad930f93f4)
and there is probabilistic reasoning that leads to this, which is a version of the Twin Prime Conjecture; basically, the "probability" that a number near t is prime is 1/(log t), and so the probability that two numbers near t are both prime should be the square of this. So the number of twin prime pairs under x should be just that integral, except that we have to take some divisiblity information into account, which is what the product out front does. If a number n is odd, then n+2 is twice as likely to be prime as it would ``otherwise" be; if n is not divisible by some odd prime p, then n+2 has ``probability" (p-2)/(p-1) of being not divisible by p (we must have that n isn't two less than a multiple of p), which is 1 - 1/(p-1)2 times the "naive" probability that n+2 isn't divisible by p, namely (p-1)/p.
I rather like results of this form, where we can guess properties of the primes by assuming that the probability that a given integer is divisible by a prime p is 1/p and these events are independent for different primes p; I've talked about this before, when I sketched a heuristic argument for the Goldbach conjecture. Sure,.the primes have no business behaving randomly. But what reason is there that they should behave nonrandomly?
Since this is at least nominally a probability blog, I draw your attention to the following: Call p and p+2 a twin prime pair if they're both prime. Then it's conjectured that the number of twin prime pairs with smaller member less than or equal to x, denoted π2(x), is asymptotically
and there is probabilistic reasoning that leads to this, which is a version of the Twin Prime Conjecture; basically, the "probability" that a number near t is prime is 1/(log t), and so the probability that two numbers near t are both prime should be the square of this. So the number of twin prime pairs under x should be just that integral, except that we have to take some divisiblity information into account, which is what the product out front does. If a number n is odd, then n+2 is twice as likely to be prime as it would ``otherwise" be; if n is not divisible by some odd prime p, then n+2 has ``probability" (p-2)/(p-1) of being not divisible by p (we must have that n isn't two less than a multiple of p), which is 1 - 1/(p-1)2 times the "naive" probability that n+2 isn't divisible by p, namely (p-1)/p.
I rather like results of this form, where we can guess properties of the primes by assuming that the probability that a given integer is divisible by a prime p is 1/p and these events are independent for different primes p; I've talked about this before, when I sketched a heuristic argument for the Goldbach conjecture. Sure,.the primes have no business behaving randomly. But what reason is there that they should behave nonrandomly?
Labels:
number theory,
primes,
probabilistic number theory
Extraordinary claims require extraordinary evidence
Here's a book that's available online that I learned about recently: Information Theory, Inference, and Learning Algorithms by David MacKay. I learned about it from Steve at Information Processing; he learned about it from Nerd Wisdom.
I rather like the trend of books being available online. For one thing, it means that I do not have to carry books back and forth between my office and my apartment. (This is unfortunately not true for the calculus book I'm teaching from, which is about 1200 pages; fortunately my officemate has a copy of the book he doesn't use, so I use his copy when I'm on campus and my copy when I'm at home.)
This book seems to be a good introduction to information theory, machine learning, Bayesian inference, etc.; I have not had the chance to read any part of it thoroughly but I have randomly sampled from it and it seems quite interesting.
A few days ago I wrote about the question of finding the next element in the sequence "1, 2, 5, ..." and some of my readers complained that this problem is not well-posed. MacKay, in his chapter on Occam's razor, gives a similar example: what's the next number in the sequence "-1, 3, 7, 11"? You probably say 15. Consider two possible theories -- the sequence is an arithmetic sequence, or it is given by a cubic function of the form cx3 + dx2 + e, where c, d, and e are rational numbers. (The omission of the linear term is deliberate; otherwise the first theory would be a case of the second one.) The first one turns out to be much more likely, for the simple reason that there are less parameters to be tweaked! Let's say it's equally likely that the underlying function is linear or cubic; there are just a lot more possible cubic functions, so each particular cubic function is less likely. (For the details, see p. 345 of MacKay.)
By logic such as this, the most likely next element in that sequence is difficult to say, though... should we prefer 10, because then the nth term is given by the simple explicit formula (n-1)2+1? Or should we prefer 12 or 13, which are both given by simple linear recurrences? My instinct is that it depends on where the problem comes from, since the various possible next terms arise from different sorts of combinatorial structures, and in this sense the problem was ill-posed. In reality we wouldn't start out by assuming that all possible theories have equal probability; for one thing, there are infinitely many of them! The "simple" theories (the sequence has an explicit formula which is a polynomial, or a linear recursion with constant coefficients, or something like that) have higher prior probabilities... but given enough evidence, any complicated theory that starts out with nonzero probability of being true could turn out to be true. Extraordinary claims, though -- those with low prior probability -- require extraordinary evidence. There's a nice toy example a little further on in MacKay (p. 351) showing that if you see a piece of a box sticking out to the left of a tree, and a piece of a box of the same height and color sticking out to the right, it is nearly certain that those are actually pieces of the same box, and not two different boxes.
(What I've sketched in the previous paragraph is a bit of a lie, though; mathematical reasoning is rarely anywhere near this fuzzy. One could almost argue that it never is, because if we are so unsure about whether a result is true or not then we just don't call it mathematics.)
PS: I realized that I had a bunch of things I wanted to write about that were kind of piling up, so I might be digging into stuff that made its way through the blogosphere a month or so ago. I would feel no need to apologize for this except that blogs move so quickly.
PPS: The title of this post is due to Carl Sagan. I did not realize that when I titled the post, though, but I knew I'd heard the phrase somewhere before and Google tells me it's his.
I rather like the trend of books being available online. For one thing, it means that I do not have to carry books back and forth between my office and my apartment. (This is unfortunately not true for the calculus book I'm teaching from, which is about 1200 pages; fortunately my officemate has a copy of the book he doesn't use, so I use his copy when I'm on campus and my copy when I'm at home.)
This book seems to be a good introduction to information theory, machine learning, Bayesian inference, etc.; I have not had the chance to read any part of it thoroughly but I have randomly sampled from it and it seems quite interesting.
A few days ago I wrote about the question of finding the next element in the sequence "1, 2, 5, ..." and some of my readers complained that this problem is not well-posed. MacKay, in his chapter on Occam's razor, gives a similar example: what's the next number in the sequence "-1, 3, 7, 11"? You probably say 15. Consider two possible theories -- the sequence is an arithmetic sequence, or it is given by a cubic function of the form cx3 + dx2 + e, where c, d, and e are rational numbers. (The omission of the linear term is deliberate; otherwise the first theory would be a case of the second one.) The first one turns out to be much more likely, for the simple reason that there are less parameters to be tweaked! Let's say it's equally likely that the underlying function is linear or cubic; there are just a lot more possible cubic functions, so each particular cubic function is less likely. (For the details, see p. 345 of MacKay.)
By logic such as this, the most likely next element in that sequence is difficult to say, though... should we prefer 10, because then the nth term is given by the simple explicit formula (n-1)2+1? Or should we prefer 12 or 13, which are both given by simple linear recurrences? My instinct is that it depends on where the problem comes from, since the various possible next terms arise from different sorts of combinatorial structures, and in this sense the problem was ill-posed. In reality we wouldn't start out by assuming that all possible theories have equal probability; for one thing, there are infinitely many of them! The "simple" theories (the sequence has an explicit formula which is a polynomial, or a linear recursion with constant coefficients, or something like that) have higher prior probabilities... but given enough evidence, any complicated theory that starts out with nonzero probability of being true could turn out to be true. Extraordinary claims, though -- those with low prior probability -- require extraordinary evidence. There's a nice toy example a little further on in MacKay (p. 351) showing that if you see a piece of a box sticking out to the left of a tree, and a piece of a box of the same height and color sticking out to the right, it is nearly certain that those are actually pieces of the same box, and not two different boxes.
(What I've sketched in the previous paragraph is a bit of a lie, though; mathematical reasoning is rarely anywhere near this fuzzy. One could almost argue that it never is, because if we are so unsure about whether a result is true or not then we just don't call it mathematics.)
PS: I realized that I had a bunch of things I wanted to write about that were kind of piling up, so I might be digging into stuff that made its way through the blogosphere a month or so ago. I would feel no need to apologize for this except that blogs move so quickly.
PPS: The title of this post is due to Carl Sagan. I did not realize that when I titled the post, though, but I knew I'd heard the phrase somewhere before and Google tells me it's his.
Labels:
information theory,
MacKay,
probability
16 October 2007
Polynomials with interesting sets of zeros
Polynomials with interesting sets of zeros, from Richard Stanley.
There are certain polynomials which are naturally defined in terms of combinatorial objects, which have interesting sets of zeros. For example, consider the polynomials
F1(x) = x
F2(x) = x + x2
F3(x) = x + x2 + x3
F4(x) = x + 2x2 + x3 + x4
F5(x) = x + 2x2 + 2x3 + x4 + x5
where the xk coefficient of Fn(x) is the number of partitions of n into k parts, that is, the number of ways to write n as a sum of k integers where order doesn't matter. For example, the partitions of 5 are
5, 4+1, 3+2, 3+1+1, 2+2+1, 2+1+1+1, 1+1+1+1+1
which have 1, 2, 2, 3, 3, 4, and 5 parts respectively; that's where F5 comes from. It turns out that if you take Fn for large n and plot its zeros in the complex plane, you get a nice-looking figure; see this PDF file. This is true of other combinatorially defined families of polynomials as well. The zeroes of this particular set of polynomials are given in some recent work of Robert Boyer and William Goh. (Boyer gave a talk today at Temple which I went to, hence why this is on my mind lately; I'd actually heard about this phenomenon when I took a class from Stanley as an undergrad.)
At least in the case of the partitions, it seems noteworthy that if we sum yn Fn(x) over all n, getting the generating function of all partitions by size and number of parts, we have something nice, namely

and a lot of the other sets of polynomials described by Stanley can probably be described similarly. (Boyer and Goh allude to this at the end of their expository paper.) I don't know enough analytic number theory to say something intelligent about this, though; the proof that the pictures keep looking "like that" (in the case of partition) apparently uses some very heavy machinery.
There are certain polynomials which are naturally defined in terms of combinatorial objects, which have interesting sets of zeros. For example, consider the polynomials
F1(x) = x
F2(x) = x + x2
F3(x) = x + x2 + x3
F4(x) = x + 2x2 + x3 + x4
F5(x) = x + 2x2 + 2x3 + x4 + x5
where the xk coefficient of Fn(x) is the number of partitions of n into k parts, that is, the number of ways to write n as a sum of k integers where order doesn't matter. For example, the partitions of 5 are
5, 4+1, 3+2, 3+1+1, 2+2+1, 2+1+1+1, 1+1+1+1+1
which have 1, 2, 2, 3, 3, 4, and 5 parts respectively; that's where F5 comes from. It turns out that if you take Fn for large n and plot its zeros in the complex plane, you get a nice-looking figure; see this PDF file. This is true of other combinatorially defined families of polynomials as well. The zeroes of this particular set of polynomials are given in some recent work of Robert Boyer and William Goh. (Boyer gave a talk today at Temple which I went to, hence why this is on my mind lately; I'd actually heard about this phenomenon when I took a class from Stanley as an undergrad.)
At least in the case of the partitions, it seems noteworthy that if we sum yn Fn(x) over all n, getting the generating function of all partitions by size and number of parts, we have something nice, namely
and a lot of the other sets of polynomials described by Stanley can probably be described similarly. (Boyer and Goh allude to this at the end of their expository paper.) I don't know enough analytic number theory to say something intelligent about this, though; the proof that the pictures keep looking "like that" (in the case of partition) apparently uses some very heavy machinery.
Labels:
Boyer,
combinatorics,
generating functions,
partitions,
Stanley
Look Around You (BBC spoof)
At YouTube: Look Around You - 1 - Maths, an episode of the BBC spoof science program Look Around You. (Why don't we have spoof science programs in this country? As it is, I can't tell which parts I found funny because they're mathematically silly and which parts I found silly because they're British.)
Who scribbles meaningless mathematical formulas on the wall, like the kids at the beginning? It often bothers me when meaningless mathematical notation is used for "effect", because it doesn't even look like something a clueless student might scribble; it just takes a bunch of symbols and throws them out there, and I try to understand them but can't. I am incapable of looking at mathematical symbols and not trying to understand them. I suspect this is something like the Stroop effect, which is the experimentally observed effect that it's easier for people to read words which name colors when the ink color matches the colors named by the words than when they disagree. (Incidentally, in this article from the New Republic, excerpted from his new book The Stuff of Thought, Steven Pinker, reporting on an experiment by Don MacKay, says we have even more trouble doing this task when the words are not color words but obscene words; essentially, people can't ignore obscenity. I can't ignore mathematics.)
and I wonder if "Chapter 3.1415926" of the textbook actually exists, since I don't know the context of this. It reminds me of the version number of TeX, which gets one digit longer with each new version; Knuth has requested that after he dies the version number be fixed at π.
There are two people for whom the largest number they could think of was "a hundred thousand" and "nine hundred ninety-nine thousand", which is kind of surprising; why can't you just add one to both of those? At least someone who answered "999,999" is admitting that they don't know the word for the next number. (The program makes light of this by suggesting a very large number as the very largest number... and then adding one to it.)
I also rather liked the following description of a circle: "A uniformly curved line that somehow joins up with itself that science has yet to find a name for. Can you think of a name for it? If you can, the Royal Mathematics Society would like to hear from you, because they hold a competition each year to find a name for this figure." Ah, if only it were so easy. Naming things isn't hard. (Giving them good names, on the other hand, is quite tricky, especially if you restrict yourself to words that are already words existing in natural languages; I want to be sure that the words I pick to name something don't have the wrong connotations. Sure, I can tell people to ignore them, but it's not so easy to actually do so!)
Also, this video is about "maths", not "math". Are other Americans bothered by this, too? I've never been entirely sure what to make of the fact that across the pond, my field is plural.
Who scribbles meaningless mathematical formulas on the wall, like the kids at the beginning? It often bothers me when meaningless mathematical notation is used for "effect", because it doesn't even look like something a clueless student might scribble; it just takes a bunch of symbols and throws them out there, and I try to understand them but can't. I am incapable of looking at mathematical symbols and not trying to understand them. I suspect this is something like the Stroop effect, which is the experimentally observed effect that it's easier for people to read words which name colors when the ink color matches the colors named by the words than when they disagree. (Incidentally, in this article from the New Republic, excerpted from his new book The Stuff of Thought, Steven Pinker, reporting on an experiment by Don MacKay, says we have even more trouble doing this task when the words are not color words but obscene words; essentially, people can't ignore obscenity. I can't ignore mathematics.)
and I wonder if "Chapter 3.1415926" of the textbook actually exists, since I don't know the context of this. It reminds me of the version number of TeX, which gets one digit longer with each new version; Knuth has requested that after he dies the version number be fixed at π.
There are two people for whom the largest number they could think of was "a hundred thousand" and "nine hundred ninety-nine thousand", which is kind of surprising; why can't you just add one to both of those? At least someone who answered "999,999" is admitting that they don't know the word for the next number. (The program makes light of this by suggesting a very large number as the very largest number... and then adding one to it.)
I also rather liked the following description of a circle: "A uniformly curved line that somehow joins up with itself that science has yet to find a name for. Can you think of a name for it? If you can, the Royal Mathematics Society would like to hear from you, because they hold a competition each year to find a name for this figure." Ah, if only it were so easy. Naming things isn't hard. (Giving them good names, on the other hand, is quite tricky, especially if you restrict yourself to words that are already words existing in natural languages; I want to be sure that the words I pick to name something don't have the wrong connotations. Sure, I can tell people to ignore them, but it's not so easy to actually do so!)
Also, this video is about "maths", not "math". Are other Americans bothered by this, too? I've never been entirely sure what to make of the fact that across the pond, my field is plural.
15 October 2007
The spinning woman
A spinning dancer, from news.com, via Freakonomics and Marginal Revolution (I'm giving those sources because you might want to look at the comments there.)
There is a woman spinning; the question is, is she spinning clockwise or counterclockwise? Supposedly this has something to do with which side of the brain you use, although I'm not sure I buy it.
But my first thought was "clockwise from whose perspective? From the perspective of someone looking from above, or from below?" (To clarify what I mean by "clockwise", I mean that the dancer is turning to her own right.) I think this does say something about my brain, namely that I demand precise definitions for terms, at least for the sorts of terms that can have them.
Some of the commenters at various places say that you have to look at the shadow, because the shadow doesn't agree with the dancer; as far as I can tell this isn't true, and this is not some sort of optical illusion.
The actual resolution is that you can imagine reflecting the dancer through the plane of your monitor (I almost said "the plane of the board", because usually when I'm thinking about three-dimensional pictures it's when I'm teaching calculus), which will change the direction of rotation but not the picture. So on an intellectual level I understand why both perceptions must be possible. But I can't see her going counterclockwise. Not at all.
There is a woman spinning; the question is, is she spinning clockwise or counterclockwise? Supposedly this has something to do with which side of the brain you use, although I'm not sure I buy it.
But my first thought was "clockwise from whose perspective? From the perspective of someone looking from above, or from below?" (To clarify what I mean by "clockwise", I mean that the dancer is turning to her own right.) I think this does say something about my brain, namely that I demand precise definitions for terms, at least for the sorts of terms that can have them.
Some of the commenters at various places say that you have to look at the shadow, because the shadow doesn't agree with the dancer; as far as I can tell this isn't true, and this is not some sort of optical illusion.
The actual resolution is that you can imagine reflecting the dancer through the plane of your monitor (I almost said "the plane of the board", because usually when I'm thinking about three-dimensional pictures it's when I'm teaching calculus), which will change the direction of rotation but not the picture. So on an intellectual level I understand why both perceptions must be possible. But I can't see her going counterclockwise. Not at all.
Should everyone know number theory?
From Anarchaia: Why everyone should know number theory, by Minhyong Kim.
I'm not convinced, even if "everyone" means "all mathematicians". Kim's argument seems to be that everything mathematicians ever write down is a finite collection of data and that that data can be written down as some very large number (say you enter it into a computer and consider the corresponding binary string of 0s and 1s as an integer). This reminds me of the idea of Godel numbers (which assign a number to each logical statement in a certain language), and writing down the rules that allow you to get from one Godel number which represents a true statement to another is impossibly ugly.
However, I'm really just getting this from the title, which Kim eventually admits is an exaggeration. Kim later says:
If that's true, why does everyone need to know number theory? Still, there are times when it's useful. To take a simple example, consider the central binomial coefficients,
,
or even more generally any binomial coefficient at all. The binomial coefficients have nice number-theoretic properties; for example, if you look at them modulo 2, and color the odd ones black and the even ones white, you get something that looks suspiciously like the Sierpinski triangle. But parity is fine structure; this particular result comes out of looking at how many factors of 2 are in the numerator and the denominator and subtracting them, and usually there are about the same number so one has to be careful about cancellations. A lot of the time what one really cares about is approximately how large this coefficient is. For this sort of question it's useful to use Stirling's approximation,

which allows us to determine lots of things about the sizes of things involving factorials -- for example, binomial coefficients -- without getting anywhere near close enough to them to determine their parity.
For example, it turns out that

which is the sort of information that might be useful in, say, probability; if I have some number of objects which are counted by the central binomial coefficients, and I want to know that, say, the proportion of them having some particular property is very small, this is much more valuable than knowing, say, the complete prime factorization of a binomial coefficient.
The essay does give some interesting examples, though -- the classical ones are things like doubling the cube and squaring the circle, and there are a bunch of modern examples of theorems which are in complex geometry but whose proofs require number theory. The paper builds up to a famous theorem of Matiyasevich, which says that "listable" sets of natural numbers (those for which we can write a computer program that will output all the numbers in the set, and no others) and Diophantine sets (those which are sets of solutions to polynomial equations with integer coefficients) are the same; so if you buy into the idea that all mathematics can be encoded in "things computers can do" then you essentially have to believe this number-theoretic hypothesis. And it's hard not to believe that about computers. I have sometimes told my students that if they are attempting to solve a problem, they only know a small number of things, so they could just try all of them. The mathematical community as a whole knows a large number of things, but stil a finite number, so if we wanted a computer to solve a mathematical problem we could conceivably enter it in some suitable form into a computer, along with all the facts we know, and tell the computer to "try everything". This isn't practical, though, which is why we still have mathematicians.
However, the polynomials corresponding to "simple" listable sets are not so nice; there's a polynomial of degree 25 in 26 variables all of whose positive values are primes. You can see it here. (Supposedly the degree can be reduced to 10; 26 is kind of cute, though, because it's the number of letters in our alphabet.)
Kim also gives brief outlines of how the four-color theorem could have been proven this way, and a possible proof of the Poincare conjecture via the same method (basically, it's enough to prove it for simplicial complexes, which we can enumerate, although there are some annoying problems, namely that we can't determine if a group is trivial by looking at its presentation. Maybe these reductions aren't so trivial after all.
I'm not convinced, even if "everyone" means "all mathematicians". Kim's argument seems to be that everything mathematicians ever write down is a finite collection of data and that that data can be written down as some very large number (say you enter it into a computer and consider the corresponding binary string of 0s and 1s as an integer). This reminds me of the idea of Godel numbers (which assign a number to each logical statement in a certain language), and writing down the rules that allow you to get from one Godel number which represents a true statement to another is impossibly ugly.
However, I'm really just getting this from the title, which Kim eventually admits is an exaggeration. Kim later says:
However, there is an even more essential reason why most practicing mathematicians can get by with a rather naive understanding of numbers and might be better off doing so. This is because of the extremely useful geometric picture of the real and complex numbers. Much of the time, it is perfectly reasonable to view the complex numbers as a geometric plane, and base all other constructions upon that picture, oblivious to the fine structure of our objects, pretty much as one can do plenty of classical physics without worrying about the fact that the macroscopic objects we are considering arise from the complicated interaction of elementary particles. Now, my claim is that the role of a number theorist in mathematics is exactly analogous to te role of a particle theorist in physics.
If that's true, why does everyone need to know number theory? Still, there are times when it's useful. To take a simple example, consider the central binomial coefficients,
or even more generally any binomial coefficient at all. The binomial coefficients have nice number-theoretic properties; for example, if you look at them modulo 2, and color the odd ones black and the even ones white, you get something that looks suspiciously like the Sierpinski triangle. But parity is fine structure; this particular result comes out of looking at how many factors of 2 are in the numerator and the denominator and subtracting them, and usually there are about the same number so one has to be careful about cancellations. A lot of the time what one really cares about is approximately how large this coefficient is. For this sort of question it's useful to use Stirling's approximation,
which allows us to determine lots of things about the sizes of things involving factorials -- for example, binomial coefficients -- without getting anywhere near close enough to them to determine their parity.
For example, it turns out that
which is the sort of information that might be useful in, say, probability; if I have some number of objects which are counted by the central binomial coefficients, and I want to know that, say, the proportion of them having some particular property is very small, this is much more valuable than knowing, say, the complete prime factorization of a binomial coefficient.
The essay does give some interesting examples, though -- the classical ones are things like doubling the cube and squaring the circle, and there are a bunch of modern examples of theorems which are in complex geometry but whose proofs require number theory. The paper builds up to a famous theorem of Matiyasevich, which says that "listable" sets of natural numbers (those for which we can write a computer program that will output all the numbers in the set, and no others) and Diophantine sets (those which are sets of solutions to polynomial equations with integer coefficients) are the same; so if you buy into the idea that all mathematics can be encoded in "things computers can do" then you essentially have to believe this number-theoretic hypothesis. And it's hard not to believe that about computers. I have sometimes told my students that if they are attempting to solve a problem, they only know a small number of things, so they could just try all of them. The mathematical community as a whole knows a large number of things, but stil a finite number, so if we wanted a computer to solve a mathematical problem we could conceivably enter it in some suitable form into a computer, along with all the facts we know, and tell the computer to "try everything". This isn't practical, though, which is why we still have mathematicians.
However, the polynomials corresponding to "simple" listable sets are not so nice; there's a polynomial of degree 25 in 26 variables all of whose positive values are primes. You can see it here. (Supposedly the degree can be reduced to 10; 26 is kind of cute, though, because it's the number of letters in our alphabet.)
Kim also gives brief outlines of how the four-color theorem could have been proven this way, and a possible proof of the Poincare conjecture via the same method (basically, it's enough to prove it for simplicial complexes, which we can enumerate, although there are some annoying problems, namely that we can't determine if a group is trivial by looking at its presentation. Maybe these reductions aren't so trivial after all.
14 October 2007
Two things about conics
A real life application of conics, from xkcd.
An ellipse has the property that sounds, lights, etc. emitted at one focus, and reflected off the dish, will be reflected back to the other focus.
A few weeks ago I tried to convince my students that a parabola is just an ellipse with one of the foci at infinity (so, in particular, plotting part of a curve and saying "ooh, it looks like a parabola" isn't valid because it might be an ellipse with one of the foci really far away) but I don't think they got it. Then again, it wasn't particularly important, and we're deliberately de-emphasizing things that have to do with conic sections.
And parabolas have this same property, where "emitted at one focus" is replaced with "traveling in the same direction"; that's just an example of the point-at-infinity thing.
On a related note, someone brought this problem to me -- what is the largest n such that a regular n-sided polygon can be inscribed in a non-circular ellipse?
The answer's four. A regular polygon can always be inscribed in a circle, so the vertices of a regular polygon inscribed in an ellipse all lie on the same circle.
But the maximal number of intersections of a curve of degree m and a curve of degree n is mn, by Bezout's theorem; a curve and an ellipse are both of degree 2, so the maximal number of intersections is four.
To complete the solution, I should name an ellipse in which a square can be inscribed; take x2 + 2y2 = 3, which passes through (±1, ±1).
An ellipse has the property that sounds, lights, etc. emitted at one focus, and reflected off the dish, will be reflected back to the other focus.
A few weeks ago I tried to convince my students that a parabola is just an ellipse with one of the foci at infinity (so, in particular, plotting part of a curve and saying "ooh, it looks like a parabola" isn't valid because it might be an ellipse with one of the foci really far away) but I don't think they got it. Then again, it wasn't particularly important, and we're deliberately de-emphasizing things that have to do with conic sections.
And parabolas have this same property, where "emitted at one focus" is replaced with "traveling in the same direction"; that's just an example of the point-at-infinity thing.
On a related note, someone brought this problem to me -- what is the largest n such that a regular n-sided polygon can be inscribed in a non-circular ellipse?
The answer's four. A regular polygon can always be inscribed in a circle, so the vertices of a regular polygon inscribed in an ellipse all lie on the same circle.
But the maximal number of intersections of a curve of degree m and a curve of degree n is mn, by Bezout's theorem; a curve and an ellipse are both of degree 2, so the maximal number of intersections is four.
To complete the solution, I should name an ellipse in which a square can be inscribed; take x2 + 2y2 = 3, which passes through (±1, ±1).
Correlation is not causation
From the Associated Press, via msnbc.com: Which jobs have highest rate of depression?. The article ends:
From the people at Strayer University, who sent me mail: in big letters on the front of the envelope, they inform me that
They cite the "Bureau of Labor and Statistics" [sic]; their name is actually the Bureau of Labor Statistics. Again, correlation is not causation. There have been studies showing that people who were able to get into elite universities -- but chose not to attend them -- do just as well in the work force as people who got into those universities and did attend them. Could the same sort of thing be at work between people who have been to college and people who haven't been to college but could have gotten in? I don't know. Of course, it's much easier for a potential employer to see "this person went to college" than to see "this person didn't go to college but is still just as employable as those who did"; a large part of the value of an education, especially in less technical fields where there isn't a huge amount of subject-specific material that will actually be useful on the job, is precisely this signaling effect. I feel like Paul Graham has written about this but I can't find exactly where.
(In the interests of full disclosure: they didn't send me mail. They sent the "Working Professional" who lives at my address mail.)
Just working full-time would appear to be beneficial in preventing depression. The overall rate of depression for full-time workers, 7 percent, compares with the 12.7 percent rate registered by those who are unemployed.Sure, having something to do with yourself forty hours a week probably helps ward off depression; although I don't believe I've ever suffered from anything that could be diagnosed as depression, I do feel noticeably sadder when I don't have some sort of full-time occupation. But I have a sense that the causation goes the other way here; being depressed causes people to be unemployed. Those people I have known who have suffered from clinical depression have days where they just can't get themselves to go to work; employers don't look kindly upon that sort of thing.
From the people at Strayer University, who sent me mail: in big letters on the front of the envelope, they inform me that
60.72%: The difference in salary betweeen someone with a bachelor's degree and someone with a high school diploma only.
They cite the "Bureau of Labor and Statistics" [sic]; their name is actually the Bureau of Labor Statistics. Again, correlation is not causation. There have been studies showing that people who were able to get into elite universities -- but chose not to attend them -- do just as well in the work force as people who got into those universities and did attend them. Could the same sort of thing be at work between people who have been to college and people who haven't been to college but could have gotten in? I don't know. Of course, it's much easier for a potential employer to see "this person went to college" than to see "this person didn't go to college but is still just as employable as those who did"; a large part of the value of an education, especially in less technical fields where there isn't a huge amount of subject-specific material that will actually be useful on the job, is precisely this signaling effect. I feel like Paul Graham has written about this but I can't find exactly where.
(In the interests of full disclosure: they didn't send me mail. They sent the "Working Professional" who lives at my address mail.)
13 October 2007
What comes next: 1, 2, 5, ...
What comes next in this sequence: 1, 2, 5, ...?
I know of at least seven reasonable answers: 10, 11, 12, 13, 14, or 15.
Here are explanations for each of them. Throughout, I'll denote the nth term of each sequence by an, where indexing starts at 1. (So a3 = 5, and we're looking for a4. This indexing convention is awkward for the first couple cases but agrees with the conventional indexing for Catalan and Bell numbers.)
1, 2, 5, 10, ... -- an = (n-1)2 + 1.
1, 2, 5, 11, ... --
1, 2, 5, 12, ... -- an = 2an-1 + an-2, where a1 = 1, a2 = 2. This sequence occurs in the convergents of the continued fraction for √2. The first few convergents are
1/1, 3/2, 7/5, 17/12, 41/29
and in general the nth convergent, bn/an, has an = an-1 + bn-1, bn = an + bn-1, with a1 = b1 = 1. An explicit formula for these is given by
![a_n = {1 \over 2\sqrt{2}} \left[ \left( 1+\sqrt{2} \right)^n + \left(1- \sqrt{2} \right)^n \right]](http://snappy.at.org/~cola/tex2img/image.php?id=da77c050c3c1a117587347f79caad451)
1, 2, 5, 13, ... -- alternate members of the Fibonacci sequence, which begins 1, 1, 2, 3, 5, 8, 13. If you don't like referring to another sequence, this one can be defined by the recurrence an = 3an-1 - an-2, where a1 = 1, a2 = 2. An exact formula for the nth Fibonacci number is fn = (φn - (-&phi)-n)/√5, where φ = (-1+√5)/2; then an = f2n-1.
1, 2, 5, 14, ... -- the Catalan numbers. These satisfy
. The Catalan numbers count basically everything. Okay, not everything. But in Richard Stanley's book Enumerative Combinatorics, he gives 66 sets counted by Catalan numbers; he has an ever-expanding list of them on his web site, which is currently up to 162, numbered (a), (b), (c), ..., (z), (aa), ..., (zz), (aaa), ... (zzz), (a4), ..., (z4), ..., (f7). This list may hold the prize for "longest list of things indicated by letters". The canonical example of "things counted by Catalan numbers" is binary trees, counted by number of leaves, and most of these other things can eventually be reduced to that case. The Catalan numbers satisfy the nice recurrence

which explains why they're so prevalent; they correspond to combinatorial objects that can be generated recursively, where each object of size n+1 consists of a pair of similar objects whose sizes together add up to n.
1, 2, 5, 15, ... -- the Bell numbers. These are the number of set partitions of the set {1, 2, ..., n}; that is, the number of ways to arrange n balls in some boxes. For four balls, we have the partitions
1234, 123/4, 124/3, 134/2, 234/1, 12/34, 13/24, 14/23, 12/3/4, 13/2/4, 14/2/3, 23/1/4, 24/1/3, 34/1/2, 1/2/3/4
where slashes separate the different boxes. There's no explicit formula for these, but they satisfy the recursion

which can be justified as follows. If we want to count set partitions of {1, 2, ..., n+1}, we can count them by the number of elements in the block containing n+1. The number of set partitions of {1, 2, ..., n+1} where n+1 is in a block of size j+1 is
; we can pick the j elements which are in the same block as n+1 and then make a set partition out of the remaining n-j elements. Thus

and letting k = n-j and recalling ${n \choose n-k} = {n \choose k}$ gives the desired result.
These sequences diverge from each other quickly; the next term of each sequence is 17, 21, 29, 34, 42, and 52 respectively. And the growth rates of the six sequences in question are quite different; roughly, the sequences given here grow like n2, n3/6, (1+&sqrt;(2))n, ((3+&sqrt;(5))/2)n, 4n/(π n3)1/2, and (really fast). Their generating functions also look quite different. For the first five sequences, we have ordinary generating functions which converge in some neighborhood of the origin, which are

but the last one doesn't even have a convergent ordinary generating function; its exponential generating function is exp(exp(z)-1). There's a subtle difference among the ordinary generating functions as well -- the first two have singularities at z=1, so the sequences they represent grow only as fast as powers of n, while the last three have singularities closer to the origin than z=1, so the sequences they represent grow roughly exponentially.
This is all an example of what I've heard called "the law of small numbers" -- the fact that there just aren't enough small numbers to avoid numerical coincidences like these.
I know of at least seven reasonable answers: 10, 11, 12, 13, 14, or 15.
Here are explanations for each of them. Throughout, I'll denote the nth term of each sequence by an, where indexing starts at 1. (So a3 = 5, and we're looking for a4. This indexing convention is awkward for the first couple cases but agrees with the conventional indexing for Catalan and Bell numbers.)
1, 2, 5, 10, ... -- an = (n-1)2 + 1.
1, 2, 5, 11, ... --
1, 2, 5, 12, ... -- an = 2an-1 + an-2, where a1 = 1, a2 = 2. This sequence occurs in the convergents of the continued fraction for √2. The first few convergents are
1/1, 3/2, 7/5, 17/12, 41/29
and in general the nth convergent, bn/an, has an = an-1 + bn-1, bn = an + bn-1, with a1 = b1 = 1. An explicit formula for these is given by
1, 2, 5, 13, ... -- alternate members of the Fibonacci sequence, which begins 1, 1, 2, 3, 5, 8, 13. If you don't like referring to another sequence, this one can be defined by the recurrence an = 3an-1 - an-2, where a1 = 1, a2 = 2. An exact formula for the nth Fibonacci number is fn = (φn - (-&phi)-n)/√5, where φ = (-1+√5)/2; then an = f2n-1.
1, 2, 5, 14, ... -- the Catalan numbers. These satisfy
which explains why they're so prevalent; they correspond to combinatorial objects that can be generated recursively, where each object of size n+1 consists of a pair of similar objects whose sizes together add up to n.
1, 2, 5, 15, ... -- the Bell numbers. These are the number of set partitions of the set {1, 2, ..., n}; that is, the number of ways to arrange n balls in some boxes. For four balls, we have the partitions
1234, 123/4, 124/3, 134/2, 234/1, 12/34, 13/24, 14/23, 12/3/4, 13/2/4, 14/2/3, 23/1/4, 24/1/3, 34/1/2, 1/2/3/4
where slashes separate the different boxes. There's no explicit formula for these, but they satisfy the recursion
which can be justified as follows. If we want to count set partitions of {1, 2, ..., n+1}, we can count them by the number of elements in the block containing n+1. The number of set partitions of {1, 2, ..., n+1} where n+1 is in a block of size j+1 is
and letting k = n-j and recalling ${n \choose n-k} = {n \choose k}$ gives the desired result.
These sequences diverge from each other quickly; the next term of each sequence is 17, 21, 29, 34, 42, and 52 respectively. And the growth rates of the six sequences in question are quite different; roughly, the sequences given here grow like n2, n3/6, (1+&sqrt;(2))n, ((3+&sqrt;(5))/2)n, 4n/(π n3)1/2, and (really fast). Their generating functions also look quite different. For the first five sequences, we have ordinary generating functions which converge in some neighborhood of the origin, which are
but the last one doesn't even have a convergent ordinary generating function; its exponential generating function is exp(exp(z)-1). There's a subtle difference among the ordinary generating functions as well -- the first two have singularities at z=1, so the sequences they represent grow only as fast as powers of n, while the last three have singularities closer to the origin than z=1, so the sequences they represent grow roughly exponentially.
This is all an example of what I've heard called "the law of small numbers" -- the fact that there just aren't enough small numbers to avoid numerical coincidences like these.
Labels:
combinatorics,
generating functions
12 October 2007
Huygens' principle
A weird fact I came across yesterday: Huygens' principle for the wave equation.
When you throw a pebble on the surface of some water, circular waves propagate outward from that point. However, inside the circular wavefront there will still be some sort of disturbance.
Light, in three dimensions, doesn't work the same way. If I have a pulse of light that lasts one second, then one second after that pulse stops, then the wave resulting from that pulse will be supported on the annulus from one light-second to two light-seconds from me. There will not be any sort of wave going on closer than one light-second from me.
I doubt this was known to Huygens, simply because he lived in the seventeenth century when they didn't have partial differential equations. They had waves, though, and the page I linked to above implies that Huygens knew that the leading edge of a wave travels at a constant speed, usually denoted by c; the most important example of waves is light waves, so c is now reserved for the speed of light. (I suspect he was aware of ripples in water; it's less obvious that there aren't analogous ripples in three dimensions, though, simply because there don't seem to be three-dimensional phenomena where the wave travels at a reasonable speed that one could observe. So observation is not particularly useful here, and one simply has to trust the symbols.)
Okay, this seems reasonable... things in different numbers of dimensions behave differently. But in four, six, eight, ... dimensions, solutions to the wave equation behave like the two-dimensional case (the surface of water), in five, seven, nine, ... dimensions they behave like the three-dimensional case (like light). I wouldn't have been surprised to learn that the behavior is different in low-dimensional space than in high-dimensional space. Or even that the behavior was different in "medium"-dimensional space -- it's similar to how things work in the theory of manifolds, where one- and two-dimensional manifolds are basically trivial to study, five-dimensional and higher manifolds aren't that hard, but three and four dimensions are difficult. (I'm vastly oversummarizing here; this isn't my area, and I'm just going on things I hear in the halls.) But who would think it would depend on the parity?
It seems that the dependence on the parity falls out of the series solutions to the spherically symmetrized wave equation, and this equation includes the dimension as a coefficient. I'm having a bit of trouble parsing the page I linked to (although I'm not working too hard on it). The author comments:
This isn't something I would have thought of.
When you throw a pebble on the surface of some water, circular waves propagate outward from that point. However, inside the circular wavefront there will still be some sort of disturbance.
Light, in three dimensions, doesn't work the same way. If I have a pulse of light that lasts one second, then one second after that pulse stops, then the wave resulting from that pulse will be supported on the annulus from one light-second to two light-seconds from me. There will not be any sort of wave going on closer than one light-second from me.
I doubt this was known to Huygens, simply because he lived in the seventeenth century when they didn't have partial differential equations. They had waves, though, and the page I linked to above implies that Huygens knew that the leading edge of a wave travels at a constant speed, usually denoted by c; the most important example of waves is light waves, so c is now reserved for the speed of light. (I suspect he was aware of ripples in water; it's less obvious that there aren't analogous ripples in three dimensions, though, simply because there don't seem to be three-dimensional phenomena where the wave travels at a reasonable speed that one could observe. So observation is not particularly useful here, and one simply has to trust the symbols.)
Okay, this seems reasonable... things in different numbers of dimensions behave differently. But in four, six, eight, ... dimensions, solutions to the wave equation behave like the two-dimensional case (the surface of water), in five, seven, nine, ... dimensions they behave like the three-dimensional case (like light). I wouldn't have been surprised to learn that the behavior is different in low-dimensional space than in high-dimensional space. Or even that the behavior was different in "medium"-dimensional space -- it's similar to how things work in the theory of manifolds, where one- and two-dimensional manifolds are basically trivial to study, five-dimensional and higher manifolds aren't that hard, but three and four dimensions are difficult. (I'm vastly oversummarizing here; this isn't my area, and I'm just going on things I hear in the halls.) But who would think it would depend on the parity?
It seems that the dependence on the parity falls out of the series solutions to the spherically symmetrized wave equation, and this equation includes the dimension as a coefficient. I'm having a bit of trouble parsing the page I linked to (although I'm not working too hard on it). The author comments:
It would be interesting to work out the connections between Huygens' Principle and the zeta function (whose value can only be given in simple closed form for even arguments) and the Bernoulli numbers (which are non-zero only for even indices).
This isn't something I would have thought of.
11 October 2007
Cops work fifty hours a year?
From the Philadelphia City Paper, today's issue: The First Stoned, by Chris Goldstein, is an opinion piece arguing against marijuana prohibition.
He writes:
21,000 hours is a lot of time, yes. But 21,000 divided by 400 is 52.5. These "full time" cops are only working fifty-two hours a year! (In fact, it wouldn't surprise me if Goldstein got 400 by dividing 21,000 by 52, although I can't see how that would make sense as weeks have nothing to do with the problem.)
The actual "400" there should be about 10, assuming cops work two thousand hours a year.
I support the legalization of marijuana, but I don't support people that justify it with bad math.
He writes:
In Philadelphia alone, more than 7,000 are arrested each year, mostly for simple possession. It takes about three hours to process each arrest, thereby taking police officers off the street for about 21,000 hours annually. That's like having 400 cops working full time every year just busting pot smokers.
21,000 hours is a lot of time, yes. But 21,000 divided by 400 is 52.5. These "full time" cops are only working fifty-two hours a year! (In fact, it wouldn't surprise me if Goldstein got 400 by dividing 21,000 by 52, although I can't see how that would make sense as weeks have nothing to do with the problem.)
The actual "400" there should be about 10, assuming cops work two thousand hours a year.
I support the legalization of marijuana, but I don't support people that justify it with bad math.
The "fleshball" -- a crude upper bound on human population
What bounds are there on the growth of human population?
Well, there are a lot of obvious ones. We might kill ourselves with global warming or nuclear war or a host of other things. We're still stuck on this planet. It might turn out that there's some race of killer aliens that will vaporize us when they discover we exist, perhaps because they want to demolish our planet to build a hyperspace bypass.
But let's say we survive all that. We'll still run into trouble when we can't find space for people to live.
And at some point, if we reproduce like we do now into the indefinite future, the sphere which holds all human flesh will be expanding faster than the speed of light.
Let's say you could pack all the humans that currently exist into a giant sphere. The volume of that sphere would be just Vn, where V is the volume of a human being and n is the number of humans; its radius can be obtained by solving the equation
, and we get
. With current numbers of n = 6.6 billion, V = 0.070 m3 (I'm assuming the average human being weighs 70 kilograms and has the density of water), we have x = 480 meters. That's right -- all of humanity could fit into a ball not quite a kilometer across. Kind of puts things in perspective, doesn't it?
Now let's let n be a function of time. Let's say the human population grows exponentially, so we have
. Here, n0 is the population at time zero, and r is the rate of population growth; this is currently about 0.012/year.
Then we have

and so the rate of change of the radius of the giant flesh ball, with respect to time, is

where k = 3V n_0 \over 4\pi.
The time at which the fleshball will be expanding at the speed of light is just the time when this expression for the derivative is equal to c. Setting it equal to c and solving for t gives
.
Now, we have r = 0.012/yr; that's about 4 × 10-10/sec. c is of course the speed of light, 3 × 108 m/s. And k is about 1.1 × 108 m3. Plugging in numbers gives t = 2.7 × 1011 s, or about nine thousand years from the present.
The radius of the ball at that time will be 3c/r, or 2.25 × 1018 meters; that's 237 light years. (Not surprisingly, the radius of the ball at the time that it hits the speed c doesn't depend on the volume of an individual person; as far as the ball is concerned we're just one giant blob that grows at slightly over one percent per year.)
And the number of people in the ball is

which is 6.8 × 1056.
Somehow I don't think we need to worry about this. Why? A proton weighs 1.7 × 10-27 kg; about half of a person's mass is protons, so each person contains about 2 × 1028 protons. Thus this many people would contain about 1085 protons. But the number of protons in the universe is believed to be somewhere under 1080. So we'll run out of mass in the universe well before we hit this threshhold.
Besides, the whole idea of humanity being a solid ball of flesh wasn't so appealing anyway.
Well, there are a lot of obvious ones. We might kill ourselves with global warming or nuclear war or a host of other things. We're still stuck on this planet. It might turn out that there's some race of killer aliens that will vaporize us when they discover we exist, perhaps because they want to demolish our planet to build a hyperspace bypass.
But let's say we survive all that. We'll still run into trouble when we can't find space for people to live.
And at some point, if we reproduce like we do now into the indefinite future, the sphere which holds all human flesh will be expanding faster than the speed of light.
Let's say you could pack all the humans that currently exist into a giant sphere. The volume of that sphere would be just Vn, where V is the volume of a human being and n is the number of humans; its radius can be obtained by solving the equation
Now let's let n be a function of time. Let's say the human population grows exponentially, so we have
Then we have
and so the rate of change of the radius of the giant flesh ball, with respect to time, is
where k = 3V n_0 \over 4\pi.
The time at which the fleshball will be expanding at the speed of light is just the time when this expression for the derivative is equal to c. Setting it equal to c and solving for t gives
Now, we have r = 0.012/yr; that's about 4 × 10-10/sec. c is of course the speed of light, 3 × 108 m/s. And k is about 1.1 × 108 m3. Plugging in numbers gives t = 2.7 × 1011 s, or about nine thousand years from the present.
The radius of the ball at that time will be 3c/r, or 2.25 × 1018 meters; that's 237 light years. (Not surprisingly, the radius of the ball at the time that it hits the speed c doesn't depend on the volume of an individual person; as far as the ball is concerned we're just one giant blob that grows at slightly over one percent per year.)
And the number of people in the ball is
which is 6.8 × 1056.
Somehow I don't think we need to worry about this. Why? A proton weighs 1.7 × 10-27 kg; about half of a person's mass is protons, so each person contains about 2 × 1028 protons. Thus this many people would contain about 1085 protons. But the number of protons in the universe is believed to be somewhere under 1080. So we'll run out of mass in the universe well before we hit this threshhold.
Besides, the whole idea of humanity being a solid ball of flesh wasn't so appealing anyway.
10 October 2007
The mathematician's "you"
Ertl Wins: Down With Witchcraft, by Derek Lowe at In The Pipeline, on this year's Nobel winner in Chemistry. Most chemical reactions take place in some sort of bulk liquid or gas; Ertl's work considers chemistry that occurs on the surface of a solid. The most important example is probably the Haber-Bosch process for creating ammonia from nitrogen and hydrogen.
`
Lowe writes:
Megan McArdle's response to this: "Derek Lowe has a highly exaggerated notion of my abilities."
For a moment this struck me as the way a mathematician might use "you" (although the preferred second-person pronoun in mathematical texts is "the reader", as in "This is left as an exercise for the reader." or "The reader can show that...") But it's not quite the same thing. If I'm sitting there reading some mathematics, and I come across something that "the reader" should do, I probably can do it, sitting there in my chair, if I have a large enough supply of paper and coffee. (Whether I will is a different matter; how badly do I want to understand what I'm reading?) But the reader of Lowe's post -- or the reader of a chemistry paper -- can't actually do that. The chemist has to take the author's word for it, in most cases, because it would be prohibitively expensive to check everything they read.
Vladimir Arnold, in an essay "On Teaching Mathematics", said that "Mathematics is the part of physics where experiments are cheap." This is an example of that phenomenon, although I would argue that "physics" should be replaced with some broader term, as more and more areas of knowledge are becoming mathematizable and computing power becomes cheaper and cheaper.
`
Lowe writes:
You can Haber-Bosch yourself some ammonia simply enough – just take iron powder, mix it with some drain cleaner (potassium hydroxide) and stir that up with some alumina and finely ground sand (silica). Heat it up to several hundred degrees and blow nitrogen and hydrogen across it; ammonia gas comes whiffing out the other end. Now, bacteria do this at room temperature in water, down around the roots of bean plants, but bacteria can do a lot of things we can’t do. For human civilization, this is a major achievement, because nitrogen does not want to do this reaction at all.
Megan McArdle's response to this: "Derek Lowe has a highly exaggerated notion of my abilities."
For a moment this struck me as the way a mathematician might use "you" (although the preferred second-person pronoun in mathematical texts is "the reader", as in "This is left as an exercise for the reader." or "The reader can show that...") But it's not quite the same thing. If I'm sitting there reading some mathematics, and I come across something that "the reader" should do, I probably can do it, sitting there in my chair, if I have a large enough supply of paper and coffee. (Whether I will is a different matter; how badly do I want to understand what I'm reading?) But the reader of Lowe's post -- or the reader of a chemistry paper -- can't actually do that. The chemist has to take the author's word for it, in most cases, because it would be prohibitively expensive to check everything they read.
Vladimir Arnold, in an essay "On Teaching Mathematics", said that "Mathematics is the part of physics where experiments are cheap." This is an example of that phenomenon, although I would argue that "physics" should be replaced with some broader term, as more and more areas of knowledge are becoming mathematizable and computing power becomes cheaper and cheaper.
Probabilities on the circle
At my department's tea yesterday, the following problem came up: Given three random points on a circle, what is the probability that they lie in the same semicircle?
(Incidentally, why do we insist on calling it "tea" when most of the people there drink coffee? I suspect British influence at some point.)
First, by "random" I mean "uniformly at random, with respect to angle or arc length"; a classic problem in this sort of thing is to determine the probability that the length of a random chord of a circle is larger than the length of a side of an equilateral triangle inscribed in the circle. This is Bertrand's paradox. A chord is uniquely determined by its midpoint, and the result is different if you choose a chord by picking its midpoint uniformly at random from the interior of the circle than by choosing its endpoints uniformly at random from the boundary. I feel this is "more ambiguous" in the two-point case than in the three-point case, because there's a nice way to associate a single point in the interior with two points on the boundary.
The answer to my problem is 3/4.
Consider the circle as the interval [0, 1] with the endpoints identified. Then the problem is one about choosing triplets of real numbers (x1, x2, x3) in that interval, where these numbers are chosen uniformly at random and independently. Replacing these with (0, y2, y3) = (0, x2-x1, x3 - x1) corresponds to rotating the circle by x1 and therefore doesn't change the probability of our event. Finally, if either of y2 and y3 are between 1/2 and 1, subtract 1 from it; now we're just viewing the circle as the interval [-1/2, 1/2] (centered on the first point) instead of [0, 1].
So now our event is just that |y3 - y2| ≤ 1/2. If y3 and y2 have the same sign, this is obviously true, and we can take the semicircle [0, 1/2] or [-1/2, 0]. If y3 and y2 have different signs, but |y3 - y2| ≤ 1/2, then some interval of size 1/2 containing y2 and y3. But if |y2 - y3| ≤ 1/2, -- for example, if y2 = 0.3, y3 = -0.3 -- then we lose. Any interval of width one-half containing both of those -- say the union of [0.25, 0.5] and [-0.5, -0.25], recalling that 0.5 and -0.5 are identified -- won't contain zero.
And the probability that two random points in a unit interval are within distance 1/2 of each other is just 3/4.
Alternatively, if we're trying to find this interval, one of the points is at the "counterclockwise" end of the interval. So fix which point is counterclockwisemost in the interval, there are three ways to do that. Then the other two points must be within 180 degrees clockwise from that point; the probability of that is 1/4. The "most counterclockwise" point is unique, so we can just add up the three probabilities to get 3/4. In general, if "semicircle" is replaced with some section of the circle making up a proportion c of the whole, where c ≤ 1/2, the probability that three randomly chosen points lie in an interval of that size is 3c2.
(Incidentally, why do we insist on calling it "tea" when most of the people there drink coffee? I suspect British influence at some point.)
First, by "random" I mean "uniformly at random, with respect to angle or arc length"; a classic problem in this sort of thing is to determine the probability that the length of a random chord of a circle is larger than the length of a side of an equilateral triangle inscribed in the circle. This is Bertrand's paradox. A chord is uniquely determined by its midpoint, and the result is different if you choose a chord by picking its midpoint uniformly at random from the interior of the circle than by choosing its endpoints uniformly at random from the boundary. I feel this is "more ambiguous" in the two-point case than in the three-point case, because there's a nice way to associate a single point in the interior with two points on the boundary.
The answer to my problem is 3/4.
Consider the circle as the interval [0, 1] with the endpoints identified. Then the problem is one about choosing triplets of real numbers (x1, x2, x3) in that interval, where these numbers are chosen uniformly at random and independently. Replacing these with (0, y2, y3) = (0, x2-x1, x3 - x1) corresponds to rotating the circle by x1 and therefore doesn't change the probability of our event. Finally, if either of y2 and y3 are between 1/2 and 1, subtract 1 from it; now we're just viewing the circle as the interval [-1/2, 1/2] (centered on the first point) instead of [0, 1].
So now our event is just that |y3 - y2| ≤ 1/2. If y3 and y2 have the same sign, this is obviously true, and we can take the semicircle [0, 1/2] or [-1/2, 0]. If y3 and y2 have different signs, but |y3 - y2| ≤ 1/2, then some interval of size 1/2 containing y2 and y3. But if |y2 - y3| ≤ 1/2, -- for example, if y2 = 0.3, y3 = -0.3 -- then we lose. Any interval of width one-half containing both of those -- say the union of [0.25, 0.5] and [-0.5, -0.25], recalling that 0.5 and -0.5 are identified -- won't contain zero.
And the probability that two random points in a unit interval are within distance 1/2 of each other is just 3/4.
Alternatively, if we're trying to find this interval, one of the points is at the "counterclockwise" end of the interval. So fix which point is counterclockwisemost in the interval, there are three ways to do that. Then the other two points must be within 180 degrees clockwise from that point; the probability of that is 1/4. The "most counterclockwise" point is unique, so we can just add up the three probabilities to get 3/4. In general, if "semicircle" is replaced with some section of the circle making up a proportion c of the whole, where c ≤ 1/2, the probability that three randomly chosen points lie in an interval of that size is 3c2.
09 October 2007
Probabilities of relative primality
From The Universe of Discourse -- the probability that two randomly-selected polynomials over Z2 are relatively prime is 1/2. (More formally, the probability that two randomly-selected polynomials over Z2 of degree less than n are relatively prime is 1/2 + 1/4n; the original result holds as n goes to infinity.) There's actually an almost one-to-one correspondence between pairs which aren't relatively prime and pairs which are. You can find this, and some more general results, in "The Probability of Relatively Prime Polynomials", from the June 2007 issue of Mathematics Magazine, (Arthur T. Benjamin and Curtis D Bennett, page 196).
The most general result is as follows: if a1(x), ..., am(x) are randomly chosen polynomials of degree less than n in F[x], where F is the finite field with q elements, then the probability that they are relatively prime is 1 - 1/qm-1 + (q-1)/qmn; this is all shown with a counting argument that invokes the Euclidean algorithm. The case from The Universe of Discourse is the case m = 2, where n goes to infinity.
This reminds me of one of my favorite statements in mathematics: "the probability that two randomly chosen integers are relatively prime is 6/π2." This seems almost meaningless -- how can we choose integers "at random"? More formally, let A(n) be the number of relatively prime pairs of integers in {1, 2, ..., n}; then

The convergence is pretty quick; A(100)/1002 = 0.6087, and A(1000)/10002 = 0.608383, while 6/π2 = 0.6079...
Why is this true? Well, the probability that two integers are relatively prime is the probability that they're not both divisible by 2, times the probability that they're not both divisible by 3, times the probabiltiy that they're not both divisible by 5, and so on. The probability that two integers aren't both divisible by a given prime p is 1 - 1/p2. So the probability that two "randomly chosen integers" aren't both divisible by the same prime p for all primes p is

and Euler showed that this was equal to 1/ζ(2), or 6/π2.
Similarly, the probability that n randomly chosen integers are relatively prime (not pairwise relatively prime; I count a triple like (6, 10, 15), since there isn't a prime number dividing all three of those) is 1/ζ(n). We recall that
ζ(n) = 1 + (1/2)n + (1/3)n + ...
and if n is large we can throw away all but the first two terms; thus the probability that n randomly chosen integers are relatively prime is about 1 - (1/2)n. This makes sense; if there are n integers that aren't relatively prime, the reason they're not relatively prime is almost always because they're all even.
The logical next question is: what is the average of the greatest common divisor of randomly chosen integers? From some quick numerical work, it looks like

grows like log n, that is, the average of the GCDs of pairs of integers randomly chosen from {1, 2, ..., n} is about some constant times log n. (The constant, to one decimal place, looks like 0.6.) But I don't know the answer. If I get bored at some point in the next couple days I'll see if I can figure it out.
The most general result is as follows: if a1(x), ..., am(x) are randomly chosen polynomials of degree less than n in F[x], where F is the finite field with q elements, then the probability that they are relatively prime is 1 - 1/qm-1 + (q-1)/qmn; this is all shown with a counting argument that invokes the Euclidean algorithm. The case from The Universe of Discourse is the case m = 2, where n goes to infinity.
This reminds me of one of my favorite statements in mathematics: "the probability that two randomly chosen integers are relatively prime is 6/π2." This seems almost meaningless -- how can we choose integers "at random"? More formally, let A(n) be the number of relatively prime pairs of integers in {1, 2, ..., n}; then
The convergence is pretty quick; A(100)/1002 = 0.6087, and A(1000)/10002 = 0.608383, while 6/π2 = 0.6079...
Why is this true? Well, the probability that two integers are relatively prime is the probability that they're not both divisible by 2, times the probability that they're not both divisible by 3, times the probabiltiy that they're not both divisible by 5, and so on. The probability that two integers aren't both divisible by a given prime p is 1 - 1/p2. So the probability that two "randomly chosen integers" aren't both divisible by the same prime p for all primes p is
and Euler showed that this was equal to 1/ζ(2), or 6/π2.
Similarly, the probability that n randomly chosen integers are relatively prime (not pairwise relatively prime; I count a triple like (6, 10, 15), since there isn't a prime number dividing all three of those) is 1/ζ(n). We recall that
ζ(n) = 1 + (1/2)n + (1/3)n + ...
and if n is large we can throw away all but the first two terms; thus the probability that n randomly chosen integers are relatively prime is about 1 - (1/2)n. This makes sense; if there are n integers that aren't relatively prime, the reason they're not relatively prime is almost always because they're all even.
The logical next question is: what is the average of the greatest common divisor of randomly chosen integers? From some quick numerical work, it looks like
grows like log n, that is, the average of the GCDs of pairs of integers randomly chosen from {1, 2, ..., n} is about some constant times log n. (The constant, to one decimal place, looks like 0.6.) But I don't know the answer. If I get bored at some point in the next couple days I'll see if I can figure it out.
08 October 2007
In case you need more things to procrastinate with...
The list of blogs that I read, over on the right sidebar, has been updated.
There's no real rhyme or reason to this list; I just took my Google Reader list, filtered out the blogs that aren't mathematical in any way, and stuck them in there. If you want more things to read, take a look there. Basically, these are the blogs I find interesting enough that I want the magic of RSS to let me know when they've been updated.
I promise I won't be offended if you go read them instead of me.
There's no real rhyme or reason to this list; I just took my Google Reader list, filtered out the blogs that aren't mathematical in any way, and stuck them in there. If you want more things to read, take a look there. Basically, these are the blogs I find interesting enough that I want the magic of RSS to let me know when they've been updated.
I promise I won't be offended if you go read them instead of me.
Choosing partitions at random
There is such a thing as a partition of a positive integer; it's a way to write that positive integer as a sum of positive integers, where order is irrelevant. So, for example, the partitions of 7 are
7, 6+1, 5+2, 5+1+1, 4+3, 4+2+1, 4+1+1+1, 3+3+1, 3+2+2, 3+2+1+1, 3+1+1+1+1, 2+2+2+1, 2+2+1+1+1, 2+1+1+1+1+1, 1+1+1+1+1+1+1
and there are fifteen of them. We can define a function p, where p(n) is the number of partitions of n; thus p(7) = 15.
So how might one pick a partition of n "at random"? There are at least three ways I know of. The first is to pick one uniformly, so each one is chosen with probability 1/15.
The second is as follows: begin with the partition 1. Then generate a partition of 2 by augmenting some part of the partition by 1; we can either augment the first part (giving 2) or the second part (there is no second part, so this corresponds to adding a new part 1, giving 1+1). Then do the same thing to obtain a partition of 3. If you had 2 at the second step, then you get 3 or 2+1 at the third step with equal probability; if you had 1+1 at the second step, then you get 2+1 or 1+1+1 with equal probability. But you can't have 1+2 -- we make the rule that partitions have to be written in decreasing order. So at the third step, you have 3 with probability 1/4, 2+1 with probability 1/2, and 1+1+1 with probability 1/4. At the fourth step, 3 becomes 4 or 3+1 with equal probability; 2+1 becomes 3+1, 2+2, or 2+1+1 with equal probability, and 1+1+1 becomes 2+1+1 or 1+1+1+1 with equal probability. Thus you have the partitions 4 or 1+1+1+1 with probability 1/8 each; the partitions 3+1 or 2+1+1 with probability 7/24 each; the partition 2+2 with probability 1/4.
As you can see, this process doesn't generate partitions uniformly at random. But in a way it's natural, because at each step of the process one is doing something uniform. There's a natural way to draw a picture of a partition as a Ferrers diagram, where a partition of n will have a diagram with n dots; this corresponds to putting down n dots "at random" subject to the constraint that at each step the diagram you obtain is a valid Ferrers diagram. It turns out that the Ferrers diagram obtained in this way tend, up to scaling, to a certain shape as they grow large. As do partitions chosen uniformly at random -- but it's a different shape.
And there's a third way of choosing partitions at random. This is called the "Plancherel measure". We pick a partition λ of n with probability (dim λ2)/n!, where dim λ is the "dimension" of a partition. The dimension of a partition is just the number of ways in which it can be obtained as the nth term of a sequence of partitions, starting with the empty partition and at each step adding one to some part. For example, the partition 5 = 3+2 is the last element of
0, 1, 2, 3, 3+1, 3+2
0, 1, 2, 2+1, (3+1 or 2+2), 3+2
0, 1, 1+1, 2+1, (3+1 or 2+2), 3+2
so there are five such sequences, and dim (3+2) = 5. Thus in the Plancherel measure we choose the partition 3+2 with probability 52 / 5! = 25/120. The dimension of a partition is also the number of standard Young tableaux corresponding to it.
This just seems so artificial, though -- if I wanted to actually pick a partition according to this distribution, how would I do it?
There are n! permutations of n objects. Picking one of these uniformly at random seems like a natural thing to do. (For example, it has a physical instantiation in the shuffling of a deck of cards.) It turns out that there is a correspondence called the Robinson-Schensted correspondence. (Sometimes Knuth's name is in here as well.) This correspondence associates with a permutation of [n] a pair of standard Young tableaux, each containing n squares, and of the same shape. The shape of those Young tableaux gives a partition, then; if we started by picking a permutation uniformly then we've now obtained a partition chosen with the distribution induced by the Plancherel measure.
Suddenly this distribution seems natural. I'm much more comfortable with the idea of picking something from a random distribution if I can imagine how I would actually go and choose it if I had to.
7, 6+1, 5+2, 5+1+1, 4+3, 4+2+1, 4+1+1+1, 3+3+1, 3+2+2, 3+2+1+1, 3+1+1+1+1, 2+2+2+1, 2+2+1+1+1, 2+1+1+1+1+1, 1+1+1+1+1+1+1
and there are fifteen of them. We can define a function p, where p(n) is the number of partitions of n; thus p(7) = 15.
So how might one pick a partition of n "at random"? There are at least three ways I know of. The first is to pick one uniformly, so each one is chosen with probability 1/15.
The second is as follows: begin with the partition 1. Then generate a partition of 2 by augmenting some part of the partition by 1; we can either augment the first part (giving 2) or the second part (there is no second part, so this corresponds to adding a new part 1, giving 1+1). Then do the same thing to obtain a partition of 3. If you had 2 at the second step, then you get 3 or 2+1 at the third step with equal probability; if you had 1+1 at the second step, then you get 2+1 or 1+1+1 with equal probability. But you can't have 1+2 -- we make the rule that partitions have to be written in decreasing order. So at the third step, you have 3 with probability 1/4, 2+1 with probability 1/2, and 1+1+1 with probability 1/4. At the fourth step, 3 becomes 4 or 3+1 with equal probability; 2+1 becomes 3+1, 2+2, or 2+1+1 with equal probability, and 1+1+1 becomes 2+1+1 or 1+1+1+1 with equal probability. Thus you have the partitions 4 or 1+1+1+1 with probability 1/8 each; the partitions 3+1 or 2+1+1 with probability 7/24 each; the partition 2+2 with probability 1/4.
As you can see, this process doesn't generate partitions uniformly at random. But in a way it's natural, because at each step of the process one is doing something uniform. There's a natural way to draw a picture of a partition as a Ferrers diagram, where a partition of n will have a diagram with n dots; this corresponds to putting down n dots "at random" subject to the constraint that at each step the diagram you obtain is a valid Ferrers diagram. It turns out that the Ferrers diagram obtained in this way tend, up to scaling, to a certain shape as they grow large. As do partitions chosen uniformly at random -- but it's a different shape.
And there's a third way of choosing partitions at random. This is called the "Plancherel measure". We pick a partition λ of n with probability (dim λ2)/n!, where dim λ is the "dimension" of a partition. The dimension of a partition is just the number of ways in which it can be obtained as the nth term of a sequence of partitions, starting with the empty partition and at each step adding one to some part. For example, the partition 5 = 3+2 is the last element of
0, 1, 2, 3, 3+1, 3+2
0, 1, 2, 2+1, (3+1 or 2+2), 3+2
0, 1, 1+1, 2+1, (3+1 or 2+2), 3+2
so there are five such sequences, and dim (3+2) = 5. Thus in the Plancherel measure we choose the partition 3+2 with probability 52 / 5! = 25/120. The dimension of a partition is also the number of standard Young tableaux corresponding to it.
This just seems so artificial, though -- if I wanted to actually pick a partition according to this distribution, how would I do it?
There are n! permutations of n objects. Picking one of these uniformly at random seems like a natural thing to do. (For example, it has a physical instantiation in the shuffling of a deck of cards.) It turns out that there is a correspondence called the Robinson-Schensted correspondence. (Sometimes Knuth's name is in here as well.) This correspondence associates with a permutation of [n] a pair of standard Young tableaux, each containing n squares, and of the same shape. The shape of those Young tableaux gives a partition, then; if we started by picking a permutation uniformly then we've now obtained a partition chosen with the distribution induced by the Plancherel measure.
Suddenly this distribution seems natural. I'm much more comfortable with the idea of picking something from a random distribution if I can imagine how I would actually go and choose it if I had to.
Labels:
combinatorics,
number theory,
probability
07 October 2007
links for 2007-10-07
links for 2007-10-07
From The Economist, via Statistical Modeling, etc.: the durations of people's periods of physical activity follows a power-law distribution, and the distributions are much different in depressed people than in healthy people. The article claiming this, oddly enough, was in Physical Review Letters -- this seems a bit strange to me, if only because the referees that journal would use are likely physicists!
(The original article by Toru Nakamura, Ken Kiyono, Kazuhiro Yoshiuchi, Rika Nakahara, Zbigniew R. Struzik, and Yoshiharu Yamamoto is Universal Scaling Law in Human Behavioral Organization; it looks like you've got to pay for it.)
From Slashdot: Science and the Islamic world—The quest for rapprochement:
As you may remember, these are the people who invented algebra. And algorithms. (It wasn't Al Gore.)
Mark Dominus (who now lives in my neighborhood) analyzes a program he wrote many years ago about van der Waerden's problem. We can define a function V(n,C) such that if we color the first V(n,C) integers with C different colors, there's guaranteed to be a sequence of n integers, in arithmetic progression, which are all the same color. The problem is that the bounds the proof of the theorem gives are ridiculously huge; according to the theorem V(3,3) ≤ 5.79 · 1014613. In fact V(3,3) = 27 -- there are ways of coloring the integers from 1 to 26 red, yellow, and blue so that there's no monochromatic arithmetic progression of length three, but not of coloring the integers from 1 to 27 under that same condition. A while ago I looked at this from a probabilistic point of view -- one can imagine doing an exhaustive search of which sequences of length 1 have no monochromatic sequences of length n, then the average number of ways of expanding each of those to a sequence of length 2, then length 3, and so on; multiplying those averages together gives an estimate of how many ``good" sequences exist of any given length. As the sequences get longer, those numbers should grow and then shrink; the length at which they shrink below 1 is probably a good estimate of V(n,C). Unfortunately I can't find my notes on this.
A pair of posts on the geometry of parallel parking: from The Everything Seminar and Rigorous Trivialities. The second of these proves that it's always possible to parallel park a car, assuming that the spot to be parked in is strictly larger than the car and the driver can make arbitrarily small movements. The next logical question is: how many such movements, and how much total movement of the car, does it take to do so?
From The Economist, via Statistical Modeling, etc.: the durations of people's periods of physical activity follows a power-law distribution, and the distributions are much different in depressed people than in healthy people. The article claiming this, oddly enough, was in Physical Review Letters -- this seems a bit strange to me, if only because the referees that journal would use are likely physicists!
(The original article by Toru Nakamura, Ken Kiyono, Kazuhiro Yoshiuchi, Rika Nakahara, Zbigniew R. Struzik, and Yoshiharu Yamamoto is Universal Scaling Law in Human Behavioral Organization; it looks like you've got to pay for it.)
From Slashdot: Science and the Islamic world—The quest for rapprochement:
It was not always this way. Islam's magnificent Golden Age in the 9th–13th centuries brought about major advances in mathematics, science, and medicine. The Arabic language held sway in an age that created algebra, elucidated principles of optics, established the body's circulation of blood, named stars, and created universities. But with the end of that period, science in the Islamic world essentially collapsed. No major invention or discovery has emerged from the Muslim world for well over seven centuries now. That arrested scientific development is one important element—although by no means the only one—that contributes to the present marginalization of Muslims and a growing sense of injustice and victimhood.
As you may remember, these are the people who invented algebra. And algorithms. (It wasn't Al Gore.)
Mark Dominus (who now lives in my neighborhood) analyzes a program he wrote many years ago about van der Waerden's problem. We can define a function V(n,C) such that if we color the first V(n,C) integers with C different colors, there's guaranteed to be a sequence of n integers, in arithmetic progression, which are all the same color. The problem is that the bounds the proof of the theorem gives are ridiculously huge; according to the theorem V(3,3) ≤ 5.79 · 1014613. In fact V(3,3) = 27 -- there are ways of coloring the integers from 1 to 26 red, yellow, and blue so that there's no monochromatic arithmetic progression of length three, but not of coloring the integers from 1 to 27 under that same condition. A while ago I looked at this from a probabilistic point of view -- one can imagine doing an exhaustive search of which sequences of length 1 have no monochromatic sequences of length n, then the average number of ways of expanding each of those to a sequence of length 2, then length 3, and so on; multiplying those averages together gives an estimate of how many ``good" sequences exist of any given length. As the sequences get longer, those numbers should grow and then shrink; the length at which they shrink below 1 is probably a good estimate of V(n,C). Unfortunately I can't find my notes on this.
A pair of posts on the geometry of parallel parking: from The Everything Seminar and Rigorous Trivialities. The second of these proves that it's always possible to parallel park a car, assuming that the spot to be parked in is strictly larger than the car and the driver can make arbitrarily small movements. The next logical question is: how many such movements, and how much total movement of the car, does it take to do so?
06 October 2007
Dual numbers
You know that trick where you invent some number ε such that ε2 = 0 and use it to, basically, take derivatives?
For example, (x+ε)2 = x2 + 2xε, so if we change x by some small amount ε then we change x2 by 2x times that amount. Thus the derivative of x2 must be 2x.
It turns out that trick has a name; it's calculation in the algebra of dual numbers, which I discovered by randomly poking around Wikipedia, and it's apparently used in at least some computer algebra systems to do differentiation. I didn't know that.
Edit (Monday, 4:36 pm): Charles of Rigorous Trivialities has pointed out that dual numbers are used extensively in deformation theory.
For example, (x+ε)2 = x2 + 2xε, so if we change x by some small amount ε then we change x2 by 2x times that amount. Thus the derivative of x2 must be 2x.
It turns out that trick has a name; it's calculation in the algebra of dual numbers, which I discovered by randomly poking around Wikipedia, and it's apparently used in at least some computer algebra systems to do differentiation. I didn't know that.
Edit (Monday, 4:36 pm): Charles of Rigorous Trivialities has pointed out that dual numbers are used extensively in deformation theory.
Some partition identities due to Euler
I wrote a couple weeks ago about my love of the identity

which is an analytic version of the fact that every integer is uniquely a sum of distinct powers of 2. What I didn't realize at the time, but learned from George Andrews article Euler's "De Partitio Numerorum", in the current issue (Vol. 44, No. 4) of the Bulletin of the American Mathematical Society, is that this is due to Euler. This is hardly surprising, though, as Euler was as far as I know the first person to apply generating functions to partitions. (The result was known before that; it's the generating-function statement that's due to Euler.)
But here's one I didn't know:

This is an analytic version of the fact that every integer is uniquely a sum of distinct powers of 3 and their negatives, and is Theorem 4 of the Andrews paper. For example, 1 = 1, 2 = 3-1, 3 = 3, 4 = 3+1, 5 = 9-3+1, 6 = 9-3, 7 = 9-3+1, and so on.
To see this fact combinatorially, note that we can make 3k sums out of the numbers 30, 31, ..., 3k-1; for each of these we include the number, its negative, or neither. If we can show that these are the distinct integers from -l to l, where l = (3k-1)/2, then that's enough. (For example, when k = 2, we have l = 4, and the nine possible sums of 1, 3, and their negatives are
-3-1, -3, -3+1, -1, 0, 1, 3-1, 3, 3+1
where "0" denotes the empty sum. These are just the integers from -4 to 4.)
But this can be shown by induction on k. Clearly it's true for k = 0. Then, for the induction, consider the 3k sums of the powers of three up to 3k-1 and their negatives; by the inductive hypothesis these are just the integers in the interval
. Now consider the 3k+1 sums of the powers of three up to 3k and their negatives. There are 3k of these which are just the ones which don't include 3k. Those that include +3k add up to
, and those which include -3k add up to
. These three intervals put together are
, as desired. (For example, when k=2, these three intervals are [-4, 4], [5, 13], and [-13, -5], which together make up [-13, 13].) By induction, we can make each integer in
uniquely as a sum of distinct powers of 3 which are less than 3k and their negatives, which is essentially the desired result.
The last section of the article talks about partitions with some negative parts, or "signed partitions". For example, 9+8-6-5 is a signed partition of 6. These are a bit more awkward to work with in terms of generating functions -- the generating function above about the powers of 3 doesn't converge, ever! So it's basically a curiosity, but one can actually prove things about these signed partitions using generating functions; see the article for examples.
which is an analytic version of the fact that every integer is uniquely a sum of distinct powers of 2. What I didn't realize at the time, but learned from George Andrews article Euler's "De Partitio Numerorum", in the current issue (Vol. 44, No. 4) of the Bulletin of the American Mathematical Society, is that this is due to Euler. This is hardly surprising, though, as Euler was as far as I know the first person to apply generating functions to partitions. (The result was known before that; it's the generating-function statement that's due to Euler.)
But here's one I didn't know:
This is an analytic version of the fact that every integer is uniquely a sum of distinct powers of 3 and their negatives, and is Theorem 4 of the Andrews paper. For example, 1 = 1, 2 = 3-1, 3 = 3, 4 = 3+1, 5 = 9-3+1, 6 = 9-3, 7 = 9-3+1, and so on.
To see this fact combinatorially, note that we can make 3k sums out of the numbers 30, 31, ..., 3k-1; for each of these we include the number, its negative, or neither. If we can show that these are the distinct integers from -l to l, where l = (3k-1)/2, then that's enough. (For example, when k = 2, we have l = 4, and the nine possible sums of 1, 3, and their negatives are
-3-1, -3, -3+1, -1, 0, 1, 3-1, 3, 3+1
where "0" denotes the empty sum. These are just the integers from -4 to 4.)
But this can be shown by induction on k. Clearly it's true for k = 0. Then, for the induction, consider the 3k sums of the powers of three up to 3k-1 and their negatives; by the inductive hypothesis these are just the integers in the interval
The last section of the article talks about partitions with some negative parts, or "signed partitions". For example, 9+8-6-5 is a signed partition of 6. These are a bit more awkward to work with in terms of generating functions -- the generating function above about the powers of 3 doesn't converge, ever! So it's basically a curiosity, but one can actually prove things about these signed partitions using generating functions; see the article for examples.
Are we so different after all?
Mark Liberman writes about The Pirahã and us, from Language Log:
And we're like this too, he claims; to a very good approximation, we don't have words for information about the distribution of representative samples.
We have these words (the examples he submits are "percentile", "histogram", "standard deviation", "frequency distribution", "variance", and "confidence intervals") but perhaps a hundred thousand or so people in the USA actually understand what these words mean. There are three hundred Piraha; if you pick three hundred Americans, chances are you won't get one who understands this stuff. (I am not quite bored enough to go out on the street and do this study, and even if I did, I live near a university so the data would be skewed.) I suppose that this is true of any specialized field, though, not just statistics.
Notably not among that set of people are the journalists whose job it is to inform other people of these things; as readers of this blog know, this causes much comedy for those of us who do know a little bit about these things.
Although I haven't seriously thought this through, it seems plausible to me that instead of teaching college students who will take one math course calculus, we should teach them statistics; statistics might actually be useful.
Other, larger, languages have the same issue as the Pirahã do, though. It's my impression that there are very few things you can't talk about in English due to a lack of vocabulary. However, I admit that I might not know about these things if they exist -- all the languages I know are either English or have a large number of people who speak the language and English. I've heard that in smaller European languages this isn't the case -- there are things that, say, Norwegian or Catalan just doesn't have the words for, that speakers of those languages might want to talk about. (For example, what do they call the Catalan numbers in Catalan?) This is because in a large language community like English-speakers, someone will want to talk about thing X, even if thing X is rare, but this is less likely to be true in a smaller language community. The usual solution seems to be to borrow words from other languages -- but perhaps small numbers are the sort of things you just can't borrow. (Remember that I'm not a linguist.)
The Pirahã language and culture seem to lack not only the words but also the concepts for numbers, using instead less precise terms like "small size", "large size" and "collection". And the Pirahã people themselves seem to be suprisingly uninterested in learning about numbers, and even actively resistant to doing so, despite the fact that in their frequent dealings with traders they have a practical need to evaluate and compare numerical expressions.
And we're like this too, he claims; to a very good approximation, we don't have words for information about the distribution of representative samples.
We have these words (the examples he submits are "percentile", "histogram", "standard deviation", "frequency distribution", "variance", and "confidence intervals") but perhaps a hundred thousand or so people in the USA actually understand what these words mean. There are three hundred Piraha; if you pick three hundred Americans, chances are you won't get one who understands this stuff. (I am not quite bored enough to go out on the street and do this study, and even if I did, I live near a university so the data would be skewed.) I suppose that this is true of any specialized field, though, not just statistics.
Notably not among that set of people are the journalists whose job it is to inform other people of these things; as readers of this blog know, this causes much comedy for those of us who do know a little bit about these things.
Although I haven't seriously thought this through, it seems plausible to me that instead of teaching college students who will take one math course calculus, we should teach them statistics; statistics might actually be useful.
Other, larger, languages have the same issue as the Pirahã do, though. It's my impression that there are very few things you can't talk about in English due to a lack of vocabulary. However, I admit that I might not know about these things if they exist -- all the languages I know are either English or have a large number of people who speak the language and English. I've heard that in smaller European languages this isn't the case -- there are things that, say, Norwegian or Catalan just doesn't have the words for, that speakers of those languages might want to talk about. (For example, what do they call the Catalan numbers in Catalan?) This is because in a large language community like English-speakers, someone will want to talk about thing X, even if thing X is rare, but this is less likely to be true in a smaller language community. The usual solution seems to be to borrow words from other languages -- but perhaps small numbers are the sort of things you just can't borrow. (Remember that I'm not a linguist.)
Labels:
language,
Language Log,
statistics
05 October 2007
faking it
Zeno at Halfway There implicitly asks: would it be easier for a mathematician to pose as a teacher of some other discipline than vice versa?
This reminds me of a fairly common answer to "what good is a math major?" or "what good is a PhD in math if I don't want to be a professor?", namely that for most values of X, it's easier to teach a mathematician to do X than it is to teach an X-ist mathematics. I'm not sure what good this answer is, though, because it presupposes that having someone who can do both X and math is a Good Thing. That sounds plausible but I want a proof. Or at least some convincing numerical evidence.
According to Zeno, it was somewhat controversial that a two-year college degree in California was made to require more math than a high school diploma in the same state; this reminds me of what I noticed when I was studying for the GRE. The verbal section of the GRE seemed to me to be harder than the verbal section of the SAT -- it had more advanced vocabulary, more complicated reading passages, and so on. But the mathematics section seemed easier! It's as if the test-writers assume that people forgot math in college. (Since you're going to ask: I got a perfect score on the math section and would have been ashamed of anything less.) The average score in my department, depending on the sample, is somewhere between 789 and 794 out of a possible 800. But you'd expect a bunch of entering PhD students in math to be nearly perfect on math they should have learned by the ninth grade. I concluded, upon taking the test, that there was no score I could get on the math section that could possibly make my application look better -- that the test could only hurt me. (Fortunately, it didn't.)
Zeno also writes:
It's an interesting experiment. Sokal affair, anyone?
This reminds me of a fairly common answer to "what good is a math major?" or "what good is a PhD in math if I don't want to be a professor?", namely that for most values of X, it's easier to teach a mathematician to do X than it is to teach an X-ist mathematics. I'm not sure what good this answer is, though, because it presupposes that having someone who can do both X and math is a Good Thing. That sounds plausible but I want a proof. Or at least some convincing numerical evidence.
According to Zeno, it was somewhat controversial that a two-year college degree in California was made to require more math than a high school diploma in the same state; this reminds me of what I noticed when I was studying for the GRE. The verbal section of the GRE seemed to me to be harder than the verbal section of the SAT -- it had more advanced vocabulary, more complicated reading passages, and so on. But the mathematics section seemed easier! It's as if the test-writers assume that people forgot math in college. (Since you're going to ask: I got a perfect score on the math section and would have been ashamed of anything less.) The average score in my department, depending on the sample, is somewhere between 789 and 794 out of a possible 800. But you'd expect a bunch of entering PhD students in math to be nearly perfect on math they should have learned by the ninth grade. I concluded, upon taking the test, that there was no score I could get on the math section that could possibly make my application look better -- that the test could only hurt me. (Fortunately, it didn't.)
Zeno also writes:
With enough sang froid, a math teacher could probably pose as an English teacher for a much longer time than an English teacher could do the same in a math class. Math doesn't have the wiggle room or the space for discourse that other subjects allow.
It's an interesting experiment. Sokal affair, anyone?
Probability on Jeopardy!
No, this isn't about the probability of winning at Jeopardy or any such thing.
On Wednesday night's Jeopardy!, there was a category "Fun With Probability". It's always funny to see math categories appear on the show, because the contestants seem to avoid them. (They shy away from science categories as well, but not to the same extent; you can fake your way through a science category by having memorized a bunch of things, but you can't do that with a math category. By the way, any puns on the word "category" I might make in this post are unintentional.)
You can see the entire game at the Jeopardy! archive. The questions were as follows:
Contestants answered "36" and "32". I think "32" was a wild guess. But "36" is actually a fairly reasonable answer here. The odds of hitting your lucky number on a single-zero wheel are, in fact, 36 to 1. A single-zero wheel has thirty-seven spots -- the zero and the numbers one through 36 -- and a bet of $1 pays $36 if your number comes up. The house edge here is just 1/37, as opposed to 2/38 on the wheels with both zero and double-zero. (Is there a form of roulette where there are no zeroes? This seems like it could exist if not played in a casino. Poker is a fair game when played socially but in casinos the house takes a percentage of each pot. Then again, I don't see people getting together socially to spin a wheel and hand each other money; poker involves infinitely more skill than roulette. I mean the word "infinitely" here literally, because roulette takes zero skill.)
One in thirty-two, of course; thirty-two is 25. (Trick question that I might give if I ever find myself teaching basic probability: the odds of getting heads on a given coin flip of a certain unfair coin are 2 to 1 against. What are the odds of getting heads five times in a row? The answer is 242 to 1 against, but I suspect a lot of people would hear "coin" and just start multiplying out the twos.)
Not really a probability question; you really just had to kind of guess your way through this one. There are three reasonable objects -- comets, meteors, and asteroids. The three contestants, in turn, guessed all three possible pairs of them; binomial coefficients in action! (There are plenty of Jeopardy! clues where there are three possible answers, and the best strategy seems to be to wait for the other two people to get the wrong answer. There are also a fairly large number where there are two possible answers; Sweden/Norway and Oxford/Cambridge seem like common pairs of this sort. At least for me.) If you know a bit of astronomy, though, you know that meteors are not that big, and hit the earth all the time in meteor showers.
A lot of Jeopardy! clues have a bunch of extraneous verbiage; the question here is really "what's the kind of allele that's not dominant?", which might be surprisingly easy for a $1600 clue. But at this point I feel obliged to mention that genetics is one of the other disciplines where probability and statistics were applied early on, or so a friend of mine tells me; I suppose this is more reputable than gambling and less boring than insurance.
(There was a video here; it was pretty clear that this was the probability of the second card being an ace, given that the first card also was an ace.) There are three aces left among the fifty-one remaining cards, so it's three in 51, which is one in 17. I was screaming at the TV -- none of the contestants got it -- but then again I do this stuff for a living, and as far as I know none of them do, so I really shouldn't be too hard on them. Incidentally, this is a "baby version" (as one of my the principle at work in card counting in blackjack; if you know the deck is rich in high cards then the dealer is more likely to bust, so you bet more.
On Wednesday night's Jeopardy!, there was a category "Fun With Probability". It's always funny to see math categories appear on the show, because the contestants seem to avoid them. (They shy away from science categories as well, but not to the same extent; you can fake your way through a science category by having memorized a bunch of things, but you can't do that with a math category. By the way, any puns on the word "category" I might make in this post are unintentional.)
You can see the entire game at the Jeopardy! archive. The questions were as follows:
$400: High rollers get to play roulette on single-zero wheels, where the chance of hitting your lucky number is 1 in this
Contestants answered "36" and "32". I think "32" was a wild guess. But "36" is actually a fairly reasonable answer here. The odds of hitting your lucky number on a single-zero wheel are, in fact, 36 to 1. A single-zero wheel has thirty-seven spots -- the zero and the numbers one through 36 -- and a bet of $1 pays $36 if your number comes up. The house edge here is just 1/37, as opposed to 2/38 on the wheels with both zero and double-zero. (Is there a form of roulette where there are no zeroes? This seems like it could exist if not played in a casino. Poker is a fair game when played socially but in casinos the house takes a percentage of each pot. Then again, I don't see people getting together socially to spin a wheel and hand each other money; poker involves infinitely more skill than roulette. I mean the word "infinitely" here literally, because roulette takes zero skill.)
$800: The chance of getting heads on any given coin flip is 1 in 2, so the chance of getting heads 5 times in a row is 1 in this
One in thirty-two, of course; thirty-two is 25. (Trick question that I might give if I ever find myself teaching basic probability: the odds of getting heads on a given coin flip of a certain unfair coin are 2 to 1 against. What are the odds of getting heads five times in a row? The answer is 242 to 1 against, but I suspect a lot of people would hear "coin" and just start multiplying out the twos.)
$1200: NASA's Spaceguard Survey watches for the "extremely small" probability of these 2 objects coming to smash Earth
Not really a probability question; you really just had to kind of guess your way through this one. There are three reasonable objects -- comets, meteors, and asteroids. The three contestants, in turn, guessed all three possible pairs of them; binomial coefficients in action! (There are plenty of Jeopardy! clues where there are three possible answers, and the best strategy seems to be to wait for the other two people to get the wrong answer. There are also a fairly large number where there are two possible answers; Sweden/Norway and Oxford/Cambridge seem like common pairs of this sort. At least for me.) If you know a bit of astronomy, though, you know that meteors are not that big, and hit the earth all the time in meteor showers.
$1600: Offspring of heterozygous parents have a 50-50 chance of getting a dominant vs. this type of allele
A lot of Jeopardy! clues have a bunch of extraneous verbiage; the question here is really "what's the kind of allele that's not dominant?", which might be surprisingly easy for a $1600 clue. But at this point I feel obliged to mention that genetics is one of the other disciplines where probability and statistics were applied early on, or so a friend of mine tells me; I suppose this is more reputable than gambling and less boring than insurance.
$2000: The probability of the first card dealt being an ace is 4 in 52, so the probability of the second card being an ace is 1 in this number
(There was a video here; it was pretty clear that this was the probability of the second card being an ace, given that the first card also was an ace.) There are three aces left among the fifty-one remaining cards, so it's three in 51, which is one in 17. I was screaming at the TV -- none of the contestants got it -- but then again I do this stuff for a living, and as far as I know none of them do, so I really shouldn't be too hard on them. Incidentally, this is a "baby version" (as one of my the principle at work in card counting in blackjack; if you know the deck is rich in high cards then the dealer is more likely to bust, so you bet more.
Labels:
Jeopardy,
probability,
television
04 October 2007
How we cite results
There seem to be some odd conventions in naming and citing mathematical results.
First, a result isn't necessarily named after the person who first discovered it. The canonical example is that there are about a zillion things which Euler and Gauss came up with which aren't named after them. (And yet the October 2007 issue of the Bulletin of the American Mathematical Society is entirely about Euler; the six full-length articles concern his work on infinite series, algebraic geometry (!), partitions, defining the derivative, compressible fluids, and incompressible fluids. This issue also includes Terence Tao's article What is good mathematics?, which I would recommend more highly if I hadn't read it already (it's been circulating since at least February). Part of this is probably the motivation to give more unique names to results: an example is Goldbach's conjecture, which I talked about in asking who gets credit for quadratic reciprocity, and which Euler actually conjectured.
Second, when more than one person proves a result, their names get smushed together into some new, hyphenated name which is grammatically singular. If Smith and Jones prove a theorem, it becomes known as the Smith-Jones theorem; then when people cite it, they'll say "as Smith-Jones proved" or "Smith-Jones shows, in [1], ..." This is despite the fact that there's no person named Smith-Jones! This of course gets even more confusing when one of the people involved has a hyphenated name; the canonical example is, I think, the Birch and Swinnerton-Dyer conjecture, which is a single conjecture, by two (not three) people, Bryan Birch and Peter Swinnerton-Dyer. My theory was that it's because the people who come up with these results don't seem like real people... until I caught myself thinking about the Stanley-Wilf conjecture yesterday. I have taken classes from both Stanley and Wilf, and I am very much aware that they are distinct people. Yet I still caught myself saying "So what does Stanley-Wilf say about this?" I suppose this is an abbreviated version of "So what does the Stanley-Wilf conjecture say about this?" but it still seems weird to me. Can any linguists weigh in on this?
Third, let's say that the aforementiond Smith gives a talk about eir theorem, which ey proved in joint work with Jones. Ey will write on the board at eir talk something like
abbreviating eir own last name to a single initial. This seems unnecessarily modest to me; shouldn't Smith be proud enough of eir result to cite emself on an equal footing with eir collaborator? Fortunately people don't give joint talks, because then you'd have a talk by, say, Smith and Simpson, and you might see a theorem attributed to "S." (since neither one of them would want to write out their name) and you wouldn't know which one it was.
First, a result isn't necessarily named after the person who first discovered it. The canonical example is that there are about a zillion things which Euler and Gauss came up with which aren't named after them. (And yet the October 2007 issue of the Bulletin of the American Mathematical Society is entirely about Euler; the six full-length articles concern his work on infinite series, algebraic geometry (!), partitions, defining the derivative, compressible fluids, and incompressible fluids. This issue also includes Terence Tao's article What is good mathematics?, which I would recommend more highly if I hadn't read it already (it's been circulating since at least February). Part of this is probably the motivation to give more unique names to results: an example is Goldbach's conjecture, which I talked about in asking who gets credit for quadratic reciprocity, and which Euler actually conjectured.
Second, when more than one person proves a result, their names get smushed together into some new, hyphenated name which is grammatically singular. If Smith and Jones prove a theorem, it becomes known as the Smith-Jones theorem; then when people cite it, they'll say "as Smith-Jones proved" or "Smith-Jones shows, in [1], ..." This is despite the fact that there's no person named Smith-Jones! This of course gets even more confusing when one of the people involved has a hyphenated name; the canonical example is, I think, the Birch and Swinnerton-Dyer conjecture, which is a single conjecture, by two (not three) people, Bryan Birch and Peter Swinnerton-Dyer. My theory was that it's because the people who come up with these results don't seem like real people... until I caught myself thinking about the Stanley-Wilf conjecture yesterday. I have taken classes from both Stanley and Wilf, and I am very much aware that they are distinct people. Yet I still caught myself saying "So what does Stanley-Wilf say about this?" I suppose this is an abbreviated version of "So what does the Stanley-Wilf conjecture say about this?" but it still seems weird to me. Can any linguists weigh in on this?
Third, let's say that the aforementiond Smith gives a talk about eir theorem, which ey proved in joint work with Jones. Ey will write on the board at eir talk something like
Theorem. (Jones-S., 2007.) Let F be a foo. Then...
abbreviating eir own last name to a single initial. This seems unnecessarily modest to me; shouldn't Smith be proud enough of eir result to cite emself on an equal footing with eir collaborator? Fortunately people don't give joint talks, because then you'd have a talk by, say, Smith and Simpson, and you might see a theorem attributed to "S." (since neither one of them would want to write out their name) and you wouldn't know which one it was.
Labels:
language,
mathematical culture
03 October 2007
Home-field advantage and postseason baseball scheduling
Major League Baseball keeps playing around with the way they schedule the various playoff series.
Thing #1: during the season teams play every day. But during the first round of the playoffs, which start in two hours with the Phillies and the Rockies, there are four five-game series. Each of these series includes two off-days, except for the Red Sox and Angels, who have three. Take a look at the schedule.
You'd think that one benefit of this would be that the TV stations televising the series never have to broadcast four games in one day. You'd be right. You'd think another benefit of that would be that teams could have their games scheduled at times that are not ridiculously inconvenient for their fans. You'd be wrong, at least if you're a Phillies fan; the five games in the first-round series are scheduled for Wednesday at 3, Thursday at 3, Saturday at 9:30, Sunday at 10, and Tuesday at 6:30. (The first two are weekdays.) The Cubs-Diamondbacks schedule isn't much better. The Yankees-Indians and Red Sox-Angels series get all the good time slots.
Anyway, a big deal is made of having home-field advantage in these series. (At least, people in places like Boston make a big deal out of it, because the Red Sox knew they had a playoff spot a week before the end of the season. The National League was a bit crazier this year; the Diamondbacks and Cubs weren't assured of playoff spots until late Friday night, the Phillies when the final pitch was thrown on Sunday, and the Rockies late Monday night.) Currently, in a five-game series the team with home-field advantage (which is the team with the better regular-season record, except it can't be the wild card) plays games 1, 2, and 5 at home, and games 3 and 4 on the road; this is referred to as a "2-2-1" format. In the past, a "2-3" format was used -- for example in the 1984 NLCS -- although I can't tell which team was considered to have home-field advantage. (edit: this article indicates that it was the team which started on the road and had three games at home. At that time, the divisions alternated home-field advantage from year to year, instead of basing the seeding on regular-season record, which makes life a lot easier on the people selling the tickets!)
So I ask the question: say that a team -- let's call them the Phillies -- has a probability p of winning a game played at a neutral site against their opponents, the Qockies. (The Qockies are related to the Qankees, who I wrote about here and here.) The probability of the Qockies winning a game is, of course, 1-p; we'll call this q.
Next, we'll say that the probability of the Phillies winning a game at home is p + ε, and on the road p - ε, for some constant ε. (Incidentally, ε is about 0.04 for most teams -- that is, home teams win about 54% of their games.) Then the probability of the Phillies winning a series in a 2-2-1 format, where they start at home, can be found by summing the probabilities of all the different ways in which they can win. For example, the probability that the Phillies win in three games is (p+ε)2(p-ε). The probability that they win game 1, lose game 2, win game 3, lose game 4, and win game 5 is (p+ε)(q-ε)(p-ε)(q+ε)(p+ε). We get ten terms in p, q, and ε which correspond to the various ways in which a team can win and lose games in a five-game series and still win the series. These add up to
6p5 + (-15+6ε)p4 + (10 - 12ε) p3 + O(ε2)
and so we conclude that a home-field of ε in each game confers an advantage in the series of
6εp4 - 12εp3 + 6εp2
or 6ε(pq)2. If p = q = 1/2 this means that home-field advantage in a series is worth something like 3/8 of what it's worth in a single game; a team that has a 0.540 winning percentage will win a series 0.515 of the time. (The actual number there is 0.5150646144.)
What about in the 2-3 format? A similar polynomial can be computed; it turns out to be exactly the same polynomial, if you assume that the team with home-field advantage starts on the road (and therefore gets three home games). So fooling around with the order of the games doesn't change the impact of the home-field advantage. Yet one might say that the team that starts on the road actually doesn't have the advantage if the series only plays three games -- they have to play two on the road!
This seems surprising, but it's not. Think of a playoff series as an experiment to figure out which team is the better team. In order to do the experiment, you play five games, and whichever team wins more games is declared the better team. If the games are independent, then the order in which the games are played shouldn't affect the probability of a given team winning the series. In reality, this is exactly what is done -- except that we stop playing when one team wins three games, meaning that no matter how the rest of the series turns out, they will end up winning more games.
I'm not going to weigh in on which format is preferable, because that's not a mathematical question. But since either format, 2-2-1 or 2-3, is mathematically equivalent, one probably wants to go with the format that appears most fair, which is probably 2-2-1 -- in a series with an odd number of games, the team with home-field advantage will have one more home game, and in a four-game series each team has two. Furthermore, the team with home-field advantage opens the series at home, which seems fair.
But then how does one handle a seven-game series? The current format is 2-3-2, and you could argue that the team that opens at home -- and has four scheduled home games out of seven -- is hurt in a five-game series. (The same argument as to why they're not really hurt applies.) It wouldn't surprise me to see MLB go to a 2-2-1-1-1 format and introduce even more off days into the postseason -- did you know that Game 7 of the World Series this year is scheduled for November 1? Before you know it they'll be playing baseball on Thanksgiving.
(For the record, I credit the Phillies being in the playoffs to the colors of my blog. My blog's original layout was blue and orange, which are the Mets' colors although I didn't realize it at the time. Now it's red and white. As a result, Mets fans look like this guy.)
Thing #1: during the season teams play every day. But during the first round of the playoffs, which start in two hours with the Phillies and the Rockies, there are four five-game series. Each of these series includes two off-days, except for the Red Sox and Angels, who have three. Take a look at the schedule.
You'd think that one benefit of this would be that the TV stations televising the series never have to broadcast four games in one day. You'd be right. You'd think another benefit of that would be that teams could have their games scheduled at times that are not ridiculously inconvenient for their fans. You'd be wrong, at least if you're a Phillies fan; the five games in the first-round series are scheduled for Wednesday at 3, Thursday at 3, Saturday at 9:30, Sunday at 10, and Tuesday at 6:30. (The first two are weekdays.) The Cubs-Diamondbacks schedule isn't much better. The Yankees-Indians and Red Sox-Angels series get all the good time slots.
Anyway, a big deal is made of having home-field advantage in these series. (At least, people in places like Boston make a big deal out of it, because the Red Sox knew they had a playoff spot a week before the end of the season. The National League was a bit crazier this year; the Diamondbacks and Cubs weren't assured of playoff spots until late Friday night, the Phillies when the final pitch was thrown on Sunday, and the Rockies late Monday night.) Currently, in a five-game series the team with home-field advantage (which is the team with the better regular-season record, except it can't be the wild card) plays games 1, 2, and 5 at home, and games 3 and 4 on the road; this is referred to as a "2-2-1" format. In the past, a "2-3" format was used -- for example in the 1984 NLCS -- although I can't tell which team was considered to have home-field advantage. (edit: this article indicates that it was the team which started on the road and had three games at home. At that time, the divisions alternated home-field advantage from year to year, instead of basing the seeding on regular-season record, which makes life a lot easier on the people selling the tickets!)
So I ask the question: say that a team -- let's call them the Phillies -- has a probability p of winning a game played at a neutral site against their opponents, the Qockies. (The Qockies are related to the Qankees, who I wrote about here and here.) The probability of the Qockies winning a game is, of course, 1-p; we'll call this q.
Next, we'll say that the probability of the Phillies winning a game at home is p + ε, and on the road p - ε, for some constant ε. (Incidentally, ε is about 0.04 for most teams -- that is, home teams win about 54% of their games.) Then the probability of the Phillies winning a series in a 2-2-1 format, where they start at home, can be found by summing the probabilities of all the different ways in which they can win. For example, the probability that the Phillies win in three games is (p+ε)2(p-ε). The probability that they win game 1, lose game 2, win game 3, lose game 4, and win game 5 is (p+ε)(q-ε)(p-ε)(q+ε)(p+ε). We get ten terms in p, q, and ε which correspond to the various ways in which a team can win and lose games in a five-game series and still win the series. These add up to
6p5 + (-15+6ε)p4 + (10 - 12ε) p3 + O(ε2)
and so we conclude that a home-field of ε in each game confers an advantage in the series of
6εp4 - 12εp3 + 6εp2
or 6ε(pq)2. If p = q = 1/2 this means that home-field advantage in a series is worth something like 3/8 of what it's worth in a single game; a team that has a 0.540 winning percentage will win a series 0.515 of the time. (The actual number there is 0.5150646144.)
What about in the 2-3 format? A similar polynomial can be computed; it turns out to be exactly the same polynomial, if you assume that the team with home-field advantage starts on the road (and therefore gets three home games). So fooling around with the order of the games doesn't change the impact of the home-field advantage. Yet one might say that the team that starts on the road actually doesn't have the advantage if the series only plays three games -- they have to play two on the road!
This seems surprising, but it's not. Think of a playoff series as an experiment to figure out which team is the better team. In order to do the experiment, you play five games, and whichever team wins more games is declared the better team. If the games are independent, then the order in which the games are played shouldn't affect the probability of a given team winning the series. In reality, this is exactly what is done -- except that we stop playing when one team wins three games, meaning that no matter how the rest of the series turns out, they will end up winning more games.
I'm not going to weigh in on which format is preferable, because that's not a mathematical question. But since either format, 2-2-1 or 2-3, is mathematically equivalent, one probably wants to go with the format that appears most fair, which is probably 2-2-1 -- in a series with an odd number of games, the team with home-field advantage will have one more home game, and in a four-game series each team has two. Furthermore, the team with home-field advantage opens the series at home, which seems fair.
But then how does one handle a seven-game series? The current format is 2-3-2, and you could argue that the team that opens at home -- and has four scheduled home games out of seven -- is hurt in a five-game series. (The same argument as to why they're not really hurt applies.) It wouldn't surprise me to see MLB go to a 2-2-1-1-1 format and introduce even more off days into the postseason -- did you know that Game 7 of the World Series this year is scheduled for November 1? Before you know it they'll be playing baseball on Thanksgiving.
(For the record, I credit the Phillies being in the playoffs to the colors of my blog. My blog's original layout was blue and orange, which are the Mets' colors although I didn't realize it at the time. Now it's red and white. As a result, Mets fans look like this guy.)
02 October 2007
Links for 2007-10-02
An explanation of the Excel 65,535 = 100,000 bug, from Good Math, Bad Math.
Timothy Gowers on the importance of mathematics, found from The Trans-Siberian Handbook. (There's a video version, too, which I haven't watched, because I can read faster than I can watch video; a link is in the TSH post.)
At The Dilbert Blog, Scott Adams writes:
Somewhat jokingly, I replied:
This is true to some extent. I also find that it has the reverse effect -- it makes people I want to talk to at parties come to me, because for some reason the people I find most interesting are the ones who will respond to "I'm a mathematician" with "I always found math really interesting but I'm not any good at it." These people are usually interesting to talk to, perhaps because they haven't had the fun beat out of them by a technical education.
Here's Thomas Maarup's masters' thesis on the game of Hex -- the history of the game, the classic proof of the existence of a winning strategy, the complexity, and some tips on how to actually play the game. (The proof that the winning strategy exists is nonconstructive, and exhaustive search would take too long for practical purposes.)
Some notes on the calculation of the gamma function.
Graph Visualization is Difficult, but is it Useful? from Data Mining: Text Mining, Visualization and Social Media
And the game of "nuclear pennies". It turns out that the strategy to this game is related to the isomorphism of 7-tuples of trees with individual trees, which I've written about here before (but I can't find that post). The existence of such an isomorphism is hinted at by the fact that if you take a generating function for trees, f(z) = (1-(1-4z)1/2)/(2z), you have f(1) = ω where ω is a primitive sixth root of unity; thus ω7 = ω, and it seems just inside the realm of possibility that 7-tuples of trees and trees might count the same thing. From this I learned about the paper of Marcelo Fiore and Tom Leinster, Objects of Categories as Complex Numbers, which says basically that this idea can be made rigorous.
Timothy Gowers on the importance of mathematics, found from The Trans-Siberian Handbook. (There's a video version, too, which I haven't watched, because I can read faster than I can watch video; a link is in the TSH post.)
At The Dilbert Blog, Scott Adams writes:
My reason for majoring in economics in college was to understand how the world works, so I would be more equipped to navigate in it. I think it was a good choice. Has your college major given you any mild super powers?
Somewhat jokingly, I replied:
I majored in mathematics and am a PhD student in mathematics now. This gives me the super power of making people I don't want to talk to at parties go away, because when they hear I'm a mathematician they generally assume we have nothing in common.
This is true to some extent. I also find that it has the reverse effect -- it makes people I want to talk to at parties come to me, because for some reason the people I find most interesting are the ones who will respond to "I'm a mathematician" with "I always found math really interesting but I'm not any good at it." These people are usually interesting to talk to, perhaps because they haven't had the fun beat out of them by a technical education.
Here's Thomas Maarup's masters' thesis on the game of Hex -- the history of the game, the classic proof of the existence of a winning strategy, the complexity, and some tips on how to actually play the game. (The proof that the winning strategy exists is nonconstructive, and exhaustive search would take too long for practical purposes.)
Some notes on the calculation of the gamma function.
Graph Visualization is Difficult, but is it Useful? from Data Mining: Text Mining, Visualization and Social Media
And the game of "nuclear pennies". It turns out that the strategy to this game is related to the isomorphism of 7-tuples of trees with individual trees, which I've written about here before (but I can't find that post). The existence of such an isomorphism is hinted at by the fact that if you take a generating function for trees, f(z) = (1-(1-4z)1/2)/(2z), you have f(1) = ω where ω is a primitive sixth root of unity; thus ω7 = ω, and it seems just inside the realm of possibility that 7-tuples of trees and trees might count the same thing. From this I learned about the paper of Marcelo Fiore and Tom Leinster, Objects of Categories as Complex Numbers, which says basically that this idea can be made rigorous.
01 October 2007
Martingale road
I recieved in the mail at school today a brochure about being a "Chartered Enterprise Risk Analyst" from the Society of Actuaries.
The Society of Actuaries has a return address on Martingale Road, in Schaumburg, Illinois.
I suspect the name of this road is not a coincidence.
The martingale was originally an excessively silly betting system. Imagine you're betting on the flipping of a fair coin, and you will receive whatever you bet if the coin turns up heads but must pay the same amount if the coin comes up tails. Then you can guarantee that you win money by betting $1 on the first flip, $2 on the second flip, $4 on the third flip, and so on; once you win you quit the game. You are guaranteed to win exactly $1. Unfortunately this guarantee only holds if you begin with an infinite amount of capital. The probability-theoretic version of the martingale (see Wikipedia is inspired by this; of course, a lot of probability-theoretic notions are inspired by gambling.
The Society of Actuaries has a return address on Martingale Road, in Schaumburg, Illinois.
I suspect the name of this road is not a coincidence.
The martingale was originally an excessively silly betting system. Imagine you're betting on the flipping of a fair coin, and you will receive whatever you bet if the coin turns up heads but must pay the same amount if the coin comes up tails. Then you can guarantee that you win money by betting $1 on the first flip, $2 on the second flip, $4 on the third flip, and so on; once you win you quit the game. You are guaranteed to win exactly $1. Unfortunately this guarantee only holds if you begin with an infinite amount of capital. The probability-theoretic version of the martingale (see Wikipedia is inspired by this; of course, a lot of probability-theoretic notions are inspired by gambling.
Subscribe to:
Posts (Atom)