30 May 2008

Shoup: A computational introduction to number theory and algebra

A Computational Introduction to Number Theory and Algebra, online book, by Victor Shoup. (There's also a print edition, but the online PDF-book will remain freely available.) It is what it sounds like; "computational" doesn't mean "non-mathematical" but rather means that a lot of the applications are chosen with regard to their usability in computer science, specifically cryptography.

From reddit, where various commenters have pointed out things like "the author says that book is elementary but it really isn't." Of course, this is fairly common. The author actually says
The mathematical prerequisites are minimal: no particular mathematical concepts beyond what is taught in a typical undergraduate calculus sequence are assumed.
The computer science prerequisites are also quite minimal: it is assumed that the reader is proficient in programming, and has had some exposure to the analysis of algorithms, essentially at the level of an undergraduate course on algorithms and data structures.

As usual, "elementary" means something like "a smart, well-trained undergrad could read and understand it"; it's a term of art like anything else. But in some provinces of the Internet there seems to be an idea that everything should be immediately understandable to all readers at first glance. (I use Reddit for the links it gives me to interesting news, not for the comments.)

"Square roots" of probability distributions

Think about probability distributions supported on the positive integers. Not all of them have a "square root" -- that is, given a random variable X supported on the positive integer, there do not exist independent, identically distributed variables Y1, Y2 such that X has the same distribution as Y1 + Y2.

You might be wondering why it's natural to refer to this as a "square root". Well, let pk be the probability of the event X = k. Then the probability generating function for X is
f(z) = \sum_{k \ge 0} p_k z^k.
Similarly, let qj be the probability of the event Y = j, and define the probability generating function
g(z) = \sum_{j \ge 0} q_j z^j.
Let Y, Y1, Y2 be equidistributed. Then X is equidistributed with Y1 + Y2 if and only if f(z) = g(z)2, since addition of independent random variables corresponds to convolution of distributions and multiplication of their generating functions.

Conversely, the random variable X has a square root in this sense if and only if its generating function f(z) has a square root which has a Taylor series at z = 0 with all coefficients positive.

The simplest case is when X has finite support. Then X has a square root if and only if its generating function f(z) is the square of a polynomial. For example, the random variable which takes values 0, 1, 2 with probabilities 1/4, 1/2, 1/4 has a square root; its generating function is
f(z) = {1 \over 4} + {1 \over 2} z + {1 \over 4} z^2 = \left( {1 \over 2} + {1 \over 2} z \right).

But the random variable taking values 0, 1, 2 with probability 1/3 each is not, since 1/3 + z/3 + z2/3 is not a square.

But what about distributions with infinite support? Some of these are easy -- the square root of the Poisson distribution with mean λ is Poisson with mean λ/2. This can easily be seen since the probability generating function of a Poisson(λ) random variable is
\sum_{k=0}^\infty {e^{-\lambda} \lambda^k \over k!} z^k = \exp(\lambda(z-1)).
(In fact, the Poisson distribution is infinitely divisible; in the nomenclature used in this post one might say it has roots of all orders.)

Now consider a random variable X, with geometric distribution with parameter 1/2 supported on {0, 1, 2, 3, ...}; this has P(X = k) = 2-(k+1). This is a special case of the negative binomial distribution, which is infinitely divisible. We have f(z) = 1/2 + z/22 + z2/23 + ... = 1/(2-z). So the square root distribution has generating function

{1 \over \sqrt{2-z}} = \sqrt{2} \left( {1 \over 2} + {1 \over 8} z + {3 \over 64} z^2 + {5 \over 256} z^3 + {35 \over 4096} z^4 + \cdots \right)


and in general the coefficient of zn is, by the binomial theorem,

\sqrt{2} {(-1)^n \over 2^{1+n}} {-1/2 \choose n} = \sqrt{2} {(-1)^n \over 2^{1+n}} (-1)^n {2n-1 \choose n-1} {1 \over 2^{2n-1}} = {1 \over 2^{3n-1/2}} {2n-1 \choose n-1}.


That binomial coefficient is ${1 \over 2} {2n \choose n}$; we have ${2n \choose n} \sim 4^n/\sqrt{\pi n}$, so the coefficient of zn in our probability generating function, which we'll call qk, is asymptotic to

{2^{2n-1} \over {\sqrt{\pi n}} } {1 \over 2^{3n-1/2}} = {1 \over 2^n \sqrt{2 \pi n}}


In particular, the "square root" distribution decays just a bit faster than the distribution that it's the square root of, the main difference being the additional factor of n-1/2. This is reasonable, if you think about the process of convolution. We have

pn = q0 qn + q1 qn-1 + q2 qn-2 + ... + qn-1 q1 + qn q0

and each of the n terms is roughly 1/(n2n). This is just another negative binomial distribution. (I actually didn't realize that until I started writing this post; the post was originally titled "what's the name of this distribution?" and then I did some research.)

29 May 2008

Video abstracts?

From John Baez at the n-Category Cafe: the Journal of Number Theory is now inviting authors to post their abstracts on YouTube.

Something about putting them on YouTube -- as opposed to on the journal's website -- strikes me as saying that they're not "really" part of the paper.

More interestingly, though, the few "video abstracts" there are just people reading their abstracts. I think it's a good idea, but mathematical speech is just not mathematical writing read out loud. (This is, of course, the converse of the fact that mathematical writing is not just mathematical speech transcribed.) Perhaps there's potential here, though. The first few minutes of a good talk would work as a good "video abstract", I think.

But you've got to start somewhere.

28 May 2008

Stochastic generation of ideas

I'm reading about continuous time Markov processes, from Adventures in Stochastic Processes: The Random World of Happy Harry by Sidney Resnick. This book has some of the most amusing problems I've come across in a while; many of the problems involve a character known as "Happy Harry", who owns a greasy spoon near some university that occasionally reveals itself to be Cornell. This post does not involve one of those problems.

A discrete-time Markov chain is what people usually mean when they say "Markov chain". This is a sequence of elements X0, X1, X2, ... selected from some state space S = (s1, s2, ...), where

P(Xn = sj | Xn-1 = si) = pij

(the pij are called transition probabilities) and Xn does not depend on the history before Xn-1. Now, from any discrete-time Markov chain it's possible to construct a continuous-time Markov chain. We simply assign a parameter λ(j) to each state sj. When the discrete-time Markov chain finds itself in state j, we wait for a time which is exponentially distributed with parameter λ(j) (and therefore mean 1/λ(j)); after this time passes we proceed into another state according to the transition probabilities.

There's a problem, though, as Resnick explains in Section 5.2 -- it can happen that the chain "blows up", i. e. it makes infinitely many transitions in finite time.

The "pure birth" process has state space {1, 2, 3, ...}; the transition probabilities are pn,n+1 = 1 with all others zero; the parameters are λ(n) = λn. That is, the process waits in state n for a time which is exponentially distributed with mean 1/(λn) and then advances to state n+1. Not surprisingly, the population grows exponentially fast -- if the population is at n currently, it grows at rate λn.

But if you let λ(n) = λn2, then the process "blows up" almost surely. This isn't surprising from the point of view of a differential equation -- if the population is n, it grows at rate λn2. If we ignore the stochastic elements and model this as an ordinary differential equation, the population P satisfies dP/dt = λP2, and solutions to this have vertical asymptotes. Something similar (I don't pretend to know all the details) happens in the stochastic case.

I then thought, when would you get a population that grows like this? Consider not a population of living beings, but a population of ideas. And let us assume that new ideas come into being when two old ideas combine. Then in some idealized world where all pairs of ideas are constantly interacting, one might expect that the rate at which new ideas exist is proportional to the number of pairs of ideas. Of course, this is a silly model, because ideas don't interact all by themselves, but rather in people's brains -- and the population of people grows just exponentially. Still, this feels like it could explain why certain ideas seem to "come out of nowhere" -- at first in some area of intellectual inquiry there isn't much there because new ideas require new insights, but at some point combining old insights in new ways becomes a viable way to come up with new ideas.

(Of course, checking this against reality would be quite difficult. For one thing, how do you count ideas?)

27 May 2008

Abuse of averages

At Freakonomics, they're talking about an advertisement that says that the average termite eats 24 hours a day. (This is an ad for a pest control company.)

Of course, this isn't possible!

Nearly every termite is actually below average; I don't know much about termites but at some point they slack off.

However, the average termite colony might be eating close to 24 hours a day, in that at least one of the termites in your house may be eating at any given moment. And I think this is the point the ad was trying to make -- termites are constantly destroying your house. (Even this might not be true. Do termites sleep at night?)

24 May 2008

I Will Derive

"I Will Derive", a parody of "I Will Survive".



(via reddit.)

Now, the original song is clearly about female empowerment, and there is somebody who could be described as a "jerk" in the song. (Here are the lyrics, in case you've been living in a cave, or you're not from the US -- I'm not sure how well-known this song is in other countries.) I think this parody would be improved if it mentioned jerk, the third derivative of position.

22 May 2008

Phillies radio silliness

There's a commercial which airs on Phillies radio broadcasts. Since I dislike Comcast, but am addicted to the Internet, I'm willing to pay them for high-speed Internet access but not for cable TV. This means I listen to Phillies games on the radio.

Anyway, there's a commercial for Citizens Bank (who own the naming rights to Citizens Bank Park, where the Phillies play) that regularly airs during these broadcasts. It goes something like this (I'm paraphrasing, but the "math" is correct):
Did you ever notice how important the number seven is to the Phillies? They've won seven National League East division titles. There are seven letters in Ashburn, Schmidt, and Carlton. Thirty-two Hall of Famers played for the Phillies. Divide by the number of runs scored in a grand slam and, yes, you get eight. Subtract the number on Whitey's back and what do you get. Seven!

The voice then proceeds to tell us how Citizens Bank is open seven days a week. Which I suppose would be nice to know if I didn't do basically all my banking at ATMs anyway.

But I think the commercial is intended to make fun of people who jump through hoops like this to make the numbers work out. And seven is a small number and therefore easy to manufacture by the manipulation of other small numbers, or so I noted on July 7, 2007.

20 May 2008

Large Rubik's cubes and asymptotic thoughts

A video of the solution of a Rubik's cube of side 100, from YouTube.

I don't know the details of the algorithm, but it has the interesting property that at first it looks like nothing's happening, and then larger and larger blocks of each color seem to form -- I'm not sure if this is because of the low resolution or if this really is some sort of "phase transition" in the algorithm. This is the sort of thing one just doesn't see in the normal-sized cube.

I came to this from a solution of the size-20 cube, which works in a much different way; each of the faces of the cube gets "filled in" in turn. So in the second case the algorithm appears to be making steady progress; in the first case it doesn't. There are essentially two different families of algorithm.

It would be interesting to know how the solution time (measured in number of moves) of these families of algorithms -- and I'm calling them "families" because I suspect that the actual side length of the cube isn't all that important -- scales with time. Also, how do these algorithms compare with the asymptotically fastest one? One could compute a lower bound on the number of moves needed to solve a Rubik's cube of arbitrary size by just computing the number of possible positions (analogously to the calculation of that number for the normal cube I outlined here) and comparing that to the number of possible moves one can make from a given position. An absolutely useless question -- is that lower bound (I worked it out once, but I forget what it is, and it was the sort of the back-of-the-envelope work that might have been wrong anyway) asymptotically tight? That is, does there exist a solution procedure for the Rubik's cube for which the number of moves needed to solve the size-n cube is within a constant factor of this obvious lower bound?

As for what that lower bound is, I may write that up as another post; I'd either need to find the notes I made on it a few months ago or (more likely) rederive them.

19 May 2008

It's a tax on stupidity -- but nobody's that stupid.

You win $1,500 by feeding slot only $3.75 million -- Paul Carpenter, from the (Allentown, Pennsylvania) Morning Call, via fark.com.

From that headline, it sounds like slot machines essentially never pay out, right?

Carpenter becomes suspicious that the machines are rigged because Flo Fabrizio, a "key backer" of legalized gambling in Pennsylvania, won $1500 on his first play of the machine. Now, I can understand this suspicion -- sure, it could happen randomly. But then Carpenter goes on to do some "math" which proves he really doesn't understand what he's talking about. The article basically debunks itself.

It states that slot machines in Pennsylvania have to pay back at least 85 percent of what they take in. So then Carpenter says:
If it takes a regular player 50 million three-second plays to get a big jackpot, it would take more than 11 years, playing 10 hours a day, every day. More important, if each play averages 50 cents, the player would have to put $25 million into the slot. The 85 percent payback would be $21,250,000. So, odds are, the cost of playing long enough to get a $1,500 jackpot would be $3,750,000.
That's true. (The "50 million" comes from the regulations that say that casinos can set up their slots so that the jackpot only pays one time out of every 50 million.) But somehow I doubt a $1500 jackpot would be that rare. And more importantly, while the player is making those fifty million plays, they'll make a lot of small bets back. This is like saying that poker's a losing game because you only get a royal flush once every several hundred thousand hands. Or saying that you must not be broke because you didn't buy a million-dollar house you couldn't afford, but ignoring all the little expenses that add up. (Try telling your credit card company that!) You can't cherry-pick the tail of the distribution like that!

Could there be corruption in the gambling industry? Sure. But this doesn't prove anything. Not to mention that these calculations seem to imply that slot machines basically never pay out. Casinos aren't stupid -- they know that if slots essentially never paid out then nobody would play! The whole trick is that if you have enough small gains, you won't realize that on average you're losing.

On the other hand, they say that gambling is a tax on stupidity. At least this man's stupidity will keep him out of the casinos.

18 May 2008

Optimal choice of partners

Optimizing your wife. From MathPages, via reddit. (This is apparently also known as the secretary problem or the sultan's dowry problem.)

The following is a semi-standard bit of mathematical folklore. Say you expect to meet N possible suitors in your life. You want to maximize your chances of picking the best one, given that you expect to meet these people in a random order, and once you say no to one of them you can never see them again, and that you can say for any two of them either "X is better than Y" or "Y is better than X". (There are no ties, and everybody's comparable.)

My readers who are not mathematicians are probably thinking that these are ridiculous assumptions. This is true. In particular, the first one seems silly to me -- we don't meet people at random, but through other people we know, so I suspect that the longer one spends living the more one tends to find people one likes. This certainly is true in my experience when seeking out friends -- people who I like spending time with tend to know other people I'd like spending time with. The inability to go back to previous suitors seems to be not a horrible approximation. The fact that everyone is rankable seems at first like a bad assumption -- but we manage to do it for economic goods all the time. For example, houses have prices which are scalars, despite the fact that there are multiple dimensions along which houses can be compared. The fourth assumption, that you know how many potential suitors there are, seems the most wrong to me. But combined with the assumption that you meet suitors at a constant rate, this converts strategies which talk about numbers of suitors to strategies which talk about time.

My readers who are mathematicians probably didn't read the previous paragraph.

Anyway, the optimal strategy is to reject the first N/e of the people you meet, and then pick the first person you meet who's better than everyone you'd met previously. The probability of this strategy succeeding -- in that you pick the very best person -- is close to 1/e when N is large.

I'm pretty sure I first heard this around fifteen years ago, and have probably heard it at least once a year since then -- but I didn't know a proof until now. (Strictly speaking, there's some hand-waving in the argument given there, as there always is whenever someone says "X is approximately equal to Y" without giving any sort of bound on X-Y or X/Y -- but I don't plan to use this to prove anything else, so I'm convinced.)

Malthus is overrated

Malthus, the false prophet, from The Economist, and Costs of Living, a review of Common Wealth: Economics for a Crowded Planet by Jeffrey Sachs.


Sachs' book is about how to avert global economic catastrophe, which is obviously something worth worrying about. Basically, he seems to be saying that global cooperation is necessary, because "we're all in this together". He also argues, it seems, that population growth is bad; I found the article from Greg Mankiw's blog. Mankiw quotes the end of the review:
In an age when we don’t need to have lots of children to work the fields, or to compensate for high infant mortality, Sachs argues that it’s both economically rational — and crucial for a future of sustainable growth — for people to reproduce at a rate close to 2.1 children per family. In his acknowledgments, Sachs thanks his three children.


But Mankiw pointed out in 1998 that it's not necessarily true that population growth is bad. It seems like a lot of people like to quote Thomas Malthus on this, who said that population grows exponentially with time (true) but that food production ability only grows linearly. As far as I know Malthus had no reasonable argument for this; people talk about the Malthusian catastrophe. But if you actually look at the relevant section in An Essay on the Principle of Population, in chapter 1 Malthus just postulates these growth rates. In Chapter 2 he offers as justification for this basically that there's not enough room in England for the farms to feed an exponentially growing population. But there's not room for an exponentially growing population itself either!

It would be one thing if, say, farms fell from the sky at a constant rate. But farms are made by people; thus if people grow exponentially so will farms.

Now, I'm not saying that we aren't running out of room. It's obvious that some land is better for farming than other land, and people will tend to farm the better land first; as time goes on we will be obliged to farm more and more of the inferior land, therefore decreasing the amount of food grown per person.

But Malthus was writing in 1798. He assumes that the population is producing just enough food for itself in his time, and then goes on to say:
In two centuries and a quarter, the population would be to the means of subsistence as 512 to 10.
Two centuries and a quarter is 2023; roughly speaking, now. We clearly have much more than two percent of the food we need.

Of course, it's possible -- as is pointed out in every introductory computer science class -- that polynomial growth can outstrip exponential growth over some short time, and we're not in the limiting regime yet. But I don't think anybody is seriously saying this. And anyway, Malthus made the stronger claim that food production is growing linearly.

(Does Malthus get more sophisticated than this? I'm just cherry-picking, but skimming his work it seems to continue in roughly the same vein.)

Now, this is an obvious criticism -- but somehow it rarely gets pointed out. Mankiw pointed it out, though -- sure, people use resources. But they also create resources. People will eat food. But they will also figure out how to grow more food. That is, if we don't just disintegrate into a society of virtual reality addicts first.

edited, 11:08 am: Also from the NYT, Deaths are outpacing births in the Pittsburgh metropolitan area. I'm not sure how meaningful this is on a national level, because people move around.

17 May 2008

A little number theory problem

Here's an interesting little problem, taken from Chapter Zero which I recently discovered.

It was once asked on the Math GRE -- what's the greatest common divisor of p4-1 over all primes p > 5? It's an odd way to formulate the question, but it's simple enough.

First, remember the possibly familiar fact that for primes p > 3, p2-1 is a multiple of 24. This is simple enough -- we can write p2-1 = (p-1)(p+1). Since p is odd, both of these factors are divisible by 2, and one of them is divisible by 4. So p2 - 1 is a multiple of 8. (This is true for all odd numbers, not just primes.) Furthermore, since p isn't a multiple of 3, either p-1 or p+1 is. So p2 - 1 is divisible by 8 and by 3, therefore by 24.

This suggests an approach to the analogous problem for fourth powers, which is basically taken from the original post. A bit of numerical experimentation gives 74-1 = 2400, 114 - 14640, 134 - 1 = 28560, ... -- all of these are divisible by 240. So you start to suspect that 240 might be the answer. And it turns out that it is; we can write

p4 - 1 = (p-1)(p+1)(p2+1).

Now, what power of 2 appears in the prime factorization of p4-1? Each of the three factors on the right-hand side is even; furthermore one of p-1 or p+1 must be divisible by 4, so there are at least four occurrences of 2.

Similarly, either p-1 or p+1 is divisible by 3.

And p4 - 1 is divisible by 5 by Fermat's little theorem. (A brute-force analysis is also possible -- if p is congruent to 1 mod 5 then p-1 is divisible by 5, if p is congruent to 4 mod 5 then p+1 is divisible by 5, and is p is congruent to ± 2 mod 5 then p2 + 1 is divisible by 5.)

So p4 - 1 is divisible by 24 31 51, which is 240, for all primes p > 5. We can easily see that, say, gcd(114-1, 134 - 1) = 240. So 240 is the answer.

The original post which I found this from asked two questions:

  • can this be done in a less ad hoc manner?

  • are there generalizations?


The method generalizes pretty easily to other powers; by a similar analysis, the gcd of p6 - 1 over primes p > 7 is 504. For eighth powers, 480. For tenth powers, 264. (In all cases, when taking nth powers we require p > n+1.)

There's no pattern obvious in these numbers, though. But they turn out to be related to the Bernoulli numbers. If you know more about this sort of thing that I do you might try to prove (or disprove) that:
Conjecture. gcd(pa - 1), taken over primes p > a+1, is the denominator of Ba/(2a), where Ba are the Bernoulli numbers.
(I have not made any attempt at all to determine if this is known. Feel free to point out if it is!)

16 May 2008

Correlation coefficients and the popularity gap

The Popularity Gap (Sarah Kliff, Newsweek, May 15 issue).

Apparently, the people who end up being successful later in life are the ones who think people like them in middle school, not necessarily the ones who are actually well-liked in middle school. This reports on a study by Kathleen Boykin McElhaney, who is not particularly important to what I'm going to say, because I'm going to comment on something that I assume was introduced by the folks at Newsweek.

The Newsweek article continues:
One of McElhaney's most interesting findings is that self-perceived and peer-perceived popularity don't line up too well; most of the well-liked kids do not perceive themselves as well liked and visa versa. The correlation between self-perceived and peer-ranked popularity was .25, meaning about a quarter of the kids who were popular according to their classmates also thought they were popular. For the other three quarters, there was a disconnect between how the teen saw themselves and what their peers thought.
I can't read the original journal article (the electronic version doesn't become available for a year after publication, and I'm not going to campus in the rain and looking around an unfamiliar library just to track this down!) but the Newsweek article says enough to make it clear that the study wasn't using a two-point "popular/unpopular" scale. I'm inclined to think that the "correlation" here is what's usually referred to as the "correlation coefficient" -- and this is usually explained in popular media by saying that "one-fourth of the variation in how popular students believed they were was due to how popular they actually were" or some such similar phrase. I'm not a statistician, so I won't try to explain why that phrase might be wrong; if you are, please feel free to weigh in!

But let's assume that half of students are actually popular, and half of students think they're popular. (This might be a big assumption; recall the apocryphal claim that 75 percent of students at [insert elite college here] come in thinking they'll be in the top 25 percent of their class.) Then if only 25 percent of the students who are actually popular think they're popular, there's actually a negative correlation between actual popularity and perceived popularity! More formally, let X be a random variable which is 0 if someone's not (objectively) popular and 1 if they are; let Y play the same role for their self-assessed popularity. Then E(XY) is the probability that a randomly chosen student both is popular and thinks they are, which is 1/8 in this case; E(X) E(Y) = 1/4, which is larger.

Then again, if there actually were a negative correlation -- if people were so bad at self-assessment as to be worse than useless at it -- then that would be quite interesting. As it is, there seems to be in general a weak positive correlation between how P someone is (where P is some desirable trait, say popularity in this case) and how P they think they are.

And the fact that I bothered to write this post probably will lead you to guess -- correctly -- that I wasn't all that popular in high school.

15 May 2008

Well-intentioned money advice

Suze Orman, "internationally acclaimed personal finance expert" (actually the title of her web page!),, said yesterday on myphl17 News At Ten something like: "you are not to spend your economic stimulus check. you must save it." (Don't ask me why I ever watch this newscast. It consists of recycled press releases, news that someone got shot and somebody's house burned down, and sports scores. The only score I care about is the Phillies and I usually know how that turned out anyway.)

Anyway, Orman's advice seemed to be based on the idea that because the economy as an aggregate is doing poorly, we all must be suffering. There are surely some people who have had a very good year and don't need the six hundred bucks. And there are surely some people who have had a very bad year and for whom six hundred bucks is just a drop in the bucket.

I'll call this the "distributional fallacy" (does it have another name) -- assuming that any individual must be representative of some sample from which they're drawn. Not a horrible assumption in the absence of other information -- but I know more about my financial situation than someone appearing on my television!

But "if times have been bad and you don't have money saved up, you should save the money -- and maybe you should save it even if things have been good for you, because they might turn bad" doesn't have the same ring to it.

I'm not arguing that people shouldn't save their money, because life has a way of causing people trouble. But to assume that everybody is going through hard times is kind of short-sighted. Then again, if you tell people "some people should save their money", that American instinct to consume will kick in and people will assume that "some people" doesn't include them.

(For the record, I will be saving my economic stimulus check. I think. It's hard to say, because money's fungible, and I stand to have negative cash flow this summer because I won't be teaching like I have the last two summers. So it'll go into savings, but then I'll spend "it" later. Money is money, it all mixes together. It's a scalar, not some sort of crazy vector in a non-Euclidean space as some people would probably like you to think.)

Simpson's paradox and climate change

Solving the climate change attitude mystery, from Statistical Modeling, Causal Inference, and Social Science, originally from Wired.

The facts appear to be as follows: 19 percent of college-educated Republicans believe that human activites cause global warming; 75 percent of Democrats believe the same. Among the non-college-educated, 31 percent of Republicans have that belief and 52 percent of Democrats.

One conclusion is that college-educated people are more likely to toe the party line.

But here's another idea. The assumption underlying this seems to be that "Republican" and "Democrat" are fixed labels -- so clearly college makes you think that global warming is more likely if you're a Democrat, but less likely if you're a Republican. But of course those labels are not fixed; people can switch parties! So maybe what happens is that education doesn't change your beliefs -- but if you are the sort of person who would believe that humanity causes global warming, but otherwise tend to agree with Republicans, a college education would flip you to the Democratic side. (And vice-versa for Democratic-leaning global-warming-skeptics.) I'm not saying that there are people doing indoctrination at our colleges, but that such an education changes the way one looks at things.

It basically seems to be Simpson's paradox in different guise. You've got to be careful when the groups you're analyzing change size!

Prediction markets aren't perfect

As of right now, intrade.com reports a 7.4 percent probability that Hillary Clinton will get the Democratic nomination for President -- and an 8.0 percent probability that she will be the next President.

That seems a bit unlikely to me.

Intrade isn't an efficient market; there are tons of arbitrage opportunities like this. (Some are more subtle.)

Although if you want to get technical, buying Clinton getting the Democratic nomination (at 74 cents for a contract that pays $10) and selling Clinton winning the '08 election (at 80 cents) isn't quite risk-free -- there's a nonzero chance that she might want so badly to be President that she'd run as an independent in order to do so. And it's my understanding that short selling (which would be necessary to pull this off) isn't possible on Intrade anyway.

12 May 2008

When Obama wins...

When Obama wins is Jason Kottke's collection of people's postings to Twitter that begin "when Obama wins".

It appears to have started with When Obama wins... unicorns will crap ice cream and pastries.

Of course this is vacuously true! Or at least it would be given the nonexistence of unicorns, which I think we can take as an axiom. To disprove it one would need to produce a counterexample, namely a unicorn that does not crap ice cream and pastries. But we can't do that, since there are no unicorns!

Now, if it said "When Obama wins... everybody will have their own unicorn that craps ice cream and pastries", that would be falsifiable. (And almost certainly false. Let's face it, it's hard to make sure everybody gets a unicorn.)

11 May 2008

Jenga mathematics

John Cook writes about Jenga mathematics, his name for the sort of mathematics which is done by weakening the hypotheses of a theorem as much as possible while the theorem still remains true.

As he points out, "Taken to extremes, Jenga mathematics turns theorems inside-out and proofs become hypotheses". This is an interesting way to look at it, and perhaps explains why it is difficult to read such theorems.

10 May 2008

Some very crude graphing

The number of permutations of [n] of odd order (or equivalently, the number with all cycle lengths odd) is given by (n-1)2(n-3)2...12 if n is even, and n(n-2)2(n-4)2...12 if n is odd. The sequence begins 1, 1, 1, 3, 9, 45, 225, 1575, ..., which is A000246 in Sloane's encyclopedia.

How fast does this grow? Well, it's not hard to show (from Stirling's formula) that if a(n) is the number of permutations of [n], then a(n)/n! is asymptotic to (2/πn)1/2. Since there are n! permutations of [n], that's the asymptotic probability that a randomly chosen permutation is of odd order. Alternatively, a(n) is asymptotic to 2(n/e)n. So log a(n) ~ log 2 + n(log n - 1).

And you can quickly "graph" log a(n) by just looking at the table of the first 100 values. The length of a(n), when written in base 10, is proportional to log n (precisely, it's log n / log 10). So the length of a(n) when written in decimal should grow a bit faster than linearly.

Indeed, if you zoom out from that table, that's exactly what you see -- the table's not quite a triangle, but it's close. Of course this trick isn't really so useful -- how often is one going to have a sequence that we can compute exactly but don't have any idea how large the logarithm is? -- but I think it's kind of cute.

09 May 2008

The Stolz-Cesàro theorem

The Stolz-Cesàro theorem is a discrete analogue of L'Hôpital's rule, with which I was not familiar.

(From Topological Musings.)

08 May 2008

Discreet mathematics

What is discreet mathematics? Urbandictionary.com tells us: "Discreet mathematics refers to the subtle study of mathematics. Discreet mathematics is characterised by furtive looks in maths textbooks disguised as pr0n and by secret maths lectures held in abandoned warehouses at midnight..."

Of course, discrete mathematics is an entirely different beast.

edit, 6:31 pm: A poster on a cryptography mailing list pointed out that cryptography is "discreet" mathematics.

Sampling is tricky

From Statistical Modeling, etc.: The candy weighing demonstration, or, the unwisdom of crowds.

Basically, you have a bunch of candies in a bag, of various sizes. You want to know how much the bag weighs, and for some reason you know the number of candies. To do this, you pick five candies at random and divide by five to get the average weight of a candy. Then you multiply by the number of candies.

Your estimate will almost surely be too high, because you are more likely to pick the large candies than the small ones.

This reminds me of a classic statistical paradox: academic institutions like to measure the "average number of students per course", which they do by adding up the number of students in each course and dividing by the number of courses. But what really matters from the students' point of view is the average size of the courses they're in. Big classes have more students in them (that's what makes them big!), so this latter average will be larger. Assume, for example, we have a university with two courses; one has 30 students and one has 60. Every student takes one course. The average number of students per course is (30+60)/2 = 45. But if you ask the students "how many people are in your course?", one-third will say 30 and two-thirds will say 60, so from the students' point of view the average is 50. One could conclude that universities would use their teaching resources more efficiently if they didn't have such a wide range of course sizes, but this wouldn't be practical.

An "algorithmic curve"?

Steampunk Moves Between Two Worlds, from today's Monkey Writes.
“There seems to be this sort of perfect storm of interest in steampunk right now,” Mr. von Slatt said. “If you go to Google Trends and track the number of times it is mentioned, the curve is almost algorithmic from a year and a half ago.” (At this writing, Google cites 1.9 million references.)
Here's the curve at Google Trends. It looks linear to me, although with a lot of noise. I suspect, though, that the speaker meant to say it was exponential (because exponentials grow really fast, so this would be in keeping with the rest of the article), then confused that with "logarithmic" (which is a pretty common mistake), and then rearranged a few letters to get "algorithmic".

And a random Google Trends fact: the number of people searching for mathematics has declined steadily over the past four years. Math has been basically flat over that time period, but shows a very large dip during the summers. Presumably the people searching for "math" are students. Maths displays what appears to be a more complicated seasonal pattern; can this be explained by the UK academic calendar. As for "mathematics" declining, I suspect people find the word too long.

07 May 2008

A linear equation come to life

The New York Times has a delegate calculator. This consists of a slider which allows the user to set a hypothetical percentage of delegates that Obama (or Clinton) will win in the remaining primaries, and returns the percentage of the remaining uncommitted superdelegates that candidate would have to get in order to win the Democratic presidential nomination.

Of course, it varies linearly -- for each pledged delegate a candidate gets, that's one less superdelegate they need. Since things are phrased in terms of percentages, it's not quite so simple -- there are 217 pledged delegates left and 225.5 superdelegates with unknown preference. (Half a superdelegate, you ask? Democrats Abroad get delegates that each get half a vote.) So for each one percent more of the pledged delegates a candidate gets (that's roughly proportional to the popular vote, although rounding and the fact that delegates are assigned based on historic turnout as opposed to current turnout alters that), they need slightly less than one percent less of the superdelegates. (Don't try to do the arithmetic; it doesn't seem to quite work out, because I suspect the superdelegate counters and the slider-makers aren't constantly talkign to each other.)

This is an interesting wya to display the linear dependence; a graph or a table would seem more obvious, but with computers one isn't bound to static displays as one is on paper.

06 May 2008

Something I learned from Jeopardy!

On tonight's Jeopardy, the last clue in a category about awards referred to the Fermat Prize. The clue was something like "The Fermat prize is given in this field for work involving 'variational principles'" (I don't have the exact wording). The question, of course, is "What is mathematics?"

Now, I didn't know there was such a prize. It turns out that it's awarded for work in any of the following three fields in which Fermat's work was important: "Statements of Variational Principles", "Foundations of Probability and Analytical Geometry", or "Number theory". From the past prize winners it looks like it's been awarded in all of those categories, although when I heard the clue it sounded like the prize was only awarded for work in the calculus of variations. (Sometime in the next couple days, the exact wording should be available at the Jeopardy! archive.)

Also, Alex Trebek pronounced the "t" in Fermat. No.

And the second-to-last clue in the same category asked for the subject in which the Turing Award is given. (I'd never heard of that award, either, but the answer is obvious if -- and probably only if -- you've heard of Turing.)

05 May 2008

Breaking down inflation

An interesting informational graphic: All of Inflation's Little Parts, from Saturday's New York Times. This is a graphic that breaks up the many parts of the standard "basket" that is used by the US government to compute the inflation rate.

Of course, nobody is exactly average. To take an example dear to my heart, 0.1% of this basket is "books". But how many people do you think spend exactly that portion of their income on books? My instinct is that the distribution of "percentage of income spent on books" (and a lot of the other small items) has a long right tail. Plenty of people I know (who are of course not a uniform sample) spend, say, two or three percent of their income or more on books -- and it's my understanding that the book industry relies quite a bit on people like us.

Somewhat more seriously, 23.9% of the basket is "owner's equivalent rent" (what homeowners would pay if they were renting their homes) and 5.8% is actual rent. That means that a typical household making, say, $50,000 a year spends about $1,000 monthly on the house they own (or would, if we weren't having a housing bubble), and $250 monthly on rent. But it's very hard to imagine a household that actually does that! The mean and the mode, in distributions like this, are very different. It would be quite surprising, I think, to find someone whose spending breaks down exactly as in the graphic.

04 May 2008

Loose ends

You may have noticed that posts have been less frequent than usual recently.

This is because first I was finishing up the semester. Then I spent this weekend recovering from the semester. The good news is that I am done taking classes for credit, forever! This is also somewhat terrifying, because the prospect of doing research full-time and writing a thesis is daunting -- but also exciting...

Anyway, not too long ago I wrote about the napkin ring problem in calculus, inspired by Keith Devlin's article; Devlin has given a calculus-free solution, basically that corresponding cross-sections are equal. I alluded to this in the post a couple weeks ago but didn't write out the details. In his May column, Devlin gives some reader remarks to Paul Lockhart's A Mathematician's Lament, which I wrote about back in March.

Also, here's an interview from Leonard Mlodinow, at Carl Bialik's The Numbers Guy -- my mother fairly reliably forwards me some of the The Numbers Guy columns, because she gets some sort of e-mail from the Wall Street Journal, and I don't have the heart to tell her that I'm a step ahead of her! I previously wrote about Mlodinow's probability quiz. Unfortunately, the cover of Mlodinow's book The Drunkard's Walk no longer seems to feature dice.

02 May 2008

The strange mathematics of tipping

Do you tip less in a tough economy?, from Queercents.com, which is apparently a queer personal finance site.

It seems that at some restaurants, the average tip has gone down from the old 15 to 20 percent neighborhood to more like 12 percent. The author of the post comments that this is a "3 to 7 percent" cut in pay.

First, 20-12 is 8, not 7.

Second, that's all wrong! Let's say that before the US economy went all funny, servers made $2.50 an hour in pay, and $7.50 in tips. ($2.50 is somewhere near the federal minimum wage for servers; $7.50 was chosen so that the figures would add up to $10.) Let's say that the $7.50 was back when tips were an average of 18%. If tips average 12% now, then only two-thirds as much tip income will come in -- so $5 an hour.

So our hypothetical server now averages $7.50 an hour instead of $10, a twenty-five percent cut. There's a big difference there. I would not be happy if my income were cut by "3 to 7 percent", but I wouldn't have to make huge changes in my lifestyle. But if my income were cut by 25 percent, there are quite a few things I'd have to cut back on. I suspect this is true for many people.

For my international readers who are not used to the silly system we have here in the US: restaurants typically pay their servers between $2 and $3 an hour, and the rest of their income comes in tips; this is as opposed to the more civilized system I understand you have in much of the rest of the world, in which tipping is reserved for extraordinary service and restaurant owners actually pay their staff a decent wage.

(Thanks to dan for pointing me to this.)