God Plays Dice: April 2011

17 April 2011

Get drunk not broke

Dreading the upcoming work week, because it's Sunday night and you still don't make enough money to buy fancy booze? Get drunk not broke sorts drinks by alcohol to cost ratio. See, math is good for something.

There's also Get drunk not fat (alcohol to calories).

And finally there's Brian Gawalt's Get drunk but neither broke nor fat. This page finds the three drinks such that every other drink is either more expensive or more caloric per ounce of alcohol.

12 April 2011

On the inclusion of solutions in textbooks

From Jones, Game Theory: Mathematical Models of Conflict (link goes to Google Books), in the preface:

"Some teachers may be displeased with me for including fairly detailed solutions to the problems, but I remain unrepentant [...] In my view any author of a mathematical textbook should be required by the editor to produce detailed solutions for all the problems set, and these should be included where space permits."

By the way, Jones was writing this in 1979; presumably if space does not permit, in the present day solutions can be posted on the author's web site. (This will pose a problem if websites move, though; perhaps an arXiv-like electronic repository of solutions would be a good idea?) A reviewer at Amazon points out that the inclusion of solutions to problems might be an issue for those choosing to assign the textbook in a course where homework is collected and graded. Jones has a PhD from Cambridge and as far I can tell was at Imperial College, London at the time of writing; the willingness to include solutions may have something to do with the difference between the British and American educational systems.

I've seen frustration about the lack of provided solutions in textbooks on the part of my more conscientious students. (This isn't with regard to this text - I'm not currently teaching game theory - but with regard to other texts I've used in other courses.) They want to do as many problems as they can, which is good. This practice of leaving out the solutions is perhaps aimed at the median student - in my experience the median student does all of the homework problems but would never consider trying anything that's not explicitly assigned. (And although I don't know for sure, the student who goes out of their way to get a bootleg solutions manual is probably not the conscientious student I'm referring to.)

07 April 2011

The probability of Asian-Hispanic food

From tomorrow's New York Times: an article on the growing prevalence of Asian-Hispanic fusion food in California. It's part of a series on the Census. Orange County, California is 17.87% Asian and 34.13% Hispanic -- so the majority of the population, 52.00%, is either Asian or Hispanic. Not surprisingly there's a guy there with a food truck named Dos Chinos, which serves such food as sriracha-tapatito-tamarind cheesecake.

This is accompanied by a little map showing the sum of Asian and Hispanic population in any given county. (Well, it might be the sum; to be honest I don't know, as in the Census "Hispanic" vs. "non-Hispanic" is orthogonal to "race", which takes values White, Black, Asian, American Indian, and Native Hawaiian.) In many places in the southern half of the state it's over 50%.

But wouldn't the relevant statistic for this article be not (0.1787) + (0.3413), but 2*(0.1787)*(0.3413) = 0.1219, the probability that if two random Orange Country residents run into each other, one of them will be Asian and the other will be Hispanic? Fresno County, for example, is 50.3% Hispanic and 9.6% Asian -- that's 59.9% "Hispanic or Asian" -- but there wouldn't seem to be quite as many opportunities for such fusion as the probability of a Hispanic-Asian pair in Fresno County is only 2*(50.3%)*(9.6%) = 9.7%.

(Except that 97% of Fresno County's Asians are Hispanic, according to the frustratingly hard-to-navigate American FactFinder.So maybe some "fusion" has already taken place.)

06 April 2011

Folding toilet paper thirteen times

James Tanton, of the St. Mark's Math Institute at St. Mark's School in Southborough, Massachusetts, makes excellent short mathematical videos.

He and his students also folded a very long piece of paper 13 times -- that is, they created 2¹³-ply toilet paper. This is a world record. (There's a bit of a question about whether they actually got 13 folds or just 12 -- the 13th fold has to be held in place. 12 has been done before.) You can read about it in a local newspaper, or see a video on Youtube. They did it in the "Infinite Corridor" of MIT, which is not infinite but is very long, about 800 feet. On a Sunday, apparently, and on what must be the third or fourth floor. They got access thanks to OrigaMIT, MIT's origami club. I am only very mildly surprised that such a club exists.

This whole thing may be the only known good use of single-ply toilet paper.

02 April 2011

A street-fighting approach to the variance of a hypergeometric random variable

So you all¹ know that if I have a biased coin with probability p of coming up heads, and I flip it n times, then the expected number of heads is np and the variance is npq. That's the binomial distribution. Alternatively, if I have an urn containing pN white balls and qN black balls, with p + q = 1, and I draw n balls with replacement then the distribution of the number of white balls has that mean and variance.

Some of you know that if I sample without replacement from that same urn -- that is, if I take balls out and don't put them back -- then the expected number of white balls is np and the variance is npq(N-n)/(N-1). The distribution of the number of white balls is the hypergeometric distribution.

So it makes sense, I think, to think of (N-n)/(N-1) as a "correction factor" for going from sampling with replacement to sampling without replacement. This is the approach taken in Freedman, Pisani, and Purves, for example, which is the book I'm teaching intro stats from this semester.

How do you prove this? On this, FPP are silent. The proof I know -- see, for example, Pitman -- is as follows. Write the number of white balls, when sampling without replacement, as

S_n = I₁ + ... + I_n

where IS_k is 1 if the kth draw gives a white ball and 0 otherwise. Then E(I_k) is just the probability of getting a white ball on the kth draw, and so it's equal to p by symmetry. By linearity of expectation E(S_n) = np. To get the variance, it's enough to get E(S_n²). And by expanding out that sum of indicators there, you get

S_n² = (I₁² + ... + I_n²) + (I₁ I₂ + I₁ I₃ + ... + I_n-1 I_n).

There are n terms inside the first set of parentheses, and n(n-1) inside the second set, which includes every pair I_j I_k where j and k aren't equal. By linearity of expectation and symmetry,

E(S_n²) = nE(I₁) + n(n-1)E(I₁ I₂).

The first term, we already know, is np. The second term is n(n-1) times the probability that both the first and second draws yield white balls. The first draw yields a white ball with probability p. For the second draw there are N-1 balls left, of which pN-1 are white, so that draw yields a white ball with probability (pN-1)/(N-1). The probability is the product of these. Do the algebra, let the dust settle, and you get the formula I claimed.

But this doesn't explain things in terms of the correction factor. It doesn't refer back to the binomial distribution at all! But in the limit where your sample is small compared to your population, sampling without replacement and smapling with replacement are the same! So can we use this somehow? Let's try to guess the correction factor without writing down any random variables. We'll write

Variance without replacement = f(N,n) npq

where n is the sample size and N is the population size, and think about what we know about f(N,n)

First, f(N,1) = 1. If you have a sample of size 1, sampling with and without replacement are actually the same thing.

Second, f(N,N) = 0. If your sample is the entire population, you always get the same result.

But most important is that if we sample without replacement, and take samples of size n or of size N-n, we should get the same variance! Taking a sample of size N-n is the same as taking a sample of size n and deciding to take all the other balls instead. So for each sample of size n with w white balls, there's a corresponding sample of size N-n with pN-w white balls. The distributions of numbers of white balls are mirror images of each other, so they have the same variance. So you get

nf(N,n)pq = (N-n)f(N, N-n)pq.

Of course the pq factors cancel. For ease of notation, let g(x) = f(N,x). Then we need to find some function g such that g(1) = 1, g(N)=0, and ng(n) = (N-n)g(N-n). Letting n = 1 you get g(1) = (N-1)g(N-1), so g(N-1) = 1/(N-1). The three values of g that we have so far are consistent with the guess that g is linear. So let's assume it is -- why should it be anything more complicated? And that gives you the formula. This strikes me as the Street-Fighting Mathematics approach to this problem.

Question: Is there a way to rigorize this "guess" -- some functional equation I'm not seeing, for example?

1. I use "all" in the mathematician's sense. This means I wish you knew this, or I think you should know it. Some of you probably don't. That's okay.

God Plays Dice