15 December 2011

Solution to distance between random points from a sphere

So I asked on Sunday the following question: pick two points on a unit sphere uniformly at random. What is the expected distance between them?

Without loss of generality we can fix one of the points to be (1, 0, 0). The other will be chosen uniformly at random and will be (X, Y, Z). The distance between the two points is therefore

√((1-X)2 + Y2 + Z2)

which does not look all that pleasant. But the point is on the sphere! So X2 + Y2 + Z2 = 1, and this can be rewritten as

√((1-X)2 + 1 - X2)

or after some simplification

√(2-2X).

But by a theorem of Archimedes (Wolfram Alpha calls it Archimedes' Hat-Box Theorem but I don't know if this name is standard), X is uniformly distributed on (-1, 1). Let U = 2-2X; U is uniformly distributed on (0, 4). The expectation of √(U) is therefore

04 (1/4) u1/2 du

and integrating gives 43/2/6 = 8/6 = 4/3.

(The commenter "inverno" got this.)

Of course it's not hard to simulate this in, say, R, if you know that the distribution of three independent standard normals is spherically symmetric, and so one way to simulate a random point on a sphere is to take a vector of three standard normals and normalize it to have unit length. This code does that:

xx1=rnorm(10^6,0,1); yy1=rnorm(10^6,0,1); zz1=rnorm(10^6,0,1)
d1=radic(xx1^2+yy1^2+zz1^2)
x1=xx1/d1;y1=yy1/d1;z1=zz1/d1;
xx2=rnorm(10^6,0,1); yy2=rnorm(10^6,0,1); zz2=rnorm(10^6,0,1)
d2=radic(xx2^2+yy2^2+zz2^2)
x2=xx2/d2;y2=yy2/d2;z2=zz2/d2;
d=radic((x1-x2)^2+(y1-y2)^2+(z1-z2)^2);

and then the output of mean(d), which contains the distances, is 1.333659; the histogram of the distances d is a right triangle. (The code doesn't make the assumption that one point is (1, 0, 0); that's a massive simplification if you want to do the problem analytically, but not nearly as important in simulation.)

11 December 2011

A geometric probability problem

Here's a cute problem (from Robert M. Young, Excursions in Calculus, p. 244): "What is the average straight line distance between two points on a sphere of radius 1?"

(Answer to follow.)

If any of my students are reading this: no, this should not be interpreted as a hint to what will be on the final exam.

16 November 2011

In which I declare four things which my probability class is not about

In class today, I said approximately this:

So people decide whether to have children by flipping a coin, and if it comes up tails they have a kid, and if it comes up heads they don't. They repeat this until it comes up heads. This is probably not a good model of how people decide whether or not to have children, but maybe it's good in the aggregate. And anyway this isn't a class about how people decide whether to have kids.

Then there are two kinds of children, girls and boys -- well, not always, but this isn't a class about that -- and each child is equally likely to be a boy or a girl -- well, wait, that's not exactly true, but it's not a horrible assumption about how reproduction works on a cellular level, but this isn't a class about that either.

And people's decisions to stop having kids is independent of the sex of the children they've had -- which says this isn't China, because people do interesting things under the one-child policy -- but this isn't a class about that.

(Then I actually did some math -- namely, assume that the number of children a random family has is geometrically distributed with some parameter p, and assume that all children are equally likely to be male or female and that their genders are independent of the gender of any other children or the number of children in the family. Pick a random family with no boys. What is the distribution of the number of children they have?)

11 November 2011

11/11/11

You may have heard that it's 11/11/11. (Or, if you live in the UK, 11/11/11.) When I was growing up, I'd get confused and think that World War II ended on this day, one hundred years ago. You know, at the eleventh hour of the eleventh day of the eleventh month of the eleventh year.

The New York Times says that marketers are viewing this as a singular event -- but they went on about this four years, four months, and four days ago.

The Corduroy Appreciation Club says it is Corduroy Appreciation Day.

A bit more mathematically, you can watch a video about the number eleven by James Grime, which appears to be the first of a series of Numberphile videos.

Edited, November 12, 12:29 pm: from the New York Times, a hundred years ago: "To-day it is possible to write the date with the repetition six times of a single digit." The article also points out that a digit will probably never occur again seven times in the date -- we'd have to make it to November 11, 10011 for that to happen.

09 November 2011

Small sample sizes lead to high margins of error, unemployment version

The ten college majors with the lowest unemployment rates, from yahoo.com. I've heard about this from a friend who majored in astronomy and a friend who majored in geology; both of these are on the list, with an unemployment rate of zero.

The unemployment rates of the ten majors they list are 0, 0, 0, 0, 0, 0, 1.3, 1.4, 1.6, and 2.2 percent.

I would bet that the six zeroes are just the majors for which there were no unemployed people in the sample. The data apparently comes from the Georgetown Center on Education and the Workforce; there's a summary table at the Wall Street Journal, and indeed the majors which have zero unemployment are among the least popular. Just eyeballing the data, some of the majors with the highest unemployment are also among the least popular. The red flag here would be, say, an unemployment rate of 16.7% (one out of six) or 20.0% (one out of five) for some major near the bottom of the popularity table, but I don't see it; I guess their sample is big enough that no major is that small, or maybe they actually made some adjustments for this issue.

The actual Georgetown report seems to be available here but I am having trouble viewing it.

In case you were wondering, mathematics is the 28th most popular major (of 173) and has 5.0% unemployment; "statistics and decision science" is 128th most popular and has 6.9% unemployment, which seems to go against the popular wisdom these days that statistics majors are more employable than math majors. (But I work in a statistics department, so my view of the popular wisdom may be biased.)

06 October 2011

Solution to a puzzle from a few months ago

I never posted a solution to this puzzle, and today one of my students asked me about it.

The puzzle was to find all three-digit numbers that, when multiplied by their successor, give a number concatenated with itself.

So of course when you concatenate a three-digit number x with itself, you get 1001x. So the question becomes: when is k(k+1) a multiple of 1001?

1001 is 7 times 11 times 13. k and k+1 have no prime factors in common, so we have to have that some subset of 7, 11, and 13 are prime factors of k, and the rest are prime factors of k + 1. Furthermore these are going to be proper subsets; if we have that 7, 11, and 13 are prime factors of k and none of those are prime factors of k+1, or vice versa, then we get that k doesn't in fact have three digits.

So we can have the following six situations:

(1) k is a multiple of 7 and k + 1 is a multiple of 143;

(2) k is a multiple of 11 and k + 1 is a multiple of 91;

(3) k is a multiple of 13 and k + 1 is a multiple of 77;

(4) k is a multiple of 77 and k + 1 is a multiple of 13;

(5) k is a multiple of 91 and k + 1 is a multiple of 11;

(6) k is a multiple of 143 and k + 1 is a multiple of 7.

In each case we can then use the Chinese remainder theorem to find k. Let's consider the first case: we have that k ≡ 0 (mod 7), k ≡ -1 (mod 11), k ≡ -1 (mod 13).

The CRT tells us that the solution to the system of congruences

k ≡ a7 (mod 7), k ≡ a11 (mod 11), k ≡ a13 (mod 13)

is

k ≡ a7(143)(143-1)7 + a11(91)(91-1)11 + a13 (77)(77-1)13 (mod 1001)

where I'm using (a-1)b to stand for the inverse of a mod b. The other solutions differ from this by multiples of 1001. We can find the inverses by brute force:

(143-1)7 = (3-1)7 = 5, (91-1)11 = (3-1)11 = 4, (77-1)13 = (12-1)13 = 12.

So we finally get

k ≡ 715 a7 + 364 a11 + 924 a13 (mod 1001).

Now, the case (1) corresponds to a7 = 0, a11 = -1, a13 = -1; so we get

k ≡ 0 - 364 - 924 (mod 1001)

and so we get the solution k = -364 - 924 + (2)(1001) = 714. Indeed (714)(715) = 510510. (714 and 715 are a Ruth-Aaron pair.) The other five situations lead to, respectively, k = 363, 923, 77, 637, 286 and k(k+1) = 132132, 852852, 6006, 406406, 82082.