26 November 2008

The asymptotics of "27 lines on a cubic"

Charlie Siegel, whose office is just down the hall from mine, had for a while the following statement on the blackboard in his office (paraphrased and renotated a bit): Let g(n) be the number of lines on the generic hypersurface of degree 2n-1 in complex projective (n+1)-space. Then
g(n) = [z^n] (1-z) \prod_{j=0}^{2n-1} (2n-1-j+jz)
where the notation $[z^n] f(z)$ means "the coefficient of zn in f(z)". For example, g(2) is the coefficient of z2 in (1-z)(3)(2+z)(1+2z)(3z), which turns out to be 27; this means that the number of lines on the generic hypersurface of degree 3 in complex projective 3-space is 27. Informally, "there are 27 lines on the cubic". Want to know why? For the cubic case, see this post by Charlie or go to his talk last week.

The first few values of g(n) are 1, 27, 2875, 698005, 305093061, 210480374951, 210776836330775, 289139638632755625... -- so this sequence grows pretty quickly. How quickly? I couldn't tell at first, but it kept nagging at me every time I was in Charlie's office. The sequence is A027363 in the Encyclopedia of Integer Sequences. It turns out that it's in an appendix of the paper that the Encyclopedia links to (arXiv:math/0610286, specifically its appendix by Zagier) but the derivation there is long and involved and it was more fun with the formula myself. There are a few gaps, but here it is.

We can deal easily with the 1-z factor out front, and we want
g(n) = [z^n] f_n(z) - [z^{n-1}] f_n(z)
f_n(z) = \prod_{j=0}^{2n-1} (2n-1 - j + jz)

We can already see it's going to be kind of tricky; the coefficients of fn(z) are probably going to be pretty big and not that far apart.

Now, we can pull out a 2n-1 from each factor in the expression for fn(z), and we get
 f_n(z) = (2n-1)^{2n} \prod_{j=0}^{2n-1} {(2n-1 - j + jz) \over 2n-1 }

Call the product pn(z). Now this is where, out of nowhere, I start having Fun With Probability. (It's kind of amusing because there is nothing probabilistic about the original question.) The term corresponding to j in that product is the probability generating function of a random variable which is 1 with probability (j/(2n-1)) and 0 otherwise. The product is thus the p.g.f. of the sum of such random variables for j = 0, 1, ..., 2n-1.

Now, this sum has mean which is the sum of the means of those random variables, that is,
\sum_{j=0}^{2n-1} {j \over 2n-1} = n
and variance which is the sum of their variances,
\sum_{j=0}^{n-1} {j \over 2n-1} \left( 1 - {j \over 2n-1} \right) = {2n(n-1) \over 3(2n-1)}
which is approximately n2/3. Furthermore, since the sum is a sum of a bunch of small contributions, it's approximately normal. So pn(z) is the probability generating function of a random variable which is roughly normal with mean n and variance n2/3, but integer-valued, and its kth coefficient is the probability that such a random variable is equal to k.

Therefore [z^k] p_n(z) is approximately
q_n(k) = {3 \over \sqrt{2\pi n}} \exp \left( {-(k-n)^2 \over 2n/3} \right)

which is the density of such a normal random variable. We want to know [zn] pn(z) - [zn] pn-1}(z), and this is approximately qn(n) - qn(n-1). You'd think this would be qn'(n), but qn(n) is actually zero -- the fact that the coefficients were "close to each other" is even more troublesome than you'd expect. Still, we can make a Taylor series for qn at n, and the linear term is zero, so we get
q_n(n) - q_n(n-1) \sim {q_n^{\prime\prime}(n) \over 2} = \sqrt{27 \over 8\pi n^3}
And g(n) = [zn] fn(z) - [zn] fn-1(z) = (2n-1)2n ([zn] pn(z) - [zn] pn-1(z)); using the approximation we get
g(n) \sim \sqrt{27 \over 8\pi n^3} (2n-1)^{2n}

and now note that
(2n-1)^{2n} = (2n)^{2n} \left(1-{1 \over 2n}\right)^{2n} \sim {(2n)^{2n} \over e}.

g(n) \sim \sqrt{27 \over 8\pi e^2 n^3} (2n)^{2n}
which matches (with a bit of computation) the result given by Zagier in the arXiv, and the terms reported in the OEIS. I wouldn't have trusted it otherwise; that "take the second derivative" step is particularly sketchy, although it can probably be justified as there are results on the rate of convergence of the central limit theorem. But asymptotic analysis is nice in this way; if it's easy to compute the terms of some sequence, then we can often be confident in results like this even without

Besides, I'd never seen the trick of using the second derivative to approximate a difference before. At least not that I can remember.


Efrique said...

Could you explain how you get to "2n(n-1)/[3(2n-1)]" and then from there to n^2/3?

It may just be that I'm overtired, but that doesn't look right to me.

I could believe that I've lost a factor of two doing the summation, but that last step - surely that becomes n/3, right?

Michael Lugo said...


you're right, it should be n/3; I mistyped. (What follows is correcet based on it being n/3, though.)

Anonymous said...

This is a very nice argument ;), but I think some technical work is required to turn in into a "formal" proof.

Here is something that may be troublesome... You say: "Furthermore, since the sum is a sum of a bunch of small contributions, it's approximately normal."

The variables you use are indeed independent, but not identically distributed. How does the asymptotic normality follow? Moreover, asymptotic normality gives you information about the distribution, and not the individual probabilities, where you need a local limit law.