^{n}in f(z)". For example, g(2) is the coefficient of z

^{2}in (1-z)(3)(2+z)(1+2z)(3z), which turns out to be 27; this means that the number of lines on the generic hypersurface of degree 3 in complex projective 3-space is 27. Informally, "there are 27 lines on the cubic". Want to know why? For the cubic case, see this post by Charlie or go to his talk last week.

The first few values of g(n) are 1, 27, 2875, 698005, 305093061, 210480374951, 210776836330775, 289139638632755625... -- so this sequence grows pretty quickly. How quickly? I couldn't tell at first, but it kept nagging at me every time I was in Charlie's office. The sequence is A027363 in the Encyclopedia of Integer Sequences. It turns out that it's in an appendix of the paper that the Encyclopedia links to (arXiv:math/0610286, specifically its appendix by Zagier) but the derivation there is long and involved and it was more fun with the formula myself. There are a few gaps, but here it is.

We can deal easily with the 1-z factor out front, and we want where

We can already see it's going to be kind of tricky; the coefficients of f

_{n}(z) are probably going to be pretty big and not that far apart.

Now, we can pull out a 2n-1 from each factor in the expression for f

_{n}(z), and we get

Call the product p

_{n}(z). Now this is where, out of nowhere, I start having Fun With Probability. (It's kind of amusing because there is

*nothing*probabilistic about the original question.) The term corresponding to j in that product is the probability generating function of a random variable which is 1 with probability (j/(2n-1)) and 0 otherwise. The product is thus the p.g.f. of the sum of such random variables for j = 0, 1, ..., 2n-1.

Now, this sum has mean which is the sum of the means of those random variables, that is,

and variance which is the sum of their variances, which is approximately n

^{2}/3. Furthermore, since the sum is a sum of a bunch of small contributions, it's approximately normal. So p

_{n}(z) is the probability generating function of a random variable which is roughly normal with mean n and variance n

^{2}/3, but integer-valued, and its kth coefficient is the probability that such a random variable is equal to k.

Therefore [z^k] p_n(z) is approximately

which is the density of such a normal random variable. We want to know [z

^{n}] p

_{n}(z) - [z

^{n}] p

_{n-1}}(z), and this is approximately q

_{n}(n) - q

_{n}(n-1). You'd think this would be q

_{n}'(n), but q

_{n}(n) is actually zero -- the fact that the coefficients were "close to each other" is even more troublesome than you'd expect. Still, we can make a Taylor series for q

_{n}at n, and the linear term is zero, so we get And g(n) = [z

^{n}] f

_{n}(z) - [z

^{n}] f

_{n-1}(z) = (2n-1)

^{2n}([z

^{n}] p

_{n}(z) - [z

^{n}] p

_{n-1}(z)); using the approximation we get

and now note that

Therefore which matches (with a bit of computation) the result given by Zagier in the arXiv, and the terms reported in the OEIS. I wouldn't have trusted it otherwise; that "take the second derivative" step is particularly sketchy, although it can probably be justified as there are results on the rate of convergence of the central limit theorem. But asymptotic analysis is nice in this way; if it's easy to compute the terms of some sequence, then we can often be confident in results like this even without

Besides, I'd never seen the trick of using the

*second*derivative to approximate a difference before. At least not that I can remember.

## 3 comments:

Could you explain how you get to "2n(n-1)/[3(2n-1)]" and then from there to n^2/3?

It may just be that I'm overtired, but that doesn't look right to me.

I could believe that I've lost a factor of two doing the summation, but that last step - surely that becomes n/3, right?

Efrique,

you're right, it should be n/3; I mistyped. (What follows is correcet based on it being n/3, though.)

This is a very nice argument ;), but I think some technical work is required to turn in into a "formal" proof.

Here is something that may be troublesome... You say: "Furthermore, since the sum is a sum of a bunch of small contributions, it's approximately normal."

The variables you use are indeed independent, but not identically distributed. How does the asymptotic normality follow? Moreover, asymptotic normality gives you information about the distribution, and not the individual probabilities, where you need a local limit law.

Post a Comment