God Plays Dice: "Square roots" of probability distributions

Think about probability distributions supported on the positive integers. Not all of them have a "square root" -- that is, given a random variable X supported on the positive integer, there do not exist independent, identically distributed variables Y₁, Y₂ such that X has the same distribution as Y₁ + Y₂.

You might be wondering why it's natural to refer to this as a "square root". Well, let p_k be the probability of the event X = k. Then the probability generating function for X is

$f(z) = \sum_{k \ge 0} p_k z^k$ .

Similarly, let q_j be the probability of the event Y = j, and define the probability generating function

$g(z) = \sum_{j \ge 0} q_j z^j$ .

Let Y, Y₁, Y₂ be equidistributed. Then X is equidistributed with Y₁ + Y₂ if and only if f(z) = g(z)², since addition of independent random variables corresponds to convolution of distributions and multiplication of their generating functions.

Conversely, the random variable X has a square root in this sense if and only if its generating function f(z) has a square root which has a Taylor series at z = 0 with all coefficients positive.

The simplest case is when X has finite support. Then X has a square root if and only if its generating function f(z) is the square of a polynomial. For example, the random variable which takes values 0, 1, 2 with probabilities 1/4, 1/2, 1/4 has a square root; its generating function is

$f(z) = {1 \over 4} + {1 \over 2} z + {1 \over 4} z^2 = \left( {1 \over 2} + {1 \over 2} z \right)$ .

But the random variable taking values 0, 1, 2 with probability 1/3 each is not, since 1/3 + z/3 + z²/3 is not a square.

But what about distributions with infinite support? Some of these are easy -- the square root of the Poisson distribution with mean λ is Poisson with mean λ/2. This can easily be seen since the probability generating function of a Poisson(λ) random variable is

$\sum_{k=0}^\infty {e^{-\lambda} \lambda^k \over k!} z^k = \exp(\lambda(z-1))$ .

(In fact, the Poisson distribution is infinitely divisible; in the nomenclature used in this post one might say it has roots of all orders.)

Now consider a random variable X, with geometric distribution with parameter 1/2 supported on {0, 1, 2, 3, ...}; this has P(X = k) = 2^-(k+1). This is a special case of the negative binomial distribution, which is infinitely divisible. We have f(z) = 1/2 + z/2² + z²/2³ + ... = 1/(2-z). So the square root distribution has generating function

${1 \over \sqrt{2-z}} = \sqrt{2} \left( {1 \over 2} + {1 \over 8} z + {3 \over 64} z^2 + {5 \over 256} z^3 + {35 \over 4096} z^4 + \cdots \right)$

and in general the coefficient of zⁿ is, by the binomial theorem,

$\sqrt{2} {(-1)^n \over 2^{1+n}} {-1/2 \choose n} = \sqrt{2} {(-1)^n \over 2^{1+n}} (-1)^n {2n-1 \choose n-1} {1 \over 2^{2n-1}} = {1 \over 2^{3n-1/2}} {2n-1 \choose n-1}.$

That binomial coefficient is ${1 \over 2} {2n \choose n}$; we have ${2n \choose n} \sim 4^n/\sqrt{\pi n}$, so the coefficient of zⁿ in our probability generating function, which we'll call q_k, is asymptotic to

${2^{2n-1} \over {\sqrt{\pi n}} } {1 \over 2^{3n-1/2}} = {1 \over 2^n \sqrt{2 \pi n}}$

In particular, the "square root" distribution decays just a bit faster than the distribution that it's the square root of, the main difference being the additional factor of n^-1/2. This is reasonable, if you think about the process of convolution. We have

p_n = q₀ q_n + q₁ q_n-1 + q₂ q_n-2 + ... + q_n-1 q₁ + q_n q₀

and each of the n terms is roughly 1/(n2ⁿ). This is just another negative binomial distribution. (I actually didn't realize that until I started writing this post; the post was originally titled "what's the name of this distribution?" and then I did some research.)

4 comments:

Anonymous said...: Your comment about roots of all orders got me thinking--is there much use to the notion of the log of a distribution?; June 2, 2008 at 7:51 PM
Michael Lugo said...: Cooper,

I can't think of one off the top of my head, but I'm not trying that hard. One might start by thinking of whether the convolution exponential of a distribution (which could be defined as exp(X) = 1 + X + X*X/2 + X*X*X/6 + ...) is useful; the logarithm would then be its inverse.; June 2, 2008 at 7:54 PM
Anonymous said...: This comment has been removed by a blog administrator.; February 23, 2010 at 9:32 AM
Anonymous said...: This comment has been removed by a blog administrator.; March 9, 2010 at 2:48 PM

God Plays Dice

30 May 2008

"Square roots" of probability distributions

4 comments:

Blog Archive

Contributors

Other blogs