02 February 2008

How many fixed points do involutions have?

It's a fairly well-known fact that if we pick a permutation of the set [n] at random, then the expected number of cycles of length k in that permutation is 1/k, for 1 ≤ k ≤ n. (Somewhat more memorably, the average number of elements in a permutation of [n] which are members of a k-cycle is 1.)

So what would you expect to happen if you just considered permutations have cycles of length 1 and 2 -- that is, involutions -- and sampled uniformly at random from them? If you're like me, your first thought is that the average involution will have twice as many 1-cycles as 2-cycles, and the same number of elements in 1-cycles as 2-cycles -- that is, the average involution on [n] will have n/2 1-cycles (i. e. fixed points) and n/4 2-cycles, for a total of n/2 fixed points and n/2 elements in 2-cycles.

But then you look at "n/2 fixed points" and think that that seems awfully large...

it turns out the average number of fixed points of an involution chosen uniformly at random from all involutions is about n1/2. This follows from standard generating function arguments. The exponential generating function of the number of involutions marked for their number of fixed points is exp(uz+z2/2); that is, the coefficient of znuk in that function is n! times the number of involutions on [n] with k fixed points. Standard methods from, say, Flajolet and Sedgewick (which I will probably buy when it comes out in print later this year, because I seem to cite it constantly) gives that the expected number of fixed points is
{[z^n] z \exp \left( z + {z^2 \over 2} \right) \over [z^n]  \exp \left( z + {z^2 \over 2} \right)}

and this can actually be rewritten as nan-1/an, where an is the number of involutions on [n], that is, $a_n = n! [z^n] \exp \left( z + {z^2 \over 2} \right)$. (There's a nice interpretation for this -- an-1/an is the probability that any given element of an involution is actually a fixed point -- although it's hard to say exactly why this should be true.)

Then, if you're still like me, you think "alas, I have forgotten how to figure out the asymptotics of coefficients of entire functions". But the asymptotic number of involutions is the last example of Herbert Wilf's generatingfunctionology. After some work, the asymptotic formula he gives for an gives that the expected number of fixed points in an involution is n1/2 - 1/2 + o(1)

Once you know that fixed points are rare, then it's not hard to guess that their distribution should be approximately Poisson, and thus variance should be of the same order of magnitude as the mean -- and the variance result turns out to be true. (I don't know about the Poisson result.) The variance is, I believe, n1/2 - 1 + o(1), although this is only from numerical evidence. (The generating-function way to calculate the variance relies on the definition of the variance as the mean of the square minus the square of the mean; this means I need better asymptotics in order to verify this. The better asymptotics are certainly achievable, but they're not at my fingertips.)

The result is a bit surprising, though -- why does cutting out cycles of length 3 and greater so drastically change the relative numbers of 1-cycles and 2-cycles? But involutions make up a vanishingly small proportion of all permutations, and weird things can happen in these asymptotically negligible sets without the bulk of the population caring at all.

19 comments:

I. J. Kennedy said...

"a_(n-1) / a_n is the probability that any given element of an involution is actually a fixed point -- although it's hard to say exactly why this should be true"

Given one of the a_n involutions on [n], choose an element and ask "is this a fixed point?" Surely by symmetry it doesn't matter which point we choose--they all have the same probability of being a fixed point, so choose the last one, n.

It is a fixed point iff the rest of the permutation, which is a permutation on [n-1], is an involution in its own right. Well, this happens a_(n-1) times out of the a_n, hence the probability a_(n-1) / a_n.

Michael Lugo said...

Jack,

thanks. That's basically what I wanted to say, although for some reason words failed me yesterday.

John Armstrong said...

Random thought: is there a relationship between the n^{1/2} that shows up here and that which shows up in the "birthday paradox"?

Michael Lugo said...

John,

not as far as I know. But I don't have any reasonable explanation for why it should be true yet, other than that that's what the generating functions say. I'm looking for one.

John Armstrong said...

It just seems that there's some similarity, in that you're looking for pairs which satisfy some condition together.

Anonymous said...

Here is a simple argument that we should expect a random involution of n points to have about n^{1/2} fixed points:

Recall (or prove) that the number of fixed point free involutions of 2n elements is
(2n-1)(2n-3)(2n-5)...5*3*1. So the number of involutions of an n element set which have k fixed points is

a_{k,n}=(n choose k) *(n-k-1)(n-k-3)(n-k-5)...5*3*1
=n*(n-1)*(n-2)*...*(n-k+2)*(n-k+1)
*(n-k-1)*(n-k-3)*...*5*3*1/k!

where k must have the same parity as n. Then a_{k,n}/a_{k-1,n}=(n-k)/k(k-1)

So a_{k,n} increases with k as long as (n-k)>k(k-1) and decreases with k thereafter. Rearranging n-k > k(k-1), we get n>k^2.

We should be able to use Stirling's formula to work out the limiting distribution of the number of fixed points.

Anonymous said...

I think that I don't believe your claim about the variance. I haven't been completely rigorous, but I think the variance should be of order n^{1/4}, not n^{1/2}.

Rough argument for why it should be less than n^{1/2}: First, correct the ratio computation above. In fact, r_{k,n}:=a_{k,n}/a_{k-1,n}=(n-k+2)/k(k-1). (This isn't important, but I'd be embarassed to keep carrying this error around.) Fix any constant c>0. I claim that, if k>(1+c)*n^{1/2} then a_{k,n}/a^{n^{1/2},n} decays rapidly in n. More precisely, a_{k,n}/a^{n^{1/2},n} is a product of about c n^{1/2}/2 terms of the form r_{j,n}, at least c n^{1/2}/4 of which are less than O(n/(1+c/2)^2 n). So a_{k,n}/a^{n^{1/2},n}=O((1+c/2)^{- c/2 n^{1/2}}). In particular, this decays faster than the reciprocal of any polynomial, so terms with this large k will, as n goes to infinity, contribute nothing to the variance (or, more generally, to any moment.) The same argument shows that there is no contribution from k<(1-c)n^{1/2}. In other words, in the limit probability distribution, everything is clumped around n^{1/2}, with a variance that is o(n^{1/2}).

I have a more precise argument, explaining why I believe the variance should by n^{1/4}, but I am worried that I have made some dumb error. Does what I have done make sense so far? How good is your data?

David Speyer

Michael Lugo said...

David,

I don't see an obvious flaw there. But I trust my data for now. (And it's pretty easy to generate -- just the second moment can similarly be obtained by differentiation. The exact asymptotic expansion is hard to get, of course, but coefficients of z^n in exp(z+z^2/2) arbitrarily far out are easy to find. Then again, your argument for o(n^{1/2}) variance could be correct with variance something like n^{1/2}/(log log n) which would be hard to detect when I've only calculated the first hundred or so coefficients of the generating function. At some point I'm going to actually do the computations I alluded to and write this and various other results of the same type up more formally; keep an eye out here. Thanks for your comment!

Anonymous said...

Here's one intuitive reason why conditioning on being an involution gives so many more 2-cycles (which may also be able to made rigorous into an n^(1/2) proof):

Let us generate our random involution as follows.

1. Generate a permutation sigma uniformly at random.

2. Expose the cycle containing n. If that cycle has length 3 or more, give up and start again from the step 1.

3. Otherwise, expose the rest of the cycles. If you have an involution, keep sigma. Otherwise, go back to step 1.

After step 2, we're equally likely to have placed n in a 2 cycle as in a 1 cycle. However, we still need to pass step 3.

If n is in a cycle of length 1, we keep sigma in this final step as often as a random permutation on n-1 elements is an involution. If n is in a cycle of length 2, we keep sigma as often as a random permutation on n-2 elements is an involution.

Involutions become significantly more rare as n increases, so a permutation on n-2 elements is much more likely to be an involution than one on n-1 points. Correspondingly, we keep many more of the permutations with n in a 2 cycle than with n in a 1 cycle.

Anonymous said...

OK, I've got some data now and it is -- inconclusive. I can do values of n around 1000 in basically no time; the largest that I have done is 4096.

It is definitely true that the distribution is peaked almost exactly at n^{1/2}. It also looks like a bell curve. I haven't got the code somputing standard deviations yet, but visually it looks like n^{1/2}. So this sounds kind of like a bell curve with center and variation n^{1/2}. But that is impossible. We know that the probability is precisely zero of having a negative number of fixed points.

I'll probably put some pictures and data up online once I clean it up.

David Speyer`

Michael Lugo said...

David,

the standard deviation can't be n^{1/2}, of course -- that would mean that none of the mass of the distribution is more than one standard deviation below the mean. The only distributions like that, if I remember correctly, are distributions with all the mass at two points; certainly a normal distribution (which I suspected this is even without seeing your data, because that's just how a lot of these things work) won't be of that sort.

But the variance (the square of the standard deviation) could be n^{1/2}, as I conjectured in the original post, making the standard deviation n^{1/4}. Then we would have from the non-negativity of the number of fixed points that none of the mass of the distribution is more than n^{1/4} standard deviations below the mean, which is a reasonable thing to say since the result is true in some asymptotic sense as n goes to infinity.

Anonymous said...

OK, some numerical data.

When n=1024, 2500, 3025, 4096

the mean of k is
31.5116, 49.5074, 54.5068, 63.5058

and the std. dev is
31.0193, 49.0124, 54.0113, 63.0097

That's a great fit for a standard deviation of n^{1/2}. But all the pictures look like bell curves! What the heck?

David Speyer

Anonymous said...

Oh, you're right, I'm dumb :). The data below is variances, not standard deviations. So the standard deviations are n^{1/4}, even though the bell curves look much wider by eye.

OK, now I think we are saying the same thing. Mean=n^{1/2}, Std. Dev. = n^{1/4}. The only thing we disagree on is that I think we are looking at a normal distribution and you think it is Poisson. Let me put up a picture and see if I convince you.

Anonymous said...

Check out
this histogram
depicting the probabilities of various numbers of fixed points in a random involution of 4096 elements.

David S.

Ori Gurel-Gurevich said...

David,
The Poisson distribution tends to a normal distribution (after proper normalization) as the parameter tends to infinity, so you and Isabel are at a complete agreement.
All that is left is to indeed prove that it's Poisson. I haven't thought about it, but my guess is that it shouldn't be too hard.

Anonymous said...

It was rather interesting for me to read the blog. Thank author for it. I like such topics and anything that is connected to this matter. I definitely want to read more soon.

Anonymous said...

Keep on posting such themes. I love to read blogs like this. BTW add more pics :)

Anonymous said...

Good brief and this enter helped me alot in my college assignement. Gratefulness you seeking your information.

Anonymous said...

Hey,

I am regular visitor of this website[url=http://www.weightrapidloss.com/lose-10-pounds-in-2-weeks-quick-weight-loss-tips].[/url]Plenty of useful information on godplaysdice.blogspot.com. Do you pay attention towards your health?. Let me present you with one fact here. Recent Scientific Research points that closely 80% of all U.S. grownups are either fat or weighty[url=http://www.weightrapidloss.com/lose-10-pounds-in-2-weeks-quick-weight-loss-tips].[/url] Hence if you're one of these individuals, you're not alone. Infact many among us need to lose 10 to 20 lbs once in a while to get sexy and perfect six pack abs. Now next question is how you can achive quick weight loss? You can easily lose with with little effort. If you improve some of your daily diet habbits then, its like piece of cake to quickly lose weight.

About me: I am webmaster of [url=http://www.weightrapidloss.com/lose-10-pounds-in-2-weeks-quick-weight-loss-tips]Quick weight loss tips[/url]. I am also health expert who can help you lose weight quickly. If you do not want to go under painful training program than you may also try [url=http://www.weightrapidloss.com/acai-berry-for-quick-weight-loss]Acai Berry[/url] or [url=http://www.weightrapidloss.com/colon-cleanse-for-weight-loss]Colon Cleansing[/url] for fast weight loss.