God Plays Dice: gender

Showing posts with label gender. Show all posts

29 October 2007

Bayesian gender spam

A Bayesian explanation of how to determine the gender of a person on the street (from observable cues), by Meep.

It's rather similar to Bayesian spam filtering (Paul Graham, see also here). The major difference is that one can generally assume that most e-mail is spam, whereas one cannot assume that most people are of one or the other of the two canonical genders.

In the spam filtering case, though, it doesn't seem that the prior probability that a message is spam matters; Graham claims that most e-mails are either very likely or very unlikely to be spam. But there are probably more words in an e-mail than there are easily observable cues to a person's gender; it seems much more likely to get, say, that a person has a 60% probability of being male than that an e-mail has a 60% probability of being spam.

Also, it's a lot easier to collect the necessary for spam filtering than for gender determination.

07 September 2007

More men at the top, and at the bottom.

As has been documented by a lot of people, it seems that a lot of psychological traits have the following properties:

men and women have approximately the same average for this trait, and

both genders have an approximately normal distribution for this trait, but

the distribution of men's values for this trait has a larger standard deviation than the women's values

This has the effect that men are overrepresented at both extremes. The canonical example is skill in science or mathematics; it's been claimed that women and men are on average equally good at mathematics, but most of the best mathematicians are male. This isn't a contradiction, because most of the worst mathematicians are male but we don't notice it. (It actually doesn't matter that the averages are the same; even if men were on average worse than women at math, if they had a larger standard deviation then they'd predominate at the higher levels.)

The article Is There Anything Good About Men?, which was an invited address by Roy Baumeister at the American Psychological Association, addresses this. This is thought to arise from the fact that men can have more offspring than women.

Let's say that the X-ability of women is normally distributed with mean 0 and standard deviation 1, and the X-ability of men is normally distributed with mean 0 and standard deviation σ. Then the probability density function for the mathematical skill of women is

$f(z) = {1 \over \sqrt{2\pi}} \exp \left( {-z^2 \over 2} \right)$

and that for men is

$g(z) = {1 \over \sigma\sqrt{2\pi}} \exp \left( {-z^2 \over 2\sigma^2} \right)$

If we look at the ratio f(z)/g(z), this is the ratio of women to men at skill level z. It's

${f(z) \over g(z)} = \sigma \exp \left( {z^2 \left( \sigma^{-2} - 1 \right) \over 2} \right)$

and this equals 1 when

$z = \pm {\sigma \sqrt{2 \log \sigma \over \sigma^2-1}}$ .

When z is closer to zero than this, women will predominate; when z is larger, men will predominate. It turns out that

${\sigma \sqrt{2 \log \sigma \over \sigma^2-1}} = 1 + {\sigma-1 \over 2} + O((\sigma-1)^2)$

and since σ probably isn't much larger than 1, men will predominate at more than about one standard deviation from the mean and women at less than one standard deviation from the mean. Furthermore, we have f(0)/g(0) = σ; again, since σ isn't that much greater than 1, the predominance of women over men at the center of the overall distribution is difficult to see.

Yet if σ = 1.1 -- meaning that men's skill have a standard deviation 1.1 times that of women -- then g(3)/f(3) = 1.99, so men will be twice as common as women at z=3 (which corresponds to 3 standard deviations above the mean for women, and 2.73 for men). (The same is true at three standard deviations below the mean.) At z=4, men are overrepresented by a factor of 3.6, and at z=5, by a factor of eight.

Another thing that occurred to me is the economic ramifications of this difference. It's well-known that there are more obscenely rich men than obscenely rich women. It seems to me that economic ability -- i. e. the ability to earn money -- could be proportional to, say, the exponential of (some constant times general intelligence); so for every ten IQ points you gain, your income goes up by 30%. (I made up these numbers.) This would mean that economic ability is lognormally distributed (the name is a bit counterintuitive, if you don't know it, but it means that the logarithm of economic ability is normally distributed). But the mean of a lognormally distributed variable is e^μ+σ²/2, where μ and σ are the mean and standard deviation of the variable's logarithm. So if intelligence is normally distributed in both canonical genders, but men are more spread out than women in intelligence, then the mean of men's earning potential will be greater than that of women. I'm not saying that earning potential is directly tied to general intelligence (I know plenty of people who are smart but not rich) but it wouldn't surprise me to learn that earning potential is lognormally distributed and that something like what I've outlined here is at work.

05 August 2007

links for 5 August

The Fermi Paradox is back, via Slashdot. The Fermi paradox, for those of you who don't know, is basically the following question: if there are so many examples of extraterrestrial intelligence, as a lot of people believe, how come none of them have contacted us yet? This ties in to one of my favorite overmathematizations, the Drake equation, which computes the number of extraterrestrial civilizations in our galaxy by multiplying seven factors, most of which we have no good idea of. The result is a number with a ridiculously huge margin of error; depending on who you ask, the number of extraterrestrial civilizations that we might be able to here could be anywhere between zero and a million or so. Good expositions of the Drake equation usually point out that we have no way of predicting, for example, the average lifetime of a civilization. One particularly interesting resolution I've seen of the Fermi paradox is that other civilizations decide that they just don't care about talking to other species and spend all their time looking at the local equivalents of Internet pornography and reality television. I'm not saying I believe this, just that I've heard it. A bit more plausible, I think, is the idea that civilizations evolve so quickly that a civilization that was where we'll be in the year 3000 (if we don't kill ourselves first) wouldn't be interested in talking to us. (If you think 3000 is too soon, substitute some year further in the future.) I think it would be interesting to talk to a civilization that was where we were a thousand years ago, but a lot of people believe that the evolution of civilization is accelerating; Ray Kurzweil is probably the best-known exponent of this idea, called the Singularity. I'm a bit suspicious of it because a lot of the arguments seem to rely on the fact that we remember what happened in the recent past much better than what happened in the distant past.

What autistic girls are made of, by Emily Bazelon in today's New York Times. Disorders on the autistic spectrum are usually thought of has being uniquely the province of boys, but they happen to girls too. There are researchers who think of autism as being an "extreme male brain", and if that's true it kind of makes sense that it would be more common among males than females. Also, apparently it's harder to be an autistic woman than an autistic man because women are expected to understand social networks better than men; I'm kind of curious if this has always been true or if it's a historical accident. Vaguely relatedly, Who's a Nerd, Anyway? by Benjamin Nugent from last Sunday's NYT; people who are considered nerds are "hyperwhite", according to the linguist Mary Bucholtz. (This is "white" in a cultural sense, as in the way white Americans tend to act; I don't think the author intends to say that there's anything genetic about being a nerd.) What I find interesting is that this same tendency towards oversystemization can be called either hyperwhite or hypermale, despite the fact that we usually think of sex and race as being orthogonal to each other. Finally, Mark Liberman comments at Language Log on reactions to Nugent's article, and how in general non-scientific bloggers blogging about science, and non-scientific journalists writing newspaper articles about science, make fools of themselves.

The Probabilistic Method by Noah Snyder at Secret Blogging Seminar. I love when people find out that the probabilistic method exists. For those of you who aren't familiar with it, the probabilistic method is a method used to prove that a collection of objects contains some object with a certain property not by actually finding the thing but by just proving that if you pick an object from the collection, it has probability greater than zero of being the thing you're looking for. It's kind of a mindfuck, because many of its applications are in combinatorics and people expect there to be explicit constructions of things in combinatorics. It's possible to have a group of forty-two people such that there's no five of them who all know each other and no five who don't know each other. But I can't explicitly tell you which people in that group know each other and which don't. (This is an example of a Ramsey number.)

23 June 2007

Checkout lines and genderfree bathrooms

Yes, the two things mentioned in the title have something in common.

A Long Line for a Shorter Wait -- June 23 New York Times.

Whole Foods stores in New York City have moved from having a line for each checkout register (which is for the most part standard in American food stores) to a single line for the whole store. This means that the line looks longer but customers get through it faster. A few of the commenters at the NYT article have pointed out that you don't actually get through the line faster with this system; however, the probability of waiting a very long time is reduced. And that's really what the store wants to minimize. When you go grocery shopping, you understand that you're going to have to wait in line.

In general, if you wait in a line and there's a line on either side of you, the chances are one in three that your line will be faster than both of the lines adjacent to you. So there's a two in three chance that you'll regret your choice and think "damn, I should have gotten in that one!" -- and that's if you can't see any lines other than those two. I suspect that what a grocery store actually wants to minimize is a combination of average waiting time, some sort of "maximum" waiting time (maybe the 95th percentile?), and the number of people who feel like they got screwed over.

The article claims that the waits are much longer in NYC than elsewhere, though. If this is true, why? My guess is the following. Let's say your store's checkout people can serve 5 people per minute. Then in any given minute, the line only gets longer if more than 5 people come in. If on average four people come to the checkout per minute, then the line will only get longer in 22% of minutes (the minutes when six or more people get in line); it'll stay the same length in 16% of minutes (those when five people get in line); it'll shrink in 62% of minutes. So the line doesn't have much of a chance to get long. If on average 4.8 people come to the checkout per minute (96% of your store's capacity), these probabilities are 35%, 17%, 48% respectively; suddenly it's easier for the line to get long.

If on average 5 people come per minute, the line is equally likely to grow or shrink in any given minute. And if the store's understaffed (someone's sick, maybe?) then forget it -- the line is more likely to grow than to shrink and will probably get out of control. Perhaps NYC grocery stores are slightly understaffed relative to grocery stores elsewhere but this translates into big differences for line length.

Incidentally, a related problem comes to mind. There are a large number of small business establishments (restaurants, coffee shops, etc.) which have two bathrooms, each of which has a single toilet in it. In many cases these two bathrooms are marked "men" and "women". This means one has to wait longer than if there were two bathrooms, both of which were marked "bathroom". One might argue, though, that since men on average take less time in the bathroom than women, a system such as this would actually slow things down for men.

I've seen more and more people taking this into their own hands by using whatever bathroom is free in such establishments. The usual protocol seems to be to try the bathroom marked with one's gender first, and then try the other one if the "right" one is occupied.

And let's not forget that some people face a real quandary trying to decide which bathroom to use! safe2pee.org -- bathrooms for everyone has listings and maps of bathrooms which are safe for such people, either because they are single-occupancy or because they are explicitly genderfree.

God Plays Dice