Showing posts with label census. Show all posts
Showing posts with label census. Show all posts

07 April 2011

The probability of Asian-Hispanic food

From tomorrow's New York Times: an article on the growing prevalence of Asian-Hispanic fusion food in California. It's part of a series on the Census. Orange County, California is 17.87% Asian and 34.13% Hispanic -- so the majority of the population, 52.00%, is either Asian or Hispanic. Not surprisingly there's a guy there with a food truck named Dos Chinos, which serves such food as sriracha-tapatito-tamarind cheesecake.

This is accompanied by a little map showing the sum of Asian and Hispanic population in any given county. (Well, it might be the sum; to be honest I don't know, as in the Census "Hispanic" vs. "non-Hispanic" is orthogonal to "race", which takes values White, Black, Asian, American Indian, and Native Hawaiian.) In many places in the southern half of the state it's over 50%.

But wouldn't the relevant statistic for this article be not (0.1787) + (0.3413), but 2*(0.1787)*(0.3413) = 0.1219, the probability that if two random Orange Country residents run into each other, one of them will be Asian and the other will be Hispanic? Fresno County, for example, is 50.3% Hispanic and 9.6% Asian -- that's 59.9% "Hispanic or Asian" -- but there wouldn't seem to be quite as many opportunities for such fusion as the probability of a Hispanic-Asian pair in Fresno County is only 2*(50.3%)*(9.6%) = 9.7%.

(Except that 97% of Fresno County's Asians are Hispanic, according to the frustratingly hard-to-navigate American FactFinder.So maybe some "fusion" has already taken place.)

02 May 2010

Arithmetic geometers write about statistics

Jordan Ellenberg, in yesterday's Washington Post: The census will be wrong. We could fix it.

This continues a proud tradition of mathematicians whose area of expertise is nowhere near statistics writing newspaper pieces saying that statistical sampling in censuses a good idea; Brian Conrad, 1998, New York Times.

In some sense it carries more weight when mathematicians who don't spend most of their time battling randomness in some sort or another . Statisticians of course think that doing statistical adjustments to the census in order to make it more accurate is a Good Idea; it gets them, their students, or their friends jobs!

As a combinatorialist I admire the theoretical elegance of our country's once-a-decade exercise in large-scale, brute-force combinatorics. But in practice, well, of course it needs some statistical help.

And here's something interesting:
Since 1970, a mail-in survey has provided the majority of census data, so what we enumerate is not people but numbers written on a form, which are as likely to be fictional as any statistical estimate.
I wonder if people are actually lying on their census forms. I suspect this would skew the count upwards. People who deliberately lie on their census forms, at least the sort of people I know, are likely to give "joke" answers. And large numbers are funnier. I live in a one-bedroom apartment, and if I were the sort of person who lied on government forms I would easily say that ten people live in my apartment. I can't give a comically low number of people living here, because the census insists that a positive integer number of people live in each place. Does the census has some sort of way to correct for this?

09 March 2010

People round their incomes to the nearest $5,000?

Here's something interesting: lots of people, when asked by the US Census Bureau "how much money do you make?", round to the nearest five thousand dollars.

See the data tables from the 2006 census. These give the number of people whose personal income is in each interval of the form [2500N, 2500N+2499], for integer N.

One sees, for instance, that the number of people making between $27,500 and $29,999 (which is near the mode of the distribution) is less than both those making $25,000 to $27,499 and those making $30,000 to $32,499. Something similar occurs at all income levels -- the number of people making between 2500N and 2500(N+1)-1 dollars is smaller if N is odd (and thus this interval doesn't contain a multiple of 5000) than if N is even (and so it does).

Surprisingly, the effect occurs even at very low levels of earnings. If you make $87,714 in a year I can see rounding to $90,000 -- but is the person who makes $7,714 in a year really rounding to $10,000?

(I found this while trying to answer a question at Metafilter: How many people in the United States make more than $10,000,000 per year?. I seem to recall reading somewhere that personal income roughly follows a power law in the tails, but can't actually find a reference for this.)

There also seems to be a preference for multiples of $10,000 over multiples of $5,000 that are not multiples of $10,000. But I have work to do, so I'm not going to do the statistics.