06 October 2007

Are we so different after all?

Mark Liberman writes about The Pirahã and us, from Language Log:
The Pirahã language and culture seem to lack not only the words but also the concepts for numbers, using instead less precise terms like "small size", "large size" and "collection". And the Pirahã people themselves seem to be suprisingly uninterested in learning about numbers, and even actively resistant to doing so, despite the fact that in their frequent dealings with traders they have a practical need to evaluate and compare numerical expressions.

And we're like this too, he claims; to a very good approximation, we don't have words for information about the distribution of representative samples.

We have these words (the examples he submits are "percentile", "histogram", "standard deviation", "frequency distribution", "variance", and "confidence intervals") but perhaps a hundred thousand or so people in the USA actually understand what these words mean. There are three hundred Piraha; if you pick three hundred Americans, chances are you won't get one who understands this stuff. (I am not quite bored enough to go out on the street and do this study, and even if I did, I live near a university so the data would be skewed.) I suppose that this is true of any specialized field, though, not just statistics.

Notably not among that set of people are the journalists whose job it is to inform other people of these things; as readers of this blog know, this causes much comedy for those of us who do know a little bit about these things.

Although I haven't seriously thought this through, it seems plausible to me that instead of teaching college students who will take one math course calculus, we should teach them statistics; statistics might actually be useful.

Other, larger, languages have the same issue as the Pirahã do, though. It's my impression that there are very few things you can't talk about in English due to a lack of vocabulary. However, I admit that I might not know about these things if they exist -- all the languages I know are either English or have a large number of people who speak the language and English. I've heard that in smaller European languages this isn't the case -- there are things that, say, Norwegian or Catalan just doesn't have the words for, that speakers of those languages might want to talk about. (For example, what do they call the Catalan numbers in Catalan?) This is because in a large language community like English-speakers, someone will want to talk about thing X, even if thing X is rare, but this is less likely to be true in a smaller language community. The usual solution seems to be to borrow words from other languages -- but perhaps small numbers are the sort of things you just can't borrow. (Remember that I'm not a linguist.)

4 comments:

dan said...

I have thought about this quite a bit over the years that I have been a prof, and I strongly believe that we shouldn't be teaching calculus to the majority of people we teach it to, in preference for probability and statistics.

An additional difficulty comes because most college students don't see any value to probability or statistics until they actually use it, yet application is rarely integrated into a mathematical curriculum. I have been told by many students that taking my course caused them to understand why they had been previously forced to take the courses on probability and statistics that we require of all students in our Faculty. By contrast, most of our students never have this experience, but they have two calculus courses under their belt...

Anonymous said...

If you ask what number of Americans could provide a perfect definition of every one of those terms, perhaps you are right: 100,000 sounds about right. However, if you provided a multiple choice test and asked, how many Americans would correctly identify what those terms meant, I think it would be more like 1/2%, or 1.5 million.

There are about 700,000 doctors in the U.S. I can't imagine that less than, say, 2/3 of them would have seen every single one of those terms in the course of their training, and I would hope that at least 2/3 of those would remember the meaning well enough to pick it correctly on a multiple choice exam.

That, right there, gets you perhaps 300,000 Americans who understand these terms.

Think, beyond that of all the scientists and highly-skilled engineers, computer programmers, and certain technical workers who came across those terms in their education. I very roughly figure that would multiply the number a few more times.

Then there are the students, teachers and some number of auto-didacts.

I'd be surprised if this didn't bring the total up to over a million. I'd be surprised if the number got over 3,000,000.

Anonymous said...

Oh, and that's the number of people I'd expect to get all of the terms. If we're interested in how many people would understand at least one or two (with "percentile" and "histogram" being the two most likely candidates, in my opinion), I think the number would go way up.

Anyone at the top 20% level of educational achievement has seen two or more of those terms. Standardized test results are reported along with percentiles. I'm willing to hazard a guess that anyone at the top 10% level of educational achievement has probably both seen and learned about the meaning of two or three of those terms.

Plenty of people who don't have that much education encounter percentiles from discussions of sports and politics. Some fraction of them must look the concepts up and learn about them.

So I'd reckon that perhaps 10% of the U.S. adult population would be reasonably familiar with at least two of those terms: probably percentiles and histograms, but maybe percentiles and another of the terms.

Anonymous said...

On the linguistic issue, there are things we can talk about despite that we don't have words for them, of course. Feynman, in an autobiographical essay, mentioned learning that we can think in images instead of words in grade school from another kid. The kid asked Feynman if he knew the shape of a car part -- maybe the crankshaft. Feynman realized that he knew the shape and could easily visualize it, but he didn't have a name for it. If you think of such a part, you might be very hard pressed to give a short, precise description of what you can visualize about it.

These ideas are related to the Sapir-Whorf hypothesis, but that has to do with grammar, not vocabulary. FWIW, English is unquestionably unusual in terms of the size of the active vocabulary. It can't be quantified precisely, but the size and fluidity of the English vocabularly are both enormous.