20 October 2007

The homophony group

From Justin Roberts at UCSD, an old homework problem:

"B: (Fun for all the family, ages 6-66...)The homophony group (of English) is the group with 26 generators a,b, ..., z and one relation for every pair of English words which sound the same. Example: "knight = night" shows that k=1, "earn=urn" shows that ea=u, and so on. (Scrabble rules apply; no proper names or slang!) Prove that the group is trivial! (I would guess it's also trivial in French but probably non-trivial in Spanish and very non-trivial in Japanese!)."

I was looking for something else, but I came across this problem, which I saw before in my first algebra class when I'd just learned what a group was; it's part of the folklore, apparently, and I feel like it's good to have the folklore written down. (It might be written down somewhere on the Internet under another name, but it's hard to find because "group" is such a common word in non-mathematical circumstances.)

I can't do it. I did it before, but I remember it took a lot of looking at lists of homophones; I can't find the particular lists of homophones I used years ago. Most of the relations are simpler than they look, because generally if two words in English are spelled the same they have a lot of letters in common. Here's a partial solution -- I prove that the English homophony group is a subgroup of the free group on two generators, namely m and v. (Or maybe three.)

First, we deal with the vowels. The first homophones that come to anyone's mind are probably to = too = two. From to = too we get o = 1. Thus to = two becomes t = tw, so w = 1. (Yes, I know, w isn't a vowel...) Next, consider the relations

rain = rein = reign, sign = sine, earn = urn.

From rain = rein we get a = e. From rain = reign we get ai = eig; but since a = e we can reduce this to i = ig, or g = 1. Thus we can eliminate the "g" in sign = sine to get sin = sine, so e = 1; recalling a = e, we have a = 1. Finally, from earn = urn we have ea = u; since e = a = 1 we have u = 1.

Another candidate for "most canonical homophone" is probably there = their (I'll ignore "they're" because I don't want to deal with apostrophes); we can clip the "the" in each of these to get re = ir. But e = 1, so this is r = ir, or i = 1. Next, i = eye, but i = e = 1, so this reduces to y = 1. (Does "i" count as a word? Of course I mean "I", the first-person singular pronoun. Roberts said "Scrabble rules apply", but Scrabble rules have never had to weigh in on whether "I" is playable. Scrabble rules out words which are "always capitalized" but the way in which "I" is always capitalized seems different from the way that a proper name is.)

At this point we've shown that the letters a, e, g, i, o, u, w, and y are trivial; this set of letters shouldn't be too surprising to anyone reasonable familiar with English spelling.

There are a lot of silent k's; we take no = know. Since w = o = 1, this reduces to n = kn, so k = 1. Similarly, we can leverage the silent h with wine = whine. lam = lamb gives us b = 1, and dam = damn gives n = 1.

Somewhere around here the problem seems to get harder. One key insight is that groups don't have nontrivial idempotents; if we can show that a generator is equal to its square, then it must be equal to 1. This gets us a few more letters: from bard = barred we have r = rr (recall e = 1), so r is trivial. Similarly, from medal = meddle we get dal = ddle; but a = e = 1, so dl = ddl; thus d = 1. This actually isn't how I take care of t, although there are enough double t's in the language that it's robably possible to handle t this way; I take packed = pact, from which ked = t; but k, e, d have already been shown trivial, so t = 1. A bit more trickily, thinking in terms of double letters, bawl = ball gives us w = l; but w = 1, so l = 1.

That's sixteen letters shown trivial so far: a, b, d, e, g, h, i, k, l, n, o, r, t, u, w, y. The remaining letters are c, f, j, m, p, q, s, v, x, and z. These seem trickier somehow... to my ear they have pretty well-defined sound. The easy letters to get rid of, you'll remember, are the vowels, which can basically sound however you want them to in English.

Still, a few more fall quickly. base = bass (the musical kind, not the fish), so s = 1; razor = raiser can now take advantage of the times when s sounds like z, since all the other letters involved are trivial, to give us s = z, so z = 1. The letters that are phonetically kind of k-like fall next; key = quay gives k = q, so q = 1, and choir = quire (the pair I'm proudest of finding, by the way) gives c = q, so c = 1.

Six letters to go: f, j, m, p, v, x.

lox = locks, which takes care of x. genes = jeans, killing j. phase = faze; only the p and f are nontrivial here, so p = f. There are two cheap tricks to handle this. The first is hiccup = hiccough, where all the letters involved are trivial except p, so p = 1; but are those separate words, or alternative spellings of the same word? The second is if = iff, so f = 1. (The problem here, of course, is that iff is a bit iffy as a word... is it really a word, or just an abbreviation for "if and only if"?) Two half-solutions add up to one whole solution, though, so I say that p and f are trivial.

That leaves two generators, m and v. If there are relations between them, they're trivializing ones; we're not going to get a relation like "mv = vm" (which would bring us down from the free group on two generators to the free abelian group on two generators), since that would require two homophonous words in English, both containing m and v, where in one the m comes first and in one the v comes first. I've tried to use the double-letter trick; the closest I can get is "clamor" and "clammer" to get rid of m, but these aren't really pronounced the same.

Any ideas on how to finish this up?

My vocabulary in other languages isn't rich enough to do this. But I agree with Roberts' judgement about French, Spanish, and Japanese; French is full of silent letters, even more so than English.

Edited, 12:24 pm: In the comments, dan suggests dammed = damned to show that m is trivial. And ruff = rough shows f is trivial, which gets around the objections I had before. I want to take advantage of the fact that the f in of is pronounced like v usually is, but I can't.


dan said...

damned=dammed is good enough for m, no?

Michael Lugo said...

damned = dammed works. I can't believe I didn't think of that one, seeing how I had dam = damn.

(Although now I'm trying to convince myself that I pronounce the n in "damned". I don't think I actually do, but I'm resisting the fact that I didn't see this!)

Anonymous said...

leafs = leaves?

Michael Lugo said...

d. eppstein,

I pronounce "leafs" (as in "She leafs through the pages of the book") and "leaves" (as in "The leaves fall from the trees in October") differently. You might not. One issue with this problem is that different people's pronunciation is different. (Showing "r" is trivial is much easier for a Briton than an American.)

Anonymous said...

How about veld = felt? It's not the first pronunciation listed, but it is an alternate. Not ideal, I know, but...

Anonymous said...

Also, if you're not comfortable with hiccup, how about rapt = wrapped?

Anonymous said...

Of course it's been written down. Right on Larry Washington's door at UMD is a paper in two columns. The left proves in English that the French homophone group is trivial. The right proves in French that the English homophone group is trivial.

Anonymous said...

You can find a similar two-column paper at this link

Theo said...

"I prove that the English homophony group is a subgroup of the free group on two generators, namely m and v."

Of course, you mean quotient group.

Anonymous said...

GRIMACED = GRIMMEST should take care of the M.