23 July 2007

five finger keyboards? I don't think so.

From Slashdot: Five Finger Keyboards from Trevor's Trinkets. Trevor suggests the possibility that one could have five-key "keyboards" that do everything that normal keyboards do, simply by having the possibility of pressing multiple keys at once and having every possible combination of keys mean something.

This seems like a bit of abuse of combinatorics to me. First, he points out that it would be possible to have 31 (25-1) possible "keys" by allowing any combination of "pressed" and "not pressed" for the five keys. This seems just barely feasible to me. The next suggestion is to take into account the order in which the keys are depressed, giving rise to 5 + 5*4 + 5*4*3 + 5*4*3*2 + 5*4*3*2*1 = 325 combinations. In general (although he doesn't point this out), for an n-key keyboard this would give rise to

n + n(n-1) + n(n-1)(n-2) + ... + n(n-1)(n-2)...1
= n! [1/(n-1)! + 1/(n-2)! + ... + 1/0!]

combinations, which is very nearly e(n!), since we have e = 1/0! + 1/1! + 1/2! + 1/3! + ... and the term in brackets above is this series with the very small terms omitted. (One can't always just chop off the "small terms" like this, but the terms decrease quickly enough that I can say this here.) In fact, it's e(n!)-1, rounded down to the nearest integer. (For the record, 120e = 326.19...)

Having key combinations that do something that are prefixes of other valid key combinations, though, seems like a Bad Idea. On your computer, for example, you might press Ctrl-C to copy something. C by itself does something. But Ctrl by itself doesn't do anything. That way if your fingers slip you haven't accidentally done Something Else. There's a notion of the prefix code (or "prefix-free code") which takes this into account. Huffman coding is perhaps the best example of this and are quite useful for compression of data, since they allow "letters" that occur more commonly to be encoded with shorter strings. (Morse code works similarly.)

Then he points out that if we take the order in which the keys are released into account, we'd have 15,685 combinations; this is again true, but requires far more finger dexterity than I think we can have the average person to have. Furthermore, this seems like an incredible tax on memory. (And remember that desigining for the "average" person is actually foolish, because about half of people are below average.)

In terms of memory, Trevor suggests that menus of some sort could appear which tell people which numbers lead to which keys, for example saying "1. a-i, 2. j-r" and so on; this seems to me like it would only work if there's some natural order to the characters to be entered. This means that Trevor's suggestion that this could be used for entering, say, Chinese characters wouldn't work so well; there's no natural order to those characters, as far as I know.

There's actually a fairly old convention for the input of alphabetic data that I rather like; I remember seeing it used for, say, getting stock quotes by phone. The convention was that one had A = 21, B = 22, C = 23, D = 31, E = 32, ..., Z = 94; that is, the usual telephonic pairing of letters with numbers, with a second digit required to uniquely specify the letter. (In terms of keystrokes per letter, this has very nearly the same efficiency as the code that a lot of present-day phones use, namely A = 2, B = 22, C = 222, D = 3, E = 33, ..., Z = 9999, about two keystrokes per letter.)

But as one of the people who has already commented at Trevor's blog pointed out, it's probably better to have voice-based interface than try to develop this any further. It may be theoretically possible to have very complicated input strings, but in designing a device for actual human beings to use we must take into account that people make mistakes.

5 comments:

Ian Varley said...

I don't think this idea should be abandoned completely, because there are potentially other things you can combine the 5 finger (or 10 finger) input with. For example, in the SciFi book "Rainbows End" (by Vernor Vinge) there's a great idea (not fleshed out, of course) called "Ensemble Coding", where a combination of hand motions & eye movements allows very fast (and virtually unnoticeable) text input by people who've learned it. It's futuristic, to be sure, but not any more far-fetched than current user interfaces might have seemed 40 years ago. Good on you for taking a harder look at the math here, but that doesn't mean the entire idea is going down the wrong path.

Michael Lugo said...

Ian,

you make a good point. I'm a bit skeptical about anything involving eye movement because I don't really think that I can control my eye movements well enough for them to actually be useful. They seem a lot harder to control than finger movement, mostly because finger movements seem easily digitizable (you either press the button or you don't, as I have already done hundreds of times in writing this comment) whereas eye movements seem more analog to me. Then again, we may see user interfaces develop to be more analog in the future; the digital paradigm is probably due more to ease of programming and the underlying digital nature of our current computers than to any aspect of human physiology.

I'm still inclined to think that the future of user interfaces is in some sort of voice-based control, mainly for what you might call the "evolutionary" reason -- human beings have evolved to communicate by voice. If communication by eye movement is such a good idea, why don't we talk by looking at each other funny?

But then, to some extent we do talk by looking at each other funny.

Anonymous said...

I very much doubt that anyone could control her eye movements to any serious degree. What Vinge seems to have forgotten is that one's perception of a steady gaze is actually one's brain correcting for literally thousands of constant jerky movements by our eyes.

What I'm surprised our host hasn't brought up is the explicit notion of information and information density. The problem with these suggested input devices is that their information density is far too high. If humans could input and output information that quickly, we'd all be speaking something like a Hamming code rather than such a redundant language as English. Why do we settle for a mere 3.2 bits of information per character (in its written form)? Because we simply don't process data arbitarily quickly.

These sorts of input/output devices are all well and good in theory, but they all seem to forget that us humans are using input and output devices as well, and that the two have to fit each other.

Matt Brubeck said...

No need to speculate: There are plenty of existing chording keyboards. Most have multiple thumb buttons (and sometimes multiple buttons for other fingers) to avoid some of the acrobatics described here. See the BAT keyboard, for example.

Unknown said...

There are ways to "type" using eye movement. For instance, Dasher reports up to 20 words per minute after an hour's practice using an eye tracker. (Dasher is very cool; it's free software, and can be used with all sorts of input devices. Check it out!)