In yesterday's New York Times, Dennis Overbye writes about the possibility of hiding secret messages in human DNA.
This seems vaguely plausible. Each strand of DNA is composed of a sequence of the four bases adenine, cytosine, guanine, and thymine. One could use these like the digits 0, 1, 2, and 3 in a base-4 number system; equivalently, they could be used as 00, 01, 10, 11 in a binary number system, so each base represents two bits.
Humans have done things like this. Freshly allocated memory in certain computing environments is filled with the repeated string (in hexadecimal notation) DEADBEEF; also ABADBABE, BAADF00D, CAFEBABE have been used. (CAFEBABE is apparently used in Java-related contexts; see this archive of a thread "why CAFEBABE?" on comp.lang.java.) It is of course quite unlikely that any of these strings would be found repeatedly in a computer's memory, if the memory is filled at random; the chance of getting, say, ten DEADBEEFs in a row (assuming there's not a process that's just copying some string over and over again) is one in 2320, which is more than the number of subatomic particles in the universe. As you may know, the Central Dogma of molecular biology says that DNA is transcribed into RNA, which is translated into proteins; each triplet of DNA bases maps to a single amino acid, of which there are twenty. There's a code that assigns a letter to each amino acid; the letters B, X, and Z are "special" letters; U, O, and J aren't used. It's possible to spell things with the remaining twenty letters, though, and I've heard that some genetically engineered food includes the name of the company doing the engineering in the junk DNA.
So what if someone designed us? Maybe they'd hide a message in the DNA? (For the record, I don't believe in intelligent design; however, if we were intelligently designed, that leads inexorably to the question of "who designed the designer"?) But how would they hide that message? They don't know what language we speak, and they certainly don't know that we'll invent this twenty-letter way of describing protein sequences concisely. And unlike in the DEADBEEF example, there appear to be reasons why you'd want stretches of DNA to be the same thing over and over again; these occur in the so-called junk DNA. Like many mathematicians, I'm inclined to believe that they'd hide the prime numbers. The idea behind this is that the primes should never occur due to a natural process, but any culture which is the least bit mathematically sophisticated should have them. (The idea comes from people who are searching for extraterrestrial intelligence; they assume that both us and the other species involved have radio astronomy, and inventing radios without mathematics is Hard.) The sequence
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, ...
which in base-4 is
2, 3, 11, 13, 23, 31, 101, 103, 113, 131, 133, 211, 221, 223, 233, ...
(Note that a very large number of these base-4 numbers, when read in base 10, are also prime! This is just a coincidence, although the fact that 4 and 10 are both even -- and therefore numbers which are odd remain odd under this transformation -- helps.) Replacing 0, 1, 2, 3 with A, C, G, T, we get
and so if we see this string in DNA, perhaps we should be suspicious? Well, it's 37 base-pairs long; thus we expect it to occur once in every 437 base pairs. The human genome is about 3,000,000,000 base-pairs long, so if the genome were random, the probability of this string occuring is 3,000,000,000/437 = 1.6 × 10-13.
So if we find it? Then yes, there's probably a Designer. But this doesn't mean that creationists should go fishing for hidden patterns in the genome. First, my choice of how to encode the primes was entirely random. We could reorder A, C, G, and T. We could have encoded the primes in base 3, using the fourth base to separate them. We could have encoded the primes as
where the number of A's between each pair of C's is prime. And so on. Creationists looking in DNA would, I suspect, take a Bible code-like approach to the search. And if there were slight errors? They'd blame it on mutations, which are inevitable (the Times article points out that there are certain "ultraconserved" segments of the genome -- but those sections also appear to be functional, so it would be harder to hide a message in them -- but then if these hypothetical designers are so smart, maybe they can make those sections be functional and hide messages...)
Sequencing the human genome is good for lots of reasons. But the search for messages from the past probably isn't one of them. They might be there, but we'd be searching for a needle in a haystack. And there would be lots of shiny things that aren't needles there, too.