Terence Tao, at his blog, describes a rather interesting-sounding interdisciplinary conference he recently attended. At this conference, he gave a talk about Structure and Randomness in the Prime Numbers, which he describes as a shortened version of a similar talk he gave at UCLA. You can see the slides and video of his UCLA talk here. The slides are not terribly useful, because Tao is a good speaker. Like a good speaker, he doesn't just read his slides; the slides give you an idea of what he's talking about but leave out perhaps 90% of the context. The video's good, but it's in RealPlayer format, and RealPlayer and Windows XP don't cooperate -- I got a Blue Screen of Death while watching.
Anyway, one of the ideas he mentions in this talk -- and this is something that I was not aware of before hearing this talk -- is that the Riemann hypothesis, which isequivalent to a certain strengthenging of the prime number theorem, is in a sense a probabilistic result. The Riemann hypothesis is, according to Tao, equivalent to the idea that the primes do behave randomly -- they are distributed according to the prime number theorem, with an error term that is exactly what you'd expect from the law of large numbers. This does not appear to be mentioned in, say, the Clay Mathematics Institute description of the problem, which I think is a shame.
All this got me thinking. I knew that there was a certain heuristic that is often used to determine what results about prime numbers "should be" true -- one assumes that a number which is approximately N has probability 1/log N of being prime. This works quite well, which led Paul Erdos to say: "God may not play dice with the universe, but something strange is going on with the prime numbers." This, for example, leads to a probabilistic argument for the Goldbach conjecture. The Goldbach conjecture states that every even integer greater than or equal to four is the sum of two primes: 4=2+2, 6=3+3, 8=3+5, 10=3+7=5+5, 12=5+7, 14=3+11=5+9, 16=3+13=5+11, 18=5+13=7+11, 20=3+17=7+13, and so on. For small numbers this seems like an accident. But there are, for example, six ways to write 100 as a sum of two primes:
100 = 3 + 97 = 11 + 89 = 17 + 83 = 29 + 71 = 41 + 59 = 47 + 53.
and if we pick a random good-sized even number there are lots of ways to write it as a sum of two primes. For example, the even numbers 4000, 4002, ..., 4020 can be written as a sum of two primes in 65, 106, 72, 52, 104, 65, 54, 108, 55, 66, 133 ways respectively. So we start to get the feeling that there are lots of ways to write even integers as sums of primes, and so it would be astonishingly shocking if the Great Mathematician In The Sky had stacked the deck so that, say, 4045580440 couldn't be witten as a sum of two primes.
On the other hand, there are infinitely many even integers. Let's say that by "astonishingly shocking" I meant that the "probability" that 2N can't be written as a sum of two primes is 1/N. Then, since the sum 1 + 1/2 + 1/3 + 1/4 + ... diverges, not only would Goldbach's conjecture be false, but it would be false infinitely many times.
As it turns out, though, we can make a guess at the probability that n can't be written as a sum of two primes. First, we try to estimate the number of ways to write n as the sum of two primes. This is the "probability" that 3 and n-3 are both prime, plus the "probability" that 4 and n-4 are both prime, plus the "probability" that 5 and n-5 are both prime, and so on up to the probability that n/2 and n/2 are both prime. (I know, you're saying "3 is prime! 4 isn't prime! What are you talking about?") Next, we assume that k and n-k being prime are independent, which isn't true. Then the expected number of ways that n can be written as a sum of two primes is
1/(log(3) log(n-3)) + 1/(log(4) log(n-4)) + ... + 1/(log(n/2) log(n/2))
which turns out to be about n/(2 log2 n). This is actually an underestimate, the actual value conjectured by Hardy and Littlewood being something like 1.3 n/(log2 n); the basic idea is that if k is prime, then n-k turns out to be more likely to be prime. For example, since n is even, and k must be odd, then n-k is odd, which makes it more likely to be prime. (This is pointed out at the Wikipedia article.) But even the independent result is quite strong. Since being prime is a "rare" property, I'll assume that the "distribution" (and I am using this word quite loosely) of the number of ways in which n can be written as a sum of two primes is a Poisson distribution with parameter n/(2 log2 n). This means that the "probability" that an even number n can't be written as a sum of two primes is
f(n) = exp(-n/(2 log2 n))
which is quite small for decent-sized numbers. For n = 10,000, for example, it's about 2.5 × 10-26. For n equal to one million, it's around 10-1138. The sum f(2) + f(4) + f(6) + ... converges to a number somewhere around 29, and what's more it converges quite quickly. So if there's a counterexample, it's small; if we can show that Goldbach is true for, say, numbers up to a million (which is easy to do with modern computers), then this heuristic shows it's "almost certainly" true.
There's a similar heuristic argument for why there are no odd perfect numbers, due to Pomerance; unfortunately this argument shows there are no even perfect numbers, as well. But it turns out that the even perfect numbers have a special sort of structure.
And one wants to know that the primes don't have this special sort of structure -- that they are in fact psuedorandom, as Tao calls it. (I'm not sure if "psuedorandom" is a well-defined term in any branch of mathematics; Tao uses it in two different ways in his talk, and explicitly points out that his two uses are different.)
Of course, this whole idea of thinking of primes probabilistically is sort of silly. In A Mathematician's Apology*, G. H. Hardy wrote that "317 is a prime, not because we think so, or because our minds are shaped in one way rather than another, but because it is so, because mathematical reality is built that way." Yet we don't know what the really big primes are. From a philosophical standpoint, why not act as if the primes are randomly distributed until we have evidence to the contrary? Proving the Riemann hypothesis would show that this way of looking at things is indeed valid.
A note on notation: when doing mathematics "log" means the natural logarithm; I can't think of a single place where the common, or base-10, logarithm appears in mathematics. Sometime base-2 logs appear, especially in questions in the analysis of algorithms where the analysis has some sort of structure that involves breaking things up in half. Base-10 logarithms are useful as a computational aid and nothing more. At one point I was teaching a calculus class and had a form on my web page where students could leave anonymous comments about the class. (I do not recommend this technique if you have a thin skin, but it seemed like a good idea at the time.) One of the students said, essentially, "please use 'ln' instead of 'log', it confuses me when you use 'log'. I didn't know how to respond to this. Fortunately, since the comment was anonymous, I didn't have to.)