Here's a book that's available online that I learned about recently: Information Theory, Inference, and Learning Algorithms by David MacKay. I learned about it from Steve at Information Processing; he learned about it from Nerd Wisdom.
I rather like the trend of books being available online. For one thing, it means that I do not have to carry books back and forth between my office and my apartment. (This is unfortunately not true for the calculus book I'm teaching from, which is about 1200 pages; fortunately my officemate has a copy of the book he doesn't use, so I use his copy when I'm on campus and my copy when I'm at home.)
This book seems to be a good introduction to information theory, machine learning, Bayesian inference, etc.; I have not had the chance to read any part of it thoroughly but I have randomly sampled from it and it seems quite interesting.
A few days ago I wrote about the question of finding the next element in the sequence "1, 2, 5, ..." and some of my readers complained that this problem is not well-posed. MacKay, in his chapter on Occam's razor, gives a similar example: what's the next number in the sequence "-1, 3, 7, 11"? You probably say 15. Consider two possible theories -- the sequence is an arithmetic sequence, or it is given by a cubic function of the form cx3 + dx2 + e, where c, d, and e are rational numbers. (The omission of the linear term is deliberate; otherwise the first theory would be a case of the second one.) The first one turns out to be much more likely, for the simple reason that there are less parameters to be tweaked! Let's say it's equally likely that the underlying function is linear or cubic; there are just a lot more possible cubic functions, so each particular cubic function is less likely. (For the details, see p. 345 of MacKay.)
By logic such as this, the most likely next element in that sequence is difficult to say, though... should we prefer 10, because then the nth term is given by the simple explicit formula (n-1)2+1? Or should we prefer 12 or 13, which are both given by simple linear recurrences? My instinct is that it depends on where the problem comes from, since the various possible next terms arise from different sorts of combinatorial structures, and in this sense the problem was ill-posed. In reality we wouldn't start out by assuming that all possible theories have equal probability; for one thing, there are infinitely many of them! The "simple" theories (the sequence has an explicit formula which is a polynomial, or a linear recursion with constant coefficients, or something like that) have higher prior probabilities... but given enough evidence, any complicated theory that starts out with nonzero probability of being true could turn out to be true. Extraordinary claims, though -- those with low prior probability -- require extraordinary evidence. There's a nice toy example a little further on in MacKay (p. 351) showing that if you see a piece of a box sticking out to the left of a tree, and a piece of a box of the same height and color sticking out to the right, it is nearly certain that those are actually pieces of the same box, and not two different boxes.
(What I've sketched in the previous paragraph is a bit of a lie, though; mathematical reasoning is rarely anywhere near this fuzzy. One could almost argue that it never is, because if we are so unsure about whether a result is true or not then we just don't call it mathematics.)
PS: I realized that I had a bunch of things I wanted to write about that were kind of piling up, so I might be digging into stuff that made its way through the blogosphere a month or so ago. I would feel no need to apologize for this except that blogs move so quickly.
PPS: The title of this post is due to Carl Sagan. I did not realize that when I titled the post, though, but I knew I'd heard the phrase somewhere before and Google tells me it's his.