13 June 2008

When it comes to oil, addition is hard

Have we underestimated total oil reserves?, from New Scientist (and, it appears, every other source in the British-speaking world).

Richard Pike, of the Royal Society of Chemistry and previously of the oil industry, points out that it's generally the convention in the oil industry to give, as a one-number estimate for the output of a particular oil, the 10th percentile of the distribution that they expect. This is an inherently conservative estimate. That's fine -- but then when they combine the estimates from every oil source they do it by just adding those together. If I'm understanding this correctly, the estimates that are out there of how much oil there is correspond to what happens if every oil well, etc. performs only at the 10th percentile of its expected distribution -- which just isn't going to happen. The 10th percentile is called the "proven reserves", the 50th "proven plus probable reserves".

I don't know what the underlying distributions are, but it seems like they generally report the 10th, 50th, and 90th percentiles of the underlying distribution -- so says Pike at this friction.tv video. They add these distributions together correctly internally but Pike claims that governments don't want to think about the probabilistic logic. Pike then claims that security analysts use the resulting very pessimistic estimates; perhaps the current run-up in prices is a result of this, although I'm not sure if he'd say this. (It sounds like it's not a secret that this is the way things is done, and perhaps the analysts know this.)

Mathematically, the idea is simple. Let's say that a certain oil field is expected to produce 4 megabarrels, and the production is normally distributed with standard deviation 1 megabarrels. The tenth percentile of a normal distribution is 1.28 standard deviations below its average, so the "proven reserves" of this field would be 2.72 megabarrels, and the "proven plus probable" 4 megabarrels.

But now say we have four such fields. The oil industry's techniques would say that those fields together have "proven reserves" of 2.72 million times four, or 10.88 megabarrels. But for uncorrelated distributions -- and I'm going to assume that the distributions here are uncorrelated -- the variances add. The variance of the distribution for one field is 1 megabarrel2, so for four fields it's 4 megabarrel2; the standard deviation is the square root of this, 2 megabarrels. The mean is still 16 megabarrels, but the 10th percentile is now 13.44 megabarrels. The chances of getting as low as 10.88 megabarrels are about one half of one percent; this is what Pike means when he calls this a pessimistic estimate. And of course with more fields the pessimism becomes more extreme.

The first reference I can find to this is a May 2006 press release of the Royal Society of Chemistry, referring to a June 2006 article by Pike in Petroleum Review but it seems to have swept through the British blog world in the last few days after the Times of London made a quick reference to it in an article on North Sea oil, as this press release of the RSC mentions. (For some reason the Times article refers to "this month"'s issue of Petroleum Review, which I can't seem to see online even though Penn's library claims to have an electronic version.)

If this is true (I hesitate to think it is, just because it seems surprising that people wouldn't know this!) then it's good news and bad news. Good news because it means we're not going to run out of oil as soon as we think, which means less economic shock. But it also means more carbon for us to spew into the air before we finally run out.

I encourage you to not read the comments in most places where this has been posted, because it's basically people just ranting about global warming and saying either "we've reached peak oil, anybody who says we haven't is a poopyhead" or "we haven't reached peak oil, anybody who says we have is a poopyhead".

The moral here: probabilistic forecasting is tricky. See also Nate Silver's appearance on cnn.com regarding polling for the presidential election, which is totally irrelevant here except that it also goes to show how a lot of people don't know how to aggregate probabilistic data.


CarlBrannen said...

After we go through the oil, we will begin turning coal into gasoline. The US is the world leader in coal reserves if I recall correctly. If we don't do it, the Chinese are second in supply and will burn their own. World supplies of coal are something like 100x as large as oil. The coal process is less efficient and results in even more CO2 per gallon than oil.

In the mean time, the public is exposed to junk science about biofuels which is almost impossible to argue against. Read the debate over at Backreaction and analyze the arguments.

misha said...

There are some speculations that the oil and gas reserves are not really fixed, but are created by some unerobic and thermophylic bacteria deep down there (the oil being produced by polymerization of gas under pressure), and seep to the surface. If this theory is correct, we will never run out of oil or gas, and our problem becomes just how to deal with our environment and how to keep ouselves from overpopulating the surface of our planet.

Efrique said...

Your post seems to take the position that the sum of two tenth percentiles will have a lower-than-tenth percentile in the distribution of the sum of the two random variables.

While true for your normal example, it is not true in general, and it's not that hard to construct examples - either continuous or discrete - where the percentile of the sum of two tenth percentiles of independent random variables is far past the 50th percentile of the sum.

Example: Consider two independent r.v.s distributed with the following probabilities:

1 2 3 4
0.09 0.009 0.9 0.001

The tenth percentile is 3.

If you take two independent RVs with this distribution and sum them, 6 is at about the 80th percentile.

Efrique said...

To be a bit clearer:
This implies that the "government estimate" of adding tenth percentiles is not necessarily conservative - it depends on the distribution(s) of the components.

I'm not sure what the conditions are for it to be conservative.

(Of course, with sufficient components, as long as none of them dominate the variance and their moments exist, the CLT ought to start to kick in eventually.)

Michael Lugo said...


that's a good point. And I don't know if the distributions for actual oil wells are normal; in particular I'd suspect they're not because there's no reason why they should be symmetric. But as you point out, I was hoping the CLT would take care of that.

The Petroleum Review article by Pike uses normal distributions, but I'm not sure if that's because the actual distributions are normal or just because it's easy to work with normal distributions.

Michael Lugo said...


I've heard that theory, but it's my understanding that that theory is not generally well received among oil people. And we probably ought to act as if it's false, for environmental reasons.

Anonymous said...

This analysis overlooks the fact that the underlying estimate that the distributions are based on may be (are) overly optimistic. For example, the Ghawar Field in Saudi Arabia, the largest in the world accounting for ~6.25% of global supply, was appraised in the 1970s to have 170 bn barrels of oil with 60 bn retrievable barrels. Since then it has produced 60 bn barrels and is currently producing ~5mn barrels a day. The Saudis claim it STILL has more than 71 billion barrels of PROVEN reserves remaining. (they don't let foreigners conduct independent evaluations of the field since ARAMCO was nationalized at the beginning of the 80's)

So either:
1)The original estimate was dramatically off (causing one to question the usefulness of such estimates if they can be so wrong)
2)The Saudis are full of shit (probably)
3)The distribution is not normal (also probably)
4)No one has any idea what they're talking about, with any significant degree of confidence (definitely)

Also no one has any idea how much demand is, the EIA projects total world consumption will rise to 118 mn barrels per day in 2030 from 83 million in 2004... of course they are usually wrong by 4-15% for projections over 10 years. And since oil is priced by the marginal market clearing units, a small mismatch in expected supply and demand can create dramatic increases in price. That is to say, while the price of oil is 400% higher today than in 2003 obviously demand is only marginally (1-3%) more than was expected in '03.

P.S. Isabel any interest in working for a hedge fund? you can work on problems like everyday....Before you answer, maybe read Physicist turned financier, Emmanuel Derman's, My Life as a Quant. http://www.ederman.com/new/index.html
I'm sure DE Shaw or AQR would love to have you and I'm sure they'd give you time to publish some research as well. Of course, i'm sure you know all this.

Efrique said...

Oh, I forgot to mention - when you said "for normal distributions the variances add" I think you meant to say something else.

For any distribution, the variances add when you have zero correlation - such as when you have independence; it has nothing to do with normality. (The independence assumption makes some sense here, though I have a good argument why the percentile estimates across fields might be correlated.)

(I'd anticipate that the individual -field distributions are skewed, but if the estimates of the percentiles are based on little enough data, who knows what the numbers might actually be doing, or actually represent.)

Michael Lugo said...


you're absolutely right that it has nothing to do with normality. What I had in mind with normality is just that it's easy to do calculations with normal distributions, and I went a bit too far.

And in reality there is probably some sort of correlation between the percentiles actually attained different oil fields, because the estimation methods used could have some systematic bias, or because new ways of extracting oil could be discovered.

Efrique said...

What exactly is this the probability distribution the distribution of?

Isabel's comment "And I don't know if the distributions for actual oil wells are normal" prompted me to elucidate on a point that's often missed. (I don't know that this is the case with Isabel - the quote there just reminded me of it.)

What random variable are we talking about the distribution of?

Note that we're not talking about the distribution of actual recoverable oil in a field - (at any fixed level of technology) that's a fixed quantity (pace theories about continuing production by microbes).

The percentiles represent quantiles of a distribution that largely quantifies our lack of knowledge (it's easiest to think in Bayesian terms here, but you can construct a meaningful discussion outside that).

Across actual oil fields, the distributions are highly right skewed (like almost all size distributions) - there are big and small fields, but there are necessarily more small fields than big. But at a given field, the amount of recoverable oil has a distribution because we are estimating what it contains. The distribution should probably reflect a predictive distribution (which would take into account both typical characteristics of recoverable oil given the known features of this field AND the amount of uncertainty about the parameters of the distribution), but I don't know if that's explicitly what the oil people consider when they produce estimates.

Efrique said...

This is a really interesting post.

(Apologies for all the nitpicks - it can be very hard to write precisely about this stuff without making the text impenetrable; it's probably better to take a shortcut in the text and deal with the nitpickers in comments.)

misha said...
This comment has been removed by the author.
misha said...


Here we go again, "Global Warming" and the illusion that we can control it... Another showcase example of junk science. A much better motivation would be to reduce our dependence on the imported oil and gas, and everybody knows where these are coming from: such wonderful places as the Saudi Arabia, Iran and Russia. Let those bastartd do some real work for their money!

JimB said...

In your reference to Nate Silver's CNN appearance, were you saying that Nate incorrectly aggregated probabilistic data, or that the show hosts did, or...? It wasn't obvious to me.

Michael Lugo said...


I was referring to the hosts of the show, or perhaps more accurately to the audience that the hosts of the show seemed to have in mind. Nate Silver knows what he's doing.

Anonymous said...

It was rather interesting for me to read the post. Thanks for it. I like such themes and everything connected to this matter. I would like to read a bit more soon.

Anonymous said...

You know what, use Wifi jammer to disable all spy devices in your room or office.