God Plays Dice: simulation

Showing posts with label simulation. Show all posts

15 December 2011

Solution to distance between random points from a sphere

So I asked on Sunday the following question: pick two points on a unit sphere uniformly at random. What is the expected distance between them?

Without loss of generality we can fix one of the points to be (1, 0, 0). The other will be chosen uniformly at random and will be (X, Y, Z). The distance between the two points is therefore

√((1-X)² + Y² + Z²)

which does not look all that pleasant. But the point is on the sphere! So X² + Y² + Z² = 1, and this can be rewritten as

√((1-X)² + 1 - X²)

or after some simplification

√(2-2X).

But by a theorem of Archimedes (Wolfram Alpha calls it Archimedes' Hat-Box Theorem but I don't know if this name is standard), X is uniformly distributed on (-1, 1). Let U = 2-2X; U is uniformly distributed on (0, 4). The expectation of √(U) is therefore

∫₀⁴ (1/4) u^1/2 du

and integrating gives 4^3/2/6 = 8/6 = 4/3.

(The commenter "inverno" got this.)

Of course it's not hard to simulate this in, say, R, if you know that the distribution of three independent standard normals is spherically symmetric, and so one way to simulate a random point on a sphere is to take a vector of three standard normals and normalize it to have unit length. This code does that:

xx1=rnorm(10^6,0,1); yy1=rnorm(10^6,0,1); zz1=rnorm(10^6,0,1)
d1=radic(xx1^2+yy1^2+zz1^2)
x1=xx1/d1;y1=yy1/d1;z1=zz1/d1;
xx2=rnorm(10^6,0,1); yy2=rnorm(10^6,0,1); zz2=rnorm(10^6,0,1)
d2=radic(xx2^2+yy2^2+zz2^2)
x2=xx2/d2;y2=yy2/d2;z2=zz2/d2;
d=radic((x1-x2)^2+(y1-y2)^2+(z1-z2)^2);

and then the output of mean(d), which contains the distances, is 1.333659; the histogram of the distances d is a right triangle. (The code doesn't make the assumption that one point is (1, 0, 0); that's a massive simplification if you want to do the problem analytically, but not nearly as important in simulation.)

07 February 2008

How to load airplanes faster

Optimal boarding method for airline passengers (arXiv:0802.0733), by Jason Steffen. Via Cosmic Variance. Apparently Steffen is a physicist who was sufficiently annoyed while boarding airplanes to start thinking there has to be a better way.

The conventional method (boarding from back to front), although it looks efficient from the point of view of someone in the airport, is quite bad; Steffen describes it as the second worst method, after the obviously bad front-to-back boarding If you stop to think about it you realize that the real sticking point is not getting onto the plane, but loading luggage into the overhead compartment. So the trick is something to parallelize by, say, boarding all window seats first, then all middle seats, then all aisle seats. Heuristically, you'd expect a speedup factor on the order of the number of people sharing each overhead compartment. This isn't exactly Steffen's method -- his simulations show that it's best to divide the people on the plane into four groups, which correspond to people in even- or odd-numbered rows on the left and right sides, and board each group in turn -- but his simulations as far as I can tell treat the entire plane as one long row. Still, either of these makes you realize that there's work to be done.)

Plus, if you board people so that the people loading their luggage at the same time aren't on top of each other, then you get less people hitting each other with their bags. Everybody wins! Except perhaps the people in the aisle seat, in my scheme -- since they get to load their bags last, and people try to get some pretty big carry-ons on planes these days, there might not be room in the overhead compartment for them. But that's got to happen to somebody.

15 August 2007

Are we living in a simulation?

The Simulation Argument is discussed at George Dvorsky's sentient developments, after being mentioned in an article yesterday by John Tierney in the New York Times. It's due to Nick Bostrom, whose original paper is available online.

The argument is as follows: "posthumans", that is, the people of the future who have much better computers than we do, will use their computers to run simulations of, well, more primitive humans. These simulations will be so detailed that they include a working virtual nervous system for all the people inside. And you have to figure that they're not going to run just one of these simulations; the future people run these simulations for fun! (This seems reasonable; I've spent way too much time playing SimCity to argue against this.) So we should expect that over the lifetime of humanity there are a very large number of such simulations being run, and that it is therefore very unlikely that we live in the "real world".

For me, this bears a superficial similarity to Pascal's Wager, although the mathematics is different; for one thing, Pascal's Wager involves an infinite payoff and there are no infinities here. But it's probably useful to think of there being effectively an infinite number of these simulations, in which case the probability that we're living in the "real world" turns out to be essentially zero. (Bostrom doesn't go this far; he says he feels there's about a 20 percent chance we're living in a computer simulation, which basically means he figures there's a 20 percent chance that civilization gets to the simulated-reality stage, if you neglect the probability that there are simulated realities but we're in the real one.) I suspect the real reason this reminds me of Pascal's Wager is because it seems natural to equate the runner of the simulation with "God".

What's especially strange, at first, is the idea that the simulations could have simulations within them. This reminds me of a cosmological theory that universes "evolve" by spouting black holes; in this theory, black holes are connections to other universes, where the various physical constants are slightly different than in the parent universe; thus there's a sort of Darwinian selection for universes, where the selective pressure is towards making lots of black holes. (Then why don't we live in a universe with lots of black holes?) As to the simulations within simulations -- if you carry this to the logical extreme, we are likely to live in some very deeply nested simulation. The problem is that infinite nesting probably isn't possible. And does the level of nesting even matter? My first instinct is to think that nested simulations would necessarily be of "lower fidelity" than first-level simulations, but since everything is digital this need not be true, as Bostrom himself points out. However, he also points out the disturbing fact that since a posthuman society would require more computing power to simulate -- you've got to simulate what all the computers are doing quite well -- if we head towards being posthuman we might be shut off! Personally I would like to think that the ethics of these simulations require the Simulator to not just shut us off. Presumably the person running the simulation could get on some sort of loudspeaker and let us know what was going on. (Although that might raise other ethical questions -- do simulated realities have some sort of Prime Directive, where you're not supposed to interfere with them?)

The mathematics of the argument seems so simple, though, that I'm almost inclined to throw it out on those means alone, along with other things likes Pascal's Wager and the Copernican principle. Surely proving the existence of God (and that's what this is, although people don't put it this way) can't be so easy!

07 July 2007

10000 losses, yet again

I posted a couple weeks ago a forecast of the date of the 10,000th Phillies loss, which I updated on Tuesday. My traffic-tracking software tells me that these have been my most-viewed and fourth-most-viewed pages, respectively. In particular I wondered what the chances were that I'd witness this historic event. I have tickets for July 13. The July 14 game will be televised nationally on FOX, the July 15 on ESPN; I'm certain the announcers will mention it, either as "the Phillies have just lost their 10,000th game" or "the Phillies are trying to avoid losing their 10,000th game" depending on how things play out.

Since then, the Phillies have won three and lost seven; as of the original post they'd lost 9,991 games all-time, so now they're up to 9,998 losses. They need two more for 10,000.

They play tonight, then again tomorrow, then after that not until Friday (which is when I have tickets). Who knows, I might see it.

Here are the probabilities now:

Jul 08	@ Rockies	0.257245
Jul 13	v. Cardinals	0.197124
Jul 14	v. Cardinals	0.157155
Jul 15	v. Cardinals	0.118057
Jul 16	@ Dodgers	0.126696
Jul 17	@ Dodgers	0.071025
Jul 18	@ Dodgers	0.037119
Jul 19	@ Padres	0.018392
Jul 20	@ Padres	0.009024
Jul 21	@ Padres	0.004337
Jul 22	@ Padres	0.002051
Jul 24	v. Nationals	0.000582
Jul 25	v. Nationals	0.000392
Jul 26	v. Nationals	0.000264
Jul 27	v. Pirates	0.000180
Jul 28	v. Pirates	0.000120
Jul 29	v. Pirates	0.000080
Jul 30	@ Cubs	0.000074
Jul 31	@ Cubs	0.000039
Aug 01	@ Cubs	0.000021
Aug 02	@ Cubs	0.000011
Aug 03	@ Brewers	0.000007
Aug 04	@ Brewers	0.000003
Aug 05	@ Brewers	0.000001

The single most likely game is now tomorrow's game; not surprisingly the Phillies have roughly a one-in-four chance of losing tonight and tomorrow afternoon and just getting this whole mess over with. There's almost a fifty-fifty shot of it coming during the Cardinals series (47.2%, to be exact); a 26.4% chance of it happening on the West Coast swing, which is when I originally thought it would happen; and an 0.6% chance of it happening it after their return from the West Coast.

By the way, current records for all teams are available at baseball-reference.com. The only team to have 10,000 wins so far are the Giants, with 10,150; the Cubs will most likely be next to reach that milestone, with 9,943. The Braves will be the next to cross the 10,000-loss line, but they're 320 short so it'll take a few years.

If you noticed that those are all National League teams, that's not a coincidence. The Phillies, Giants, Cubs, and Braves started play in 1883, 1883, 1876, and 1876 respectively; the eight original AL teams -- today's Orioles, Red Sox, White Sox, Indians, Tigers, Twins, Yankees, and A's -- all started in 1901, when that league was founded.)

The Phillies aren't the team with the lowest winning percentage, not by a long shot; they're .468 all-time. The Rangers, Rockies, and Padres are a bit worse at .467, .466, and .463; the Devil Rays are .398 all time. But they're all expansion teams, and expansion teams are historically bad the Phillies do have the worst winning percentage of the original sixteen teams. (The original NL is the Braves, Phillies, Cubs, Cardinals, Dodgers, Giants, Reds, and Pirates.) You can see online the standings of the the eight original AL teams and NL teams graphed since 1901. It won't surprise anyone to learn the Yankees are the best AL team in that time, and the Giants are only a bit less surprisingly the best NL team.

03 July 2007

10,000 losses: an update

A week ago, I posted a forecast of the date of the 10,000th Phillies loss. In particular I wondered what the chances were that I'd witness this historic event. I have tickets for July 13.

Since then, the Phillies have won two and lost five (including dropping three out of four to the Mets, which was particularly galling because New York fans have started coming down to Philly in large number for the games). At the time that I wrote that previous post, the Phillies had 9,991 losses; now they have 9,996, so there are four more to go.

Fortunately, the 10,000th loss can't come tomorrow, on July 4; even if they lose tonight's game (which is just getting underway) and tomorrow's, that'll "only" be 9,998. The earliest the 10,000th loss could come is against the Rockies on Saturday.

But here are the probabilities now:

Jul 03	@ Astros	0.000000
Jul 04	@ Astros	0.000000
Jul 06	@ Rockies	0.000000
Jul 07	@ Rockies	0.047428
Jul 08	@ Rockies	0.110681
Jul 13	v. Cardinals	0.117107
Jul 14	v. Cardinals	0.121772
Jul 15	v. Cardinals	0.115521
Jul 16	@ Dodgers	0.152669
Jul 17	@ Dodgers	0.118578
Jul 18	@ Dodgers	0.083456
Jul 19	@ Padres	0.054027
Jul 20	@ Padres	0.033566
Jul 21	@ Padres	0.019969
Jul 22	@ Padres	0.011472
Jul 24	v. Nationals	0.003890
Jul 25	v. Nationals	0.002815
Jul 26	v. Nationals	0.002028
Jul 27	v. Pirates	0.001477
Jul 28	v. Pirates	0.001050
Jul 29	v. Pirates	0.000744
Jul 30	@ Cubs	0.000736
Jul 31	@ Cubs	0.000431
Aug 01	@ Cubs	0.000250
Aug 02	@ Cubs	0.000144
Aug 03	@ Brewers	0.000099
Aug 04	@ Brewers	0.000048
Aug 05	@ Brewers	0.000023
Aug 07	v. Marlins	0.000007
Aug 08	v. Marlins	0.000005
Aug 09	v. Marlins	0.000003
Aug 10	v. Braves	0.000002
Aug 11	v. Braves	0.000001
Aug 12	v. Braves	0.000001
Aug 14	@ Nationals	0.000000

In particular, the peak now looks like the Cardinals series; there's a 35% chance of it happening in those three days, and nearly a one-in-eight chance I'll witness the historic event in person. My original prediction had a 66.8% chance of it happening on the West Coast swing July 16-22; now it's only 46.3%. The effect is still helped by the fact that the Dodgers and Padres are strong teams; notice that the probability decreases from the 14th to the 15th and then increases from the 15th to the 16th, and that the 10,000th loss is three times as likely to come on the 22nd as on the 24th. And the distribution doesn't stretch nearly as far into the future; the first game which has a chance of less than one in two million to be the 10000th loss (which rounds to zero) is August 14 in the current simulation, versus August 27 when I ran the numbers last week.

In other baseball-milestone news, Clay Davenport of Baseball Prospectus made a prediction in May that Barry Bonds was most likely to hit his 756th home run in mid-June, with a probability of 80% that he'd have done it by now. He's up to 751. This is because he got off to a slow start. If I had to guess, I'd predict as follows: Bonds has hit 17 home runs in his team's first 81 games. (The Giants are currently playing their 81st game, and are in the fourth innings; Bonds hit a home run in the first. I'm assuming he doesn't hit any more tonight.) So it'll take him 81 * 5/17 = 24 more games to reach the record, which projects to July 31 against the Dodgers. This sort of logic is notoriously bad; it's the sort of logic that says that since a player hits 12 home runs in April he'll hit 72 for the season, or that a team that starts its season by winning three out of four will go on to have a 122-40 record, when of course there's really regression to the mean. But it seems at least somewhat sound in this case.

27 June 2007

the 10,000th Phillies loss will come on the West Coast

Walking around this morning, I saw the Philadelphia Weekly's cover story: Losing proposition. This is an article about how the Phillies are very close to having ten thousand losses. The New York Times made fun of us a couple weeks ago (but the Times mocks anything involving Philadelphia). There are sites like Countdown to 10000 and Celebrate 10000 in honor of it. They sell T-shirts. Some people claim the 10,000th loss was in June of 2005, against the Red Sox -- but this is only true if you count the Worcester Worcesters of 1880-1882 as being the Phillies. They're not.
(Yes, the Worcester Worcesters. Some sources call them the Brown Stockings, but I like calling them the Worcesters because it shows even less ingenuity in naming than the name "Phillies" does.)
There are three facebook groups. (I wonder if there's a myspace group; the link goes to a paper that's been circulating about the class differences between Facebook and Myspace.)
Then I remembered that I have Phillies tickets for their game against the Cardinals on July 13th, the first game after the All-Star break.
I got to thinking -- what are the chances that I'd see the Phillies' ten thousandth loss? They've lost 9,991 games so far; they've got nine more to go.
Surely the 10,000th loss is a historic moment in all of professional sports. No team has lost this many games. (The San Francisco (formerly New York) Giants have won 10,000.)
It's not so hard to compute this. What I needed to know was the probability that the Phillies lose each particular game. This can be found via a method which for some cryptic reason is called the "log5 method", which I learned about from this article from Diamond Mind which computed the probabilities that each of the 2002 playoff teams would win the World Series. The method is as follows: if team A wins p_A of its games, and team B wins p_B of its games, then the probability that team A wins in any given game against team B is
p_A(1-p_B) / (p_A(1-p_B) + p_B(1-p_A).
The best justification for this formula is that it works when you test it on actual data. (Actual baseball data, that is; I'm not sure if it's good for other sports.) But an intuitive justification for it is as follows: you have two coins, coin A and coin B. Each coin has "win" on one side and "loss" on the other. Coin A comes up "win" with probability p_A, and coin B comes up "win" with probability p_B. To simulate a game, flip the two coins. If one comes up "win" and one comes up "loss", that gives you the outcome of the game; if they both come up the same, flip again. Notice that the formula passes a couple sanity checks. If p_A = 0, then it always gives 0 -- that is, if a team never wins, then its probability of winning against any opponent is zero. If p_B = 1/2, then it just gives p_A -- so a team which is playing aginst average teams performs how it usually performs.
To adjust for home field advantage, I added 0.02 to the home team's winning percentage and subtracted 0.02 from the visiting team's winning percentage; this is the method used at Baseball Prospectus' postseason odds simulation, which I'll have more to say about later.
So, for example, the Phillies play the Reds tonight, in Philadelphia. The Reds have won 29 games and lost 48, so their winning percentage is .377; we replace this with .357 since the Reds will be playing on the road. The Phillies have won 40 and lost 36, so their winning percentage is .526; we replace this with .546 since they're playing at home. The formula tells us that the Reds' chance of winning tonight is
(.357)(1-.546) / ((.357)(1-.546) + (.546)(1-.357))
which is 0.315. This is the Phillies' chance of losing, which is what I'm interested in.
So after tonight, the Phillies will have eight losses to go with probability 0.315; they'll have nine losses to go with probability 1-0.315, or 0.685.
They'll play the Reds again tomorrow night. After that game, they have seven losses to go with probability (0.315)² = 0.099; they have eight losses to go with probability (.315)(.685)+(.685)(.315) = .432; they have nine losses to go with probability (0.685)² = 0.469.
Thus, I set up a spreadsheet which calculates the probability that after each game, they have 9, 8, 7, ..., 1 losses to go. The probability of the Phillies getting their ten-thousandth loss on a certain day is the probability that they have 9,999 losses before that day ("1 loss to go"), times the probability of losing that day.
The results are as follows. The rows in red are home games, following the same color scheme as the sorted schedule. The winning percentages are from mlb.com standings as of June 27.

Date	Opponent	Chance of 10,000th loss
Jun 27	v. Reds	0.000000
Jun 28	v. Reds	0.000000
Jun 29	v. Mets	0.000000
Jun 29	v. Mets	0.000000
Jun 30	v. Mets	0.000000
Jul 01	v. Mets	0.000000
Jul 02	@ Astros	0.000000
Jul 03	@ Astros	0.000000
Jul 04	@ Astros	0.000467
Jul 06	@ Rockies	0.002946
Jul 07	@ Rockies	0.009603
Jul 08	@ Rockies	0.021746
Jul 13	v. Cardinals	0.030071
Jul 14	v. Cardinals	0.041621
Jul 15	v. Cardinals	0.052571
Jul 16	@ Dodgers	0.091757
Jul 17	@ Dodgers	0.106722
Jul 18	@ Dodgers	0.112506
Jul 19	@ Padres	0.108077
Jul 20	@ Padres	0.097745
Jul 21	@ Padres	0.083264
Jul 22	@ Padres	0.067340
Jul 24	v. Nationals	0.031618
Jul 25	v. Nationals	0.026675
Jul 26	v. Nationals	0.022282
Jul 27	v. Pirates	0.018717
Jul 28	v. Pirates	0.015312
Jul 29	v. Pirates	0.012425
Jul 30	@ Cubs	0.014016
Jul 31	@ Cubs	0.010085
Aug 01	@ Cubs	0.007142
Aug 02	@ Cubs	0.004984
Aug 03	@ Brewers	0.004103
Aug 04	@ Brewers	0.002533
Aug 05	@ Brewers	0.001533
Aug 07	v. Marlins	0.000613
Aug 08	v. Marlins	0.000441
Aug 09	v. Marlins	0.000317
Aug 10	v. Braves	0.000251
Aug 11	v. Braves	0.000171
Aug 12	v. Braves	0.000115
Aug 14	@ Nationals	0.000074
Aug 15	@ Nationals	0.000051
Aug 16	@ Nationals	0.000034
Aug 17	@ Pirates	0.000024
Aug 18	@ Pirates	0.000016
Aug 19	@ Pirates	0.000011
Aug 21	v. Padres	0.000008
Aug 22	v. Padres	0.000005
Aug 23	v. Padres	0.000003
Aug 24	v. Dodgers	0.000002
Aug 25	v. Dodgers	0.000001
Aug 26	v. Dodgers	0.000001
Aug 27	v. Mets	0.000000

So it appears most likely that the Phillies will have their ten-thousandth loss on the West Coast, between July 16 and July 22; there's a 66.8% chance of it happening in those seven games. This is where you'd "naturally" expect things to peak anyway -- since the team loses about half the time, you'd expect it to take them 18 games in order to lose 9. That road trip is the 16th through 22nd games if we start counting from today. Plus, they'll be on the road, and the Dodgers and Padres are both good teams. It actually surprised me to see that the 10,000th loss is nearly twice as likely in the first game of that road trip (July 16 @ Dodgers) than in the last game of the preceding homestand (July 15 v. Cardinals), and similarly for the last game of the road trip (July 22) and the first game back (July 24). The soonest it can happen, as of this writing, is July 4, if they lose the next nine -- that would seem somehow appropriate, given what happened in Philadelphia on a long-ago July 4. The tail of the distribution is long -- there's always that slim chance that the Phillies could get ridiculously hot and stretch this out for thirty games or more. I wouldn't bet on it, though.

And I've only got a three percent chance of seeing this historic moment on the 13th of July. I hope I don't see it, because that would mean the Phillies would only win four out of their next thirteen.

edit (Friday, 2:39 pm): Frank athot dogs and beer features a similar analysis.

God Plays Dice