24 November 2007

A simple probability puzzle,

A simple probability puzzle, from Doubting Toshuo via Anarchaia.

Paraphrasing a bit: imagine we flip a fair coin until either THH or THT appear on three consecutive flips. Which is more likely to come up first? Alternatively, which has the smaller expected time to its first appearance? (These are different questions; the first is "which wins the race" and the second is "which has the faster average time". They don't necessarily have the same answer in general.)

The answer to both is THH. This is analyzed in Flajolet and Sedgewick's forthcoming book Analytic Combinatorics, section I.4.2. If you want to figure out either the probability that THH appears before THT or the distribution of the first time at which they appear, it's a basically routine problem in the combinatorics of words.

Most people say that they're equally likely to come up first, and that they have the same expected time to the first appearance. This is because they're confusing this problem with a closely related problem: which pattern appears more often in the long run? And both patterns have probability 1/8 of appearing in the positions n through n+2 for any n; the expected value of the number of appearances of THH, or of THT, which start by position n is in fact n/8. But patterns like THT, which can overlap themselves (the technical term is "autocorrelation"), tend to occur in clusters; for example, we can get three occurrences of THT in just seven coin flips: THTHTHT -- but to get three occurrences of THH takes at least nine coin flips. As Flajolet and Sedgewick write:
On the other hand, patterns of the same length have the same expected number of occurrences, which is puzzling. The catch is that, due to overlaps of [pattern 1] with itself, occurrences of [pattern 1] tend to occur in clusters, but, then, clusters tend to be separated by wider gaps than for [pattern 2]; eventually, there is no contradiction.

They use different patterns, but the same effect occurs for their patterns as for Toshuo's. There might be a halfway decent bar bet here; unfortunately most of the people I would make a bar bet with are my friends, and my friends know I'm a probabilist, so they'd be suspicious. Especially since I've written about this here.

4 comments:

  1. ; Inputs: A and B are bit-strings of length N,
    ; right-justified in EAX and EBX respectively;
    ; 1 represents heads and 0 represents tails.
    ; N ≤ 30 is in ECX, and A ≠ B.
    ;
    ; Output: ESI/EDI is the probability that A
    ; precedes B in a sequence of flips.

    Prob:
    PUSH EAX
    PUSH EBX
    MOV EBX, EAX
    CALL Inner
    MOV ESI, EDX
    POP EBX
    CALL Inner
    SUB ESI, EDX
    MOV EAX,EBX
    CALL Inner
    MOV EDI, EDX
    POP EBX
    CALL Inner
    SUB EDI, EDX
    ADD EDI, ESI
    RET

    Inner:
    PUSHAD
    NEG ECX
    SHL EBX, CL
    MOV EDI, 1
    MOV ESI,80000000H
    XOR EBP,EBP
    next: ROR EAX, 1
    MOV EDX, EAX
    XOR EDX, EBX
    AND EDX, ESI
    JNZ skip
    ADD EBP, EDI
    skip: SAR ESI, 1
    SHL EDI, 1
    INC ECX
    JNZ next
    MOV 20[ESP], EBP
    POPAD
    RET

    ReplyDelete
  2. I think that in the particular case you mention there is a tie for "which wins the race" (though you are completely correct regarding expected number of flips to first match).

    The proof is basically that to complete THH or THT on flip N you must be in state TH on flip N-1. At this point it is equally likely that you flip heads or tails. (At no other point do you have any chance of hitting either pattern, and once you are at that point past history is irrelevant.)

    ReplyDelete
  3. You might be interested in one application of this sort of thing, which is to the area of seeding of local alignments (the heads and tails are whether corresponding DNA positions are the same or different).

    (this might amuse, or it might not...)

    ReplyDelete
  4. Hello,nice post thanks for sharing?. I just joined and I am going to catch up by reading for a while. I hope I can join in soon.

    ReplyDelete