God Plays Dice: A hack I'm disturbingly proud of, and its connection to some real math

31 December 2009

A hack I'm disturbingly proud of, and its connection to some real math

I'm applying for jobs. Many jobs, because that's how academic job searches work these days. So I have a spreadsheet (in OpenOffice) to keep track of them.

Among the things that I track for each job, there is a column with 0, 1, or 2 in it. 0 means that I haven't submitted anything; 1 means I've submitted something, but not everything that was asked for; 2 means the application is complete. Averaging these numbers and dividing by 2 tells me what proportion of the search is complete.

But I also wanted to know how many 0s, 1s, and 2s there were. And as far as I know the built-in functions in OpenOffice won't do that.

What they will do, however, is this. I have a column consisting of 0s, 1s, 2s, and empty cells. By doing

COUNT(J8:J1000)
SUM(J8:J1000)
SUMPRODUCT(J8:J1000;J8:J1000)

I get the number of cells in that column which are nonempty; their sum; and the sum of their squares. (The SUMPRODUCT function takes 2 arrays of the same shape and returns the sum of the products of corresponding cells.) "8" is the row that contains the first job on the list, and "1000" is just a number that is comfortably more than the number of jobs I am applying for. Call these a, b, and c respectively. Let n₀, n₁, and n₂ be the number of entries which are 0, 1, and 2 respectively. Then I have

a = n₀ + n₁ + n₂
b = n₁ + 2n₂
c = n₁ + 4n₂

which is a three-by-three linear system, and can be solved for n₀, n₁, n₂, giving

n₀ = a - 3b/2 + c/2, n₁ = 2b-c, n₂ = (c-b)/2

and so I can recover the number of applications with status 0, 1, or 2 from this. From the sums of the 0th, 1st, and 2nd powers I can recover the distribution of the values themselves. (The actual code is slightly different, but of course equivalent, because I solved the system "by inspection" and never actually explicitly wrote it out until just now.)

Believe it or not, I actually use this trick in a preprint, "The number of cycles of specified normalized length in permutations", to do some actual mathematics! There I find the expectation of X⁰, X¹, X², ..., X^k where X is a certain random variable known to take on the values 0, 1, ..., k, namely the number of cycles of length in the interval [γ n, δ n] in a permutation of [n] chosen uniformly at random where γ and δ are constants. k is the greatest integer less than or equal to 1/γ ; for example, if we're looking at cycles of length at least 0.15n in permutations of n, there can't be more than six of them. This gives a linear system like the one above which gives the probability that X takes on each value 0, 1, ..., k.

6 comments:

christopherdrup said...: Have you tried the Frequency command in OpenOffice? I think it will perform the counting operation you are looking for.; December 31, 2009 at 1:52 PM
Unknown said...: I believe that the COUNTIF function is implemented which should let you count how many items have 0, > 0, 1, > 1, or 2.; December 31, 2009 at 1:54 PM
Mike said...: Good luck on your job search, btw. The process does seem to be good at producing unlimited amounts of anxiety.; December 31, 2009 at 2:34 PM
Anonymous said...: In EXCEL, @COUNTIF(range,criteria)

Counta for non-empty.

Or, if you have a column to waste, set up a column that is 100^n (where n is the corresponding 0, 1, or 2. The sum of the new column will be a nifty 6 digit number, encoding all the info you want.; December 31, 2009 at 10:08 PM
Omar said...: Nice hack, of course, but it sounded silly to me that OpenOffice didn't have a function to count values satisfying some condition (like being equal to some specific number, for example).

So, as I would have done, had I needed that function, I used a popular search engine and found the countif function your other commenters mentioned.

I'm curious: did you do it your way instead of googling the name of the appropriate function because it was more fun? I often do things like that...; January 1, 2010 at 11:34 AM
Unknown said...: I usually use the data pilot for this kind of analysis. By default it presents sums, but it's easy to change to counts instead.; January 1, 2010 at 11:59 AM

God Plays Dice

31 December 2009

A hack I'm disturbingly proud of, and its connection to some real math

6 comments:

Blog Archive

Contributors

Other blogs