Yet another Congressional redistricting proposal, from Brian Olson.
This proposal is based on the theory that the "best" districting proposal is the one which has the shortest sum of distances from people to the center of their districts.
It sounds like a good idea, but I'm not sure how one would actually find this. Olson claims to. But his program runs by taking the current districting and perturbing it (he has video showing this), which means that the maps look like the current maps. Gerrymandering becomes a bit more difficult because you can't have districts that are oddly shaped. But, for example, the city of Austin, Texas (with a population which is very nearly that of one congressional district, and is politically quite different from the surrounding areas) is shared between three districts in both the old and new maps.
Of course, Olson's method didn't say it wouldn't do that, and in fact Olson addresses the question of his method breaking up communities in his FAQ. But I also wonder if he's only finding a local minimum for the sum of the distance to the centers, while the global minimum might lie with a map that looks radically different. The space of possible districtings of a state is very-high-dimensional so this is the sort of thing one has to worry about. Olson in fact proposes an Impartial Redistricting Amendment which would enshrine this criterion -- but how do we know that his program, or any other program, actually implements it?
What might be interesting is to just assign each census block in a state to a district at random and then run the algorithm. The problem with that is that there's no way the district lines would end up falling in the same place every time.
(Found at O'Reilly Radar, via Statistical Modeling, etc., the full name of which I'm tired of typing.)
14 September 2007
13 September 2007
the Bananach-Tarski paradox
From Courtney Gibbons' Brown Sharpie: the Bananach-Tarski theorem. This tells us that a banana can, through some sort of magic, be made into two bananas.
This is of course a restatement of the Banach-Tarski theorem, which tells us that a solid ball can be partitioned into finitely many non-overlapping subsets, which can then be rearranged to form two balls each of the same size as the original ball.
Of course, you couldn't do this with actual bananas (even though a topologist can't tell a banana from an orange, or a doughnut from a coffee cup, or eir pants from a hole in the ground). Damn atoms.
What other fruit-related math jokes are there? The only one I can think of is "What's purple and commutes? An abelian grape," which barely qualifies as a joke.
I found the Bananach-Tarski paradox via the Facebook group I support the right to choose one element from each set in a collection, which asks:
Finally, because this blog is at least nominally about probability: imagine a (randomized) algorithm that outputs a string of letters as follows: start with any three letters a1, a2, a3. Then for each n > 3, consider some corpus of English text, and consider all the occurrences of the string an-3 an-2 an-1 in that corpus. Tabulate the distribution of the letter that follows that string in the corpus. Choose an from that distribution.
I don't have the programming skills or the corpus to work from, but I bet there's a significant chance that if you start with BAN you get BANANANANANA...
This is of course a restatement of the Banach-Tarski theorem, which tells us that a solid ball can be partitioned into finitely many non-overlapping subsets, which can then be rearranged to form two balls each of the same size as the original ball.
Of course, you couldn't do this with actual bananas (even though a topologist can't tell a banana from an orange, or a doughnut from a coffee cup, or eir pants from a hole in the ground). Damn atoms.
What other fruit-related math jokes are there? The only one I can think of is "What's purple and commutes? An abelian grape," which barely qualifies as a joke.
I found the Bananach-Tarski paradox via the Facebook group I support the right to choose one element from each set in a collection, which asks:
Do you believe that an infinite product of nonempty sets should be nonempty? Do you feel that non-measurable subsets of the reals should exist? Or games of perfect information with no winning strategy for either player? Do you believe that every set should have a well-ordering, or that any poset in which every chain has an upper bound is entitled to a maximal element? Do nonzero rings have the right to a maximal ideal? Is every vector space entitled to a basis? Should fields have algebraic closures? Should products of compact topological spaces be compact, and countable unions of countable sets be countable? Do you want to be able to cut a sphere up into a finite number of pieces and reassemble them, with only rigid motions, into a sphere twice as large?
Finally, because this blog is at least nominally about probability: imagine a (randomized) algorithm that outputs a string of letters as follows: start with any three letters a1, a2, a3. Then for each n > 3, consider some corpus of English text, and consider all the occurrences of the string an-3 an-2 an-1 in that corpus. Tabulate the distribution of the letter that follows that string in the corpus. Choose an from that distribution.
I don't have the programming skills or the corpus to work from, but I bet there's a significant chance that if you start with BAN you get BANANANANANA...
Rosh Hashanah, and somehow the circle of fifths.
Today is Rosh Hashanah, the Jewish New Year.
The Hebrew calendar is a lunisolar calendar, which means that the months remain roughly in step with the phase of the moon (months start approximately at the new moon) but the beginning of the year is always at about the same time with respect to the seasons. This is arranged by having "leap months"; some Hebrew years have 12 months (of, on average, 29.5 or so days each) and some have 13.
It turns out that the leap years in the Hebrew calendar follow the Metonic cycle, which is a name for the fact that 235 lunar months is very nearly 19 solar years. In fact, 235 months are .076 days longer than 19 years, which seems to indicate that the calendar drifts by .076 days per 19 years, or one day per 219 years, with respect to the seasons. The Rosh Hashanah article backs this up; currently Rosh Hashanah can occur no earlier than September 5, but after 2089 it will occur no later than September 6. (It's legitimate to use the Gregorian calendar here, as the corresponding error in that calendar is something like one day in three thousand years.) The Hebrew calendar was codified in its present form several centuries before the Gregorian calendar, which probably explains why the Gregorian is more accurate; I have no doubt that the creators of the Hebrew calendar would have found a way to make it more accurate if they'd had the data to do so.
I learned from the Wikipedia article on the Hebrew calendar that
LCCLCCLCCLCCLCCLCCLCC
where L represents a leap year and C a common year, and then deleting two of the C's. It is reasonable to delete two common years that are as far apart as possible, so that the calendar doesn't get too far ahead or behind. Similarly, in the case of the major scale we have two half steps and five whole steps to be arranged; one "wants" the half steps to be as far apart as possible. (This is a little more dubious, but seems reasonable.)
In music we have the "circle of fifths", so one note is a fifth, or four diatonic tones higher than the one before but seven half-steps higher; this only breaks down when seven fifths has to be twenty-eight diatonic tones (four octaves) but forty-nine half-steps (four octaves plus a half-step). A similar construction could exist in the Hebrew calendar, where seven is replaced by eleven; "usually" a period of eleven years has four leap years and seven common years, but if one had nineteen such periods that would give 76 leap years and 133 common years, while eleven nineteen-year cycles ought to have 77 leap years and 132 common years.
In fact, both 4/11 and 7/19 are convergents of a certain continued fraction, the one which is the expansion of the actual number of lunar months per solar year, minus twelve. There is apparently a "tabulated Muslim calendar" that uses a similar-looking 11/30 ratio, although for a different reason; Islam forbids leap months, but it turns out that to keep the calendar in sync with the moon one has to add a day every so often. The average lunar month is slightly longer than 29.5 days, and this "tabular Islamic calendar begins by giving alternate months 30 and 29 days and then adding a day to the last month of eleven years out of thirty. I was actually about to state that 11/30 was the next convergent of the continued fraction in question, but it's not; it's actually 123/334, so one could have a cycle of 123 leap years out of every 334. This would be build up from seventeen of the 19-year cycles and one 11-year cycle. Of course, the problem with such a complicated leap-year rule would be that nobody can remember which years are leap years!
(Incidentally, there are also rules about which day of the week Rosh Hashanah can fall on; it has to be Monday, Tuesday, Thursday, or Saturday. The reason for this is as follows:
Therefore Rosh Hashanah doesn't always actually start on the day of the new moon; sometimes it's moved around by a day or two to avoid it falling on such a day of the week.)
The Hebrew calendar is a lunisolar calendar, which means that the months remain roughly in step with the phase of the moon (months start approximately at the new moon) but the beginning of the year is always at about the same time with respect to the seasons. This is arranged by having "leap months"; some Hebrew years have 12 months (of, on average, 29.5 or so days each) and some have 13.
It turns out that the leap years in the Hebrew calendar follow the Metonic cycle, which is a name for the fact that 235 lunar months is very nearly 19 solar years. In fact, 235 months are .076 days longer than 19 years, which seems to indicate that the calendar drifts by .076 days per 19 years, or one day per 219 years, with respect to the seasons. The Rosh Hashanah article backs this up; currently Rosh Hashanah can occur no earlier than September 5, but after 2089 it will occur no later than September 6. (It's legitimate to use the Gregorian calendar here, as the corresponding error in that calendar is something like one day in three thousand years.) The Hebrew calendar was codified in its present form several centuries before the Gregorian calendar, which probably explains why the Gregorian is more accurate; I have no doubt that the creators of the Hebrew calendar would have found a way to make it more accurate if they'd had the data to do so.
I learned from the Wikipedia article on the Hebrew calendar that
Another mnemonic [for remembering the pattern of common and leap years] is that the intervals of the major scale follow the same pattern as do Hebrew leap years: a whole step in the scale corresponds to two common years between consecutive leap years, and a half step to one common between two leap years.For a moment this seemed surprising, but then I stopped to think about it. One has to have seven leap years (i. e. 13-month years) out of every nineteen. This is slightly more than one out of every three, so one could imagine taking a 21-year cycle of the form
LCCLCCLCCLCCLCCLCCLCC
where L represents a leap year and C a common year, and then deleting two of the C's. It is reasonable to delete two common years that are as far apart as possible, so that the calendar doesn't get too far ahead or behind. Similarly, in the case of the major scale we have two half steps and five whole steps to be arranged; one "wants" the half steps to be as far apart as possible. (This is a little more dubious, but seems reasonable.)
In music we have the "circle of fifths", so one note is a fifth, or four diatonic tones higher than the one before but seven half-steps higher; this only breaks down when seven fifths has to be twenty-eight diatonic tones (four octaves) but forty-nine half-steps (four octaves plus a half-step). A similar construction could exist in the Hebrew calendar, where seven is replaced by eleven; "usually" a period of eleven years has four leap years and seven common years, but if one had nineteen such periods that would give 76 leap years and 133 common years, while eleven nineteen-year cycles ought to have 77 leap years and 132 common years.
In fact, both 4/11 and 7/19 are convergents of a certain continued fraction, the one which is the expansion of the actual number of lunar months per solar year, minus twelve. There is apparently a "tabulated Muslim calendar" that uses a similar-looking 11/30 ratio, although for a different reason; Islam forbids leap months, but it turns out that to keep the calendar in sync with the moon one has to add a day every so often. The average lunar month is slightly longer than 29.5 days, and this "tabular Islamic calendar begins by giving alternate months 30 and 29 days and then adding a day to the last month of eleven years out of thirty. I was actually about to state that 11/30 was the next convergent of the continued fraction in question, but it's not; it's actually 123/334, so one could have a cycle of 123 leap years out of every 334. This would be build up from seventeen of the 19-year cycles and one 11-year cycle. Of course, the problem with such a complicated leap-year rule would be that nobody can remember which years are leap years!
(Incidentally, there are also rules about which day of the week Rosh Hashanah can fall on; it has to be Monday, Tuesday, Thursday, or Saturday. The reason for this is as follows:
- Yom Kippur, which is the tenth day of the year, should not fall on a Friday or Sunday, i. e. a day adjacent to a Saturday, for this would make two consecutive days when no work can be done. This means that the year should not begin on a Wednesday or Friday.
- The twenty-first day of the year, which is the last day of Sukkot, called Hoshanah Rabbah, for some reason should not fall on a Saturday. (I once read why, but I forgot.) So the year should not begin on a Sunday.
Therefore Rosh Hashanah doesn't always actually start on the day of the new moon; sometimes it's moved around by a day or two to avoid it falling on such a day of the week.)
12 September 2007
geometric interpretation of vector products
Yesterday, a student asked me if there was a geometric interpretation of the dot product of vectors. I had been talking about the geometric interpretation of the cross product, which is usually given as follows: given two vectors a and b,
That's all well and good, although I was at one point asked what good the cross product was. This is a tricky question to answer, because it is not immediately obvious to students of calculus why one would want to find a vector that is orthogonal to two given vectors; furthermore, the definition of the cross product is a lot more complicated and mystifying than the definition of the dot product, and I suspect they want something more obviously useful in return for that.
Anyway, I couldn't think of a geometric interpretation of the dot product, in the sense that this student seemed to want. The Wikipedia article says the following: Since |a|cos(θ) is the scalar projection of a onto b, the dot product can be understood geometrically as the product of this projection with the length of b. That's true, and I did say that, but what good is it? The two vectors that we're saying a ⋅ b is the product of the lengths of are in the same direction, so they don't naturally define an area. Is there a geometric interpretation of the dot product that doesn't feel so contrived? (This is sort of a vague question, as some people might not find what I just said contrived.)
We then moved on to the Scalar triple product a ⋅ (b × c), which created other frustration; it's weird to write this thing on the board and not mention that it's symmetric under even permutations of the arguments, because it's really the determinant of the 3-by-3 matrix containing those entries. I mumbled some words about how that sort of determinant comes up in changes of variables in multiple integrals, though; I was thinking of the Jacobian.
Of course, there's the classical claim that "you can only define a cross product in one, three, or seven dimensions", which I didn't mention, because nobody asked "can you define a vector product in two dimensions?" -- I would have mentioned it if they've asked.
But the cross product has that nice geometric interpretation. How do you continue with that? My officemate reminded me that the determinant of the "matrix"

(where i, j, k, l are the unit vectors in four-space) is the 3-volume of the parallelepiped spanned by the three vectors (a1, a2, a3, a4), (b1, b2, b3, b4), (c1, c2, c3, c4). So it seems that we can define some sort of "cross product" in this sense in any number of dimensions; in Rn it'll take n-1 arguments. This is actually the wedge product in disguise.
But then what's the "cross product" (in this sense) of a single vector in 2-space? It's not the vector itself; if you write the "determinant" it's

and so applying this operation to the vector (a1, a2) returns (a2, -a1). The operation takes in a vector and spits out its rotation by 90 degrees. This makes sense in retrospect; in the n-dimensional it spits out a vector which is orthogonal to the n-1 input vectors.
It just seems strange to be calling rotation by 90 degrees a "product"...
- the magnitude of a × b is the area of the parallelogram determined by a and b, and
- the direction of a × b is given by the right-hand rule .
That's all well and good, although I was at one point asked what good the cross product was. This is a tricky question to answer, because it is not immediately obvious to students of calculus why one would want to find a vector that is orthogonal to two given vectors; furthermore, the definition of the cross product is a lot more complicated and mystifying than the definition of the dot product, and I suspect they want something more obviously useful in return for that.
Anyway, I couldn't think of a geometric interpretation of the dot product, in the sense that this student seemed to want. The Wikipedia article says the following: Since |a|cos(θ) is the scalar projection of a onto b, the dot product can be understood geometrically as the product of this projection with the length of b. That's true, and I did say that, but what good is it? The two vectors that we're saying a ⋅ b is the product of the lengths of are in the same direction, so they don't naturally define an area. Is there a geometric interpretation of the dot product that doesn't feel so contrived? (This is sort of a vague question, as some people might not find what I just said contrived.)
We then moved on to the Scalar triple product a ⋅ (b × c), which created other frustration; it's weird to write this thing on the board and not mention that it's symmetric under even permutations of the arguments, because it's really the determinant of the 3-by-3 matrix containing those entries. I mumbled some words about how that sort of determinant comes up in changes of variables in multiple integrals, though; I was thinking of the Jacobian.
Of course, there's the classical claim that "you can only define a cross product in one, three, or seven dimensions", which I didn't mention, because nobody asked "can you define a vector product in two dimensions?" -- I would have mentioned it if they've asked.
But the cross product has that nice geometric interpretation. How do you continue with that? My officemate reminded me that the determinant of the "matrix"

(where i, j, k, l are the unit vectors in four-space) is the 3-volume of the parallelepiped spanned by the three vectors (a1, a2, a3, a4), (b1, b2, b3, b4), (c1, c2, c3, c4). So it seems that we can define some sort of "cross product" in this sense in any number of dimensions; in Rn it'll take n-1 arguments. This is actually the wedge product in disguise.
But then what's the "cross product" (in this sense) of a single vector in 2-space? It's not the vector itself; if you write the "determinant" it's

and so applying this operation to the vector (a1, a2) returns (a2, -a1). The operation takes in a vector and spits out its rotation by 90 degrees. This makes sense in retrospect; in the n-dimensional it spits out a vector which is orthogonal to the n-1 input vectors.
It just seems strange to be calling rotation by 90 degrees a "product"...
11 September 2007
Fencepost errors, intervals, redeeming coupons, and drinking
A friend of mine asked today: if a coupon says it expires on a certain date, is that the last date that it's usable, or the first date that it's not usable? (He was referring in particular to the customized coupons that print out on receipts at CVS if you participate in their "loyalty program".)
My answer: the last day it's usable. This is basically by analogy with more "traditional" coupons, which are likely to expire on, say, "September 30". It seems reasonable that one could use the coupon on every day in September, and on no day in October; it would not seem reasonable to be able to use the coupon on every day in September but one. Coupons printed out by this program expire on seemingly random dates, but it seems that the same principle should apply there.
Similarly, if I said "the homework is due on Thursday" to my students, and then one of them handed it in on Thursday at noon, I would not say that what I really meant is "Thursday is the first day on which I will not accept the homework", and that therefore the homework is late. (In practice I usually say what time the homework is due in order to avoid this question.)
I'm not sure what the law actually has to say about this, though. I do know that laws in various places endorse different interpretations with regard to the "sell by" date on milk: in some jurisdictions the "sell by" date is the last date on which the milk can be sold, and in others it is the first day on which it cannot.
I'm inclined to say that one is always allowed to do the thing on the date which is stated, so it should be permitted to redeem a coupon, sell a carton of milk, etc. on the date listed on it. Similarly, if there's something that I'm allowed to do if I am "at least N years old" I should be able to do it on my Nth birthday, and the law agrees with me. (There's an exception of sorts for drinking, actually; in some states one cannot drink at midnight on one's 21st birthday, but must wait until the bars have closed and re-opened again. A friend of mine turned 21 in Massachusetts and was disappointed to learn this. Apparently the reason for these laws is that some people were trying to take 21 shots between midnight and last call, often 1am or 2am, and they were dying. Dying was seen by the lawmakers to be a Bad Thing.) There's a symmetry here, in that both of these are "permissive" interpretations.
But the scenario in which one is always forbidden to do a thing on the date stated on the appropriate piece of paper (so no redeeming a coupon on its expiration date, no drinking on your 21st birthday) also is symmetrical, in a way -- one might call it a "strict" interpretation. One could also have "early" and "late" interpretations -- the "early" interpretation would be that you can't redeem the coupon but you can drink, and the "late" interpretation the reverse.
These four cases are roughly analogous to the four sets of inequalites a ≤ x ≤ b, a < x < b, a ≤ x < b, and a < x ≤ b, although making the analogy precise takes more time than I want to spend on this.
My answer: the last day it's usable. This is basically by analogy with more "traditional" coupons, which are likely to expire on, say, "September 30". It seems reasonable that one could use the coupon on every day in September, and on no day in October; it would not seem reasonable to be able to use the coupon on every day in September but one. Coupons printed out by this program expire on seemingly random dates, but it seems that the same principle should apply there.
Similarly, if I said "the homework is due on Thursday" to my students, and then one of them handed it in on Thursday at noon, I would not say that what I really meant is "Thursday is the first day on which I will not accept the homework", and that therefore the homework is late. (In practice I usually say what time the homework is due in order to avoid this question.)
I'm not sure what the law actually has to say about this, though. I do know that laws in various places endorse different interpretations with regard to the "sell by" date on milk: in some jurisdictions the "sell by" date is the last date on which the milk can be sold, and in others it is the first day on which it cannot.
I'm inclined to say that one is always allowed to do the thing on the date which is stated, so it should be permitted to redeem a coupon, sell a carton of milk, etc. on the date listed on it. Similarly, if there's something that I'm allowed to do if I am "at least N years old" I should be able to do it on my Nth birthday, and the law agrees with me. (There's an exception of sorts for drinking, actually; in some states one cannot drink at midnight on one's 21st birthday, but must wait until the bars have closed and re-opened again. A friend of mine turned 21 in Massachusetts and was disappointed to learn this. Apparently the reason for these laws is that some people were trying to take 21 shots between midnight and last call, often 1am or 2am, and they were dying. Dying was seen by the lawmakers to be a Bad Thing.) There's a symmetry here, in that both of these are "permissive" interpretations.
But the scenario in which one is always forbidden to do a thing on the date stated on the appropriate piece of paper (so no redeeming a coupon on its expiration date, no drinking on your 21st birthday) also is symmetrical, in a way -- one might call it a "strict" interpretation. One could also have "early" and "late" interpretations -- the "early" interpretation would be that you can't redeem the coupon but you can drink, and the "late" interpretation the reverse.
These four cases are roughly analogous to the four sets of inequalites a ≤ x ≤ b, a < x < b, a ≤ x < b, and a < x ≤ b, although making the analogy precise takes more time than I want to spend on this.
10 September 2007
Asimov on large numbers
Asimov on large numbers, including Skewes' number. (By the way, Skewes is pronounced in two syllables, which I didn't know, and Littlewood's middle name was apparently Edensor.)
Asimov also writes xy as x\y, which is a nice trick when the exponents get complicated, as they do here. Why is there not a non-subscript notation for powers, other than exp(x) for ex when the base happens to be e?
You can take any real number greater than 1 and write it as 10\10\...\10\k for some number of 10's and some constant 1 < k < 10; Skewes' number is 10\10\10\34, or 10\10\10\10\1.53, so Asimov refers to this as a "quadruple-ten number". If I remember correctly Douglas Hofstadter uses this same idea of quantifying numbers by the number of exponentiations needed in one of his Metamagical Themas columns, although more correctly Asimov had it first.
Essentially all "real" numbers are either "single-ten" numbers (between 10 and 1010) or "double-ten" numbers (between 1010 = 10\10\1 and 101010 = 10\10\10)); note that the number of particles in the observable universe is much less than 10\10\2. (It's something like 10\10\1.90, which is actually much smaller than 10\10\2 even though it doesn't sound like it.)
There's a sense in which, say, 10\x and 10\y are close to each other if x is close to y. This is reflected in some asymptotic notation I've seen, where one writes f(x) ~ g(x) if (log f(x)/log g(x)) approaches 1 as x approaches infinity, so, for example, exp(x2) and exp(x2 + x) are "close together" in this sense. I don't know of cases where one can only show that log log f(x) and log log g(x) are asymptotically equal; this seems like too coarse a definition of "equality" to be of much use.
Asimov also writes xy as x\y, which is a nice trick when the exponents get complicated, as they do here. Why is there not a non-subscript notation for powers, other than exp(x) for ex when the base happens to be e?
You can take any real number greater than 1 and write it as 10\10\...\10\k for some number of 10's and some constant 1 < k < 10; Skewes' number is 10\10\10\34, or 10\10\10\10\1.53, so Asimov refers to this as a "quadruple-ten number". If I remember correctly Douglas Hofstadter uses this same idea of quantifying numbers by the number of exponentiations needed in one of his Metamagical Themas columns, although more correctly Asimov had it first.
Essentially all "real" numbers are either "single-ten" numbers (between 10 and 1010) or "double-ten" numbers (between 1010 = 10\10\1 and 101010 = 10\10\10)); note that the number of particles in the observable universe is much less than 10\10\2. (It's something like 10\10\1.90, which is actually much smaller than 10\10\2 even though it doesn't sound like it.)
There's a sense in which, say, 10\x and 10\y are close to each other if x is close to y. This is reflected in some asymptotic notation I've seen, where one writes f(x) ~ g(x) if (log f(x)/log g(x)) approaches 1 as x approaches infinity, so, for example, exp(x2) and exp(x2 + x) are "close together" in this sense. I don't know of cases where one can only show that log log f(x) and log log g(x) are asymptotically equal; this seems like too coarse a definition of "equality" to be of much use.
How many well-formed formulas are there?
The following is exercise 1.1.2 of "A Mathematical Introduction to Logic", by Herbert Enderton:
Enderton defines a wff as follows: every sentence symbol An is a wff. If α and β are wffs, then so are (¬ α), (α ∧ β), (α ∨ β), (α → β), and (α ↔ &beta); furthermore, that's all of the wffs. (The parentheses count as symbols. The sentence symbol An counts as a single symbol.)
The problem is pretty straightforward. A, (A ∧ B), and (A ∧ (B ∧ C)) are wffs of length 1, 5, and 9 respectively. If α is a wff of length n, then (¬ α) is a wff of length n+3; thus we can generate wffs with lengths in the following sequences: {1, 4, 7, ...}, {5, 8, 11, ...}, {9, 12, 15, ...}. This is all the integers except 2, 3, and 6.
Now, each of the ways of forming new wffs from old takes one or two wffs, adds them together, and adds three new symbols. So we can't get a wff of length 2 or 3 because it would have to come from a wff of length -1 or 0. Furthermore, there are no wffs of length 6, because they'd either have to be of the form (¬ α) where α has length 3, or (α * β) where * is ∧, ∨, →, or ↔ and α and β have total length 3, which would mean that one of α and β has length 2.
The natural follow-up question, to me at least, is "how many wffs are there"? This is silly to ask because if we have infinitely many sentence symbols, there are of course infinitely many wffs of any length for which there is at least one wff. So let's restrict ourselves to one sentence symbol, A. Let fn be the number of wffs of length n. Let F(z) = Σn fn zn be the generating function of this sequence. Then we have
f(z) = z + z3f(z) + 4z3f(z)2
where the first term on the right-hand side corresponds to the formula which consists of just the sentence symbol, the second term to all those formulas of type (¬ α), and the third term to all those forms of type (α * β). (The coefficient 4 comes up because there are four things that "*" could be.)
Solving this for f, we get

and so we get
f(z) = z + z4 + 4z5 + z7 + 12z8 + 32z9 + z10 + 24z11 + 160z12 + 321z13 + 40z14 + 480z15 + ...
which is a bit surprising; the coefficients don't grow "smoothly". This is because, for example, wffs of length 9 are of the form (A * (A * A)) or ((A * A) * A) where * is as above (and yes, these two forms are different), so there are 32 of these; the only wff of length 10 is (¬ (¬ (¬ A))), since any other such formula would come from two formulas with total length 7, and one of those would have to be of length 2, 3, or 6 which is impossible. But I'm more interested in the asymptotics.
It's a general principle that the coefficients of a generating function grow roughly like fn ~ (1/ρ)n, where ρ is the absolute value of the singularity of f(z) = Sigma;n fn zn; this is true up to a "subexponential factor" which has to do with the nature of that singularity (polar, algebraic, etc.) The singularities of f(z) in this case arise from taking the square root of numbers near 0; thus they are located at the roots of the polynomial 1 - 2z3> +z6 -16z4. The smallest root is about ρ = 0.4728339090, so the number of wffs of length n grows like (1/ρ)n, or about 2.11n, times some subexponential factor. If you take larger coefficients they do have a ratio of about 2.11.
The subexponential factor is probably something like n-3/2 -- that is, the number of wffs of length n is probably about 2.11n n-3/2 -- because that particular subexponential factor often arises when one analyzes the asymptotics of tree-like structures. Don't ask me to explain that.
The interpretation this brings to mind is that if you write a wff by just writing symbols at random, at any given step there are in some sense "on average" 2.11 symbols that you could write down which would keep the wff grammatical. There are eight possible symbols, so this suggests that the best special-purpose compression algorithm for this formal language would shorten formulas to log(2.11)/log(8), or about 36%, of their original length.
Show that there are no [well-formed formulas] of length 2, 3, or 6, but that any other positive length is possible.
Enderton defines a wff as follows: every sentence symbol An is a wff. If α and β are wffs, then so are (¬ α), (α ∧ β), (α ∨ β), (α → β), and (α ↔ &beta); furthermore, that's all of the wffs. (The parentheses count as symbols. The sentence symbol An counts as a single symbol.)
The problem is pretty straightforward. A, (A ∧ B), and (A ∧ (B ∧ C)) are wffs of length 1, 5, and 9 respectively. If α is a wff of length n, then (¬ α) is a wff of length n+3; thus we can generate wffs with lengths in the following sequences: {1, 4, 7, ...}, {5, 8, 11, ...}, {9, 12, 15, ...}. This is all the integers except 2, 3, and 6.
Now, each of the ways of forming new wffs from old takes one or two wffs, adds them together, and adds three new symbols. So we can't get a wff of length 2 or 3 because it would have to come from a wff of length -1 or 0. Furthermore, there are no wffs of length 6, because they'd either have to be of the form (¬ α) where α has length 3, or (α * β) where * is ∧, ∨, →, or ↔ and α and β have total length 3, which would mean that one of α and β has length 2.
The natural follow-up question, to me at least, is "how many wffs are there"? This is silly to ask because if we have infinitely many sentence symbols, there are of course infinitely many wffs of any length for which there is at least one wff. So let's restrict ourselves to one sentence symbol, A. Let fn be the number of wffs of length n. Let F(z) = Σn fn zn be the generating function of this sequence. Then we have
f(z) = z + z3f(z) + 4z3f(z)2
where the first term on the right-hand side corresponds to the formula which consists of just the sentence symbol, the second term to all those formulas of type (¬ α), and the third term to all those forms of type (α * β). (The coefficient 4 comes up because there are four things that "*" could be.)
Solving this for f, we get

and so we get
f(z) = z + z4 + 4z5 + z7 + 12z8 + 32z9 + z10 + 24z11 + 160z12 + 321z13 + 40z14 + 480z15 + ...
which is a bit surprising; the coefficients don't grow "smoothly". This is because, for example, wffs of length 9 are of the form (A * (A * A)) or ((A * A) * A) where * is as above (and yes, these two forms are different), so there are 32 of these; the only wff of length 10 is (¬ (¬ (¬ A))), since any other such formula would come from two formulas with total length 7, and one of those would have to be of length 2, 3, or 6 which is impossible. But I'm more interested in the asymptotics.
It's a general principle that the coefficients of a generating function grow roughly like fn ~ (1/ρ)n, where ρ is the absolute value of the singularity of f(z) = Sigma;n fn zn; this is true up to a "subexponential factor" which has to do with the nature of that singularity (polar, algebraic, etc.) The singularities of f(z) in this case arise from taking the square root of numbers near 0; thus they are located at the roots of the polynomial 1 - 2z3> +z6 -16z4. The smallest root is about ρ = 0.4728339090, so the number of wffs of length n grows like (1/ρ)n, or about 2.11n, times some subexponential factor. If you take larger coefficients they do have a ratio of about 2.11.
The subexponential factor is probably something like n-3/2 -- that is, the number of wffs of length n is probably about 2.11n n-3/2 -- because that particular subexponential factor often arises when one analyzes the asymptotics of tree-like structures. Don't ask me to explain that.
The interpretation this brings to mind is that if you write a wff by just writing symbols at random, at any given step there are in some sense "on average" 2.11 symbols that you could write down which would keep the wff grammatical. There are eight possible symbols, so this suggests that the best special-purpose compression algorithm for this formal language would shorten formulas to log(2.11)/log(8), or about 36%, of their original length.
Labels:
combinatorics,
compression,
generating functions,
logic
Subscribe to:
Posts (Atom)