18 November 2008

A couple of questions on the mechanics of writing mathematical papers

I'm writing a paper, and of course this requires the use of LaTeX. As many of you know, the way one creates cross-references inside a LaTeX document is a two-step process. First, you insert the command \label{big-important-theorem} where your Big Important Theorem is in the paper. Then when you want to refer to your theorem, you write something like "And as a consequence of Theorem \ref{big-important-theorem} we can prove Corollary \ref{million-dollar-problem}, and so I claim to be the winner of a million dollar prize."

No, I have not written that sentence. Nor do I plan to. In fact, I would add "claims to be the winner of one of the Clay prizes in the body of the paper" to John Baez's crackpot index. Baez couldn't have put that in his list, because it was written in 1998. He does mention the Nobel, though, which carries a similar monetary value.

But this leads to a question -- how do you label your theorems, equations, etc. in your own LaTeX code? I try to come up with names that reflect what the labeled object is about, but this isn't so easy, because sometimes there are lots of objects that are "about" the same sort of thing. I'm tempted to just find some source of extra-mathematical names. So I could name my theorems \label{market}, \label{chestnut}, \label{walnut}, \label{locust}, ... (Philadelphia streets), say. Of course, this sequence has the disadvantage that the streets come in order; a set of names with no natural order is probably better, because then I won't feel like I'm moving things out of order when I move them around.

On a related note, often one sees bibliographical references of the form
[Be74] Edward A. Bender. Asymptotic methods in enumeration. SIAM Review,
Vol. 16, No. 4, October 1974.
I prefer this form to the form where [Be74] is replaced by a number, because if something is cited more than once in the same paper, I only need to look at the reference once; the second time I see [Be74] in the paper I know it's the paper by somebody whose name starts with Be, written in 1974. (And if it's a paper I've heard of already, sometimes I don't have to look at the citation at all.)

But what's the convention for picking the letters to be used? First two letters of the name seems common for a single-authored paper, but by no means universal; I've definitely seen one-letter citations, the problem being that if you have a reasonably extensive bibliography you'll want to cite two different authors with the same initial. I'm currently using just the first letter of each name for multiple-author works -- [FS08] for Flajolet and Sedgewick's Analytic Combinatorics (coming out sometime in December, preorderable now! and readable online!), for example. I've tried to reverse-engineer whatever convention there is from other people's reference lists and I can't. Is there actually no convention?

Of course, there is no need for a convention, as long as each work cited has a unique identifier.

17 comments:

Ben said...

There is no unique convention. I prefer 2 to 3 letters, with either a date or numbering to distinguish multiple papers by the same authors.

On the other hand, the fact that you are thinking about this means you are making your bibliography by hand, a grevious error. BibTeX is the way to go!

and with labels.....just make sure you're using a program smart enough to remember what your labels are (reftex in emacs is great for this).

Michael Lugo said...

Yeah, I keep promising myself that one of these days I'll learn BibTeX. Maybe that day is now.

Derek Jones said...

More than you probably ever want to know about the issues involved in identifier naming. While we are on the topic, Knuth should be taken out and shot for the 'mathematical' style of naming he advocates in his aptly misnamed book "Literate programming".

unapologetic said...

Trust me, Michael: BibTeX is AWESOME. Look up your paper on MathSciNet, find the BibTeX entry for that, c/p it to your running bibliography file, and refer ro it as needed. No fuss, no muss!

Boris said...

The convention is that you use BibTeX with the style "amsalpha" (or "alpha" if you don't like the AMS for some reason). Out of curiosity's sake, I'll try to dig up the actual way it generates the names ...

Boris said...

Apparently, amsalpha uses the following logic (modulo some considerations of when you should take editors or organizations instead of authors; there are lots of side cases with these kinds of things):

If there is only one last name, take the first three letters (or as many as you can if there aren't three).

If there is more than one name, take the initials. If there are more than four names, truncate after three and put a superscripted + after them.

Multi-part names are always abbreviated by initials. For example, a single author Mittag-Leffler would be labeled as "ML"; if there were four authors Erdos, Euler, Gauss, and Mittag-Leffler, you'd get "EEGML". Note that you include both initials of Mittag-Leffler so labels can actually be long. Also, accented letters appear to be included as is and alphabetized at the end.

The last two digits of the year are then added, even if the publication is from the 1800s or earlier. (This is sometimes pretty confusing!) If you don't have a year, don't add anything.

If two papers have the same label, disambiguate them by adding "a", "b", "c", etc. in order after them. If you had more than 26 papers with identical labels at this point, crash and burn. (I'm serious; that's the behavior. Never write more than 26 sole-authored papers in a year and cite them all? I'm sure this has had to come up in someone's prolific CV before.)

If you have two papers by the same author(s) with no years, then they do become "Auta" and "Autb". If you had two papers by "Au" with no year, and one paper by "Aua" with no year, then you get two labels that coincide as well as the incorrect ordering Aua < Aub < Aua. Edge cases suck. =)

Michael Lugo said...

Boris,

thanks! (Although if I actually figure out BibTeX, as people have said I should, I didn't need to know that.)

I did have a couple labels that "looked weird" to me, and it turns out they violate those rules. (One of the form [A-B08] for somebody with a hyphenated name, and one of the form [ABCDE07] for a five-author paper.)

Also, I suspect that 27 sole-authored papers in a year isn't something that happens; people who are that prolific usually have collaborators.

Another edge case: I have [FS90] which refers to a paper of Flajolet and Soria, and [FS08] which refers to the book of Flajolet and Sedgewick. It seems like a very slight defect in the system that these two labels have the same alphabetic component but different sets of authors.

david said...

For bibtex, I generally replace amsalpha with the style file math.bst, which you can download at the first google hit for "math.bst".

Suresh said...

And on the matter of label design: you should really be using auctex: at the very least it does section labels automatically, and with some customization can be persuaded to do other environment customization.

auctex is a package for xemacs, so if you use some other editor, you're out of luck though.

My dream config: xemacs + auctex + reftex (another emacs package for automatically finding cites in a bibtex file). Makes writing papers a breeze.

Dan said...

What, no vim love here? :)

First off, you really, really need to use BibTeX. Grabbing the entry from MathSciNet is easy and results in accurate and standardized citations. I would also try to add a DOI (digital object identifier) field and use a BibTeX style that supports DOIs. It's ridiculous how slow the math community has been to adopt something so simple and useful.

As for naming, I usually use a prefix that identifies what sort of thing the reference is: "sec" for sections and so on, "eqn" for equations, "tbl" and "fig" for tables, figures, and so on. Then I add a colon and a relatively understandable description, so I might get

eqn:foo-gf-as-cont-frac

for the generating function of foo as a continued fraction. Then, if you're using a smart editor (vim and emacs are both "smart"), you can configure it to do autocompletion: you type "\ref{eqn:f" and then hit some key and you can see all the possible completions and select one. This is a very fast way to work, and doesn't require you to memorize all your tags.

Learning to use BibTeX and a smart editor (and configuring it well for editing LaTeX documents) may take some time, but it really pays off.

CarlBrannen said...

I guess I would prefer the letter and year, but the journals have standards you have to follow.

As far as names, my labels are of form xx:yyyyyyyy where "xx" describes what is labelled, such as "fig" or "eq", and "yyyyyyyy" is a very very long description, with no numbers and no spaces or other strange characters. And I do it that way so that it copies with a single double click.

The other thing I hate is having to remember the rest of the garbage that goes around the "\ref" so I keep a generic copy at the beginning of each subsection border. So it looks something like this:

% Eq.~(\ref{eq:yyyyyy}) Fig.~(\ref{fig:yyyyyy})
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Then I go to the place where the equation to reference is, grab the reference by double click ^c, bring it back to where I'm writing, substitute it into the yyyyyy with double click ^v, and then copy the the whole "Eq.~(...)" into where it's supposed to go. The idea is to minimize the amount of knowledge about LaTeX that has to stay in my brain.

I hate having to write LaTeX so that it gives two columns.

By the way, I am a great fan of the memoir utility class for writing math books and have an example book written in that language. You can find the book, acrobat, and LaTeX source code links here, along with a description of the Lulu print on demand service which allows you to cheaply print nice hardbound scientific books, with pictures of same.

I would highly recommend the service for someone writing an thesis, for example, in order to have a nice copy for their parents, etc.

Boris said...

Frivolous digression on getting 26 identical labels: first of all, if you simply write a bunch of notes (like Dijkstra did and you did/still do) and decide to reference them all somewhere --- say a single bibliography file for your notes --- then it's quite possible to get more than 26 in a year.

But I decided to actually try with a few mathematicians. I checked Frank Harary first. He gets up to [Har67i] by himself and up to [HP66k] with Palmer 14 times, Plummer 1, and Prins 1. With three authors, he gets [HRW78c].

The actual natural first person to check, Paul Erdos, gets up to [Erd62p] by himself*! (And close in other years: [Erd64n], [Erd67l], [Erd65k], ...) With coauthors, we get [ES78g] which is less than Harary, but he gets it with five people: Sarkozy 3, Segal 1, Szabados 1, Szekeres 1, and Szemeredi 1. The best three authors he got was [ESS94c], but he did have 25 overall labels of the form [ESS00x], with 7 distinct subsets of 10 people.

* The Renyi Institute's "Erdos Project" actually lists 17 sole-authored papers from that year. They also say he wrote 49 (!!) published papers in 1978, of which 7 were sole-authored.

Drumi Bainov gets to [Bai67f] and [KB90j]. Let's check some more ...

Aha, finally! Lucien Godeaux has 29 sole-authored papers in 1950! My LaTeX+BibTeX crash on his bibliography. Note that there is actually a paper (called "Lucien Godeaux (1887--1975): sa vie---son Ĺ“uvre") that lists his entire bibliography. It takes 52 pages and apparently they don't use amsalpha; I don't have access to it, but I presume they simply don't have labels.

John Palmieri said...

One way to quickly download BibTeX references from MathSciNet is to use a perl script that I wrote: Bibweb. Share and Enjoy.

Anonymous said...

I use Kile for latex editing in linux, which keeps track of refs and labels, and JabRef for editing my bibtex files. It is awesome, it can search and add what you need from net, and it is written in java so you can use it anywhere.

Andrew Stacey said...

The main thing about BibTeX is that it follows the LaTeX core ethos: when you are actually writing the paper you shouldn't be concerned with what it looks like.

Thus if you use BibTeX (or another reference generator, there are alternatives), you don't think about what it looks like when you write it and if, at the end, you change what you want then you just have to change the choice of style. This can be particularly important when you decide that your paper is actually Annals material after all and not Chaos, Fractals, and Solitons as you first thought - different journals have different requirements[1].

"Learning" BibTeX is almost a vacuous statement. As "unapologetic" said, assembling a BibTeX reference file is extremely easy: MathSciNet exports BibTeX so it's as simple as Cut-and-Paste (that phrase seems to be replacing "as simple as ABC"). Save the file as, say, "myreferences.bib". Then you just need:

\bibliographystyle{style}

in the preamble[2],

\cite{ref}

in the text, and

\bibliography{myreferences}

just before the end (note no ".bib").

I've found some other useful LaTeX stuff for when one is actually writing the paper (as opposed to publishing it). Probably all are in a standard distribution, but links are to CTAN in case not:

srcltx: adds links to the dvi document so that with a "smart" editor, one can click on a place in the dvi and jump to the relevant line in the document.

showlabels: puts the label name in the margin so you can easily find where labels have been defined in the dvi version, can also be used to put labels in the margin where references are used.

changes: useful for collaborations, particularly long-distant, as authors can leave comments for each other and mark additions/deletions/modifications.

There's plenty more on CTAN!

Hope this helps a little.

[1] Though often the journal itself is unaware of its own requirements. It is not unknown for an editor to ask "What's this style file you're using? We can't get it to compile." only to receive the answer "It's your style file."

[2] The choice of bibliography style is something that, logically, should be made in the preamble. Internally, however, it can't be made until the document is actually started (it needs to wait until the aux file is opened). So LaTeX cunningly says "I'll save this choice for when the document is actually started.". Unfortunately, some class files redefine the \bibliographystyle and remove that cunning plan (not looking at anyone in particular *cough* A *cough* M *cough* S) so if something goes wrong with the BibTeX try moving the bibliographystyle file after the \begin{document}.

Dan said...

@CarlBrannen:

I agree with your sentiment about "having to remember the rest of the garbage that goes around the "\ref"", but there's an easier way than your suggestion: use the hyperref package, and use \autoref in your document. The \autoref macro automatically finds what sort of thing you've references (figure, table, theorem, etc) and hyperlinks the entire phrase -- all of, say, "Theorem 5" instead of just the "5". This makes things easier to click on.

If you look up \autoref in the hyperref manual, you can even change the text it gives, so you can use "Fig." or "Figure" or other languages, etc.

One problem is that \autoref has difficulty distinguishing between theorems, lemmas, propositions, and so on -- the thm-autoref package, part of thmtools, fixes this.

Writing a Research Paper said...

Many institutions limit access to their online information. Making this information available will be an asset to all.