michael orlitzky

Makeing LaTeX

posted 2016-11-18; updated 2020-05-01

Update 2020-05-01
Wojtek Kosior noticed a bug in my recursive $(MAKE) that ran pdflatex one time too many. Trying to keep this article 100% up-to-date would be a losing battle, so please refer to the copy of GNUmakefile in my gitweb repository for future improvements.

This article is about using GNU Make to build LaTeX documents. If you don't know what those things are, forget I even said anything.

the problem

To create a LaTeX document, you need to compile a source file containing markup, references, and whatever else. Some parts of the document depend on other parts, and full rebuilds can take a long time. Ideally the compilation process could be automated and redundant rebuilds eliminated. That's what a build system does.

…but, LaTeX documents are pathological. Normally, to compile something, you perform an action on it once and you're done. Or maybe two times. Some fixed number of times. LaTeX documents, on the other hand, need to be (re)compiled indefinitely, until the output file stops changing. There's no way to express that idea in existing build systems, because nothing else is so stupid as to require it.

What's more, the LaTeX toolchain currently sticks bullshit like the current date and time into the output file, so in fact, the output document never stops changing. Basically: it isn't easy to use an existing build system to automate the creation of a LaTeX document. Most people wind up doing the following:

user $ pdflatex example.tex

user $ pdflatex example.tex

user $ pdflatex example.tex

That's stupid, and takes three times longer than is usually necessary.

tools

Build system
GNU Make. Some things will be portable to other Make implementations, but later on, I'm going to use conditional expressions.
LaTeX compiler
pdflatex, part of the pdfTeX suite. It creates PDFs, and that's what you want. Neither XeTeX nor LuaTeX are supported by publishers.
Bibliography management
BibTeX. I know BibLaTeX and Biber are better, but if you want to publish, you need to use the old and busted BibTeX.
Other
The kpsewhich utility (from kpathsea) is used to locate bibliography databases.The cmp tool from GNU Diffutils lets us know when the output file stops changing. GNU sed removes timestamps from the output files. And GNU coreutils is basically always assumed.

a fixed point

The rule that we would like to encode is, “rebuild this document until it stops changing.” That isn't possible using the standard rules, but since we can run shell shell commands, we can fake it. Let's walk through a GNUmakefile from the top.

First, encode the LaTeX compiler command in a variable. This lets you add options onto it at a later point without having to find and replace every invocation of it.

LATEX = pdflatex

Next, let's define the “project name,” or “paper name” if you prefer. This makes it easy to reuse this build system for another document. Don't be a retard and put spaces in your file name. If you do, you're responsible for adding quotes to the rest of this article.

PN = example

Finally, create a variable containing a list of all “source” files—basically, the inputs for your document.

SRCS = $(PN).tex

Everything is nice and simple so far. We're set up to build example.pdf from example.tex, once we create the latter. Here's a sample example.tex file:

\documentclass{article}
\begin{document}
  Hello, world!
\end{document}

Now let's try to build it and see what happens…

user $ pdflatex example.tex

This is pdfTeX...

...

Output written on example.pdf (1 page, 12039 bytes).

Transcript written on example.log.

Great, everything went according to plan, and example.pdf is now sitting right next to example.tex. All we have to do is repeat that process until the PDF file stops changing.

not

Just kidding. If you open example.pdf with a text editor, you'll find some lines like

/CreationDate (D:20161116130421-05'00')
/ModDate (D:20161116130421-05'00')

Those are going to change every time we rebuild the file, so “rebuild until the result stops changing” is going to rebuild for eternity.

reproducible builds

Before we can do anything else, we need to ditch those timestamps. Thanks to Debian's reproducible builds initiative, that's less horrible than it could be.

The easy way

If you're using a newer version of pdflatex, then it will respect the SOURCE_DATE_EPOCH environment variable, which you can set to zero before building your document. An appropriate version of pdfTeX (v1.40.17 or newer) ships with TeXLive 2016.

In that case, all you have to do is modify your LaTeX compiler command:

# Our LaTeX compiler command. The value of SOURCE_DATE_EPOCH
# will be used as the creation/modification date in the
# resulting PDF file, and setting it to zero lets us get
# repeatable results.
LATEX = SOURCE_DATE_EPOCH=0 pdflatex

The hard way

If you're stuck on an older version of pdflatex, then you'll have to clobber the CreationDate and ModDate entries yourself, after the PDF has been created. This isn't as bad as it sounds. The following command will replace those fields with “zero” dates.

user $ sed --in-place \ -e '/^\/ID \[<.*>\]/d' \ -e "s/^\/\(ModDate\) (.*)/\/\1 (D:19700101000000Z00'00')/" \ -e "s/^\/\(CreationDate\) (.*)/\/\\1 (D:19700101000000Z00'00')/" \ example.pdf

a fixed point, again

Where were we? Right, building the PDF—let's start with something simple. This says that example.pdf depends on all of our source files, and that we should always run the LaTeX compiler on the TeX file at least once to create a PDF:

$(PN).pdf: $(SRCS)
	$(LATEX) $(PN).tex

After we create a PDF, we strip the timestamps out of it. The funny $@ variable simply refers to the PDF file.

	...
	sed --in-place \
	  -e '/^\/ID \[<.*>\]/d' \
	  -e "s/^\/\(ModDate\) (.*)/\/\1 (D:19700101000000Z00'00')/" \
	  -e "s/^\/\(CreationDate\) (.*)/\/\\1 (D:19700101000000Z00'00')/" \
	  $@

Next, we compare our new PDF to an old one if an old one exists. If no older PDF exists, then we can't possibly be done. Let's handle that case first. If this is the first pass at generating a PDF, we'll simply pretend that the new one is the previous one and invoke the do-over protocol. This renames example.pdf to example.pdf.previous and then starts over:

	...
	if [ ! -f $@.previous ]; then \
	  mv $@ $@.previous; \
	  $(MAKE) $@; \
	fi;

But what will happen the next time around? Since a “previous” file exists, we won't simply rename the new PDF and restart. Instead, we want to use the cmp utility to check whether or not the new PDF is the same as the old one. If it is, we can simply delete the old one, because we're done. Otherwise, we want to start over again. In the latter case, we overwrite example.pdf.previous with our new example.pdf before starting over.

	...
	if cmp -s $@ $@.previous; then \
	  rm $@.previous; \
	else \
	  mv $@ $@.previous; \
	  $(MAKE) $@; \
	fi;

recap

If you put everything together, you get something like this:

LATEX = pdflatex
PN = example
SRCS = $(PN).tex

$(PN).pdf: $(SRCS)
	$(LATEX) $(PN).tex

	sed --in-place \
	  -e '/^\/ID \[<.*>\]/d' \
	  -e "s/^\/\(ModDate\) (.*)/\/\1 (D:19700101000000Z00'00')/" \
	  -e "s/^\/\(CreationDate\) (.*)/\/\\1 (D:19700101000000Z00'00')/" \
	  $@

	if [ ! -f $@.previous ]; then \
	  mv $@ $@.previous; \
	  $(MAKE) $@; \
	fi;

	if cmp -s $@ $@.previous; then \
	  rm $@.previous; \
	else \
	  mv $@ $@.previous; \
	  $(MAKE) $@; \
	fi;

That's not perfect, but it's good enough for simple documents.

bells und vhistles

If your LaTeX document contains citations and cross-references, then it requires an auxiliary file, named, for example, example.aux. The auxiliary file is (re)created during each compilation pass and is used by the bibliography, but our build system doesn't know that yet. Let's specify where the auxiliary file comes from.

$(PN).aux: $(SRCS)
	$(LATEX) $(PN).tex

What about the bibliography? If you want to use BibTeX with a central database, then you'll need more stuff. First, define a variable to hold your bibliography database(s):

# A space-separated list of bib files. These must all belong
# to paths contained in your $BIBINPUTS environment variable
# (or the current directory).
#
# Comment it out if you don't use a bibliography database.
#
BIBS = references.bib

Now, we'll need to add BIBS to SRCS so that changes in the bibliography database trigger a rebuild. We use kpsewhich to find the path to references.bib so that you can store it in a central location (to be used in other documents). Warning: the ifdef/endif below only work with GNU Make.

# Use kpsewhich (from the kpathsea suite) to find the absolute
# paths of the bib files listed in in $(BIBS).
ifdef BIBS
BIBPATHS = $(shell kpsewhich $(BIBS))
SRCS += $(BIBPATHS)
endif

When BibTeX creates the bibliography for example.tex, it puts it in a file named example.bbl. The final PDF document thus depends on example.bbl, so we need to go back and modify the prerequisites of the rule that generates our PDF file.

Old and busted:

$(PN).pdf: $(SRCS)

New hotness:

$(PN).pdf: $(SRCS) $(PN).bbl

All that's left is the rule to (re)generate example.bbl. This rule is a little tricky.

The example.aux file is recreated during every compilation pass, although its contents won't change. Normally that would trigger a rebuild of example.bbl, but we don't want to do that if the only thing that changed is the timestamp on example.aux. Why? Because example.pdf depends on example.bbl, and if we rebuild the latter, that will trigger a rebuild of the former. But rebuilding example.pdf requires another compilation pass, which recreates example.aux, and would trigger a rebuild of example.bbl… sending us into an infinite loop if we actually allowed it to happen. An order-only prerequisite (the bar-pipe thingy) lets us avoid the problem by ignoring the timestamp on example.aux.

And since we're not really depending on example.aux any more, we have to add SRCS as a prerequisite in order to rebuild the bibliography when the source document changes.

But there's more: bibtex doesn't like to be called on an auxiliary file that contains no citations. Another set of ifdef/endif lets us do the right thing when BIBS is empty or unset. That means you must unset (or comment out) BIBS if you don't cite anything.

$(PN).bbl: $(SRCS) | $(PN).aux
ifdef BIBS
	bibtex $(PN).aux
else
	echo -n '' > $@
endif

tl;dr

Here, use this. A maintained version is part of my mjotex git repository.

LATEX = pdflatex
PN = example

# A space-separated list of bib files. These must all belong
# to paths contained in your $BIBINPUTS environment variable
# (or the current directory).
#
# Leave commented if you don't use a bibliography database.
#
#BIBS = references.bib


SRCS = $(PN).tex
ifdef BIBS
BIBPATHS = $(shell kpsewhich $(BIBS))
SRCS += $(BIBPATHS)
endif

$(PN).pdf: $(SRCS) $(PN).bbl
	$(LATEX) $(PN).tex

	sed --in-place \
	  -e '/^\/ID \[<.*>\]/d' \
	  -e "s/^\/\(ModDate\) (.*)/\/\1 (D:19700101000000Z00'00')/" \
	  -e "s/^\/\(CreationDate\) (.*)/\/\\1 (D:19700101000000Z00'00')/" \
	  $@

	if [ ! -f $@.previous ]; then \
	  mv $@ $@.previous; \
	  $(MAKE) $@; \
	fi;

	if cmp -s $@ $@.previous; then \
	  rm $@.previous; \
	else \
	  mv $@ $@.previous; \
	  $(MAKE) $@; \
	fi;


$(PN).aux: $(SRCS)
	$(LATEX) $(PN).tex


$(PN).bbl: $(SRCS) | $(PN).aux
ifdef BIBS
	bibtex $(PN).aux
else
	echo -n '' > $@
endif