Somehow, [Writing Academic Papers in Markdown](/post/2021/02/writing-academic-papers-in-markdown/) is one of my most popular blog posts. I'm glad so many (presumably academics) are looking to partially ditch LaTeX and separate content from markup! Pandoc is a wonderful tool that takes in a plain `.md` Markdown file and spits out whatever you'd like: Word, HTML, or of course, PDFs using a TeX engine of your choice---which is what we're interested in.
Writing a paper in Markdown is easy enough since most of the post processing is done by the conference or journal template you slap on afterwards. For my PhD dissertation, things are a bit more complicated, as I wanted to use the [tufte-book](https://www.latextemplates.com/template/tufte-style-book) document style. [Edward Tufte's books](https://www.edwardtufte.com/tufte/books_vdqi) are simply amazing. He's a statistics and visualization expert that has inspired an entire army of design and styling guidelines---including a TeX package. That means we can do things like this:
![](../tufte.jpg "An excerpt of an early chapter in my thesis.")
Tufte makes maximum use of margins: they can house margin figures, footnotes, references, or images can stretch into full width. The beautiful font face and styling is a free bonus.
But. I want to write primarily in Markdown, which means I'll need Pandoc's ability to convert it into `.tex`, which means tufte-book specific environments like `\newthought{blah}` are technically impossible to do unless you start mixing TeX and MD, again muddling the content---we don't want that. I stumbled on a lot of issues and had to jump through a lot of hoops in order to get the most out of it. In this post, I'll try to summarize all dirty hacks for prosperity.
Most custom stuff below is simply a Python script that gets executed _after_ running the `pandoc` command, but _before_ calling upon `xelatex` to render the PDF. This series of commands builds everything:
Because of the [Pandoc citation system](https://pandoc.org/MANUAL.html#citations), `@someref says that... and others say as well [@otherref].` will be translated into `\citet{someref} syas that... and others say as well \citep{otherref}.`. That's _excellent_, because tufte-book replaces `\cite{}` to make citations appear in the margin. And I don't want that, as it's a dissertation, quickly overrunning the margin.
We'll want to use `apacite` in conjunction with `natbibapa`, but leave the natbib options empty using these options:
Don't forget the `--metadata-file=metadata.yml` and `--natbib` Pandoc options. The [apacite package](https://www.ctan.org/pkg/apacite) will take care of your citations as long as you stick to the `@` notation that Pandoc translates. I killed a bunch of statements in the pandoc template that checks which citation system you use because I had trouble compiling but can't remember the specifics.
Okay, and what about possessive citations, like "Kaufman's (2009) framework is such and such"? By default, `@kaufman's framework` becomes "Kaufman (2009)'s framework". [This Overleaf hint](https://www.overleaf.com/learn/latex/Questions/How_do_I_create_a_possessive_or_genitive_citation%3F) inspired me to auto-replace `\citet{(\w+)}'s` into `\citeauthor{\1}'s \citeyearpar{\1}`.
Also, there's a couple of interesting Pandoc filters made by Tom Duck called [pandoc-fignos](https://github.com/tomduck/pandoc-fignos), -secnos, and -tablenos. They make it possible to avoid using `\ref{}` in your text, but unfortunately rely on header includes which get overridden by my `--include-in-header` flag to pass in custom preamble. Nevertheless, the filters inspired me to come up with something simple for myself.
This will translate
```
![#fig:label Some Caption](somefig.jpg)
@fig:label shows some cool graph. Blah blah. See also @pt1ch2-something for more details.
```
into
```
\begin{figure}
...
\label{fig:label}
...
\end{figure}
Figure~\ref{fig:label} shows some cool graph. Blah blah. See also Chapter~\ref{pt1ch2-something} for more details.
```
using a simple regex: `re.sub(r"\\citet{fig:(\w+)}", r"Figure~\\ref{fig:\1}", file)` (and the same for producing the image label). But why replace a `\citet{}`? See above; the Pandoc system auto-replaces `@blah` into `\citet{}`. But why adding in `Figure~`? I know there are packages that pandoc-fignos uses internally that take care of that for you but wanted to keep things simple. It also means I don't have to type "Chapter" or "Figure" each time in the source file.
## Figures
A lot of figures are misaligned depending on the left-hand or right-hand side since the caption appears in the margin. This is very irritating since adding or removing text moves them around, breaking the layout. That's fixed by hacking in `\checkoddpage \ifoddpage \forcerectofloat \else \forceversofloat \fi` just after each `\begin{figure}`, see [this GitHub issue](https://github.com/Tufte-LaTeX/tufte-latex/issues/144).
Another problem: how can you produce `\begin{figure*}`---note the star---to create full-width images spanning across the extended margin? By default, you can't. You can do this:
```
![](sup.jpg){width=100%}
```
And Pandoc will interpret the width ratio and produce `includegraphics[width=1\textwidth,height=\textheight]`. Which of course does not work, as it's still wrapped in a regular figure block. I had to regex for it, then go back up to find the enclosing block and add a `*`.
I have no solution for margin figures except for a custom property within `{}`that does more or less the same.
## Acronyms
Inspired by [pandoc-acro](https://kprussing.github.io/pandoc-acro/), I created a simplified version by replacing `\+([A-Z]\w+)` with `\ac{\1}`. That means you write:
The second `+SE` won't get unfolded but that's customizable, for instance if you want to do so for each new chapter.
Don't forget to include package `acro` and define each acronym in your preamble using `\DeclareAcronym{SE}{short = SE, long = Software Engineering}`. There's all kinds of options there for you to fiddle with as well. I also auto-replace `+SEs` with `\acfp{SE}`---the full plural version. If that's too much effort for you, just try out the original filter, but I wanted more control and already had a script that grepped around, so whatever.
## Layouting
Tufte starts his later books out with a "new thought" in each new chapter and section, where the first three or four words are capitalized and spread out. tufte-book supports this with `newthought{}`, but I don't want to add this manually in the Markdown file, hence another hack. It's too barebones (and dirty!) to share here but it boils down to:
1. Find all `\begin{section` blocks. Take optional `[]`s into account.
2. Scan for the next line that is not empty; a TeX command; or the start of a TeX block---in case of that last one, fast-forward to the first `\end{}`.
3. Break up the line, push the first words into `\newthought{}`, and save.
## Other TeX-specific settings
Remember that tufte-book by default doesn't show sections in the table of contents, and that dotted lines are absent. This can be fixed with: