171 lines
10 KiB
Markdown
171 lines
10 KiB
Markdown
---
|
|
title: "Writing a Tufte-book in Markdown"
|
|
date: '2022-11-15T19:13:00+01:00'
|
|
tags:
|
|
- writing
|
|
- pandoc
|
|
- Markdown
|
|
- latex
|
|
categories:
|
|
- software
|
|
---
|
|
|
|
Somehow, [Writing Academic Papers in Markdown](/post/2021/02/writing-academic-papers-in-markdown/) is one of my most popular blog posts. I'm glad so many people are looking to partially ditch LaTeX and separate content from markup! Pandoc is a wonderful tool that takes in a plain `.md` Markdown file and spits out whatever you'd like: Word, HTML, or of course, PDFs using a TeX engine of your choice---which is what we're interested in.
|
|
|
|
Writing a paper in Markdown is easy enough since most of the post processing is done by the conference or journal template you slap on afterwards. For my PhD dissertation, things are a bit more complicated, as I wanted to use the [tufte-book](https://www.latextemplates.com/template/tufte-style-book) document style. [Edward Tufte's books](https://www.edwardtufte.com/tufte/books_vdqi) are simply amazing. He's a statistics and visualization expert that has inspired an entire army of design and styling guidelines---including a TeX package. That means we can do things like this:
|
|
|
|
![](../tufte.jpg "An excerpt of an early chapter in my thesis.")
|
|
|
|
Tufte makes maximum use of margins: they can house margin figures, footnotes, references, or images can stretch into full width. The beautiful font face and styling is a free bonus.
|
|
|
|
But. I want to write primarily in Markdown, which means I'll need Pandoc's ability to convert it into `.tex`, which means tufte-book specific environments like `\newthought{blah}` are technically impossible to do unless you start mixing TeX and MD, again muddling the content---we don't want that. I stumbled on a lot of issues and had to jump through a lot of hoops in order to get the most out of it. In this post, I'll try to summarize all dirty hacks for prosperity.
|
|
|
|
Most custom stuff below is simply a Python script that gets executed _after_ running the `pandoc` command, but _before_ calling upon `xelatex` to render the PDF. This series of commands builds everything:
|
|
|
|
```
|
|
pandoc -f markdown \
|
|
-V documentclass=tufte-book \
|
|
--include-in-header=preamble.tex \
|
|
--include-before-body=voorblad.tex \
|
|
--pdf-engine=xelatex \
|
|
--natbib \
|
|
--template=../pandoc/templates/pandoc-tufte.tex \
|
|
--top-level-division=part \
|
|
--metadata-file=metadata.yml \
|
|
-t latex+smart \
|
|
--highlight-style=haddock > thesis.tex \
|
|
chapters/ch0-preface.md chapters/ch1-introduction.md \
|
|
chapters/pt1.md chapters/pt1ch1-whatever.md
|
|
python ../pandoc/filters/tufte-postprocessor.py thesis.tex
|
|
xelatex thesis.tex
|
|
bibtex thesis
|
|
xelatex thesis.tex
|
|
```
|
|
|
|
## References
|
|
|
|
Because of the [Pandoc citation system](https://pandoc.org/MANUAL.html#citations), `@someref says that... and others say as well [@otherref].` will be translated into `\citet{someref} syas that... and others say as well \citep{otherref}.`. That's _excellent_, because tufte-book replaces `\cite{}` to make citations appear in the margin. And I don't want that, as it's a dissertation, quickly overrunning the margin.
|
|
|
|
We'll want to use `apacite` in conjunction with `natbibapa`, but leave the natbib options empty using these options:
|
|
|
|
```
|
|
classoption: justified,symmetric,marginals=raggedright,notoc,numbers,nobib
|
|
# leave these intentionally blank!
|
|
natbiboptions:
|
|
biblio-style:
|
|
```
|
|
|
|
Don't forget the `--metadata-file=metadata.yml` and `--natbib` Pandoc options. The [apacite package](https://www.ctan.org/pkg/apacite) will take care of your citations as long as you stick to the `@` notation that Pandoc translates. I killed a bunch of statements in the pandoc template that checks which citation system you use because I had trouble compiling but can't remember the specifics.
|
|
|
|
Okay, and what about possessive citations, like "Kaufman's (2009) framework is such and such"? By default, `@kaufman's framework` becomes "Kaufman (2009)'s framework". [This Overleaf hint](https://www.overleaf.com/learn/latex/Questions/How_do_I_create_a_possessive_or_genitive_citation%3F) inspired me to auto-replace `\citet{(\w+)}'s` into `\citeauthor{\1}'s \citeyearpar{\1}`.
|
|
|
|
Also, there's a couple of interesting Pandoc filters made by Tom Duck called [pandoc-fignos](https://github.com/tomduck/pandoc-fignos), -secnos, and -tablenos. They make it possible to avoid using `\ref{}` in your text, but unfortunately rely on header includes which get overridden by my `--include-in-header` flag to pass in custom preamble. Nevertheless, the filters inspired me to come up with something simple for myself.
|
|
|
|
This will translate
|
|
|
|
```
|
|
![#fig:label Some Caption](somefig.jpg)
|
|
|
|
@fig:label shows some cool graph. Blah blah. See also @pt1ch2-something for more details.
|
|
```
|
|
|
|
into
|
|
|
|
```
|
|
\begin{figure}
|
|
...
|
|
\label{fig:label}
|
|
...
|
|
\end{figure}
|
|
|
|
Figure~\ref{fig:label} shows some cool graph. Blah blah. See also Chapter~\ref{pt1ch2-something} for more details.
|
|
```
|
|
|
|
using a simple regex: `re.sub(r"\\citet{fig:(\w+)}", r"Figure~\\ref{fig:\1}", file)` (and the same for producing the image label). But why replace a `\citet{}`? See above; the Pandoc system auto-replaces `@blah` into `\citet{}`. But why adding in `Figure~`? I know there are packages that pandoc-fignos uses internally that take care of that for you but wanted to keep things simple. It also means I don't have to type "Chapter" or "Figure" each time in the source file.
|
|
|
|
|
|
## Figures
|
|
|
|
A lot of figures are misaligned depending on the left-hand or right-hand side since the caption appears in the margin. This is very irritating since adding or removing text moves them around, breaking the layout. That's fixed by hacking in `\checkoddpage \ifoddpage \forcerectofloat \else \forceversofloat \fi` just after each `\begin{figure}`, see [this GitHub issue](https://github.com/Tufte-LaTeX/tufte-latex/issues/144).
|
|
|
|
You probably also want to make use of _short captions_, otherwise the list of figures will be very confusing. The Pandoc filter https://github.com/martisak/pandoc-shortcaption is great for that, and converts
|
|
|
|
```
|
|
![a very very long explanation of an image](image.jpg "the short version")
|
|
```
|
|
|
|
Into a `\caption[the short version]{a very very long explanation of an image}`. That way, you can simply make use of Markdowns built-in support for _alt_ texts.
|
|
|
|
Another problem: how can you produce `\begin{figure*}`---note the star---to create full-width images spanning across the extended margin? By default, you can't. You can do this:
|
|
|
|
```
|
|
![](sup.jpg){width=100%}
|
|
```
|
|
|
|
And Pandoc will interpret the width ratio and produce `includegraphics[width=1\textwidth,height=\textheight]`. Which of course does not work, as it's still wrapped in a regular figure block. I had to regex for it, then go back up to find the enclosing block and add a `*`.
|
|
|
|
I have no solution for margin figures except for a custom property within `{}`that does more or less the same.
|
|
|
|
As for tables, Pandoc generates `longtable` blocks instead of regular ones, and it's full of weird crap. Most of the tables I have require special TeX commands anyway, for instance to rotate certain column headers, so I gave up and simply relied on TeX for those blocks instead. In that case, if Pandoc acts weird and starts translating `\` to `\textbacklash` instead of your TeX command, try [encapsulating the whole block](https://github.com/jgm/pandoc/issues/4473) in a `{=latex}` Markdown code block.
|
|
|
|
If you want subtables: do not use the deprecated `subfigure` package which is incompatible with tufte-book! `booktabs` and `subfig` (with `caption=false`) does the trick, see [this stackexchange post](https://tex.stackexchange.com/questions/87364/problem-with-tufte-book-and-subfigure).
|
|
|
|
## Acronyms
|
|
|
|
Inspired by [pandoc-acro](https://kprussing.github.io/pandoc-acro/), I created a simplified version by replacing `\s\+([A-Z]\w+)` with ` \ac{\1}`. That means you write:
|
|
|
|
```
|
|
Te +SE world is a peculiar one.
|
|
|
|
Many students in +SE don't know how to grok Node.
|
|
```
|
|
|
|
Will become in the PDF text:
|
|
|
|
```
|
|
The Software Engineering (SE) world is a peculiar one.
|
|
|
|
Many students in SE don't know how to grok Node.
|
|
```
|
|
|
|
The second `+SE` won't get unfolded but that's customizable, for instance if you want to do so for each new chapter.
|
|
|
|
Don't forget to include package `acro` and define each acronym in your preamble using `\DeclareAcronym{SE}{short = SE, long = Software Engineering}`. There's all kinds of options there for you to fiddle with as well. I also auto-replace `+SEs` with `\acfp{SE}`---the full plural version. If that's too much effort for you, just try out the original filter, but I wanted more control and already had a script that grepped around, so whatever.
|
|
|
|
|
|
## Layouting
|
|
|
|
Tufte starts out his later books with a "new thought" in each new chapter and section, where the first three or four words are capitalized and spread out. tufte-book supports this with `newthought{}`, but I don't want to add this manually in the Markdown file, hence another hack. It's too barebones (and dirty!) to share here but it boils down to:
|
|
|
|
1. Find all `\begin{section` blocks. Take optional `[]`s into account.
|
|
2. Scan for the next line that is not empty; a TeX command; or the start of a TeX block---in case of that last one, fast-forward to the first `\end{}`.
|
|
3. Break up the line, push the first words into `\newthought{}`, and save.
|
|
|
|
As for text alignment, tufte-book uses left alignment instead of a justified one as Tufte believes it's easier to read. I think I agree, but as it's an academic text-heavy work, I still like it to be justified. The `justified` option for the document breaks more than it fixes though, as a lot of hyphenation errors occurred, to the point that fixing them manually `\hyphen{}` was fruitless. Thanks to [this blog article](https://sumanta679.wordpress.com/2009/05/20/latex-justify-without-hyphenation/), adding
|
|
|
|
```
|
|
\tolerance=1
|
|
\emergencystretch=\maxdimen
|
|
\hyphenpenalty=10000
|
|
\hbadness=10000
|
|
```
|
|
|
|
Creates a Word-like justified style, spreading out words rather than breaking them.
|
|
|
|
## Other TeX-specific settings
|
|
|
|
Remember that tufte-book by default doesn't show sections in the table of contents, and that dotted lines are absent. This can be fixed with:
|
|
|
|
```
|
|
\renewcommand*\l@section{\@dottedtocline{1}{0em}{2.3em}}
|
|
\renewcommand*\l@figure{\@dottedtocline{1}{0em}{2.3em}}
|
|
|
|
\setcounter{secnumdepth}{1}
|
|
\setcounter{tocdepth}{1}
|
|
```
|
|
|
|
I also use [titletoc](http://ctan.org/pkg/titletoc) to customize the styling of the title.
|
|
|
|
If you're interested to get things up and running but encounter difficulties, feel free to reach out, I'm happy to share scripts and source material!
|
|
|