writing a tufte book in markdown

This commit is contained in:
Wouter Groeneveld 2022-11-15 20:28:52 +01:00
parent c2f160ecb2
commit 1e92eef110
4 changed files with 149 additions and 1 deletions

View File

@ -4,6 +4,7 @@ date: 2022-11-11T10:57:00+01:00
categories:
- webdesign
tags:
- blogging
- searching
---

View File

@ -0,0 +1,147 @@
---
title: "Writing a Tufte-book in Markdown"
date: '2022-11-15T19:13:00+01:00'
tags:
- writing
- pandoc
- Markdown
- latex
categories:
- software
---
Somehow, [Writing Academic Papers in Markdown](/post/2021/02/writing-academic-papers-in-markdown/) is one of my most popular blog posts. I'm glad so many (presumably academics) are looking to partially ditch LaTeX and separate content from markup! Pandoc is a wonderful tool that takes in a plain `.md` Markdown file and spits out whatever you'd like: Word, HTML, or of course, PDFs using a TeX engine of your choice---which is what we're interested in.
Writing a paper in Markdown is easy enough since most of the post processing is done by the conference or journal template you slap on afterwards. For my PhD dissertation, things are a bit more complicated, as I wanted to use the [tufte-book](https://www.latextemplates.com/template/tufte-style-book) document style. [Edward Tufte's books](https://www.edwardtufte.com/tufte/books_vdqi) are simply amazing. He's a statistics and visualization expert that has inspired an entire army of design and styling guidelines---including a TeX package. That means we can do things like this:
![](../tufte.jpg "An excerpt of an early chapter in my thesis.")
Tufte makes maximum use of margins: they can house margin figures, footnotes, references, or images can stretch into full width. The beautiful font face and styling is a free bonus.
But. I want to write primarily in Markdown, which means I'll need Pandoc's ability to convert it into `.tex`, which means tufte-book specific environments like `\newthought{blah}` are technically impossible to do unless you start mixing TeX and MD, again muddling the content---we don't want that. I stumbled on a lot of issues and had to jump through a lot of hoops in order to get the most out of it. In this post, I'll try to summarize all dirty hacks for prosperity.
Most custom stuff below is simply a Python script that gets executed _after_ running the `pandoc` command, but _before_ calling upon `xelatex` to render the PDF. This series of commands builds everything:
```
pandoc -f markdown \
-V documentclass=tufte-book \
--include-in-header=preamble.tex \
--include-before-body=voorblad.tex \
--pdf-engine=xelatex \
--natbib \
--template=../pandoc/templates/pandoc-tufte.tex \
--top-level-division=part \
--metadata-file=metadata.yml \
-t latex+smart \
--highlight-style=haddock > thesis.tex \
chapters/ch0-preface.md chapters/ch1-introduction.md \
chapters/pt1.md chapters/pt1ch1-whatever.md
python ../pandoc/filters/tufte-postprocessor.py thesis.tex
xelatex thesis.tex
bibtex thesis
xelatex thesis.tex
```
## References
Because of the [Pandoc citation system](https://pandoc.org/MANUAL.html#citations), `@someref says that... and others say as well [@otherref].` will be translated into `\citet{someref} syas that... and others say as well \citep{otherref}.`. That's _excellent_, because tufte-book replaces `\cite{}` to make citations appear in the margin. And I don't want that, as it's a dissertation, quickly overrunning the margin.
We'll want to use `apacite` in conjunction with `natbibapa`, but leave the natbib options empty using these options:
```
classoption: justified,symmetric,marginals=raggedright,notoc,numbers,nobib
# leave these intentionally blank!
natbiboptions:
biblio-style:
```
Don't forget the `--metadata-file=metadata.yml` and `--natbib` Pandoc options. The [apacite package](https://www.ctan.org/pkg/apacite) will take care of your citations as long as you stick to the `@` notation that Pandoc translates. I killed a bunch of statements in the pandoc template that checks which citation system you use because I had trouble compiling but can't remember the specifics.
Okay, and what about possessive citations, like "Kaufman's (2009) framework is such and such"? By default, `@kaufman's framework` becomes "Kaufman (2009)'s framework". [This Overleaf hint](https://www.overleaf.com/learn/latex/Questions/How_do_I_create_a_possessive_or_genitive_citation%3F) inspired me to auto-replace `\citet{(\w+)}'s` into `\citeauthor{\1}'s \citeyearpar{\1}`.
Also, there's a couple of interesting Pandoc filters made by Tom Duck called [pandoc-fignos](https://github.com/tomduck/pandoc-fignos), -secnos, and -tablenos. They make it possible to avoid using `\ref{}` in your text, but unfortunately rely on header includes which get overridden by my `--include-in-header` flag to pass in custom preamble. Nevertheless, the filters inspired me to come up with something simple for myself.
This will translate
```
![#fig:label Some Caption](somefig.jpg)
@fig:label shows some cool graph. Blah blah. See also @pt1ch2-something for more details.
```
into
```
\begin{figure}
...
\label{fig:label}
...
\end{figure}
Figure~\ref{fig:label} shows some cool graph. Blah blah. See also Chapter~\ref{pt1ch2-something} for more details.
```
using a simple regex: `re.sub(r"\\citet{fig:(\w+)}", r"Figure~\\ref{fig:\1}", file)` (and the same for producing the image label). But why replace a `\citet{}`? See above; the Pandoc system auto-replaces `@blah` into `\citet{}`. But why adding in `Figure~`? I know there are packages that pandoc-fignos uses internally that take care of that for you but wanted to keep things simple. It also means I don't have to type "Chapter" or "Figure" each time in the source file.
## Figures
A lot of figures are misaligned depending on the left-hand or right-hand side since the caption appears in the margin. This is very irritating since adding or removing text moves them around, breaking the layout. That's fixed by hacking in `\checkoddpage \ifoddpage \forcerectofloat \else \forceversofloat \fi` just after each `\begin{figure}`, see [this GitHub issue](https://github.com/Tufte-LaTeX/tufte-latex/issues/144).
Another problem: how can you produce `\begin{figure*}`---note the star---to create full-width images spanning across the extended margin? By default, you can't. You can do this:
```
![](sup.jpg){width=100%}
```
And Pandoc will interpret the width ratio and produce `includegraphics[width=1\textwidth,height=\textheight]`. Which of course does not work, as it's still wrapped in a regular figure block. I had to regex for it, then go back up to find the enclosing block and add a `*`.
I have no solution for margin figures except for a custom property within `{}`that does more or less the same.
## Acronyms
Inspired by [pandoc-acro](https://kprussing.github.io/pandoc-acro/), I created a simplified version by replacing `\+([A-Z]\w+)` with `\ac{\1}`. That means you write:
```
Te +SE world is a peculiar one.
Many students in +SE don't know how to grok node.
```
Will become in the PDF text:
```
The Software Engineering (SE) world is a peculiar one.
Many sutdents in SE don't know how to grok node.
```
The second `+SE` won't get unfolded but that's customizable, for instance if you want to do so for each new chapter.
Don't forget to include package `acro` and define each acronym in your preamble using `\DeclareAcronym{SE}{short = SE, long = Software Engineering}`. There's all kinds of options there for you to fiddle with as well. I also auto-replace `+SEs` with `\acfp{SE}`---the full plural version. If that's too much effort for you, just try out the original filter, but I wanted more control and already had a script that grepped around, so whatever.
## Layouting
Tufte starts his later books out with a "new thought" in each new chapter and section, where the first three or four words are capitalized and spread out. tufte-book supports this with `newthought{}`, but I don't want to add this manually in the Markdown file, hence another hack. It's too barebones (and dirty!) to share here but it boils down to:
1. Find all `\begin{section` blocks. Take optional `[]`s into account.
2. Scan for the next line that is not empty; a TeX command; or the start of a TeX block---in case of that last one, fast-forward to the first `\end{}`.
3. Break up the line, push the first words into `\newthought{}`, and save.
## Other TeX-specific settings
Remember that tufte-book by default doesn't show sections in the table of contents, and that dotted lines are absent. This can be fixed with:
```
\renewcommand*\l@section{\@dottedtocline{1}{0em}{2.3em}}
\renewcommand*\l@figure{\@dottedtocline{1}{0em}{2.3em}}
\setcounter{secnumdepth}{1}
\setcounter{tocdepth}{1}
```
I also use [titletoc](http://ctan.org/pkg/titletoc) to customize the styling of the title.
If you're interested to get things up and running but encounter difficulties, feel free to reach out, I'm happy to share scripts and source material!

Binary file not shown.

After

Width:  |  Height:  |  Size: 174 KiB

View File

@ -86,7 +86,7 @@ pre code
.page-header
padding-bottom: 9px
margin: 3em 0 0.9em
margin: 2em 0 0.9em
border-bottom: 1px solid #eee
text-align: center