brainbaking/content/post/2021/11/exporting-goodreads-to-obsi...

4.8 KiB

title date categories tags
Exporting Goodreads to Obsidian 2021-11-19T08:53:00+01:00
software
obsidian
GoodReads

Writing a short review after reading a book became a soothing ritual for me ever since discovering LibraryThing in 2011. Two years later, Goodreads attracted more and more attention, causing me to jump ship. Most non-fiction books I read produce many analog notes, ending in my personal knowledge management (PKM) system---of which Obsidian has become a permanent and valuable addition. The digital review posted on Goodreads is both interesting for my later self and for my friends.

The problem is, my Goodreads reviews do not make it into my PKM system. All analog notes that get scanned in---including reading notes---are tagged, and thus searchable. But the Goodreads reviews are on Goodreads; an external service beyond the Brain Baking domain. IndieWeb folks' eyes might start twitching now, proclaiming to use POSSE instead of the PESOS I did not even employ for Goodreads: Publish Elsewhere, Syndicate to your Own Site.

That means pressing ⌘P in Obsidian to apply a fuzzy match on a part of a book title results in no matches. There goes my attempt to build a working memory extender. The other day, I knew I've read a book but couldn't remember whether I liked it or not. I usually resort to a gr Alfred workflow to quickly whip up the book's website to find my review.

But what if I took more in-depth notes? These aren't linked to the review.

Solution 1: one-time CSV export

In your Goodreads account settings, there's a well-hidden button called "export", generating a CSV file of your entire library. Great, we can use that to generate Markdown .md files that seamlessly blend into the Obisidan Vault. I simply resorted to a combination of csvparse and ejs to map each record to a template, generating files:

csvParse(readFileSync(csvfile), {
	columns: true
}).forEach(csv => {
	ejs.render(templates.goodreadsMarkdown, { item: {
		title: csv['Title'],
		// ...
	})
	writeFileSync(`${outputDir}/${filename}.md`, mddata, 'utf-8')
})

The template itself is a combination of structured frontmatter and unstructured text as human-readable content:

---
title: "<%- item.title %>, <%- item.author %>"
isbn: <%- item.isbn %>
rating: <%- item.rating %>
average: <%- item.average %>
pages: <%- item.pages %>
date: <%- item.date %>
---

# <%- item.title %>

By **<%- item.author %>**

## Book data

[GoodReads ID/URL](https://www.goodreads.com/book/show/<%- item.id %>)

- ISBN13: <%- item.isbn %>
- Rating: <%- item.rating %> (average: <%- item.average %>)
- Published: <%- item.year %>
- Pages: <%- item.pages %>
- Date added/read: <%- item.date %>

## Review

<%- item.review -%>

Where item.review is the most valuable data, although I also like Goodread's 5-star rating system.

This is essentially a one-time script. However, another problem arises: I keep on reading books, and I keep on adding their review on Goodreads. I don't want to periodically download a CSV file, say once a month. Can we do better?

Solution 2: automatic RSS export

Yes we can! Goodreads luckily provides a personal RSS feed where your reviews automatically appear (click on any shelf, for example your My Books: read shelf, making a tiny RSS icon appear on the botton right). Partially reusing the above template and code is exactly what I did, except instead of reading a CSV file, I fetched the RSS endpoint and parsed it using got and fast-xml-parser:

  const buffer = await got(rssendpoint, {
    responseType: "buffer",
    resolveBodyOnly: true,
    timeout: 5000,
    retry: 5
  })

  const books = parser.parse(buffer.toString(), {
    ignoreAttributes: false
  }).rss.channel.item

The only difference after that are the property names of the items in the books array. A few gotchas:

  • The user_date_added property is formatted like Sat, 13 Nov 2021 12:53:08 -0800 in RSS and YYYY-MM-DD in CSV
  • The user_review property can contain HTML; convert <br(.?)\/?> to \n.
  • The title and author_name properties can contain symbols that aren't compatible with your OS' filename requirements.
  • How to determine which entries to parse in the RSS? I solved this by simply keeping track of the latest book_id entry; ignoring the rest.
  • What to do when the file already exists---for instance, when I took digital notes in Obsidian before finishing the book and my review on Goodreads? Check with existsSync or similar.

Add the RSS export script to your crontab and you're good to go.

Success:

Now I can auto-find and link my own reviews in Obsidian!