brainbaking/content/post/2021/11/exporting-goodreads-to-obsi...

---
title: Exporting Goodreads to Obsidian
date: 2021-11-19T08:53:00+01:00
categories:
  - software
tags: 
  - obsidian
  - GoodReads
---

Writing a short review after reading a book became a soothing ritual for me ever since discovering [LibraryThing](https://www.librarything.com/) in 2011. Two years later, Goodreads attracted more and more attention, causing me to jump ship. Most non-fiction books I read produce many analog notes, ending in my personal knowledge management (PKM) system---of which [Obsidian](/tags/obsidian) has become a permanent and valuable addition. The digital review posted on Goodreads is both interesting for my later self and for my friends.

The problem is, my Goodreads reviews do not make it into my PKM system. All analog notes that [get scanned in](/post/2021/01/digitizing-journals-using-devonthink/)---including reading notes---are tagged, and thus searchable. But the Goodreads reviews are on Goodreads; an external service beyond the Brain Baking domain. IndieWeb folks' eyes might start twitching now, proclaiming to [use POSSE instead of the PESOS](/post/2021/03/the-indieweb-mixed-bag/) I did not even employ for Goodreads: _Publish Elsewhere, Syndicate to your Own Site._

That means pressing `⌘P` in Obsidian to apply a fuzzy match on a part of a book title results in no matches. There goes my attempt to build a working _memory extender_. The other day, I knew I've read a book but couldn't remember whether I liked it or not. I usually resort to a `gr` 
[Alfred workflow](/post/2021/01/the-productive-programmer-on-mac/) to quickly whip up the book's website to find my review. 

But what if I took more in-depth notes? These aren't linked to the review. 

### Solution 1: one-time CSV export

In your Goodreads account settings, there's a well-hidden button called "export", generating a CSV file of your entire library. Great, we can use that to generate Markdown `.md` files that seamlessly blend into the Obisidan Vault. I simply resorted to a combination of `csvparse` and `ejs` to map each record to a template, generating files:

```js
csvParse(readFileSync(csvfile), {
	columns: true
}).forEach(csv => {
	ejs.render(templates.goodreadsMarkdown, { item: {
		title: csv['Title'],
		// ...
	})
	writeFileSync(`${outputDir}/${filename}.md`, mddata, 'utf-8')
})
```

The template itself is a combination of structured frontmatter and unstructured text as human-readable content:

```md
---
title: "<%- item.title %>, <%- item.author %>"
isbn: <%- item.isbn %>
rating: <%- item.rating %>
average: <%- item.average %>
pages: <%- item.pages %>
date: <%- item.date %>
---

# <%- item.title %>

By **<%- item.author %>**

## Book data

[GoodReads ID/URL](https://www.goodreads.com/book/show/<%- item.id %>)

- ISBN13: <%- item.isbn %>
- Rating: <%- item.rating %> (average: <%- item.average %>)
- Published: <%- item.year %>
- Pages: <%- item.pages %>
- Date added/read: <%- item.date %>

## Review

<%- item.review -%>
```

Where `item.review` is the most valuable data, although I also like Goodread's 5-star rating system. 

This is essentially a one-time script. However, another problem arises: I keep on reading books, and I keep on adding their review on Goodreads. I don't want to periodically download a CSV file, say once a month. Can we do better? 

### Solution 2: automatic RSS export

Yes we can! Goodreads luckily provides a personal RSS feed where your reviews automatically appear (click on any shelf, for example your _My Books: read_ shelf, making a tiny RSS icon appear on the botton right). Partially reusing the above template and code is exactly what I did, except instead of reading a CSV file, I fetched the RSS endpoint and parsed it using `got` and `fast-xml-parser`:

```js
  const buffer = await got(rssendpoint, {
    responseType: "buffer",
    resolveBodyOnly: true,
    timeout: 5000,
    retry: 5
  })

  const books = parser.parse(buffer.toString(), {
    ignoreAttributes: false
  }).rss.channel.item
```

The only difference after that are the property names of the items in the `books` array. A few gotchas:

- The `user_date_added` property is formatted like `Sat, 13 Nov 2021 12:53:08 -0800` in RSS and `YYYY-MM-DD` in CSV
- The `user_review` property can contain HTML; convert `<br(.?)\/?>` to `\n`.
- The `title` and `author_name` properties can contain symbols that aren't compatible with your OS' filename requirements.
- How to determine which entries to parse in the RSS? I solved this by simply keeping track of the latest `book_id` entry; ignoring the rest.
- What to do when the file already exists---for instance, when I took digital notes in Obsidian before finishing the book and my review on Goodreads? Check with `existsSync` or similar. 

Add the RSS export script to your `crontab` and you're good to go. 

Success:

![](../obsidian-fuzzy.jpg)

Now I can auto-find and link my own reviews in Obsidian!
obsidian export article 2021-11-19 08:57:38 +01:00			`---`
			`title: Exporting Goodreads to Obsidian`
			`date: 2021-11-19T08:53:00+01:00`
			`categories:`
			`- software`
			`tags:`
			`- obsidian`
heavily modified archive tag index, added lots of tags for 2021/2022 2022-06-20 15:08:34 +02:00			`- GoodReads`
obsidian export article 2021-11-19 08:57:38 +01:00			`---`

analog to digital obs 2021-11-22 09:28:27 +01:00			Writing a short review after reading a book became a soothing ritual for me ever since discovering [LibraryThing](https://www.librarything.com/) in 2011. Two years later, Goodreads attracted more and more attention, causing me to jump ship. Most non-fiction books I read produce many analog notes, ending in my personal knowledge management (PKM) system---of which [Obsidian](/tags/obsidian) has become a permanent and valuable addition. The digital review posted on Goodreads is both interesting for my later self and for my friends.
obsidian export article 2021-11-19 08:57:38 +01:00
analog to digital obs 2021-11-22 09:28:27 +01:00			The problem is, my Goodreads reviews do not make it into my PKM system. All analog notes that [get scanned in](/post/2021/01/digitizing-journals-using-devonthink/)---including reading notes---are tagged, and thus searchable. But the Goodreads reviews are on Goodreads; an external service beyond the Brain Baking domain. IndieWeb folks' eyes might start twitching now, proclaiming to [use POSSE instead of the PESOS](/post/2021/03/the-indieweb-mixed-bag/) I did not even employ for Goodreads: _Publish Elsewhere, Syndicate to your Own Site._
obsidian export article 2021-11-19 08:57:38 +01:00
			That means pressing `⌘P` in Obsidian to apply a fuzzy match on a part of a book title results in no matches. There goes my attempt to build a working _memory extender_. The other day, I knew I've read a book but couldn't remember whether I liked it or not. I usually resort to a `gr`
			`[Alfred workflow](/post/2021/01/the-productive-programmer-on-mac/) to quickly whip up the book's website to find my review.`

			`But what if I took more in-depth notes? These aren't linked to the review.`

			`### Solution 1: one-time CSV export`

			In your Goodreads account settings, there's a well-hidden button called "export", generating a CSV file of your entire library. Great, we can use that to generate Markdown `.md` files that seamlessly blend into the Obisidan Vault. I simply resorted to a combination of `csvparse` and `ejs` to map each record to a template, generating files:

			```js
			`csvParse(readFileSync(csvfile), {`
			`columns: true`
			`}).forEach(csv => {`
			`ejs.render(templates.goodreadsMarkdown, { item: {`
			`title: csv['Title'],`
			`// ...`
			`})`
			writeFileSync(`${outputDir}/${filename}.md`, mddata, 'utf-8')
			`})`
			```

			`The template itself is a combination of structured frontmatter and unstructured text as human-readable content:`

			```md
			`---`
			`title: "<%- item.title %>, <%- item.author %>"`
			`isbn: <%- item.isbn %>`
			`rating: <%- item.rating %>`
			`average: <%- item.average %>`
			`pages: <%- item.pages %>`
			`date: <%- item.date %>`
			`---`

			`# <%- item.title %>`

			`By <%- item.author %>`

			`## Book data`

			`[GoodReads ID/URL](https://www.goodreads.com/book/show/<%- item.id %>)`

			`- ISBN13: <%- item.isbn %>`
			`- Rating: <%- item.rating %> (average: <%- item.average %>)`
			`- Published: <%- item.year %>`
			`- Pages: <%- item.pages %>`
			`- Date added/read: <%- item.date %>`

			`## Review`

			`<%- item.review -%>`
			```

			Where `item.review` is the most valuable data, although I also like Goodread's 5-star rating system.

			`This is essentially a one-time script. However, another problem arises: I keep on reading books, and I keep on adding their review on Goodreads. I don't want to periodically download a CSV file, say once a month. Can we do better?`

			`### Solution 2: automatic RSS export`

			Yes we can! Goodreads luckily provides a personal RSS feed where your reviews automatically appear (click on any shelf, for example your _My Books: read_ shelf, making a tiny RSS icon appear on the botton right). Partially reusing the above template and code is exactly what I did, except instead of reading a CSV file, I fetched the RSS endpoint and parsed it using `got` and `fast-xml-parser`:

			```js
			`const buffer = await got(rssendpoint, {`
			`responseType: "buffer",`
			`resolveBodyOnly: true,`
			`timeout: 5000,`
			`retry: 5`
			`})`

			`const books = parser.parse(buffer.toString(), {`
			`ignoreAttributes: false`
			`}).rss.channel.item`
			```

			The only difference after that are the property names of the items in the `books` array. A few gotchas:

			- The `user_date_added` property is formatted like `Sat, 13 Nov 2021 12:53:08 -0800` in RSS and `YYYY-MM-DD` in CSV
			- The `user_review` property can contain HTML; convert `<br(.?)\/?>` to `\n`.
			- The `title` and `author_name` properties can contain symbols that aren't compatible with your OS' filename requirements.
			- How to determine which entries to parse in the RSS? I solved this by simply keeping track of the latest `book_id` entry; ignoring the rest.
			- What to do when the file already exists---for instance, when I took digital notes in Obsidian before finishing the book and my review on Goodreads? Check with `existsSync` or similar.

			Add the RSS export script to your `crontab` and you're good to go.

			`Success:`

			`![](../obsidian-fuzzy.jpg)`

			`Now I can auto-find and link my own reviews in Obsidian!`