brainbaking/content/post/2021/11/exporting-goodreads-to-obsi...

103 lines
4.8 KiB
Markdown
Raw Normal View History

2021-11-19 08:57:38 +01:00
---
title: Exporting Goodreads to Obsidian
date: 2021-11-19T08:53:00+01:00
categories:
- software
tags:
- obsidian
- GoodReads
2021-11-19 08:57:38 +01:00
---
2021-11-22 09:28:27 +01:00
Writing a short review after reading a book became a soothing ritual for me ever since discovering [LibraryThing](https://www.librarything.com/) in 2011. Two years later, Goodreads attracted more and more attention, causing me to jump ship. Most non-fiction books I read produce many analog notes, ending in my personal knowledge management (PKM) system---of which [Obsidian](/tags/obsidian) has become a permanent and valuable addition. The digital review posted on Goodreads is both interesting for my later self and for my friends.
2021-11-19 08:57:38 +01:00
2021-11-22 09:28:27 +01:00
The problem is, my Goodreads reviews do not make it into my PKM system. All analog notes that [get scanned in](/post/2021/01/digitizing-journals-using-devonthink/)---including reading notes---are tagged, and thus searchable. But the Goodreads reviews are on Goodreads; an external service beyond the Brain Baking domain. IndieWeb folks' eyes might start twitching now, proclaiming to [use POSSE instead of the PESOS](/post/2021/03/the-indieweb-mixed-bag/) I did not even employ for Goodreads: _Publish Elsewhere, Syndicate to your Own Site._
2021-11-19 08:57:38 +01:00
That means pressing `⌘P` in Obsidian to apply a fuzzy match on a part of a book title results in no matches. There goes my attempt to build a working _memory extender_. The other day, I knew I've read a book but couldn't remember whether I liked it or not. I usually resort to a `gr`
[Alfred workflow](/post/2021/01/the-productive-programmer-on-mac/) to quickly whip up the book's website to find my review.
But what if I took more in-depth notes? These aren't linked to the review.
### Solution 1: one-time CSV export
In your Goodreads account settings, there's a well-hidden button called "export", generating a CSV file of your entire library. Great, we can use that to generate Markdown `.md` files that seamlessly blend into the Obisidan Vault. I simply resorted to a combination of `csvparse` and `ejs` to map each record to a template, generating files:
```js
csvParse(readFileSync(csvfile), {
columns: true
}).forEach(csv => {
ejs.render(templates.goodreadsMarkdown, { item: {
title: csv['Title'],
// ...
})
writeFileSync(`${outputDir}/${filename}.md`, mddata, 'utf-8')
})
```
The template itself is a combination of structured frontmatter and unstructured text as human-readable content:
```md
---
title: "<%- item.title %>, <%- item.author %>"
isbn: <%- item.isbn %>
rating: <%- item.rating %>
average: <%- item.average %>
pages: <%- item.pages %>
date: <%- item.date %>
---
# <%- item.title %>
By **<%- item.author %>**
## Book data
[GoodReads ID/URL](https://www.goodreads.com/book/show/<%- item.id %>)
- ISBN13: <%- item.isbn %>
- Rating: <%- item.rating %> (average: <%- item.average %>)
- Published: <%- item.year %>
- Pages: <%- item.pages %>
- Date added/read: <%- item.date %>
## Review
<%- item.review -%>
```
Where `item.review` is the most valuable data, although I also like Goodread's 5-star rating system.
This is essentially a one-time script. However, another problem arises: I keep on reading books, and I keep on adding their review on Goodreads. I don't want to periodically download a CSV file, say once a month. Can we do better?
### Solution 2: automatic RSS export
Yes we can! Goodreads luckily provides a personal RSS feed where your reviews automatically appear (click on any shelf, for example your _My Books: read_ shelf, making a tiny RSS icon appear on the botton right). Partially reusing the above template and code is exactly what I did, except instead of reading a CSV file, I fetched the RSS endpoint and parsed it using `got` and `fast-xml-parser`:
```js
const buffer = await got(rssendpoint, {
responseType: "buffer",
resolveBodyOnly: true,
timeout: 5000,
retry: 5
})
const books = parser.parse(buffer.toString(), {
ignoreAttributes: false
}).rss.channel.item
```
The only difference after that are the property names of the items in the `books` array. A few gotchas:
- The `user_date_added` property is formatted like `Sat, 13 Nov 2021 12:53:08 -0800` in RSS and `YYYY-MM-DD` in CSV
- The `user_review` property can contain HTML; convert `<br(.?)\/?>` to `\n`.
- The `title` and `author_name` properties can contain symbols that aren't compatible with your OS' filename requirements.
- How to determine which entries to parse in the RSS? I solved this by simply keeping track of the latest `book_id` entry; ignoring the rest.
- What to do when the file already exists---for instance, when I took digital notes in Obsidian before finishing the book and my review on Goodreads? Check with `existsSync` or similar.
Add the RSS export script to your `crontab` and you're good to go.
Success:
![](../obsidian-fuzzy.jpg)
Now I can auto-find and link my own reviews in Obsidian!