4.8 KiB
title | date | categories | tags | ||
---|---|---|---|---|---|
Exporting Goodreads to Obsidian | 2021-11-19T08:53:00+01:00 |
|
|
Writing a short review after reading a book became a soothing ritual for me ever since discovering LibraryThing in 2011. Two years later, Goodreads attracted more and more attention, causing me to jump ship. Most non-fiction books I read produce many analog notes, ending in my personal knowledge management (PKM) system---of which Obsidian has become a permanent and valuable addition. The digital review posted on Goodreads is both interesting for my later self and for my friends.
The problem is, my Goodreads reviews do not make it into my PKM system. All analog notes that get scanned in---including reading notes---are tagged, and thus searchable. But the Goodreads reviews are on Goodreads; an external service beyond the Brain Baking domain. IndieWeb folks' eyes might start twitching now, proclaiming to use POSSE instead of the PESOS I did not even employ for Goodreads: Publish Elsewhere, Syndicate to your Own Site.
That means pressing ⌘P
in Obsidian to apply a fuzzy match on a part of a book title results in no matches. There goes my attempt to build a working memory extender. The other day, I knew I've read a book but couldn't remember whether I liked it or not. I usually resort to a gr
Alfred workflow to quickly whip up the book's website to find my review.
But what if I took more in-depth notes? These aren't linked to the review.
Solution 1: one-time CSV export
In your Goodreads account settings, there's a well-hidden button called "export", generating a CSV file of your entire library. Great, we can use that to generate Markdown .md
files that seamlessly blend into the Obisidan Vault. I simply resorted to a combination of csvparse
and ejs
to map each record to a template, generating files:
csvParse(readFileSync(csvfile), {
columns: true
}).forEach(csv => {
ejs.render(templates.goodreadsMarkdown, { item: {
title: csv['Title'],
// ...
})
writeFileSync(`${outputDir}/${filename}.md`, mddata, 'utf-8')
})
The template itself is a combination of structured frontmatter and unstructured text as human-readable content:
---
title: "<%- item.title %>, <%- item.author %>"
isbn: <%- item.isbn %>
rating: <%- item.rating %>
average: <%- item.average %>
pages: <%- item.pages %>
date: <%- item.date %>
---
# <%- item.title %>
By **<%- item.author %>**
## Book data
[GoodReads ID/URL](https://www.goodreads.com/book/show/<%- item.id %>)
- ISBN13: <%- item.isbn %>
- Rating: <%- item.rating %> (average: <%- item.average %>)
- Published: <%- item.year %>
- Pages: <%- item.pages %>
- Date added/read: <%- item.date %>
## Review
<%- item.review -%>
Where item.review
is the most valuable data, although I also like Goodread's 5-star rating system.
This is essentially a one-time script. However, another problem arises: I keep on reading books, and I keep on adding their review on Goodreads. I don't want to periodically download a CSV file, say once a month. Can we do better?
Solution 2: automatic RSS export
Yes we can! Goodreads luckily provides a personal RSS feed where your reviews automatically appear (click on any shelf, for example your My Books: read shelf, making a tiny RSS icon appear on the botton right). Partially reusing the above template and code is exactly what I did, except instead of reading a CSV file, I fetched the RSS endpoint and parsed it using got
and fast-xml-parser
:
const buffer = await got(rssendpoint, {
responseType: "buffer",
resolveBodyOnly: true,
timeout: 5000,
retry: 5
})
const books = parser.parse(buffer.toString(), {
ignoreAttributes: false
}).rss.channel.item
The only difference after that are the property names of the items in the books
array. A few gotchas:
- The
user_date_added
property is formatted likeSat, 13 Nov 2021 12:53:08 -0800
in RSS andYYYY-MM-DD
in CSV - The
user_review
property can contain HTML; convert<br(.?)\/?>
to\n
. - The
title
andauthor_name
properties can contain symbols that aren't compatible with your OS' filename requirements. - How to determine which entries to parse in the RSS? I solved this by simply keeping track of the latest
book_id
entry; ignoring the rest. - What to do when the file already exists---for instance, when I took digital notes in Obsidian before finishing the book and my review on Goodreads? Check with
existsSync
or similar.
Add the RSS export script to your crontab
and you're good to go.
Success:
Now I can auto-find and link my own reviews in Obsidian!