implementing searching in static websites

2022-08-04 11:45:03 +02:00 · 2022-08-04 11:45:03 +02:00 · b24001dae7
parent afb12ca476
commit b24001dae7
3 changed files with 89 additions and 1 deletions
--- a/content/notes/2022/08/03h08m44s51.md
+++ b/content/notes/2022/08/03h08m44s51.md
@ -3,7 +3,7 @@ date: 2022-08-03T08:44:51+02:00
 context: "https://roytang.net/2022/08/twenty-years/"
 ---

-Excellent summary Roy, cheers! I dug around in your archives and discovered you were into MtG [way back in 2001](https://roytang.net/archives/ancient/tripod/ffmagic/)---that's exactly the same year as I started playing! Since you regularly post updates on your digital Arene grinds, I was wondering how you migrated from analog to digital MtG. I only play with "the real stuff", but as a consequence, I regularly have trouble finding buddies to play with. 
+Excellent summary Roy, cheers! I dug around in your archives and discovered you were into MtG [way back in 2001](https://roytang.net/archives/ancient/tripod/ffmagic/)---that's exactly the same year as I started playing! Since you regularly post updates on your digital Arena grinds, I was wondering how you migrated from analog to digital MtG. I only play with "the real stuff", but as a consequence, I regularly have trouble finding buddies to play with. 

 Since discovering Commander, I much prefer playing it like that: more chaos and politics, more crazy cards, and it's not always the player with the most expensive deck that wins. Most of my stuff is geared towards a budget anyway. 

--- a/content/notes/2022/08/03h11m10s41.md
+++ b/content/notes/2022/08/03h11m10s41.md
@ -0,0 +1,10 @@
+---
+date: 2022-08-03T11:10:41+02:00
+context: "https://fundor333.com/social/2022/08/03/1659516036/"
+---
+
+Fundor 333 asked:
+
+> In your opinion something like Gitea with a syndication like Mastodon will solve some of the problems and move more people on this “Gitea with Syndication”?
+
+I'd answer: yes and no. Yes, it will solve some problems---hopefully more easy collaboration across different instances. With GitHub, that's not a problem, provided that everyone uses GitHub. And No, I don't think it will move more people towards Gitea, since syndication and self-hosting are usually two "complicated" solutions. Note that I didn't say complex. Most people will still find it too troublesome to move. Just look at Mastodon VS Twitter. The `@user@mastodoninstance` thing already trips most people up. 
--- a/content/post/2022/08/implementing-searching-in-static-websites.md
+++ b/content/post/2022/08/implementing-searching-in-static-websites.md
@ -0,0 +1,78 @@
+---
+title: Implementing Searching In Static Websites
+date: 2022-08-04T10:59:00+02:00
+categories:
+  - webdesign
+tags:
+  - hugo
+  - searching
+---
+
+In my monthly [July 2022 overview](/post/2022/08/july-2022) write-up, I wrote:
+
+> This website got a new search engine! The baked archives page used to be powered by Lunr.js, which has been replaced by Pagefind.app. I guess this is worth its own blog post, I’ll save the details for later.
+
+It's time for those juicy details. 
+
+Last month's first HugoConf revealed many interesting JAMStack-related tooling to boost your statically generated blog. For the uninitiated, a "JAMStack" is a _JavaScript, API, and Markup stack_ that (almost) enables static websites to be just as dynamic as true blogging engines such as Wordpress. For example, [a Webmention-based commenting system](/post/2021/05/beyond-webmention-io/) with a queryable API, a few pre- and post-processor scripts like [YouTube link to image converters](/post/2021/06/youtube-play-image-links-in-hugo/), or, **search functionality**. 
+
+One of those new search tools mentioned during the conference is [Pagefind](https://pagefind.app/). Since I was looking into throwing out [Lunr.js](https://lunrjs.com/) anyway, it was a good opportunity to try out new things. The result is the simple but very fast [search bar in the /archives page](/archives).
+
+How do these tools work? 
+
+1. You generate some content in Markdown. Your static site processor, in my case Hugo, converts it to static HTML, ready to be served to visitors. 
+2. A script needs to be run to create **an index** of your content---either by processing the `.md` source, or the `.html` target. The result is usually a fairly large `.js` file. 
+3. On a search page, you include 2 `<script/>` tags: the index file and the tool that uses it. Users that enter a query search **client-side** in JS code, as opposed to submitting a real form like in search engines or with Wordpress. 
+
+The problem with step 2 is the index file itself, as it can quickly grow in size. Furthermore, I automatically checked in the changes to the `brainbaking-index.js` file, needlessly convoluting the git repository. Even with gzip-compression in mind, I found Lunr.js not to be the best approach. 
+
+Instead, Pagefind uses **fragmentation**. It never requires the inclusion of a single huge index file, but rather a tiny JS file (`8.05 kB`), that only fetches a minimal index file after you start typing (`45.66 kB`), and for each result to be displayed (usually limited to five), fetches a _fragment_ of the indexed content (between `5` and `8.5 kB`), and, optionally, a (currently non-optimized) thumbnail. The result is a blazing fast search-as-you-type system that's still self-hosted, highly optimized, and doesn't require a page submit.
+
+Try it out yourself [at /archives](/archives).
+
+There are a few obvious disadvantages of using Pagefind. For one, it's very bleeding edge, currently at [version 0.5.3](https://github.com/CloudCannon/pagefind/releases). Custom placeholder text, proper internationalization support, and more custom options are currently missing, but it is possible to use the lower-level API and come up with something cool yourself. I took a stab at it but decided that most of the default stuff is just fine.
+
+The other downside is that it still requires you to run another executable---this is the JAMStack part, so to speak---after Hugo is done generating. I have a simple shell script that is triggered every hour:
+
+```sh
+#!/bin/bash
+
+sites=( brainbaking jefklakscodex redzuurdesem )
+export WEBMENTION_TOKEN="supersecret"
+
+echo "building at $(date)... with $1"
+
+for site in "${sites[@]}"
+do
+	echo "building site $site"
+	cd /var/dev/$site
+	git reset --hard
+	RESULT=$(git pull | grep 'Already up to date')
+	if [[ -z "$RESULT" ]] || [[ $1 == "--force" ]]
+	then
+		/usr/local/bin/hugo --cleanDestinationDir --destination docs
+		/usr/local/bin/pagefind --source docs
+		rsync --archive --delete docs/ /var/www/$site/
+		yarn install
+		yarn run postdeploy
+	else
+		echo "nothing to do for $site"
+	fi
+done
+echo "done building."
+```
+
+This boils down to:
+
+1. Execute `hugo`, dump HTML output in `docs/`
+2. Execute `pagefind`, scour through `docs/` and dump index/JS/fragments in there as well
+3. Copy over new files using `rsync` to the deployed location for Nginx to pick up
+4. Run `yarn` for an optional post-deploy step. This contains webmention sending.
+
+Pagefind is a Rust self-contained binary, but I had to install it from source for my MacBook as there's no released ARM64 artifact available. You do have to install it as well on your web server---although that is optional: you can also run the `pagefind` command locally and simply check in all changes. I did that before with Lunr.js, but do not recommend it: every slightest change of your blog triggers a commit of the index file.
+
+---
+
+Is all this trouble worth it? I'm not sure. [Rubenerd's Archives page](https://rubenerd.com/archives/) resorts to another technique: simply let a _real_ search engine do the searching. By embedding a DuckDuckGo `<form/>`, you delegate all the above to another party, decreasing the complexity of your build pipeline and website theme code. It's worth noting that this alternative works _even with JavaScript disabled in the browser!_ I had to put in a `<noscript/>` tag to bring JS-haters bad news: they can't search.
+
+On the other hand, DuckDuckGo doesn't immediately index new posts, and you still route users away from your site with a form submit. In the end, Ruben's approach is probably the easiest, albeit the less immersive option. You'll have to decide for yourself whether or not to go for it. I still like Pagefind's relative simpleness and even [implemented it at my other sites](https://jefklakscodex.com/tags/) that didn't have a search option before.