two interesting use cases for llms

2024-02-20 21:06:10 +01:00 · 2024-02-20 21:06:10 +01:00 · c84ac6405f
parent b97352ca5f
commit c84ac6405f
3 changed files with 67 additions and 0 deletions
--- a/content/post/2023/04/is-your-website-training-ai.md
+++ b/content/post/2023/04/is-your-website-training-ai.md
@ -3,6 +3,8 @@ title: "Is Your Website Training AI?"
 date: 2023-04-23T14:45:00+02:00
 categories:
  - webdesign
+tags:
+  - AI
 ---

 The answer will inevitably be _yes_. Both [Jan-Lukas](https://jlelse.blog/links/2023/04/secret-list-of-websites) and [Ton](https://www.zylstra.org/blog/2023/04/how-many-tokens-from-your-blog-are-in-googles-llm/) wrote about the Washington Post's [Inside the secret list of websites that make AI like ChatGPT sound smart](https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/) today. The Post investigated a massive dataset called the English [Colossal Clean Crawled Corpus](https://www.semanticscholar.org/paper/Documenting-the-English-Colossal-Clean-Crawled-Dodge-Sap/40c3327a6ddb0603b6892344509c7f428ab43d81?itid=lk_inline_enhanced-template) or C4, which drives language model AIs such as Google's T5 and Facebook's LLaMA. OpenAI's ChatGPT presumably uses forty times as much data but OpenAI doesn't want to disclose where it comes from. 
--- a/content/post/2024/02/two-interesting-use-cases-for-llms.md
+++ b/content/post/2024/02/two-interesting-use-cases-for-llms.md
@ -0,0 +1,65 @@
+---
+title: "Two Interesting Use Cases For LLMs"
+date: 2024-02-20T20:11:00+01:00
+categories:
+  - braindump
+tags:
+  - AI
+---
+
+I've openly proclaimed my dislike for current trends in AI and Large Language Models (LLMs) before: it's being misused to [genereate crap to put on the internet](/post/2023/04/is-your-website-training-ai/) and the availability of hallucinated crap [makes students' learning painfully worse](/post/2023/10/chatgpt-is-worse-for-students-than-stack-overflow/). So I've been wondering: can these ChatGPT-like systems be put to any _real_ use? I think the answer is _yes_: here are two possibly interesting use cases. 
+
+## Case 1: Personal Knowledge Management
+
+I have [sixteen analog notebooks](/post/2023/03/creating-journals-that-last/) digitized in an Obsidian vault of which each linked note file has been painfully tagged by hand to quickly find related information, as OCR software still isn't capable of decrypting my arcane scribbles. Heavy linking and creating ideas at the intersection of those links in true Zettelkasten style aside, wouldn't it be awesome if I could locally spin up a LLM system that ingests my whole knowledge base, significantly easing (1) my ability to find my own notes and (2) querying that database in a natural way?
+
+Are you thinking what I'm thinking, Tony Stark?
+
+![](../jarvis.jpg)
+
+> Jarvis, which blog post ideas in the past month aren't written yet? Jarvis, what goes in my secret gingerbread spice mix again? Jarvis, what were my thoughts on project x again?
+
+Wouldn't that be great? Things I most often look for in my Obsidian vault are:
+
+- Recipes: exact spice mixture percentages, comparisons of waffle recipes, etc;
+- TODOs and blog post idea templates;
+- Specific information on technologies I've once used but forgotten;
+- Photos by tag;
+- Contact information of people that I haven't stored in CardDAV yet;
+- Summaries and mind maps of e.g. books or papers.
+
+Looking for these things sometimes takes too much time because of my poor tagging skills, and it's become increasingly hard to excessively link notes because of the ever increasing amount of notes. On top of that, I sometimes can't read my own notes (bad scans, bad handwriting, too little context that I forgot, ...). LLMs could potentially solve all these issues. 
+
+Of course the biggest concern here is **privacy**. I have zero interest in feeding my private notes to an existing system maintained by Silicon Valley tech boys, meaning I'd want to run and train it locally. I don't want to make the required free time available to experiment with this as I know it'll take a lot of fiddling and the bottleneck still is the analog to digital conversion as textual input still is minimal (no OCR and limited tags). And no, I won't switch to digital note-taking.
+
+I guess the Private Jarvis dream will stay a dream for now.
+
+## Case 2: Philosophical research
+
+I'm interested in philosophy, but it's a challenge to find what I'm looking for in ancient works. An example: I want to know what certain philosophers' stances are on laziness, nostalgia, whatever. I have Montaigne's _Essais_ on my shelf but the index doesn't provide an answer right away. A digital version enables `CMD+F`-ing to quickly find stuff but that still isn't very productive. So I thought: what if we could ask these questions through LLMs or a specialized model that is exclusively fed philosophical works? Won't that make it easier for me to find e.g. what Schopenhauer thinks of leisure time? Internet search engines do a poor job answering these questions nowadays.
+
+I took https://anonchatgpt.com/ and https://www.bing.com/chat to the test. Input:
+
+> What does Schopenhauer think about poker playing?
+
+Output, CoPilot:
+
+> Arthur Schopenhauer, the German philosopher, had some rather scathing views on card-playing, including poker. In his work “The Wisdom of Life and Other Essays,” he expressed the following opinion: “Hence, in all countries the chief occupation of society is card-playing, and it is the gauge of its value, and an outward sign that it is bankrupt in thought. Because people have no thoughts to deal in, they deal cards, and try and win one another’s moneyIdiots!”
+
+Output, ChatGPT:
+
+> Schopenhauer believed that poker, like all forms of gambling, is a futile and reckless pursuit driven by the desire for quick and easy wealth. He saw it as a manifestation of the human tendency to seek pleasure and avoid pain without regard for the consequences. Schopenhauer would likely view poker playing as a form of escapism and distraction from the harsh realities of life, ultimately leading to disappointment and suffering.
+
+At least CoPilot adds references (GoodReads & Wikipedia). Both forget to add that Schopenhauer himself was a big poker player, earning a good buck with the game he thought was for empty-headed losers. Or did he? When I ask CoPilot this, the AI denies it, but other AIs confirm it. As I have yet to work my way through all of his works, I don't really know which LLM engine is hallucinating and which one is telling the truth. 
+
+I've been discussing the possibilities of using LLMs to engage in a philosophical conversation with [Gibru](https://elusivewordsmith.com/) who's more at home when it comes to new (self-hosted) LLM tools, and while he thinks it can be a good way to bootstrap your philosophical thinking mode, he also warned me about the hallucination effects. 
+
+That means my use case also falls through: I wanted to use LLMs to quickly _find facts_, not have a discussion---although that might also be interesting just to get the ball rolling. Since I don't know the facts yet, I have to believe every answer the system provides. Additionally, only when I run these locally, I can train them with the original sources, not with second-hand Wikipedia-style "sources".
+
+---
+
+What would Epicurus think of the Nintendo Switch if he would be alive today?
+
+> The immersive gaming experience, captivating worlds, and engaging gameplay could align with Epicurus’s emphasis on seeking pleasurable experiences.
+
+Yeah right. That sounds like yet another misinterpretation of Epicurus' principles of ataraxia. I guess the two interesting use cases will stay just that: interesting philosophical thinking exercises. 
--- a/static/post/2024/02/jarvis.jpg
+++ b/static/post/2024/02/jarvis.jpg