bookkeeping tagfest

This commit is contained in:
Wouter Groeneveld 2024-01-15 09:53:04 +01:00
parent 5049c684a2
commit 34c827885c
2 changed files with 88 additions and 0 deletions

View File

@ -0,0 +1,88 @@
---
title: Semi-Automating Household Bookkeeping
date: 2024-01-15T09:00:00+01:00
categories:
- learning
tags:
- python
- bookkeeping
---
We've been keeping track of our household expenses on and off over the past decade---more _off_ than _on_ as it's yet another huge chore to pour everything in tables and extract useful information out of it. At first, we scribbled everything in a big expenses notebook, but after a year of frantically trying to write down every transaction, that simply became unmanageable. Then I resorted to Excel[^exc] and started from an exported bank statement log. Now I finally semi-automated that process, but what remains---tagging---still is a good chunk of manual labor.
[^exc]: I know there are many cool-looking command-based expense database tools out there, but my wife should also be able to easily see what's what, so Excel it is.
Bookkeeping your household expenses has to have a purpose. What questions are you trying to answer? Which patterns are you trying to uncover? For us, these are:
1. How much can we save on average each month?
2. How much do we spend on certain groups; for instance food and drinks or baby supplies?
3. How do the expenditures compare to previous years, taken into account life changes and index adjustments?
4. Are there any big discrepancies in expenses we overlooked?
5. Where can we make smart cuts?
The first question is easy to answer: add up every plus and minus and compare both for each month. The next question is another matter. To do that, I resorted to categorizing each expense into the following groups: `food/drinks`, `hobby`, `health`, `clothes etc`, `pets`, `once/year`, `utility`, `car`, `baby`, `house`, `other`. Depending on the name of the opponent and the statement itself on the bank transcript, this can be done automatically. For instance, if they contain the name of our veterinarian, it should be labeled `pets`, while if it's coming from Carrefour, Delhaize, or Bio-Planet---local supermarkets---it should be labeled `food/drinks`.
I used to label every single statement by hand, but that's just silly: the majority of the transactions, like your weekly shopping habits, are recurring expenses. So I whipped up a Python script that converts the downloaded export file form our bank into the format and layout I want. It relies on Pandas and [Xlsxwriter](https://xlsxwriter.readthedocs.io) and contains a dictionary of categories with arrays of possible regex matches to auto-label each row:
```python
import pandas as pd
df = pd.read_excel(file)
rows = []
for rownr, row in df.iterrows():
opm = enhance_description(row['statement'])
cat = '' if row['amount'] > 0 else guess_category(str(row['name']) + opm)
rows.append((row['date'], ['', row['date'].strftime('%d-%m'), str(row['name']), row['amount'], opm, cat, '']))
```
`enhance_description()` is there to add additional information to the description when it's missing, as our local post office loves to omit a sensible description. That was one of the most puzzling statements, with name `MRS 285100 3983`. What? Indeed. `guess_category()` checks whether or not the name or description contain certain keywords to automatically set the correct label. To easily sort on date, since I want a nice overview of each month, I resort to a tuple to temporarily store the date itself. Then, Xlsxwriter exports it:
```python
df = pd.DataFrame(columns = cols)
for newrow in rows:
df = df.append(pd.DataFrame([newrow], columns = cols))
writer = pd.ExcelWriter(filename, engine="xlsxwriter")
df.to_excel(writer, sheet_name="Sheet1")
```
You can even add extra conditional formatting and data validation:
```python
workbook = writer.book
worksheet = writer.sheets["Sheet1"]
green = workbook.add_format({"bg_color": "#BBFFBB"})
worksheet.conditional_format('E2:E999', {'type': 'cell', 'criteria': '>', 'value': 0, 'format': green})
worksheet.data_validation("G2:G999", {"validate": "list", "source": "=$I$2:$I$" + str(i)})
```
Where the cell range `$I$2:$I$` contains all possible categories I listed before.
Are we done now? Not quite.
**What about credit card settlements**? In Belgium, they appear for instance as "MasterCard XXXYYY" with a certain amount, once a month, and that's it. I don't know about you, but I can't remember what that `€59,89` was for, even a week or two after the purchase, and since it's not in the statement export, I had to manually enrich data by also downloading my monthly MasterCard summaries. The problem gets worse: that's a PDF. If we want to keep it simple, forget parsing. I try to avoid using my credit card anyway, and you'll have to make a decision whether or not to split the statement row into multiple rows depending on what you used your card for that month.
**What about Amazon-esque statements**? This includes Paypal and the Dutch Bol.com. Do you know what these are:
```
14/11 PayPal (Europe) S.a r.l. et Cie, S. -9,99
23/11 Amazon Mrktplc Luxembourg -143,43
```
Good luck trying to identify these. Again, you'll need more information: log into your PayPal/Amazon account to find out more. The first line turned out to be our Kobo eBook subscription, and the second one a bulk purchase of some Christmas gifts: the first one got labeled with `hobby`, the second with `other`.
Back to question number 2: after the labeling is done, for each month, expenses are grouped by category, and then averages for the entire year are calculated. That's all put into a summary sheet: (the numbers are fictional here)
![](../bookkeeping-sheet.jpg "An example of what a summary sheet for each category would look like.")
With a bit of simple conditional formatting, question 4 can also be answered, as it's quite clear that in the above example, in November we spent too much on our hobbies (`-€2000`). I also toss in a few useless graphs to visualize our graudal spending in the `food/drinks` category as the year progresses, and a bar chart to see which category is the most expensive (not depicted). Remember, if your `other` column has big numbers, you're probably in need of a new category.
In 2023, we spent about 15% of our income on `food/drinks` which is exactly as much as in `utility` (including the mortgage). On average, our two cats and dog cost `€230` a month, a surprisingly high number since one of the cats and the dog reached senior age and started having health issues. You could go even further and make subcategories: for example right now I don't know how much of that 15% comes from restaurants, but that's easy to fix with additional filters.
---
This might seem like yet another nerdy numbers project, but the yearly Excel file does give us a lot of reassurance. We can _very_ quickly calculate how much of a dip our savings will take if our income would take a hit. We precisely know how much it takes financially to raise our daughter (a lot). We can relate yearly indexation changes to our absolute expenses and see if this percentually holds up compared to the previous years. We know that owning a car cost more than €300 a month last year.
If you don't already have a household bookkeeping system in place, I highly recommend that you do so as soon as possible, starting with the simplest possible dataset: an export of your monthly bank statements. For us, that was simple: we only have one shared current account.
Happy tagging!

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB