in academia, we write throwaway code

This commit is contained in:
Wouter Groeneveld 2023-11-10 15:21:42 +01:00
parent 0dbbe26f7d
commit 554887edaa
1 changed files with 44 additions and 0 deletions

View File

@ -0,0 +1,44 @@
---
title: "In Academia We Write Throwaway Code"
date: 2023-11-10T14:07:00+01:00
categories:
- programming
tags:
- code quality
- academia
---
Something scary dawned to me recently after having peeked into several source code repositories of funded projects here in academia. Most of these repositories contain code that would instantly make my software developers ex-colleagues sick---and by sick, I mean suddenly-needing-a-bucket sick. What gives?
In academia, we write throwaway code. There's nothing inherently wrong with that. For most projects, that is. For some of these that we want to keep in the air for just a little while longer, where the adoption rate gradually increases, the scope is ultimately stretched out beyond the lifetime of a typical project. In that case, leaning on something called _clean code_ to increase maintainability is no longer a luxury: it's a requirement. Sadly, that's never taken into account during the development of these prototypes---because that's what these are.
The code I write for my research has the following purposes:
1. It queries and stores data. Bespoke surveys with specific drag-and-drop components fall into this category. In case I help out an engineering or computer scientist colleague: it produces data.
2. It manipulates, aggregates, and summarizes data. The program and its output is mostly used once, as part of the publication process.
3. It presents data. Simple websites to help disseminate the results are put online, sometimes also as appendix to a paper.
Presenting data doesn't really come with domain logic. Manipulating and producing data is usually a one-time job where hacking together a few Python scripts can be called good enough. Then it makes sense to question the whole test-first approach: if all there's to do is a few `map()`, `join()`, and `filter()` functions slapped together that will only be used once as part of a short-lived project where the budget is---as always---too tight, then who cares about re-usability?
Sometimes though, it can happen that the query thing might get expanded upon to query a different population in a follow-up project, or even to be commercialized in hopes of getting sold to generate a bit more much-needed research funding. Revisiting that source code, which in academia happens years later thanks to painfully slow lead times, then becomes very painful. Not only because the code is throwaway code, but also because chances are that project was passed on to you, as its creator probably left academia. Congrats, you're now the proud maintainer of something unmaintainable! Now what?
Good question, and honestly, I don't know the answer. If time and budget allows for it, trash the whole thing and re-write it properly, taking into account it's now a piece of software that should outlive the research project. Unfortunately, time and budget never allows for it. Now imagine the code we write in academia isn't throwaway code, but instead properly tested, easily maintainable, clean code (lol). If that was the case, the above problem would never surface....
This again touches upon a lot of philosophically oriented programming questions I've been struggling with during my career as a software engineer:
- [When is it appropriate not to unit test?](https://softwareengineering.stackexchange.com/questions/66480/when-is-it-appropriate-to-not-unit-test)
- Is it fine not to unit test prototypes? [In an SE-Radio episode Kent Beck says yes](http://www.se-radio.net/2010/09/episode-167-the-history-of-junit-and-the-future-of-testing-with-kent-beck/).
- Should you unit test R? [Some data scientists do it](https://towardsdatascience.com/unit-testing-in-r-68ab9cc8d211), (most) others don't. ([I Pity The Fool Who Doesn't Write Unit Tests](https://blog.codinghorror.com/i-pity-the-fool-who-doesnt-write-unit-tests/)!)
- Most agree that [writing tests for one-time scripts](https://www.quora.com/As-a-software-engineer-would-you-add-unit-tests-for-one-time-scripts) is unnecessary. Except that these "one-timers" might become "multiple-timers", and retroactively injecting quality becomes next to impossible...
It seems that I'm not the only one struggling with this. [Ben at Stack Exchange](https://academia.stackexchange.com/questions/21276/best-practice-models-for-research-code) explains:
> I sometimes feel as though my industrial experience has been a hindrance in my research, as the goals of writing software in a research context feel contradictory to the goals in industry, [where] code needs to be (ideally): maintainable, bug-free, refactored, well-documented, rigorously tested---good quality---best practice says that these things are worth the time (I agree).
He continues:
> In academia, the goal is to write as many quality research papers in the shortest possible time. In this context, code is written to run the experiment, and might never be looked at again (we are judged on our papers, not our code). There seems to be no motivation to write tested, maintainable, documented code---I just need to run it and get the result in my paper or whatever ASAP. Consequently, the "academic" code I've written is poor quality, from a software engineering perspective.
The first person replying throws in a hilarious link: [why do so many talented scientists write horrible software?](https://academia.stackexchange.com/questions/17781/why-do-many-talented-scientists-write-horrible-software) Exactly. [Researchers are not programmers](https://danielhnyk.cz/academia-people-are-terrible-programmers/) (and in my experience also lousy programming teachers). But wait. I've been a programmer for 11 years. I've also been a researcher for the last 5 years. Am I slowly becoming one of those horrible software developers?
Wow. This post has just taken a turn for the worse...