trends in job ads: the analysis

This commit is contained in:
Wouter Groeneveld 2023-11-30 09:35:44 +01:00
parent 69c038e38d
commit 9ecf0de457
3 changed files with 101 additions and 2 deletions

View File

@ -12,7 +12,7 @@ Almost every few weeks, a new and exciting retro hardware project is announced.
An FPGA or _Field Programmable Gate Array_ is a chip that can be "programmed in the field"---unlike a typical ASIC or _Application-Specific Integrated Circuit_ chip that is engineered to do only one thing. This means that if you get your hands on an FPGA board, you can flash the chip on it to act like a Game Boy. Or a Mega Drive. Or a Commodore 64. Or anything else you can possibly think of, within the limits of that particular board.
Xilinx, one of the most popular manufacturers of such chips, [sells development and evaluation kits](https://www.xilinx.com/products/boards-and-kits/device-family/nav-artix-7.html) at very acceptable prices, depending on the power of the chip you're after. If you're just looking to mess around with the FPGA concept, you can treat yourself with a very hackable [PYNQ board](http://www.pynq.io/). These boards come with all manner of trinkets---8 DMA channels, high-and low-bandwidth controllers, an FPA comparable to the above Artix-7, 512 MB DDR3---and are specifically built for rapid prototyping and teaching. We use them at our faculty to introduce students to various concepts of digital electronic design.
Xilinx, one of the most popular manufacturers of such chips, [sells development and evaluation kits](https://www.xilinx.com/products/boards-and-kits/device-family/nav-artix-7.html) at very acceptable prices, depending on the power of the chip you're after. If you're just looking to mess around with the FPGA concept, you can treat yourself with a very hackable [PYNQ board](http://www.pynq.io/). These boards come with all manner of trinkets---8 DMA channels, high-and low-bandwidth controllers, an FPA comparable to the above Artix-7, 512 MB DDR3---and are specifically built for rapid prototyping and teaching. We use them at our faculty to introduce students to various concepts of digital electronic design and hardware/software co-design.
Even if you're not that technical, I'm sure you've at least heard of the term "FPGA", as it has the tendency to pop up everywhere, especially in groups discussing retro computing. Why? Because you don't need a powerful FPGA chip to emulate an old embedded 8-bit CPU. In other words: slap an FPGA on a board, flash it to perfectly emulate your favorite retro hardware, and you've got yourself a beefed up Game Boy/Commodore/Whatever! A few examples of FPGA-enabled retro hardware projects I've encountered in the last few years:
@ -33,7 +33,7 @@ However, for other either more finicky (think CPU cycle timing, vertical/horizon
If you were to play the NES game _Mario Bros. 3_ on your official NES Mini, you would be playing it through a software emulator on an Allwinner R16 4x Cortex A7 ASIC system-on-a-chip with 256 MB DDR3 RAM running on 512 MB flash storage. That quad core is, in essence, a conventional mini-computer that's more than powerful enough to emulate 8-bit systems. But if you were to play _Mario Bros. 3_ using an FPGA core on the MiSTer or the Analogue Pocket, that software layer is absent: its hardware behaves exactly like the original NES did.
For those interested in slight latency differences of specific hardware systems, YouTube has got you covered, hosting a plethora of [MiSTer FPGA vs Mednafen emulation](https://www.youtube.com/watch?v=StqrUPawyJI)-alike videos, in this case taking a close look at PC Engine/TurboGrafx 16 software vs. hardware emulation. I honestly don't think I would notice the difference when playing.
For those interested in slight latency or image quality differences of specific hardware systems, YouTube has got you covered, hosting a plethora of [MiSTer FPGA vs Mednafen emulation](https://www.youtube.com/watch?v=StqrUPawyJI)-alike videos, in this case taking a close look at PC Engine/TurboGrafx 16 software vs. hardware emulation. I honestly don't think I would notice the difference when playing.
---

View File

@ -0,0 +1,99 @@
---
title: "Current Trends In Local Software Development Job Advertisements"
date: 2023-11-30T09:30:00+01:00
categories:
- programming
tags:
- job ads
---
As I once again find myself staring at local software dev job ads, I can't help but wonder: _what are the current trends in local software dev ads?_ In other words, can we identify patterns by data mining job ads? The answer is, of course, yes, but the results are disappointingly comparable with the last time I was flipping through ads, in 2014---nine years ago! Java programming jobs are still the most popular ones, followed by the .NET stack, and HR still thinks recurring buzzwords such as _agile_ and _teamwork_ should be heavily sprinkled throughout the ad descriptions. Let's take a closer look.
## Data mining approach
Instead of digging through company-specific `/jobs` pages, I decided to scrape from the aggregate giant `indeed.com`, except that scraping wasn't that easy as the whole site is a big ball of JavaScript mud---presumably to make scraping more difficult. Most [GitHub Indeed scraping projects](https://github.com/topics/indeed-scraping) stopped working because of that. I approached this semi-automatically by just pasting snippets of JS code in the browser, although WebDriver-alike headless browser steps could be taken to fully automate this. My IP was blocked after a few automated `curl` attempts: their rate-limiter is very aggressive.
The Indeed URL used was https://be.indeed.com/jobs?q=software+developer&l=hasselt&start=1 meaning I searched for "**software developer**" near (within 25 miles) location "**Hasselt**", near where I live. The results are paginated; fiddling with the start parameter (multiply by ten) flips to the next 15 results. Clickable small cards summarizing search results are presented to the visitor, but these do not contain the full ad description, rendering the data useless. Instead, we're interested in the `job_id` and have to find the rest of the info in a separate URL, https://be.indeed.com/viewjob?jk=jobid. Execute this JS:
```js
[...document.querySelectorAll('.tapItem')].map(itm =>
[...itm.classList].find(cls => cls.indexOf('job_') === 0)
.replace('job_', 'https://be.indeed.com/viewjob?jk=')
)
```
And simply click on each link by holding down `CMD` to open them in new tabs. For each tab, execute this JS:
```js
const job = {
title: document.querySelector('.jobsearch-JobInfoHeader-title').textContent,
location: document.querySelector('.jobsearch-CompanyInfoWithoutHeaderImage').textContent,
description: document.querySelector('#jobDescriptionText').textContent
}
job
```
Resulting in a JSON object containing the title, location, and full description. Right-click, select "copy object", and paste in your text editor, adding a comma (`,`) in-between results to form a giant result array. Fifteen minutes of mindless copy-pasting later, you've got yourself 80+ scraped jobs. Not bad. After that, it's a matter of finding occurrences of terms, but first, a few extra steps of data cleaning are needed:
1. Remove false positives. Mine contained titles like "business manager", "SAP consultant", or "product owner".
2. Remove additional "ad spam". Some descriptions like to cram in as much cool terms as possible, as if it would increase the "job ad SEO", muddling our results.
2. Enrich data if needed. I manually analyzed each one and added flags `recruitment` and `consultancy`.
Then, `jobs.filter(x => x.description.toLowerCase().indexOf(code) >= 0)` usually does what it needs to do.
## Results
In total, 68 of the 75 mined job ads were deemed relevant and included, of which a depressingly high number were either consultancy-based (33.8%) or posted via a specialized recruitment agency (32.4%). This means 66.2% of the jobs are not completely what they're pretending to be. We're off to a great start here.
Further programming language and general term usage trends are summarized in the following figure:
![](../jobtrends.jpg "Left: programming language trends (in occurrence %). Right: a word cloud of most used terms.")
A few highlights of the above bar chart:
- 50% of the job ads are java-based, of which the Spring framework is mentioned 33% and Hibernate 9%.
- The second most popular language stack is, unsurprisingly, .NET/C#, at 27%. That's a stark popularity drop, by the way!
- Python and Go jobs are few and far between.
- Only one ad falsely advertised Kotlin (a job as a "Technical Software Development Lead": "Experience with Android Studio / Java / Kotlin is a Plus"). None of them mentioned Rust, Elixir, Scala, or anything else exciting.
- Some more tech term matches:
+ `cloud`: 38% (`azure`: 22%, `aws`/`amazon`: 15%)
+ `sql`: 28% (`nosql`: 3%, `mongo`: 6%, `graphql`: none!)
+ `angular`: 24% (`react`: 19%, `vue`: 4%)
+ `AI`: 7%
The majority of the ads are tailored specifically for one (backend) programming language; I encountered none explicitly asking for a generalist. Some mention the whole stack; e.g. "The technology stack is Java-based docker containers running on Openshift" or "Development of advanced solutions on our new platform with the newest tools and technologies on a Google stack".
The word cloud---courtesy of [node-wordcloud](https://github.com/daidr/node-wordcloud)---is only partially usable as some job ads are posted in English while most are written in Dutch, meaning I'd have to translate them in order to group translations of the same word. Still, as you can see, English loanwords---the bulk of the industry's buzzwords---easily came floating on top. I had to cut out words smaller than 3 characters, numbers, and words matching against a custom ignore list for terms like "such", "with", "our", "from", "join", and so forth.
Interestingly, the term `creativ` (creativity, creative, the Dutch alternative _creativiteit_) was found in `19%` of all job ads, for instance:
> You have an analytical and creative mind and can get things done.
Or:
> Within our department, there is a lot of attention for innovation and creativity.
These sentences basically mean nothing and are just there to entice you to apply. Creativity is usually put forward as a requirement for problem-solving, but only mentioned in passing, although I was surprised that the occurrence percentage was that high.
Diagonally scanning each job ad description is very sobering. The [age of average effect](/post/2023/07/logo-convergence-design-mistakes/) is very much present here: all texts are written in the same obnoxious way, with company cultures and technical opportunities no doubt made to sound better and more interesting than they actually are. You can't derive this from the texts, but I used to work for quite a few of the companies that ended up in the result set, so let's call this anecdotal but relevant evidence.
### Compared to global trends
When we compare these results with Stack Overflow's [2023 Developer Survey](https://survey.stackoverflow.co/2023/#technology-most-popular-technologies), querying most popular technologies, Python and even TypeScript beats the crap out of Java (which is, at 30.55%, still more popular than C#, at 27.62%---even for the subgroup of professional developers). Admittedly, we're comparing apples with oranges here, as there's a big difference between a popular language and a language you have to work with on your day job. In addition, Stack Overflow also caters to data scientists and other hackers that might pass the survey's "professional developer" category.
AWS at 48.62% is much more popular than Azure at 26.03%, while in my findings, more jobs require you to work with Azure. Same thing for React: 40.58%, with Angular lagging (far!) behind at 17.46%. Another striking difference, as I found more jobs in my neighborhood for Angular.
Inspecting the [State of JS 2022 Libraries](https://2022.stateofjs.com/en-US/libraries/), this becomes even more painful, as Angular has an overwhelming amount of negative opinions and its adoption rate is slowing down. I guess we in Belgium---or at least the companies in my vicinity, probably building enterprise services---are quite slow to react (ha!). Although to be honest, I'm not sure how trustworthy this data is, since _State of JS_ also ranks PHP and even Rust above .NET/C# in the question "Which other programming languages do you use?". For reference, 17.5% respondents are from USA, and only 1.1% from Belgium. Stack Overflow lacks any meaningful demographics information.
Hacker News' [2023 Hiring Trend Analysis](https://www.hntrends.com/2023/june.html) contains graphs of top 10 programming languages. Since 2020, both Python and TypeScript are comfortably on top, with even Go coming in at third (!). Java sits somewhere at the bottom, and .NET/C# isn't even mentioned. Again, nothing about demographics, but considering AI is the most popular term/trend right now, while it's not in our ads, I'd say its majority is again USA.
Turning to academic papers, publications like _What Soft Skills Does the Software Industry Really Want? An Exploratory Study of Software Positions in New Zealand_ by Galster et al. (2022) categorize the soft side of the terms used in job ads, such as:
- `Communication`: 33%, `Team`: 20%. Well, duh, see the word cloud!
- `Analytical`: 15%
- `Creativ`: 1%. Ouch! Luckily, 17% of the ads were categorized under problem-solving.
---
In a 2023 Europe-specific report entitled the [State of Software Developer Nation](https://www.offerzen.com/reports/software-developer-europe#software_engineer_skills) by Offer Zen, Java is the third most used language, but Go and Python are the most wanted programming languages. I guess this confirms all of the above and can serve as a nice conclusion: we're all dying to program with tools that don't scream enterprise yet we're all stuck with job (ads) that require us to work in typical corporate software development environments that haven't really changed in the last ten years.

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB