Fake Data Science Jobs
TLDR
Some so-called "data science" jobs you've come across might seem odd. These roles often focus on data labeling, a repetitive task that involves assigning "labels" to data. This process can take various forms, but in this case, it revolves around labeling, comparing, and ranking code based on its quality. It's crucial for training AI systems, such as large language models (LLMs), by providing high-quality labels to improve their coding abilities. Once companies hire these "data scientists," they have them label and rank code or LLM outputs, then sell this labeled data to businesses training AI models.
Weird Job Offers
If you work in a field related to data science/analysis, you might have encountered some peculiar job offers. These job titles usually contain the word "data" and mention the task of doing data analysis, but they are far from what we usually mean by data analysis (ETL, database, modeling, A/B testing, dashboards, etc.).
Let's see what these job offers look like:

Interesting things to note:
- Looking for multiple persons (use of plural)
- No idea of what the projects are about or the kind of clients
- Focus on being able to express one's reasoning and logic
- Mention of "AI" while the job doesn't seem related to AI on technical aspects
I Applied
Since I was very curious, I applied to the job offer to understand more about it. I then filled in the basic info they asked for. And then, for some reason, they really wanted me to work with them:

They kept saying in their messages that I only had a few days to do their tests, and otherwise I would be removed from the recruiting process. What's even more interesting is that they didn't know much about my skills, but they still sent lots of automatic messages, meaning I'm probably not the only one in this case.
Other Job Offers
Let's have a look at different job offers, but from other companies to highlight the nonsense.



The Actual Job
I found a platform where it was easy to access the tasks and tried a few exercises (I even made $0.54). And as I said before, the tasks are nothing close to data analysis. Here is the description of the project I worked on:
The code they are talking about was related to data science, but was so vast it does not make any sense to look for "data analysts" to assess it. The first question was about using AutoML and Google Cloud, while the second question was about backend in Go.
There is lots of text to read (assignment + compare 2 LLMs output on long tasks), with very specific questions such as:

This might be only personal, but I find this kind of task very boring while requiring lots of focus. You lose all the fun parts about programming (aka solving a problem).
The Problem
One could argue that there is no problem with this kind of job. And in itself, it's true. It offers more job opportunities to more people. But the fact that they suggest that this is a real data science position can lead to some surprises.
These jobs will not help you get a job as a data analyst, nor get related skills. You will not work on real projects, but rather work on some sort of "meta data analysis", which consists of labeling and ranking tasks related to data analysis.
Another issue I see here is that they are not very explicit about what the job is about, and how it works. If you want to have a career in the data world, these jobs are probably the last ones you want to apply to.
Will This Make AI Better?
Probably not. As I mentioned earlier, the company's main goal is to sell the labeled data to other businesses. My opinion is that since these jobs are far from enjoyable, skilled developers are unlikely to be interested. Instead, the work will likely be done by people who, unfortunately, haven't been able to secure more desirable positions. Offering high pay could attract better talent, but the best rates I've found are around $50 per hour—and even those are rare and require passing multiple tests.
Feedback
Having a different opinion? A nuance to bring? A question to ask? Please share it!
I'm always looking for feedback. The best way to share your thoughts is to open an issue on the GitHub repository of the site.