Roughnecks in the Machine

February 23, 2024

Data Annotators are the Roughnecks of AI

Then

Now

Staff of Juma work on computers at the office in Lagos, Nigeria, on January 20, 2020. Picture taken January 20, 2020. REUTERS/Temilade Adelaja

As I noted in a recent post, in 2017, The Economist magazine said “data” was the oil of the digital era. If that metaphor is accurate, then today’s AI models, such as ChatGTP, are equivalent to 1970s-style gas guzzlers—sucking up massive amounts of data in order to run. In the “old” economy, roughnecks worked physically dangerous jobs to drill for oil. In the “new” economy, digital roughnecks work mentally and emotionally dangerous jobs to enrich, annotate and moderate the data on which AI models are trained—and without which they couldn’t run.

There are a number of service providers who employ these digital roughnecks (a.k.a. “data annotators”) who work on the data pipeline for the biggest tech companies in the world working on the most ambitious AI projects, such as Google, Meta and OpenAI. My friend and former colleague Matthew McMullen has written an excellent piece on the travails of these new digital roughnecks. As Matthew points out, just like the original oil boom, data annotation is big business: projected to reach $43 billion by 2030.

Treading dangerously close to the recent dispute between Gary Marcus and Geoffrey Hinton about whether large language models (LLMs) “understand” language, Matthew notes

While AI can imitate human discernment, thinking, and processes, its performance hinges on the quality of the data it receives. Let’s dispel the illusion of AI’s brilliance—it leans heavily on human-crafted data. … Large language models (LLMs) may generate seemingly genuine information, but this façade can crumble when faced with deceptive inputs, demonstrating AI’s lack of true cognition and emphasizing the importance of human intervention in its training.

As with the roughnecks of yore, today’s data annotators may face challenging working conditions and exploitative wages. For example, one annotation service provider recently found itself embroiled in a number of disputes with its employees over working conditions. Employees claimed that the content moderation done for Meta and OpenAI (1) exposed them to graphic and disturbing content that caused them psychological harm, (2) without adequate mental health support, and (3) for less money than they were promised.

According to Matthew, one way in which data annotation vendors maintain their leverage over workers is by siloing ratings, reviews and feedback:

Data vendors maintain strict control over workers’ ratings, reviews, and feedback, according to Seats2Meet.com’s Martijn Aretsin “Research on Platform Based Reputation Scores Contributes to an Inclusive Labor Market.” Such monopolized ownership of information deprives workers of their professional reputation and can hinder their mobility and career advancement. If workers decide to transition away from their current employment, they would effectively be starting from scratch, losing all the reputation and credibility they built over time.

Next Step Foundation is trying to address this problem through our Nikkoworkx data annotation training platform. Specifically, we envision Nikkoworkx as a kind of “GitHub for Annotators,” in which annotators can create and maintain a profile of their previous work. This should include both accuracy (how accurately an annotator completes a task) and speed (how quickly an annotator completes a task). Nikkoworkx should also be able to enable annotators to rate and review the VENDORS so annotators know who pays well and on time. Such a system would benefit both annotators and vendors, providing vendors with a searchable database of annotators based on a range of demographic attributes, along with geography, task proficiency, availability, and price (e.g., a vendor who is interested in highly accurate annotations, but less concerned about turnaround time, could sort candidates along those criteria and select annotators who may charge more per task).

About the Author

Dr. Christopher Harrison, our Executive Chairman, is a renowned copyright, technology, and antitrust expert. He has leveraged his expertise to drive social impact through the Next Step Foundation (www.nextstepfdn.org).

Roughnecks in the Machine

Leave a Reply Cancel reply

More Stories

Meet Karen

Meet Kevin

Our Partners