OpenAI’s ChatGPT is an impressive tool, but like many impressive technology products, it has a dark side. A recent Time investigation found that OpenAI used outsourced Kenyan labourers earning less than $2 USD (about $2.67 CAD) to help make ChatGPT less toxic.
The Time piece is long but well worth the read — you can find it here. Time reports that ChatGPT’s predecessor, GPT-3, was a tough sell because it was prone to blurting out violent, sexist, and racist remarks. (ChatGPT is based on GPT-3.5.) The main reason GPT-3 was so toxic was that OpenAI trained it using the internet. One the one hand, the internet is a vast repository of human language. On the other, it’s chock-full of awful content — content picked up in the training of tools like GPT-3. To solve the problem, OpenAI pursued building an AI-powered safety mechanism to stop its chatbots from regurgitating the toxic material.
Time reports that OpenAI took a page out of Facebook’s playbook, since the company had already shown it was possible to build AI-powered tools to detect toxic language like hate speech. However, instead of detecting toxic language to remove from a social media platform, OpenAI needed to scrub it from its training data.
To build that AI system, OpenAI needed to label different types of toxic speech to train the AI on. Enter Sama, a San Francisco-based firm that employs workers in Kenya, Uganda, and India to label data for Silicon Valley clients like Google, Facebook’s parent company Meta, and Microsoft. Sams bills itself as an “ethical AI” company.
Starting in November 2021, OpenAI sent tens of thousands of text snippets to Sama that seemed pulled straight from the darkest recesses of the internet. Per Time, some of it described child sexual abuse, bestiality, murder, suicide, torture, self-harm, and incest in graphic detail. Sama paid data labellers a take-home wage of between $1.32 and $2 USD per hour, depending on seniority and performance.
It’s worth noting that OpenAI doesn’t disclose the names of its outsourcing partners, and it’s not clear whether OpenAI used other data labelling firms alongside Sama for the project. OpenAI did confirm in a statement to Time that Sama employees in Kenya contributed to its toxic content detection tool that eventually became part of ChatGPT. Moreover, OpenAI stressed that the work was a “necessary step in minimizing the amount of violent and sexual content included in training data.”
According to documents reviewed by Time, OpenAI signed three contracts with Sama in late 2021. In total, the contracts were worth about $200,000 USD (roughly $267,309 CAD). However, the traumatic nature of the work eventually resulted in Sama cancelling its work for OpenAI in February 2022, eight months earlier than planned.
It’s worth noting that ChatGPT’s popularity has been a massive boon for OpenAI, spurring things like a multibillion-dollar investment from Microsoft. OpenAI is even aiming to launch a roughly $56/mo CAD ‘Professional’ tier of ChatGPT.
Some Sama workers spoke anonymously with Time about the work, with one describing some of the content viewed as torture and mentally scarring. Employees were entitled to attend sessions with “wellness” counsellors, but those that spoke with Time said the sessions didn’t help. Moreover, high productivity demands meant the sessions were rare. Some were only given the opportunity to join group sessions, and one employee said requests for a one-on-one session were repeatedly denied.
The contracts revealed that OpenAI would pay an hourly rate of $12.50 to Sama for the work, significantly more than what the employees actually took home. A Sama spokesperson told Time that the $12.50 rate “covers all costs, like infrastructure expenses, and salary and benefits for the associates and their fully-dedicated quality assurance analysts and team leaders.”
An OpenAI spokesperson told Time that the company didn’t issue productivity targets and said Sama was responsible for managing payments and mental health provisions. The spokesperson also said that OpenAI understood Sama would offer one-to-one counselling and that workers could opt out of any work without penalization.
Despite the collapse of OpenAI’s contract with Sama, a need for human labour in tech, especially AI, remains. Time spoke with AI ethicist Andrew Strait, who warned that ChatGPT and similar systems rely on “massive supply chains of human labour and scraped data, much of which is unattributed and used without consent.” As impressive as ChatGPT is, it’s emblematic of larger, foundational problems in the AI space.
Image credit: Shutterstock