Skip to main content Skip to secondary navigation
Page Content

Exploring the Complex Ethical Challenges of Data Annotation

A cross-disciplinary group of Stanford students examines the ethical challenges faced by data workers and the companies that employ them.

Image
Zach Robertson

What does human-centered data annotation look like? Stanford student Zach Robertson and others worked on this question as part of Stanford HAI's student affinity group program.

Computer science PhD student Zach Robertson is intrigued by human-AI interaction—figuring out ways to align AI systems with human preferences. Recently, however, he became focused on a different problem: how to ensure that the humans classifying AI training data are protected from the toxic and harmful content it sometimes contains. 

“My interest began when I read an article that talked about data annotation and how some companies aren’t necessarily paying their workers a fair amount of money and are having workers view and evaluate some very disturbing content,” he says. “They’re doing that to help make these models safe for the consumers, but without transparency about what might be in that data. For some of those workers, the effects are traumatizing.” 

Data annotation—the process of adding information to the data used to train computer learning models—gives machine algorithms the context and information they need to learn and to make predictions. It can take several forms, including labeling, tagging, transcribing, and processing.

“I realized that the culture of machine learning doesn’t always have a full awareness of the contribution humans make to these models,” Robertson says. “I wanted to see if I could get some multidisciplinary people together to learn more about this issue and maybe even make some changes.”

Building a Team

Robertson developed his idea as part of a Student Affinity Group through the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Launched in 2022, the program brings a diverse range of voices to AI by inviting students from any of Stanford’s seven schools to identify a topic of interest to pursue. Group members recruit their own interdisciplinary teammates for projects that run for the academic year. Each project is eligible for funding of up to $1,000 to cover basic expenses.

Robertson teamed up with mechanical engineering PhD student Mohammadmahdi Honarmand, computer science PhD student Nava Haghighi, and Graduate School of Business MS student Jon Qian to explore the ethical challenges of data annotation in AI development, particularly concerning toxic and harmful content. They called their project WellLabeled.

“We wanted to see how pervasive these problems were, to understand the core reasons for them, and to come up with strategies that might remedy some of these issues,” Robertson says. “We wanted to know if there were simple fixes that could give workers more transparency, reduce harm to them, and still get the data that companies need.”

Toxic Tasks

Team members met with representatives at Scale AI, a San Francisco-based company providing labeled data used to train AI, where they learned about the complex challenges companies face obtaining annotated data. They also met with members at Turkopticon, a worker-run nonprofit organization supporting—among others—data annotators working for Amazon’s Mechanical Turk crowdsourcing marketplace.

“At Turkopticon, we were told of an instance where a worker unexpectedly ended up having to classify images of suicide without any guidance or informed consent,” he says. “For some annotators, this work is their primary income, so they often don’t feel they have the choice to pick another project because this kind of work may be the best-paying option. The lack of transparency, the lack of options, leads them to do the work that has increased risk.”

In-Person Experience

As their research moved forward, Robertson decided to take the project a step further by getting real-world experience. He signed up to do annotation for Open AI, where he spent his time crafting complex questions designed to test the reasoning and logic of the company’s flagship ChatGPT model. 

“I was surprised to be paid $100 an hour, since reports I’d read said some contractors around the world were receiving $1 to $2 an hour. That showed me that there’s a stratification of the work ranging from the very graphic, visceral data annotation that doesn’t pay very much, to the more abstract, open-ended data creation that pays more because only experts can do it,” he says. 

The WellLabeled team finished their project this spring with new insights and suggestions for protecting the people who do data work. Among their findings:

  • Data work involves complex ethical challenges. Many data annotation workers are required to classify toxic content without transparency about assignments, proper pay, or resources to cope with the exposure to disturbing material. 
  • AI companies also face challenges: Data annotation has many components, is highly iterative, and depends on a complicated and sometimes global pipeline with many moving parts that aren’t always fully understood or appreciated by AI researchers. 

Although the problems associated with data work can be complex, some solutions are not, and these should be developed with worker perspective in mind, Robertson says. They include: 

  • Leveraging automation and AI to help protect workers, such as pre-processing data to blur graphic images in a way that allows workers to identify toxicity but protects them from the worst emotional damage. 
  • The use of techniques such as red teaming, in which annotators provide prompts to the model to elicit undesirable responses. Red teaming may offer workers more agency than techniques such as content moderation.
  • Establishing industry-wide standards and best practices such as initial transparency on the nature of assignments, options for reassignment to less toxic projects, and resources for those experiencing adverse effects from the work or who feel they’re being compelled to do damaging work. 

“The annotation process really is where we’re able to direct how AI will adapt to our norms, which makes it an important juncture point in controlling both how these models get deployed and the effect they’ll have on the downstream applications where they’ll be used,” he says. “It’s where we add in our values as a society.”

The project also gave Robertson and his teammates the rare opportunity to practice problem-solving techniques alongside other groups facing similar challenges.

“Forming a group like this isn’t easy, but we were inspired by all the various affinity groups working on very different things,” Robertson says. “It was a supportive environment and an amazing opportunity. I feel like I did meaningful work that I wasn’t going to be able to do without the support of HAI.”

Student Voices

Hear from Robertson and other student affinity group leaders on their experience in the program:

 

Stanford students can apply to the Student Affinity Groups on the HAI website.

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more

More News Topics