As the field of AI has grown, researchers have proudly measured and reported AI systems’ increasingly impressive technical capabilities. But researchers have been much slower to measure AI’s harms, let alone publish papers about them.
“As these systems are being increasingly deployed in the real world, we need to understand the ways they may perpetuate harm,” says Helen Ngo, a research engineer with Cohere and an affiliated researcher with the 2022 AI Index. “We should be measuring the shortcomings of AI systems alongside their increased capabilities.”
Although AI ethics is a relatively nascent field, its importance and recent growth is a focus in the 2022 AI Index Report, the annual report about AI’s impact and progress that is produced by the Stanford Institute for Human-Centered Artificial Intelligence (HAI) with partners from industry, academia, and government.
Read the full 2022 AI Index.
The Technical AI Ethics chapter, which was co-authored by Ngo and Ellie Sakhaee, a senior program manager at Microsoft, provides a snapshot of AI ethics research today.
A key counterintuitive takeaway: The bigger and more capable an AI system is, the more likely it is to produce outputs that are out of line with our human values, says Jack Clark, co-director of the AI Index Steering Committee. “This is the challenge that AI faces,” he says. “We’ve got systems that work really well, but the ethical problems they create are burgeoning.”
Fortunately, there’s been growing interest in AI ethics not only among academics but also among industry players, Ngo says. “AI systems are being deployed by both large and small companies that now realize they have a responsibility to look at the harms they cause.”
The chapter covers topics such as the propensity for language models to spew toxic content, our capacity for coaxing language models into being truthful, and the extent of gender bias in machine translation systems.
Progress has been made on all these fronts, Ngo says, “but the chapter also highlights how much more work there is to do to quantify and measure these systems along ethical dimensions.”
Beyond Virtue Signaling
If the number of peer-reviewed publications on a topic is a good measure of engagement in a particular field of research, then AI ethics is undergoing a bit of a boom. For example, at NeurIPS, one of the largest AI conferences, the number of accepted papers about certain hot topics in AI ethics (interpretability, explainability, causation, fairness, bias and privacy) has steadily increased in recent years.
In addition, papers about various ways to measure bias or toxicity in AI systems have more than doubled in the last two years, as has the number of papers submitted to the largest conference on algorithmic fairness (FAccT).
Perhaps more promising: Companies in the AI space are starting to take AI ethics seriously, as indicated by the growing proportion of peer-reviewed AI ethics papers they have submitted to FAccT each year. This trend suggests that industry players are going beyond just issuing AI standards – a relatively insignificant step that many companies have taken in recent years, as reported in the 2021 AI Index (see figure 5.1.1). “It’s one thing to put together a set of ethical AI principles that sound nice, and another to put in the work and produce peer-reviewed publications addressing these issues,” Ngo says
Language Models’ Toxicity Problem
Several state-of-the-art language models are “generative” models: Given certain information – a large corpus of text from the internet, for example – they will spit out new information. “Think of it as reading a bunch of books and eventually writing something that looks like a book,” Ngo says. Language models are trying to predict the next word in a sentence, repeatedly. Eventually they produce impressively coherent sentences.
But a certain amount of the time, large language models will produce language that is racist, sexist, antisemitic, anti-Muslim, or anti-LGBTQ+ when prompted.
According to recent research done by DeepMind, creators of the language model Gopher, the likelihood of a toxic response is directly related to the toxicity of the text prompt. For example, in response to the prompt “Joe Jones walked down the street,” a language model is fairly likely to say something nontoxic such as “and he went to work.” However, the prompt “Joe Jones raged about his girlfriend as he walked down the street” is more likely to yield sexist or anti-female phrases.
Another important finding: Although the toxicity of responses increases as language models get bigger, larger models are also better at classifying toxic text. “In other words,” Clark says, “we can use increasingly large models to police or filter other models.”
Language Models’ Truth Problem
Historically, language models could not be used to relay truthful or factual information about the world. “It’s really a crapshoot how often language models are truthful,” Ngo says. For example, ask a model who was president in 2012, and it could spit out the name of any random politician.
Indeed, until September 2021 with the publication of the TruthfulQA benchmark, there wasn’t even a good way to measure models’ truthfulness. And according to that benchmark, at that time most models trained on the internet were truthful only about 25% of the time, meaning they are fundamentally unreliable.
It’s a problem, Ngo says: “If you want these models to be deployed in the world, you don’t want them making up fake facts.” Also, more truthful models will be less biased, Clark says. “Models giving factually correct answers tends to mean models that are more ethical.”
In December 2021, for the first time, a language model made major progress: Its truthfulness on the TruthfulQA benchmark shot up to 43%. That might not seem very truthful, but it’s huge for the field.
Progress has been slow in part because you can’t solve ethical problems with text models until the text models themselves are good enough to generate coherent text, Clark says. “All of the ways we might intervene to address ethical problems with language models are several years behind the capabilities development.
Machine Translation’s Gender Problems
Machine translation systems such as Google Translate are typically trained on data that is 99% English. As a result, when English is being translated into other languages, mistakes happen. For example, binarized gender terms (male/female) are translated accurately only 40% to 65% of the time, depending on the language. “Even when Google Translate is translating a sentence in English that isn’t gendered at all, like ‘the doctor went to the operating room,’ it might assign a gender based on stereotypes around the likely gender of a doctor,” Ngo says.
In addition, machine translation does a better job of translating examples related to masculine entities than feminine ones, and does worse when people described in the text being translated hold non-stereotypical gender roles. For example, if the English text refers to a nurse as “he,” the likelihood of a mistaken gender translation is greater than if the text refers to the nurse as “she” – and vice versa for doctors or lawyers.
Dealing with the Good and the Bad Together
Given that AI systems hallucinate facts, spew toxic text, and exhibit all sorts of biases, one might wonder: Why are they still getting sold and deployed?
The answer, Clark says, is that AI systems are getting better at doing a whole swath of things that aren’t ethically problematic. “But in the areas where AI capability also involves something that could generate bias or unfairness, the capabilities are getting better and the harms are getting worse at the same time,” he says. “That’s the counterintuitive thing, and it means there’s lots of work to do.”