How Do We Design and Develop Human-Centered AI?
We see evidence of AI in so many aspects of what designers build – from web-based services to social media platforms and recommendation engines. New AI-driven products help determine who gets called for a job interview, who gets through airport customs, who gets a follow-up appointment with their doctor, and who gets released while awaiting trial.
But the design process misses a key element.
“We do not know how to design AI systems to have a positive impact on humans,” said James Landay, vice director of Stanford HAI and host of the fall conference, AI in the Loop: Humans in Charge. “There’s a better way to design AI.”
At the conference, which took place Nov. 15 at Stanford University, panelists proposed a new definition of human-centered AI – one that emphasizes the need for systems that improve human life and challenges problematic incentives that currently drive the creation of AI tools.
Define Human-Centered Design
How do we design AI for constructive and fair human experience? Landay suggested we start by designing and analyzing systems at three levels: user, community, and society.
Take self-driving cars. Designers must consider the needs and abilities of end users when deciding what appears on the console. They need to ensure the technology works with cyclists, pedestrians, and other non-drivers from the start. They must also talk to subject matter authorities, like transportation experts, to determine how these cars could affect the broader ecosystem. According to a Texas study Landay cited, researchers predicted that autonomous vehicles would exacerbate traffic congestion as more people move farther from work and commute in. If roads get more crowded and cities become less livable, Landay posed, should we instead redirect resources toward improving public transportation?
“Only by considering these societal impacts near the start of AI technology development can we make the right long-term decisions,” he said.
The goals of good technology design, said Ben Shneiderman, professor of computer science at the University of Maryland and founder of its Human-Computer Interaction Lab, should be to support users’ self-advocacy and human creativity, clarify responsibility of users and developers, and advance social connectedness. “I want to see AI-infused super tools that are reliable, safe, and trustworthy,” he said. “If it’s not reliable, safe, and trustworthy, shut it down.”
Jodi Forlizzi, professor of computer science and associate dean for diversity, equity, and inclusion at Carnegie Mellon University, said we need a curriculum change to think about design at every level. Her research team partnered with labor union Unite Here to study AI and automation in the hospitality sector and found that the technologies actually increased some employees’ workload. For example, after a casino replaced its bartenders with automated systems, cocktail servers complained about poor quality drinks, the need to send in smaller orders to avoid contamination, the slow pace of the automated bartender, and earning fewer tips.
“Many of these are really hastily considered designs, and there’s even less consideration for the implementation, rollout, training, and study of the impact on the workforce,” Forlizzi noted.
Still, she cautioned against a reductionist view that only sees technology as a detriment to workers. “We want to look at the difference between minimal implementation of automation and a more robust setting.”
Require Multiple Perspectives From the Get-Go
To unpack issues like those revealed by the AI bartender, we need multidisciplinary teams made up of workers, managers, software designers, and others with conflicting perspectives, Forlizzi said. Experts from technology and AI, the social sciences and humanities, and domains such as medicine, law, and environmental science, are key, and, Landay added, “These experts must be true partners on a project from the start rather than added near the end.”
Still, having the right people in the room doesn’t guarantee consensus and, in fact, results often come from disagreement and discomfort. “We need to manage with and look toward productive discomfort,” said Genevieve Bell, professor at Australian National University and director of its School of Cybernetics and 3A Institute. “How do you teach people to be good at being in a place where it feels uncomfortable?”
Rethink AI Success Metrics
When we evaluate AI systems, the very framework needs to shift: “We’re most often asking the question of what can these models do, but we really need to be asking what can people do with these models?” explained Saleema Amershi, senior principal research manager at Microsoft Research and co-chair of the company’s Aether working group on human-AI interaction and collaboration. We currently measure AI by optimizing for accuracy, explained Amershi. But accuracy is not the sole measure of value. “Designing for human-centered AI requires human-centered metrics,” she said.
“When measurements don’t reflect what people need and value, bad things can happen,” Amershi said. As an example, she told the story of a Palestinian who posted “Good morning” in Arabic on Facebook, which mistakenly translated the post to “attack them” in Hebrew. The company apologized for the machine error, but only after the man had been arrested and questioned.
Pay Attention to Power Structures
The metrics we use must also be viewed within the larger power structure that drives those metrics – and the problems they’ve been designed to address. Here in the United States, capitalism is the engine behind many advances in AI, which are often focused on boosting productivity, several panelists agreed. “We need to think about how people are productive, not just how machines are productive,” cautioned Elizabeth Gerber, a professor of mechanical engineering and co-director of the Center for Human Computer Interaction and Design at Northwestern University. When we put too sharp a focus on productivity, are we ignoring downstream considerations like individual autonomy, freedom, and happiness?
Forlizzi gave the example of Amazon and UPS, both of which monitor their delivery drivers but with an important difference. Amazon, she said, more often manages by algorithm, which has led to questionable firings, while UPS’ data gets reviewed by humans. “If there’s an error or grievance, they have a human-centered process in place for rectifying it.”
In some countries, Bell pointed out, productivity isn’t the only motivation. “A similar bundle of technologies unfolds differently in various countries because you have different governments, histories, cultures, and variations on capitalism. … Productivity is one measurement,” she said, “but so is control, so is surveillance.”
Other notable conversations in brief:
- Niloufar Salehi, assistant professor in the School of Information at the University of California, Berkeley, pointed out that many doctors with patients who speak a different language rely on Google Translate to deliver vital health information. In one UCSF hospital study, the researchers found 8% of the sentences and instructions had been translated incorrectly. With ramifications that could be critical, her team is building a translation tool from the most common emergency room instructions.
- Melissa Valentine, associate professor of management science and engineering at Stanford and an HAI Sabbatical Scholar, embedded into a Silicon Valley fashion company to study how employees use a new AI prediction tool in their buying decisions and how it affects their work. She found an environment where buyers and data scientists worked closely to create a co-produced expertise. The retailers “really started to understand themselves differently,” she said. “They thought of the tool as a way to test their theories or double-check their theories.”
- Michael Bernstein, associate professor of computer science at Stanford, described an approach to improve online content moderation through a “jury learning” system that allows developers to train machine learning tools to select which voices should be heard. (Read more about this research.)
- Meredith Ringel Morris, principal scientist at Google Brain, argued that accessibility should be the north star challenge for human-centered AI research. More than 1 billion people experience some form of disability, and everyone can benefit from accessibility tools. Morris detailed three types that could make the biggest impact while also raising fundamental questions, such as: How do we define safety? What is an acceptable error rate? How do we balance representation in datasets with personal privacy?
- Carla Pugh, a professor of surgery at Stanford School of Medicine, explained how small wins in AI in surgery are distracting us from going after bigger wins, and how far we are from the dream of precision surgery. She provided a glimpse of the current state of AI for surgery, which is primarily focused on surgical video, and of what a big win might look like.
- Jeff Bigham, associate professor at the Human-Computer Interaction Institute at Carnegie Mellon University, said too many interfaces aren’t accessible – imagine trying to buy a Coke from a vending machine if you are visually impaired. He detailed his 17 years working in image description and how the design loop can often produce intriguing divergences in efforts to create useful descriptions.
- Tanzeem Choudhury, professor in integrated health and technology at Cornell Tech, offered a bird’s-eye view of how digital technologies in mental health can be scaled and why progress in this space has been so slow.
- Maneesh Agrawala, professor of computer science at Stanford, explained why unpredictable black boxes are terrible interfaces — when a conceptual model is not predictive, users have to resort to trial and error — and offered a few ways to improve these AIs.
Want to learn more? Watch the full conference here.
Stanford HAI's mission is to advance AI research, education, policy, and practice to improve the human condition. Learn more.