2022 HAI Fall Conference on AI in the Loop: Humans in Charge

Presentation Titles and Abstracts

Jodi Forlizzi - The Role of Design in Socially Responsible AI

What we design is changing; therefore, how we design is also changing. In this talk, I will set the context for the role of design in creating purposeful and pragmatic technology, both historically and today. I will then highlight some of our research showing the impact of design in creating, developing, and deploying AI and autonomous systems, with the goal of creating better social systems, better economic relations, and a better world in which to live.

Ben Shneiderman - Human-Centered AI Design Guidelines

Ensuring human control becomes increasingly important for consequential and life-critical applications in medicine, transportation, and military applications. User interface designs that promote comprehensibility, predictability, and controllability will gain broader user acceptance, while advancing human self-efficacy, creativity, responsibility, and social connectedness. Guidelines include "Preview first, select and initiate, then manage execution."

Maneesh Agrawala - Unpredictable Black Boxes are Terrible Interfaces

Modern AI models are capable of producing surprisingly high-quality text, images, video and even program code. Yet, the models are black boxes, making it impossible for users to build a mental/conceptual model for how the AI works. Users have no way to predict how the black box transmutes input controls (e.g., natural language prompts) into the output text, images, video or code. Instead, users have to repeatedly create a prompt, apply the model to produce a result and then adjust the prompt and try again, until a suitable result is achieved. In this talk I’ll assert that such unpredictable black boxes are terrible interfaces and that they always will be until we can identify ways to explain how they work. I’ll also argue that the ambiguity of natural language and a lack of shared semantics between AI models and human users are partly to blame. Finally, I’ll suggest some approaches for improving the interfaces to the AI models.

Saleema Amershi - Measuring What Matters for Human-AI Teams

There is a significant discrepancy between the success metrics driving the AI industry and what people value in the real world. In the AI-assisted programming scenario, for example, a key value proposition is the potential for code generation models to dramatically improve developer productivity. Yet, offline metrics used to inform model development decisions and gate which models are deployed to people in the real world currently focus on generation correctness rather than correctness or effort with a developer-in-the-loop. Similarly, online metrics currently focused on acceptance rates overlook interaction costs to developers in prompting, reviewing, and editing generated code. In this talk, I will describe ongoing work from the HAX team at Microsoft Research to develop metrics and measurement tools that more faithfully reflect the needs and effectiveness of human-AI teams.

Elizabeth Gerber - Towards A Desired Future of Work: Job Design for Human-AI Partnerships

Job design is the process of creating a job that enables an organization to achieve its goals while motivating and rewarding the employee. AI has the potential to both positively and negatively affect people’s autonomy, job feedback, skill use and variety, and job significance with consequences for people’s well-being and performance. To create a desirable future of work, we need to create interdisciplinary design teams of job designers, experience designers, and AI designers. Together, they can jointly optimize Human-AI partnerships through contextual inquiry and iterative testing.

Niloufar Salehi - Human-centered Machine Translation for High-stakes Situations

Deployments of AI systems have faced significant challenges in high-stakes situations. For instance, machine translation (e.g. Google Translate) has considerable potential to remove language barriers and is widely used in hospitals in the U.S., but research shows that almost 20% of common medical phrases are mistranslated to Chinese, with 8% causing significant clinical harm. Our goal in this work is to develop new techniques and tools for the design of reliable and effective AI in high-stakes real-world contexts. To this end, I will discuss novel methods for algorithmic needs assessment, computational system design, and evaluation of AI systems.

Melissa Valentine - Helping Experts Test Their Theories: How Algorithms Both Rely On and Threaten Occupational Expertise

Prior research on algorithms in the workplace offers competing predictions. Some studies suggest that algorithms threaten knowledge workers’ expertise. Yet other studies suggest that data scientists highly value knowledge workers’ “domain expertise.” Our study, based on a 10-month ethnography at a retail tech company, shows how these competing predictions are in fact connected. At this company, data scientists pushed their domain experts - the company’s fashion buyers - to explicitly articulate the theories underlying their decisions and to use the rigorous analysis enabled by the algorithms to then reject or update those theories. These new cycles of explicit theory testing asked the buyers to help configure the decisions and analysis. Yet the new theory testing also then asked the buyers to regularly reject and revise their theories, which was a new practice. Testing the fashion buyers’ theories using algorithms was thus a new, co-produced expertise that both built on and threatened their expertise. Our study shows that algorithms can render experts’ tacit knowledge visible for evaluation and testing—a process that simultaneously values them in the articulation of that knowledge while also threatening their expertise by potentially disproving its validity.

Michael Bernstein - Designing Artificial Intelligence to Navigate Societal Disagreement

Whose voices—whose labels—should artificial intelligence (AI) systems learn to emulate? For AI tasks ranging from online comment toxicity to misinformation detection to medical diagnosis, different groups in society may have irreconcilable disagreements about what constitutes ground truth. I will present empirical results demonstrating that current AI metrics overestimate performance in the face of this societal disagreement. In response, I will describe Jury Learning, an AI architecture that resolves these disagreements explicitly through the metaphor of a jury: defining which people or groups, in what proportion, determine the classifier's prediction.

Meredith Ringel Morris - Accessibility as a North Star Challenge for Human-Centered AI Research

The World Health Organization estimates that more than one billion people worldwide experience some form of disability; beyond this 15% of the population experiencing permanent or long-term disability, nearly everyone experiences temporary or situational impairments that would benefit from accessible technology solutions. Emerging AI technologies offer huge potential for enhancing or complementing peoples' sensory, motor, and cognitive abilities. Designing human-centered systems to address accessibility scenarios is a "north star" goal that not only has great societal value, but also provides challenging and meaningful problems that, if solved, will fundamentally advance the state of the art of both AI and HCI. In this talk, I will reflect on challenges and opportunities in designing human-centered AI systems for scenarios including automated image description for people who are blind, efficient and accurate predictive text input for people with limited mobility, and AI-enhanced writing support for people with dyslexia.

Jeffrey Bigham - Discovering Humans and Loops During ML Feature Development

Solving problems with AI is a messy iterative process that is a far cry from the simple loops we sometimes hear about. 17 years ago, I set out to build my first system that would describe visual images for people who are blind, and I’m still working on it. In this talk, I’ll review the humans and loops uncovered while working on what seems like a straightforward problem, as an illustrative example of the twists and turns of technology and people that sometimes come together to deepen our understanding.

Carla Pugh - New Directions and New Data to Inspire the Next Evolution of AI in Precision Surgery

Advancements in precision surgery are heavily dependent on seamless and efficient access to value added information regarding patient anatomy, physiology and disease as well as care team coordination and decision making. AI has been heralded as a central tool to facilitate new and exciting advances in precision surgery. To date, there has been notable success in using AI to analyze surgical procedure videos. The ultimate goal is to enable real time warning capabilities for surgical errors and automated teleconsultation that can help to improve patient outcomes. Despite the recent success in using AI to analyze surgical videos, we are far from achieving our long-sought goal of real time AI integration into the surgical workflow. In this talk, I will discuss the paradigm shift that must take place in our thinking, our research, and our technology development strategies in order to properly situate AI for successful integration into surgical care.

Tanzeem Choudhury - Creating Unified Digital Mental Health Solutions where Patients and Providers are in Charge

We constantly hear about the role of technology and artificial intelligence as a game-changer for healthcare. Mental health is an area that has received increasing attention in tech. During the pandemic efforts to integrate AI into mental health services has grown. In my talk, I will discuss why technologists should not assume that AI tools will always perform well when deployed but also highlight ways we can create solutions that address both patient and provider needs and improve access, outcome, and quality of mental health care.