Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Shortcomings of Visualizations for Human-in-the-Loop Machine Learning | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Shortcomings of Visualizations for Human-in-the-Loop Machine Learning

Date
October 09, 2023
Topics
Machine Learning

While visualizations can help developers better design, train, and understand their models, new research shows gaps between ambitions and evidence. 

Because machine learning models are built on data, it makes sense to use data visualization tools to help us interpret how those systems work.

For the last few years, some data visualization researchers have been doing just that, launching a field known as Visualization for Machine Learning, or VIS4ML. The goal: to provide human-in-the-loop domain experts with visualizations that will help them accomplish diverse tasks including designing, training, engineering, interpreting, assessing, and debugging ML models.

But when Hariharan Subramonyam, assistant professor at Stanford Graduate School of Education and a faculty fellow with Stanford’s Institute for Human-Centered AI, and his colleague Jessica Hullman of Northwestern University examined 52 recent VIS4ML research publications, they became concerned that researchers are overstating their accomplishments.

For example, Subramonyam says, researchers in this space are not testing VIS4ML tools in ecologically valid ways and are making inappropriately broad claims about their tools’ applicability. The team’s analysis, which has been accepted for publication at IEEE VIS, is available now on preprint service ArXiv.org. 

“The VIS4ML community is trying to solve the problem of making ML models more interpretable,” Subramonyam says, “but the way they’re doing it has shortcomings.”

Lofty Aspirations for VIS4ML

VIS4ML researchers aspire to keep humans in the ML design loop because that will improve ML model performance, Subramonyam says. It’s an admirable goal, but also a difficult challenge. Many ML models are complex black box models that evade insight into their inner workings. It will take brand new data visualization tools to help humans understand what’s going on inside those black boxes, he says.

Some VIS4ML researchers have taken a laudable stab at inventing novel data visualization tools that offer a window into some aspects of ML models, Subramonyam says. For example, there are VIS4ML tools for creating a scatter plot that depicts clusters in high-dimensional data, with different colors for each of the categories an ML algorithm finds in a dataset – types of clothing in images, for example, as shown below. This allows an expert to spot items that are mislabeled and re-label them. Other tools might visualize the various layers of a convolutional network in a manner that users can understand, or visualize the nature of various possible features of an ML model so that an expert can make appropriate decisions about which features to include.

a sample visualization chart, showing a scatter plot of clothing types of an online clothing retailer

When an ML model categorizes thousands of items of clothing from several online shopping sites into 14 types of clothing (T-shirt, shirt, jacket, suit, dress, vest, etc.), it is correct only 61% of the time. In this visualization, the categories are color coded, allowing an expert to easily identify and re-label miscategorized items (red dots in a group of purple dots, for example). This type of scatter plot relies on a visualization algorithm that is good at showing clusters when they exist, but can also imply structure that doesn’t actually exist in the data, Hullman says.

The Generalizability Gap

While the development of novel VIS4ML tools for aiding human-in-the-loop ML is important work, Subramonyam and Hullman’s analysis shows some troubling findings: These tools are too often tested by a small set of experts – often those who were involved in designing the tools in the first place; and they are typically tested on only the most standard popular datasets. “The measure of each tool’s usefulness is quite narrow,” Subramonyam says. 

In addition, only a third of the 52 VIS4ML papers reviewed went beyond asking an expert if a tool seemed useful and actually reported whether using the tool changed the performance of an ML model. Evidence in the other papers depended on hypothetical claims about a visualization tool’s potential benefits, essentially positing that the tool will improve model performance for any kind of model and dataset. 

“These papers make these claims without providing supporting evidence and without acknowledging their limitations and constraints,” Subramonyam says. 

Recommendations

VIS4ML researchers should curtail the unsupported claims about their tools’ generalizability and be more transparent about their limitations, Subramonyam says.

If these researchers want to truly support human-in-the-loop ML, they need to more thoroughly evaluate VIS4ML tools and build a stronger evidence base for any claims of broad applicability. “Researchers need to connect the dots between the new tools and their usefulness in the real world,” Subramonyam says. To further that aim, he and Hullman set out some concrete guidelines for transparency in their paper.

In addition, Subramonyam says, there’s a need for closer collaboration between the people who are building these visualization solutions and the communities they hope to serve. “Human-centered AI is a multidisciplinary endeavor,” he says. “You can’t have tunnel vision where you build a visualization solution expecting it’s going to work in multiple domains and workflows without actually testing it in those domains and workflows.”

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more. 

Share
Link copied to clipboard!
Contributor(s)
Katharine Miller

Related News

Spatial Intelligence Is AI’s Next Frontier
TIME
Dec 11, 2025
Media Mention

"This is AI’s next frontier, and why 2025 was such a pivotal year," writes HAI Co-Director Fei-Fei Li.

Media Mention
Your browser does not support the video tag.

Spatial Intelligence Is AI’s Next Frontier

TIME
Computer VisionMachine LearningGenerative AIDec 11

"This is AI’s next frontier, and why 2025 was such a pivotal year," writes HAI Co-Director Fei-Fei Li.

The Architects of AI Are TIME’s 2025 Person of the Year
TIME
Dec 11, 2025
Media Mention

HAI founding co-director Fei-Fei Li has been named one of TIME's 2025 Persons of the Year. From ImageNet to her advocacy for human-centered AI, Dr. Li has been a guiding light in the field.

Media Mention
Your browser does not support the video tag.

The Architects of AI Are TIME’s 2025 Person of the Year

TIME
Machine LearningComputer VisionDec 11

HAI founding co-director Fei-Fei Li has been named one of TIME's 2025 Persons of the Year. From ImageNet to her advocacy for human-centered AI, Dr. Li has been a guiding light in the field.

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering
Shana Lynch
Nov 07, 2025
News

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.

News

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering

Shana Lynch
Computer VisionMachine LearningNov 07

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.