Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Finding Monosemantic Subspaces and Human-Compatible Interpretations in Vision Transformers through Sparse Coding | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
research

Finding Monosemantic Subspaces and Human-Compatible Interpretations in Vision Transformers through Sparse Coding

Date
January 01, 2025
Topics
Computer Vision
Your browser does not support the video tag.
Read Paper
abstract

We present a new method of deconstructing class activation tokens of vision transformers into a new, overcomplete basis, where each basis vector is “monosemantic” and affiliated with a single, human-compatible conceptual description. We achieve this through the use of a highly optimized and customized version of the K-SVD algorithm, which we call Double-Batch K-SVD (DBK-SVD). We demonstrate the efficacy of our approach on the sbucaptions dataset, using CLIP embeddings and comparing our results to a Sparse Autoencoder (SAE) baseline. Our method significantly outperforms SAE in terms of reconstruction loss, recovering approximately 2/3 of the original signal compared to 1/6 for SAE. We introduce novel metrics for evaluating explanation faithfulness and specificity, showing that DBK-SVD produces more diverse and specific concept descriptions. We therefore show empirically for the first time that disentangling of concepts arising in Vision Transformers is possible, a statement that has previously been questioned when applying an additional sparsity constraint. Our research opens new avenues for model interpretability, failure mitigation, and downstream task domain transfer in vision transformer models. An interactive demo showcasing our results can be found at https://disentangling-sbucaptions.xyz, and we make our DBK-SVD implementation openly available at https://github.com/RomeoV/KSVD.jl.

Share
Link copied to clipboard!
Authors
  • Romeo Valentin
  • Vikas Sindhwan
  • Summeet Singh
  • Vincent Vanhoucke
  • Mykel Kochenderfer
    Mykel Kochenderfer
Related
  • Closed for the year
    Google Cloud Credit Grants
    Call for proposals will open up again in Summer 2026

    Aimed at supporting novel or emerging research that requires advanced computational resources provided by Google Cloud

Related Publications

ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning
Joey Hejna, Chethan Anand Bhateja, Yichen Jiang, Karl Pertsch, Dorsa Sadigh
Sep 05, 2024
Research
Your browser does not support the video tag.

Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.

Research
Your browser does not support the video tag.

ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning

Joey Hejna, Chethan Anand Bhateja, Yichen Jiang, Karl Pertsch, Dorsa Sadigh
Computer VisionRoboticsNatural Language ProcessingSep 05

Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.

The Global AI Vibrancy Tool 2025
Loredana Fattorini, Nestor Maslej, Ray Perrault, Vanessa Parli, John Etchemendy, Yoav Shoham, Katrina Ligett
Deep DiveNov 24, 2025
Research
Your browser does not support the video tag.

This methodological paper presents the Global AI Vibrancy Tool, an interactive suite of visualizations designed to facilitate cross-country comparisons of AI vibrancy across countries, using indicators organized into pillars. The tool offers customizable features that enable users to conduct in-depth country-level comparisons and longitudinal analyses of AI-related metrics.

Research
Your browser does not support the video tag.

The Global AI Vibrancy Tool 2025

Loredana Fattorini, Nestor Maslej, Ray Perrault, Vanessa Parli, John Etchemendy, Yoav Shoham, Katrina Ligett
DemocracyIndustry, InnovationGovernment, Public AdministrationDeep DiveNov 24

This methodological paper presents the Global AI Vibrancy Tool, an interactive suite of visualizations designed to facilitate cross-country comparisons of AI vibrancy across countries, using indicators organized into pillars. The tool offers customizable features that enable users to conduct in-depth country-level comparisons and longitudinal analyses of AI-related metrics.

AI, Health, and Health Care Today and Tomorrow: The JAMA Summit Report on Artificial Intelligence
Tina Hernandez-Boussard, Michelle Mello, Nigam Shah, Co-authored by 50+ experts
Deep DiveOct 13, 2025
Research
Your browser does not support the video tag.
Research
Your browser does not support the video tag.

AI, Health, and Health Care Today and Tomorrow: The JAMA Summit Report on Artificial Intelligence

Tina Hernandez-Boussard, Michelle Mello, Nigam Shah, Co-authored by 50+ experts
HealthcareRegulation, Policy, GovernanceDeep DiveOct 13
Automated real-time assessment of intracranial hemorrhage detection AI using an ensembled monitoring model (EMM)
Zhongnan Fang, Andrew Johnston, Lina Cheuy, Hye Sun Na, Magdalini Paschali, Camila Gonzalez, Bonnie Armstrong, Arogya Koirala, Derrick Laurel, Andrew Walker Campion, Michael Iv, Akshay Chaudhari, David B. Larson
Deep DiveOct 13, 2025
Research
Your browser does not support the video tag.

Artificial intelligence (AI) tools for radiology are commonly unmonitored once deployed. The lack of real-time case-by-case assessments of AI prediction confidence requires users to independently distinguish between trustworthy and unreliable AI predictions, which increases cognitive burden, reduces productivity, and potentially leads to misdiagnoses. To address these challenges, we introduce Ensembled Monitoring Model (EMM), a framework inspired by clinical consensus practices using multiple expert reviews. Designed specifically for black-box commercial AI products, EMM operates independently without requiring access to internal AI components or intermediate outputs, while still providing robust confidence measurements. Using intracranial hemorrhage detection as our test case on a large, diverse dataset of 2919 studies, we demonstrate that EMM can successfully categorize confidence in the AI-generated prediction, suggest appropriate actions, and help physicians recognize low confidence scenarios, ultimately reducing cognitive burden. Importantly, we provide key technical considerations and best practices for successfully translating EMM into clinical settings.

Research
Your browser does not support the video tag.

Automated real-time assessment of intracranial hemorrhage detection AI using an ensembled monitoring model (EMM)

Zhongnan Fang, Andrew Johnston, Lina Cheuy, Hye Sun Na, Magdalini Paschali, Camila Gonzalez, Bonnie Armstrong, Arogya Koirala, Derrick Laurel, Andrew Walker Campion, Michael Iv, Akshay Chaudhari, David B. Larson
HealthcareRegulation, Policy, GovernanceDeep DiveOct 13

Artificial intelligence (AI) tools for radiology are commonly unmonitored once deployed. The lack of real-time case-by-case assessments of AI prediction confidence requires users to independently distinguish between trustworthy and unreliable AI predictions, which increases cognitive burden, reduces productivity, and potentially leads to misdiagnoses. To address these challenges, we introduce Ensembled Monitoring Model (EMM), a framework inspired by clinical consensus practices using multiple expert reviews. Designed specifically for black-box commercial AI products, EMM operates independently without requiring access to internal AI components or intermediate outputs, while still providing robust confidence measurements. Using intracranial hemorrhage detection as our test case on a large, diverse dataset of 2919 studies, we demonstrate that EMM can successfully categorize confidence in the AI-generated prediction, suggest appropriate actions, and help physicians recognize low confidence scenarios, ultimately reducing cognitive burden. Importantly, we provide key technical considerations and best practices for successfully translating EMM into clinical settings.