Computer Vision

Computer vision is enhancing machines’ ability to interpret and act on visual data, transforming sectors like healthcare, security, and manufacturing.

From Brain to Machine: The Unexpected Journey of Neural Networks

Katharine Miller

Nov 18, 2024

News

How early cognitive research funded by the NSF paved the way for today’s AI breakthroughs—and how AI is now inspiring new understandings of the human mind.

News

From Brain to Machine: The Unexpected Journey of Neural Networks

Katharine Miller

Machine LearningComputer VisionNov 18

How early cognitive research funded by the NSF paved the way for today’s AI breakthroughs—and how AI is now inspiring new understandings of the human mind.

Finding Monosemantic Subspaces and Human-Compatible Interpretations in Vision Transformers through Sparse Coding

Romeo Valentin, Vikas Sindhwan, Summeet Singh, Vincent Vanhoucke, Mykel Kochenderfer

Jan 01, 2025

Research

We present a new method of deconstructing class activation tokens of vision transformers into a new, overcomplete basis, where each basis vector is “monosemantic” and affiliated with a single, human-compatible conceptual description. We achieve this through the use of a highly optimized and customized version of the K-SVD algorithm, which we call Double-Batch K-SVD (DBK-SVD). We demonstrate the efficacy of our approach on the sbucaptions dataset, using CLIP embeddings and comparing our results to a Sparse Autoencoder (SAE) baseline. Our method significantly outperforms SAE in terms of reconstruction loss, recovering approximately 2/3 of the original signal compared to 1/6 for SAE. We introduce novel metrics for evaluating explanation faithfulness and specificity, showing that DBK-SVD produces more diverse and specific concept descriptions. We therefore show empirically for the first time that disentangling of concepts arising in Vision Transformers is possible, a statement that has previously been questioned when applying an additional sparsity constraint. Our research opens new avenues for model interpretability, failure mitigation, and downstream task domain transfer in vision transformer models. An interactive demo showcasing our results can be found at https://disentangling-sbucaptions.xyz, and we make our DBK-SVD implementation openly available at https://github.com/RomeoV/KSVD.jl.

Research

Finding Monosemantic Subspaces and Human-Compatible Interpretations in Vision Transformers through Sparse Coding

Romeo Valentin, Vikas Sindhwan, Summeet Singh, Vincent Vanhoucke, Mykel Kochenderfer

Computer VisionJan 01

Peering into the Black Box of AI Medical Programs

Adam Hadhazy

Feb 06, 2024

News

To realize the benefits of AI in detecting diseases such as skin cancer, doctors need to trust in the decisions rendered by AI. That requires better understanding of its internal reasoning.

News

Peering into the Black Box of AI Medical Programs

Adam Hadhazy

Computer VisionFeb 06

To realize the benefits of AI in detecting diseases such as skin cancer, doctors need to trust in the decisions rendered by AI. That requires better understanding of its internal reasoning.

ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning

Joey Hejna, Chethan Anand Bhateja, Yichen Jiang, Karl Pertsch, Dorsa Sadigh

Sep 05, 2024

Research

Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.

Research

ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning

Joey Hejna, Chethan Anand Bhateja, Yichen Jiang, Karl Pertsch, Dorsa Sadigh

Computer VisionRoboticsNatural Language ProcessingSep 05

Meet 12 Asteroid Shots in AI

Shana Lynch

Dec 11, 2023

News

Stanford scholars explore advances in foundation models, explore the next-generation chip, and study causal models at the recent Hoffman-Yee Symposium.

News

Meet 12 Asteroid Shots in AI

Shana Lynch

Machine LearningComputer VisionDec 11

Stanford scholars explore advances in foundation models, explore the next-generation chip, and study causal models at the recent Hoffman-Yee Symposium.