What is a Transformer?

A Generative Model for Raw Audio using Transformer Architectures

Prateek Verma, Chris Chafe

Dec 20

Research

A Generative Model for Raw Audio using Transformer Architectures

Prateek Verma, Chris Chafe

Dec 20

A Generative Model for Raw Audio using Transformer Architectures

Research

Finding Monosemantic Subspaces and Human-Compatible Interpretations in Vision Transformers through Sparse Coding

Romeo Valentin, Vikas Sindhwan, Summeet Singh, Vincent Vanhoucke, Mykel Kochenderfer

Jan 01

Research

We present a new method of deconstructing class activation tokens of vision transformers into a new, overcomplete basis, where each basis vector is “monosemantic” and affiliated with a single, human-compatible conceptual description. We achieve this through the use of a highly optimized and customized version of the K-SVD algorithm, which we call Double-Batch K-SVD (DBK-SVD). We demonstrate the efficacy of our approach on the sbucaptions dataset, using CLIP embeddings and comparing our results to a Sparse Autoencoder (SAE) baseline. Our method significantly outperforms SAE in terms of reconstruction loss, recovering approximately 2/3 of the original signal compared to 1/6 for SAE. We introduce novel metrics for evaluating explanation faithfulness and specificity, showing that DBK-SVD produces more diverse and specific concept descriptions. We therefore show empirically for the first time that disentangling of concepts arising in Vision Transformers is possible, a statement that has previously been questioned when applying an additional sparsity constraint. Our research opens new avenues for model interpretability, failure mitigation, and downstream task domain transfer in vision transformer models. An interactive demo showcasing our results can be found at https://disentangling-sbucaptions.xyz, and we make our DBK-SVD implementation openly available at https://github.com/RomeoV/KSVD.jl.

Finding Monosemantic Subspaces and Human-Compatible Interpretations in Vision Transformers through Sparse Coding

Romeo Valentin, Vikas Sindhwan, Summeet Singh, Vincent Vanhoucke, Mykel Kochenderfer

Jan 01

We present a new method of deconstructing class activation tokens of vision transformers into a new, overcomplete basis, where each basis vector is “monosemantic” and affiliated with a single, human-compatible conceptual description. We achieve this through the use of a highly optimized and customized version of the K-SVD algorithm, which we call Double-Batch K-SVD (DBK-SVD). We demonstrate the efficacy of our approach on the sbucaptions dataset, using CLIP embeddings and comparing our results to a Sparse Autoencoder (SAE) baseline. Our method significantly outperforms SAE in terms of reconstruction loss, recovering approximately 2/3 of the original signal compared to 1/6 for SAE. We introduce novel metrics for evaluating explanation faithfulness and specificity, showing that DBK-SVD produces more diverse and specific concept descriptions. We therefore show empirically for the first time that disentangling of concepts arising in Vision Transformers is possible, a statement that has previously been questioned when applying an additional sparsity constraint. Our research opens new avenues for model interpretability, failure mitigation, and downstream task domain transfer in vision transformer models. An interactive demo showcasing our results can be found at https://disentangling-sbucaptions.xyz, and we make our DBK-SVD implementation openly available at https://github.com/RomeoV/KSVD.jl.

Computer Vision

Research

A Composer’s Helper: Using AI to Create New Harmonies

Shana Lynch

Dec 04

news

The Anticipatory Music Transformer assists composers in a way most generative AI cannot.

A Composer’s Helper: Using AI to Create New Harmonies

Shana Lynch

Dec 04

The Anticipatory Music Transformer assists composers in a way most generative AI cannot.

Arts, Humanities

news

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Susan Athey, Herman Brunborg, Tianyu Du, Ayush Kanodia, Keyon Vafa

Dec 11

Research

Vafa et al. (2024) introduced a transformer-based econometric model, CAREER, that predicts a worker’s next job as a function of career history (an “occupation model”). CAREER was initially estimated (“pre-trained”) using a large, unrepresentative resume dataset, which served as a “foundation model,” and parameter estimation was continued (“fine-tuned”) using data from a representative survey. CAREER had better predictive performance than benchmarks. This paper considers an alternative where the resume-based foundation model is replaced by a large language model (LLM). We convert tabular data from the survey into text files that resemble resumes and fine-tune the LLMs using these text files with the objective to predict the next token (word). The resulting fine-tuned LLM is used as an input to an occupation model. Its predictive performance surpasses all prior models. We demonstrate the value of fine-tuning and further show that by adding more career data from a different population, fine-tuning smaller LLMs surpasses the performance of fine-tuning larger models.

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Susan Athey, Herman Brunborg, Tianyu Du, Ayush Kanodia, Keyon Vafa

Dec 11

Vafa et al. (2024) introduced a transformer-based econometric model, CAREER, that predicts a worker’s next job as a function of career history (an “occupation model”). CAREER was initially estimated (“pre-trained”) using a large, unrepresentative resume dataset, which served as a “foundation model,” and parameter estimation was continued (“fine-tuned”) using data from a representative survey. CAREER had better predictive performance than benchmarks. This paper considers an alternative where the resume-based foundation model is replaced by a large language model (LLM). We convert tabular data from the survey into text files that resemble resumes and fine-tune the LLMs using these text files with the objective to predict the next token (word). The resulting fine-tuned LLM is used as an input to an occupation model. Its predictive performance surpasses all prior models. We demonstrate the value of fine-tuning and further show that by adding more career data from a different population, fine-tuning smaller LLMs surpasses the performance of fine-tuning larger models.

Foundation Models

Natural Language Processing

Research

Stanford’s Multimodal AI Model Advances Personalized Cancer Care

Vignesh Ramachandran

Jan 27

news

The MUSK model combines clinical notes and images to predict prognosis and immunotherapy response.

Stanford’s Multimodal AI Model Advances Personalized Cancer Care

Vignesh Ramachandran

Jan 27

The MUSK model combines clinical notes and images to predict prognosis and immunotherapy response.

Healthcare

news

Unlocking New Frontiers: AI and the Sciences

Shana Lynch

Nov 27

news

At Stanford HAI’s recent fall conference, scholars showed how artificial intelligence is opening up new approaches to studying science.

Unlocking New Frontiers: AI and the Sciences

Shana Lynch

Nov 27

At Stanford HAI’s recent fall conference, scholars showed how artificial intelligence is opening up new approaches to studying science.

Machine Learning

news

Navigate

Participate

Stay Up To Date

Transformer mentioned at Stanford HAI

Enroll in a Human-Centered AI Course

A Generative Model for Raw Audio using Transformer Architectures

A Generative Model for Raw Audio using Transformer Architectures

Finding Monosemantic Subspaces and Human-Compatible Interpretations in Vision Transformers through Sparse Coding

Finding Monosemantic Subspaces and Human-Compatible Interpretations in Vision Transformers through Sparse Coding

A Composer’s Helper: Using AI to Create New Harmonies

A Composer’s Helper: Using AI to Create New Harmonies

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Stanford’s Multimodal AI Model Advances Personalized Cancer Care

Stanford’s Multimodal AI Model Advances Personalized Cancer Care

Unlocking New Frontiers: AI and the Sciences

Unlocking New Frontiers: AI and the Sciences