Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Are Universal Self-Supervised Learning Algorithms Within Reach? | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Are Universal Self-Supervised Learning Algorithms Within Reach?

Date
January 19, 2022
Topics
Machine Learning

A new benchmarking tool helps AI scholars train algorithms that work on any domain, from images to text, video, medical images, and more — all at the same time.

Machine learning has transformed artificial intelligence. Algorithms can analyze and make sense of spoken words, written text, still and moving images, and much more. Such algorithms are already improving diagnostic medicine, helping people navigate the world by voice, and making autonomous vehicles safer. Remarkable as they are, however, such transformative algorithms remain restricted to their specific domains. That is, if trained to analyze spoken words, they don’t do well with images or written text.

Now, researchers at Stanford have made a significant first step toward a day when algorithms are no longer wedded to any one specific domain. They will tackle all domains at once.

The researchers call it DABS, short for “domain-agnostic benchmark for self-supervised learning.” Self-supervised learning is a field of AI in which algorithms teach themselves to analyze data, with little or even no human oversight. SSL can extract lots of useful structure and information from data that is unlabeled. A model does not have to start from scratch when learning a new skill, such as identifying a certain kind of rare disease, for instance. By using SSL, the resulting models are often much faster and better learners, while requiring much less time-consuming human instruction (i.e., labeling) to perform well.

In a recent paper presented at the Conference on Neural Information Processing Systems, the authors show how DABS provides anyone interested in domain-agnostic approaches a complete infrastructure for data processing, training, transfer, and evaluation, absolving other researchers of the considerable task of gathering and processing many datasets across distinct domains.

Read the study: DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

 

“In providing a suite of standardized tasks, rules, and criteria,” said Alex Tamkin, a doctoral candidate in computer science and first author of the paper, “we hope DABS might allow other researchers to make clear and consistent comparisons across multiple domains and pinpoint specific algorithmic changes that improve the accuracy of their AI approaches.”

“Ideally,” said Noah Goodman, associate professor of psychology and of computer science,  and the senior author of the study, “we hope DABS will expedite the era of generalized self-supervised learning in promising fields where it is not yet well established.”

A Living Benchmark

DABS incorporates seven diverse types of data, ranging from images and sensor information to multilingual text and speech to chest X-rays.

DABS could potentially benefit any field where people have lots of raw data, but not the resources to collect labels for millions of data points. Environmental scientists, for instance, might use a DABS algorithm to produce a pretrained/foundation model for their field and, later, distribute it open source, enabling lots of other environmental scientists to use that model to study the things they care about. As the algorithms submitted to DABS get better and better, the resulting models that each subsequent field trains get better, too.

What’s more, Goodman said, new domains will be added down the road.

“We want DABS to be a living benchmark that grows and evolves,” Goodman added. “The best test of a domain-agnostic algorithm is whether it works not only with a domain that you already know a lot about, but with the less well-known one that somebody adds down the road.”

Addressing the “Long Tail”

The need for a benchmark at all is a consequence of a field that has arrived at a crossroads of sorts. Developing self-supervised algorithms demands a lot of trial and error — and that means time and money. People must first evaluate and label the content to help the algorithms understand what they are seeing, hearing, or reading. AI researchers therefore have tended to favor domains with well-established and active communities where most of the labeling is already done — speech recognition, text analysis, and computer vision.

“It’s much easier to borrow a familiar labeled dataset to test your algorithms against than to familiarize yourself with a new one, so many researchers stick to popular domains,” said Tamkin.

DABS moves the field toward these important goals by providing the labeled and unlabeled datasets for several different domains in one database, making it easy for researchers to develop algorithms without having to set up the data. In short, DABS does the work for them.

Meanwhile, other important domains — like valuable new areas of medical imaging and records analysis and promising industrial diagnostics — remain on the sidelines. As such, DABS lowers the barrier to entry to researchers interested in these long-tail domains by offering a consistent benchmark against which they can gauge success.

To test the effectiveness of DABS, the team went so far as to develop two domain-agnostic algorithms and evaluated them against the benchmark. Their results showed modest but still promising improvements in accuracy over unlabeled baselines. The outcome, according to the authors, suggests much room for progress, as future methods, perhaps developed by other researchers around the world, move toward the long-term goal of generalized algorithms.

For Goodman, it’s all a question of balance.

“The human mind displays a similar balance between domain-specific expertise in language and vision, while showing remarkable facility with less common inputs, too,” he said. “AI has much to gain from modeling that sort of adaptability. Hopefully, DABS brings us a little closer to that future.”

Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more. 

Share
Link copied to clipboard!
Contributor(s)
Andrew Myers

Related News

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering
Shana Lynch
Nov 07, 2025
News

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.

News

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering

Shana Lynch
Computer VisionMachine LearningNov 07

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.

Offline “Studying” Shrinks the Cost of Contextually Aware AI
Andrew Myers
Sep 29, 2025
News
Blue abstract background with light traveling through abstract flat cable illustrating data flow (3D render)

By having AI study a user’s context offline, researchers dramatically reduce the memory and cost required to make AI contextually aware.

News
Blue abstract background with light traveling through abstract flat cable illustrating data flow (3D render)

Offline “Studying” Shrinks the Cost of Contextually Aware AI

Andrew Myers
Foundation ModelsMachine LearningSep 29

By having AI study a user’s context offline, researchers dramatically reduce the memory and cost required to make AI contextually aware.

BEHAVIOR Challenge Charts the Way Forward for Domestic Robotics
Andrew Myers
Sep 22, 2025
News

With a first-of-its-kind competition for roboticists everywhere, researchers at Stanford are hoping to push domestic robotics into a new age of autonomy and capability.

News

BEHAVIOR Challenge Charts the Way Forward for Domestic Robotics

Andrew Myers
RoboticsMachine LearningSep 22

With a first-of-its-kind competition for roboticists everywhere, researchers at Stanford are hoping to push domestic robotics into a new age of autonomy and capability.