Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Are Universal Self-Supervised Learning Algorithms Within Reach? | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Are Universal Self-Supervised Learning Algorithms Within Reach?

Date
January 19, 2022
Topics
Machine Learning

A new benchmarking tool helps AI scholars train algorithms that work on any domain, from images to text, video, medical images, and more — all at the same time.

Machine learning has transformed artificial intelligence. Algorithms can analyze and make sense of spoken words, written text, still and moving images, and much more. Such algorithms are already improving diagnostic medicine, helping people navigate the world by voice, and making autonomous vehicles safer. Remarkable as they are, however, such transformative algorithms remain restricted to their specific domains. That is, if trained to analyze spoken words, they don’t do well with images or written text.

Now, researchers at Stanford have made a significant first step toward a day when algorithms are no longer wedded to any one specific domain. They will tackle all domains at once.

The researchers call it DABS, short for “domain-agnostic benchmark for self-supervised learning.” Self-supervised learning is a field of AI in which algorithms teach themselves to analyze data, with little or even no human oversight. SSL can extract lots of useful structure and information from data that is unlabeled. A model does not have to start from scratch when learning a new skill, such as identifying a certain kind of rare disease, for instance. By using SSL, the resulting models are often much faster and better learners, while requiring much less time-consuming human instruction (i.e., labeling) to perform well.

In a recent paper presented at the Conference on Neural Information Processing Systems, the authors show how DABS provides anyone interested in domain-agnostic approaches a complete infrastructure for data processing, training, transfer, and evaluation, absolving other researchers of the considerable task of gathering and processing many datasets across distinct domains.

Read the study: DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

 

“In providing a suite of standardized tasks, rules, and criteria,” said Alex Tamkin, a doctoral candidate in computer science and first author of the paper, “we hope DABS might allow other researchers to make clear and consistent comparisons across multiple domains and pinpoint specific algorithmic changes that improve the accuracy of their AI approaches.”

“Ideally,” said Noah Goodman, associate professor of psychology and of computer science,  and the senior author of the study, “we hope DABS will expedite the era of generalized self-supervised learning in promising fields where it is not yet well established.”

A Living Benchmark

DABS incorporates seven diverse types of data, ranging from images and sensor information to multilingual text and speech to chest X-rays.

DABS could potentially benefit any field where people have lots of raw data, but not the resources to collect labels for millions of data points. Environmental scientists, for instance, might use a DABS algorithm to produce a pretrained/foundation model for their field and, later, distribute it open source, enabling lots of other environmental scientists to use that model to study the things they care about. As the algorithms submitted to DABS get better and better, the resulting models that each subsequent field trains get better, too.

What’s more, Goodman said, new domains will be added down the road.

“We want DABS to be a living benchmark that grows and evolves,” Goodman added. “The best test of a domain-agnostic algorithm is whether it works not only with a domain that you already know a lot about, but with the less well-known one that somebody adds down the road.”

Addressing the “Long Tail”

The need for a benchmark at all is a consequence of a field that has arrived at a crossroads of sorts. Developing self-supervised algorithms demands a lot of trial and error — and that means time and money. People must first evaluate and label the content to help the algorithms understand what they are seeing, hearing, or reading. AI researchers therefore have tended to favor domains with well-established and active communities where most of the labeling is already done — speech recognition, text analysis, and computer vision.

“It’s much easier to borrow a familiar labeled dataset to test your algorithms against than to familiarize yourself with a new one, so many researchers stick to popular domains,” said Tamkin.

DABS moves the field toward these important goals by providing the labeled and unlabeled datasets for several different domains in one database, making it easy for researchers to develop algorithms without having to set up the data. In short, DABS does the work for them.

Meanwhile, other important domains — like valuable new areas of medical imaging and records analysis and promising industrial diagnostics — remain on the sidelines. As such, DABS lowers the barrier to entry to researchers interested in these long-tail domains by offering a consistent benchmark against which they can gauge success.

To test the effectiveness of DABS, the team went so far as to develop two domain-agnostic algorithms and evaluated them against the benchmark. Their results showed modest but still promising improvements in accuracy over unlabeled baselines. The outcome, according to the authors, suggests much room for progress, as future methods, perhaps developed by other researchers around the world, move toward the long-term goal of generalized algorithms.

For Goodman, it’s all a question of balance.

“The human mind displays a similar balance between domain-specific expertise in language and vision, while showing remarkable facility with less common inputs, too,” he said. “AI has much to gain from modeling that sort of adaptability. Hopefully, DABS brings us a little closer to that future.”

Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more. 

Share
Link copied to clipboard!
Contributor(s)
Andrew Myers

Related News

The AI Race Has Gotten Crowded—and China Is Closing In on the US
Wired
Apr 07, 2025
Media Mention

Vanessa Parli, Stanford HAI Director of Research and AI Index Steering Committee member, notes that the 2025 AI Index reports flourishing and higher-quality academic research in AI.

Media Mention
Your browser does not support the video tag.

The AI Race Has Gotten Crowded—and China Is Closing In on the US

Wired
Regulation, Policy, GovernanceEconomy, MarketsFinance, BusinessGenerative AIIndustry, InnovationMachine LearningSciences (Social, Health, Biological, Physical)Apr 07

Vanessa Parli, Stanford HAI Director of Research and AI Index Steering Committee member, notes that the 2025 AI Index reports flourishing and higher-quality academic research in AI.

Here are 3 Big Takeaways from Stanford's AI Index Report
Tech Brew
Apr 07, 2025
Media Mention

Vanessa Parli, HAI Director of Research and AI Index Steering Committee member, speaks about the biggest takeaways from the 2025 AI Index Report.

Media Mention
Your browser does not support the video tag.

Here are 3 Big Takeaways from Stanford's AI Index Report

Tech Brew
Sciences (Social, Health, Biological, Physical)Machine LearningRegulation, Policy, GovernanceIndustry, InnovationApr 07

Vanessa Parli, HAI Director of Research and AI Index Steering Committee member, speaks about the biggest takeaways from the 2025 AI Index Report.

Stanford HAI's 2025 AI Index Reveals Record Growth in AI Capabilities, Investment, and Regulation
Yahoo Finance
Apr 07, 2025
Media Mention

"The AI Index equips policymakers, researchers, and the public with the data they need to make informed decisions — and to ensure AI is developed with human-centered values at its core," says Russell Wald, Executive Director of Stanford HAI and Steering Committee member of the AI Index.

Media Mention
Your browser does not support the video tag.

Stanford HAI's 2025 AI Index Reveals Record Growth in AI Capabilities, Investment, and Regulation

Yahoo Finance
Economy, MarketsMachine LearningRegulation, Policy, GovernanceWorkforce, LaborIndustry, InnovationSciences (Social, Health, Biological, Physical)Ethics, Equity, InclusionApr 07

"The AI Index equips policymakers, researchers, and the public with the data they need to make informed decisions — and to ensure AI is developed with human-centered values at its core," says Russell Wald, Executive Director of Stanford HAI and Steering Committee member of the AI Index.