Are Universal Self-Supervised Learning Algorithms Within Reach?

Date

January 19, 2022

Topics

A new benchmarking tool helps AI scholars train algorithms that work on any domain, from images to text, video, medical images, and more — all at the same time.

Machine learning has transformed artificial intelligence. Algorithms can analyze and make sense of spoken words, written text, still and moving images, and much more. Such algorithms are already improving diagnostic medicine, helping people navigate the world by voice, and making autonomous vehicles safer. Remarkable as they are, however, such transformative algorithms remain restricted to their specific domains. That is, if trained to analyze spoken words, they don’t do well with images or written text.

Now, researchers at Stanford have made a significant first step toward a day when algorithms are no longer wedded to any one specific domain. They will tackle all domains at once.

The researchers call it DABS, short for “domain-agnostic benchmark for self-supervised learning.” Self-supervised learning is a field of AI in which algorithms teach themselves to analyze data, with little or even no human oversight. SSL can extract lots of useful structure and information from data that is unlabeled. A model does not have to start from scratch when learning a new skill, such as identifying a certain kind of rare disease, for instance. By using SSL, the resulting models are often much faster and better learners, while requiring much less time-consuming human instruction (i.e., labeling) to perform well.

In a recent paper presented at the Conference on Neural Information Processing Systems, the authors show how DABS provides anyone interested in domain-agnostic approaches a complete infrastructure for data processing, training, transfer, and evaluation, absolving other researchers of the considerable task of gathering and processing many datasets across distinct domains.

Read the study: DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

“In providing a suite of standardized tasks, rules, and criteria,” said Alex Tamkin, a doctoral candidate in computer science and first author of the paper, “we hope DABS might allow other researchers to make clear and consistent comparisons across multiple domains and pinpoint specific algorithmic changes that improve the accuracy of their AI approaches.”

“Ideally,” said Noah Goodman, associate professor of psychology and of computer science, and the senior author of the study, “we hope DABS will expedite the era of generalized self-supervised learning in promising fields where it is not yet well established.”

A Living Benchmark

DABS incorporates seven diverse types of data, ranging from images and sensor information to multilingual text and speech to chest X-rays.

DABS could potentially benefit any field where people have lots of raw data, but not the resources to collect labels for millions of data points. Environmental scientists, for instance, might use a DABS algorithm to produce a pretrained/foundation model for their field and, later, distribute it open source, enabling lots of other environmental scientists to use that model to study the things they care about. As the algorithms submitted to DABS get better and better, the resulting models that each subsequent field trains get better, too.

What’s more, Goodman said, new domains will be added down the road.

“We want DABS to be a living benchmark that grows and evolves,” Goodman added. “The best test of a domain-agnostic algorithm is whether it works not only with a domain that you already know a lot about, but with the less well-known one that somebody adds down the road.”

Addressing the “Long Tail”

The need for a benchmark at all is a consequence of a field that has arrived at a crossroads of sorts. Developing self-supervised algorithms demands a lot of trial and error — and that means time and money. People must first evaluate and label the content to help the algorithms understand what they are seeing, hearing, or reading. AI researchers therefore have tended to favor domains with well-established and active communities where most of the labeling is already done — speech recognition, text analysis, and computer vision.

“It’s much easier to borrow a familiar labeled dataset to test your algorithms against than to familiarize yourself with a new one, so many researchers stick to popular domains,” said Tamkin.

DABS moves the field toward these important goals by providing the labeled and unlabeled datasets for several different domains in one database, making it easy for researchers to develop algorithms without having to set up the data. In short, DABS does the work for them.

Meanwhile, other important domains — like valuable new areas of medical imaging and records analysis and promising industrial diagnostics — remain on the sidelines. As such, DABS lowers the barrier to entry to researchers interested in these long-tail domains by offering a consistent benchmark against which they can gauge success.

To test the effectiveness of DABS, the team went so far as to develop two domain-agnostic algorithms and evaluated them against the benchmark. Their results showed modest but still promising improvements in accuracy over unlabeled baselines. The outcome, according to the authors, suggests much room for progress, as future methods, perhaps developed by other researchers around the world, move toward the long-term goal of generalized algorithms.

For Goodman, it’s all a question of balance.

“The human mind displays a similar balance between domain-specific expertise in language and vision, while showing remarkable facility with less common inputs, too,” he said. “AI has much to gain from modeling that sort of adaptability. Hopefully, DABS brings us a little closer to that future.”

Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.

Related News

Chatbots, Like the Rest of Us, Just Want to Be Loved

Wired

Mar 05, 2025

Media Mention

A study led by Stanford HAI Faculty Fellow Johannes Eichstaedt reveals that large language models adapt their behavior to appear more likable when they are being studied, mirroring human tendencies to present favorably.

Media Mention