Biggest AI Headlines in 2022 – At Stanford and Beyond
If we distill the biggest advances in AI this year into one headline, it would be generative AI. This year companies launched major models like ChatGPT, DALL-E 2, and others, highlighting impressive capabilities and sparking debate about potential harms.
Also making news this year? Questions around sentient AI, a benchmark project for large language models, protein structuring, policy blueprints, and more.
Large Models Go Public
After OpenAI’s text-to-image generator DALL-E impressed people with avocado armchairs in 2021, 2022 saw the release of more capable giant models. Google’s Imagen launched privately in May, research lab Midjourney launched its DALL-E competitor in open beta in July, Stability.AI’s Stable Diffusion opened to the public in August, and DALL-E 2 became available broadly in September. Just this month, ChatGPT, OpenAI’s GPT-3 chatbot, opened for public testing. The latest powerful answer engine has been called both “mindblowing” and “built on bs.”
As impressive as this year’s models are, the tools have still shown the bias, racism, and sexism inherent in AIs trained on the internet. Scholars studying Stable Diffusion found “assertive firefighters” are depicted as white men, for example, while a “committed janitor” is a person of color, and over on Reddit, moderators banned several groups making celebrity porn with these tools. Not to mention the percolating copyright issues from artists who recognize their work in some of these generated images.
Still, we’re seeing ad agencies, graphic designers, customer service startups, app makers, and more using these image and text models in their work, and Microsoft, which invested $1 billion in OpenAI in 2019, added DALL-E to its Office software.
And around the corner, if whispers at the NeurIPS conference can be believed, will be GPT-4.
Sentient AI Panic
This summer the AI world fell into a sentient AI sinkhole on Twitter after a Google engineer claimed Google’s large language model LaMDA had a soul. Most technologists agreed sentience is nowhere near a reality (if ever), but that models are getting good enough to trick people into thinking they are having conversations with a real intelligence. This and other AI advances inspired some scholars to revisit the famous Turing Test (assessing whether a machine can successfully trick a human into thinking they are talking to another human). For example, BIG-Bench (or Beyond the Imitation Game), a benchmark for measuring and extrapolating the capabilities of language models, came out in June.
AI Bill of Rights – A Policy Start
This October, the Biden administration released an AI Bill of Rights, which laid out a set of protections for the public from algorithmic systems and their potential harms.
Incorporating feedback from researchers and technologists (see HAI’s own recommendations), advocates, journalists, and policymakers, the blueprint listed five key principles to guide AI development, deployment, and use: Systems must be safe and effective; they must not discriminate; data privacy must be respected; we should understand how a system is used and how it works; and we should be able to opt out and ask for a human alternative.
Critics were quick to note that the recommendations weren’t legally binding and had no teeth, but the blueprint does establish some norms in the U.S. to help direct developers and create future legislation.
Protein Structures and Drug Discovery
DeepMind’s AlphaFold tool, which produces highly accurate predictions of many protein structures, launched in 2020. This year researchers used the tool to predict the structures of more than 200 million proteins from some 1 million species. That covers almost every known protein on the planet, Nature notes. Meanwhile, this year Meta also focused its AI muscle on protein structures and predicted 600 million protein structures from bacteria, viruses, and other microorganisms yet uncharacterized.
Because most drugs are designed on these 3D protein shapes, advances in prediction methods could push the drug market in new directions and expand possibilities for in-silico drug discovery efforts.
And at Stanford:
As large models grow more powerful and useful, we need transparency into how well they work. That inspired HELM, a benchmark project developed by scholars at Stanford HAI’s Center for Research on Foundation Models. HELM evaluates these giant models on accuracy, calibration, robustness, fairness, bias, toxicity, efficiency, and other metrics. The scholars used this benchmark to analyze 30 prominent language models developed by OpenAI, Anthropic, Meta, Microsoft, BIGScience, and more. Read about how these tools did, as well as how HELM works.
Speeding Up Large Model Training and Inference
Large transformer models mean large computation: The attention layer at their heart is the compute and memory bottleneck, making it difficult to equip models with long context. Stanford scholars proposed FlashAttention to make attention fast and memory-efficient, with no approximation. The algorithm minimizes the number of memory reads and writes, speeding up models such as GPT by up to 3x. It yields the fastest BERT model on cloud instances in the competitive MLPerf benchmark, and leads to higher-quality models with 4x longer context.
In just a short time after release, FlashAttention has already been widely adopted by many organizations to speed up the training and inference of large language models and image-generating diffusion models (e.g., at OpenAI, Microsoft, Meta, PyTorch, NVIDIA, and many others). FlashAttention is a great example of how great technical work in academia gets rapidly adopted by industry.
Using AI to Understand History
One of Stanford HAI’s Hoffman-Yee Research Grant teams devised an interesting use of AI: as a lens to understand history. In one paper published this year, they analyzed 140 years of political speeches about immigration to track and understand attitudes toward immigrants. They found a dramatic rise in pro-immigration attitudes starting in 1940, as well as political party differences. While these findings are certainly interesting, more importantly, the scholars are using AI tools to study history in a way historians have never been able to do before.
Beyond ‘Bigger Is Better’
The past few years, we’ve noticed a trend in language models: Bigger scale seems to elicit better performance. But scaling comes with significant compute and energy costs. In this study, which won an outstanding paper award at NeurIPS, scholars including HAI Associate Director Surya Ganguli focus on training dataset size. Their research indicates that many training examples are highly redundant, so we are able to prune training datasets to much smaller sizes and train on these smaller datasets without sacrificing performance. In this work, they develop a new analytic theory of data pruning, show their theory holds in practice, develop a benchmarking study of 10 data pruning metrics, and develop an unsupervised data pruning metric that does not require labels. Their work suggests that, rather than collect large amounts of random data, we can intelligently collect smaller amounts of carefully selected data to maintain performance at smaller energy and compute costs.
Scholars/Community Partner on COVID
Of the many pandemic challenges, one was ensuring the community could act on the best health information. Stanford researchers including HAI Associate Director Daniel E. Ho partnered with the Santa Clara County Public Health Department to create a machine-learning system that helped match people’s predicted language needs to bilingual members of the COVID-19 contact tracing program. The algorithm decreased people’s overall reluctance to engage with the program and significantly cut case times, and also highlighted how AI can be used to improve people’s lives and communities. (Learn more in this article on the research.)
Sounding the Alarm
Beyond exciting technical advances in AI, Stanford faculty also issued warnings around its rapid and unhindered progress. In this thoughtful piece on the Turing Trap, Erik Brynjolfsson, faculty director of the Stanford Digital Economy Lab, a center within Stanford HAI, notes that while human-level intelligence could create many economic benefits, it could also lead to a realignment of economic and political power that redistributes benefits from workers to owners and increases the concentration of global wealth and power.
And in an opinion published in Nature on AI in chemical and materials development, Stanford scholars Sadasivan Shankar and Richard N. Zare explain the dual-use dilemma of machine learning in the field: While the technology may help us develop precision medicine to help individuals, for example, it could also engineer viruses or toxins to target people with specific genes. And while it could be used to invent more sustainable materials like biodegradable plastics, it could also be used to design a tasteless compound that could feasibly poison a community’s water supply.