AI Index: Five Trends in Frontier AI Research

Date

April 15, 2024

Topics

The new AI Index spots major advances in multimodal models, robotics, generative AI, and more.

It's easy to be impressed with models like ChatGPT, Gemini or Claude. Ask these systems to generate dinner recipes, proofread your email, or edit your code, and in seconds they accomplish what may have otherwise taken you hours. However, overlooked amid the hype surrounding large language models is a deeper story: The tremendous progress of frontier AI research beyond LLMs.

According to the recently released AI Index, a comprehensive report from the Stanford Institute for Human-Centered AI analyzing trends in AI research and development, policy, economics, and more, 2023 saw AI take on exciting new multimodal capabilities, exceed human performance, create more adaptable robotics, and make exciting discoveries in science.

Better, More Flexible Models

In 2023, foundation models hit new standards across multiple benchmarks: For example, on MMLU, a popular benchmark for assessing the general reasoning abilities of AI models (can they answer questions in the humanities, science, or mathematics?), the performance of AI, more specifically Google’s Gemini Ultra, exceeded a human baseline for the first time ever. Similarly, on MATH, a benchmark of over 10,000 competition-level mathematics problems, a GPT-4-based model posted a score of roughly 84%, not too far off the standard of 90% set by a three-time international math olympiad gold medalist. For reference, the top score on MATH in 2022 was 65%.

The tremendous progress of generative models is evident when you compare how, over time, Midjourney has responded to the prompt: "a hyper-realistic image of Harry Potter."

Midjourney's depictions of Harry Potter over one year show remarkably improved versions

Beyond better models, 2023 saw more flexible ones. Traditionally, AI models were limited in scope. For instance, language models that were good at reading comprehension struggled with generating images, and vice versa. However, some of the newest state-of-the-art models, like Google's Gemini, OpenAI's GPT-4, and Anthropic's Claude-3, demonstrate multimodal flexibility. They can handle images, process audio, and easily generate code. This is the first year where a single model (in this case, GPT-4 and Gemini) topped benchmarks in different task categories, such as reading comprehension and coding.

Language Insights Power Non-Language Models

The last year also saw exciting developments outside of language modeling. In 2023, researchers used insights from building LLMs, specifically transformer architectures for next-token prediction, to drive progress in non-language domains. Examples include Emu Video (video generation) and UniAudio (music generation). You can now make videos and generate music with AI models powered by some of the same ideas that brought you ChatGPT.

Household Robots That Tell Jokes

Robotics is another domain recently accelerated by language modeling techniques. Two of the most prominent robotic models released in 2023, PaLM-E and RT-2, were both trained on combined corpora of language and robotic trajectories data. Unlike many of its robotic predecessors, PaLM-E can engage in manipulation tasks that involve some degree of reasoning — for example, sorting blocks by color. More impressive, it can also caption images, generate haikus, and tell jokes. RT-2, on the other hand, is especially skilled at manipulating in never-before-seen environments. Both these systems are promising steps toward the development of more general robotic assistants that can intelligently maneuver in the real world and assist humans in tasks like basic housework.

Agentic AI, the Next Frontier?

Agentic AI also saw significant gains. Researchers introduced several new benchmarks — including AgentBench and MLAgentBench — that test how well AI models can operate semi-autonomously. Although there are already promising signs that AI agents can serve as useful computer science assistants, they still struggle with some more complex tasks like conducting our online shopping, managing our households, or independently operating our computers. Still, the introduction of the aforementioned benchmarks suggests that researchers are prioritizing this new field of AI research.

AI Accelerates Science

Last year's AI Index first noted AI’s use in accelerating science. In 2023, significant new systems included GraphCast, a model that can deliver extremely accurate 10-day weather predictions in under a minute; GNoME, which unveiled over 2 million new crystal structures previously overlooked by human researchers; and AlphaMissence, which successfully classified around 89 percent of 71 million possible missense mutations. AI can now perform the kind of brute force calculations that humans struggle with but are nevertheless essential for solving some of the most complex scientific problems. On the medical side, new research shows that doctors can use AI to better diagnose breast cancer, interpret X-rays, and detect lethal forms of cancer.

While large language models captured the world’s attention last year, these were not the only technical advancements at the frontier of AI. Promising developments in generation, robotics, agentic AI, science, and medicine show that AI will be much more than just a tool for answering queries and writing cover letters.

Nestor Maslej is the research manager and editor-in-chief of the AI Index.

The AI Index was first created to track AI development. The index collaborates with such organizations as LinkedIn, Quid, McKinsey, Studyportals, the Schwartz Reisman Institute, and the International Federation of Robotics to gather the most current research and feature important insights on the AI ecosystem.

Related News

Chatbots, Like the Rest of Us, Just Want to Be Loved

Wired

Mar 05, 2025

Media Mention

A study led by Stanford HAI Faculty Fellow Johannes Eichstaedt reveals that large language models adapt their behavior to appear more likable when they are being studied, mirroring human tendencies to present favorably.

Media Mention

Chatbots, Like the Rest of Us, Just Want to Be Loved

Wired

Natural Language ProcessingMachine LearningGenerative AIFoundation ModelsMar 05

Carlos Guestrin to Lead Stanford AI Lab as it Joins Forces with Stanford HAI

Shana Lynch

Quick ReadFeb 20, 2025

News

The computer scientist will invest in SAIL’s vibrant research community as it builds the future of technical AI.

News

Carlos Guestrin to Lead Stanford AI Lab as it Joins Forces with Stanford HAI

Shana Lynch

Machine LearningQuick ReadFeb 20

The computer scientist will invest in SAIL’s vibrant research community as it builds the future of technical AI.

Why Corporate AI Projects Succeed or Fail

Dylan Walsh

Feb 18, 2025

News

Stanford researchers uncover the key factors behind successful AI development in the workplace.

News

Why Corporate AI Projects Succeed or Fail

Dylan Walsh

Economy, MarketsMachine LearningFeb 18

Stanford researchers uncover the key factors behind successful AI development in the workplace.

news

AI Index: Five Trends in Frontier AI Research

Date

April 15, 2024

Topics

Economy, Markets

Natural Language Processing

Machine Learning

The new AI Index spots major advances in multimodal models, robotics, generative AI, and more.

Better, More Flexible Models

The tremendous progress of generative models is evident when you compare how, over time, Midjourney has responded to the prompt: "a hyper-realistic image of Harry Potter."