Skip to main content Skip to secondary navigation
Page Content
Illustration of bright colored, intersecting lines

It's easy to be impressed with models like ChatGPT, Gemini or Claude. Ask these systems to generate dinner recipes, proofread your email, or edit your code, and in seconds they accomplish what may have otherwise taken you hours. However, overlooked amid the hype surrounding large language models is a deeper story: The tremendous progress of frontier AI research beyond LLMs. 

According to the recently released AI Index, a comprehensive report from the Stanford Institute for Human-Centered AI analyzing trends in AI research and development, policy, economics, and more, 2023 saw AI take on exciting new multimodal capabilities, exceed human performance, create more adaptable robotics, and make exciting discoveries in science.

Better, More Flexible Models

In 2023, foundation models hit new standards across multiple benchmarks: For example, on MMLU, a popular benchmark for assessing the general reasoning abilities of AI models (can they answer questions in the humanities, science, or mathematics?), the performance of AI, more specifically Google’s Gemini Ultra, exceeded a human baseline for the first time ever. Similarly, on MATH, a benchmark of over 10,000 competition-level mathematics problems, a GPT-4-based model posted a score of roughly 84%, not too far off the standard of 90% set by a three-time international math olympiad gold medalist. For reference, the top score on MATH in 2022 was 65%. 

The tremendous progress of generative models is evident when you compare how, over time, Midjourney has responded to the prompt: "a hyper-realistic image of Harry Potter."

Midjourney's depictions of Harry Potter over one year show remarkably improved versions

Beyond better models, 2023 saw more flexible ones. Traditionally, AI models were limited in scope. For instance, language models that were good at reading comprehension struggled with generating images, and vice versa. However, some of the newest state-of-the-art models, like Google's Gemini, OpenAI's GPT-4, and Anthropic's Claude-3, demonstrate multimodal flexibility. They can handle images, process audio, and easily generate code. This is the first year where a single model (in this case, GPT-4 and Gemini) topped benchmarks in different task categories, such as reading comprehension and coding. 

Language Insights Power Non-Language Models

The last year also saw exciting developments outside of language modeling. In 2023, researchers used insights from building LLMs, specifically transformer architectures for next-token prediction, to drive progress in non-language domains. Examples include Emu Video (video generation) and UniAudio (music generation). You can now make videos and generate music with AI models powered by some of the same ideas that brought you ChatGPT. 

Household Robots That Tell Jokes

Robotics is another domain recently accelerated by language modeling techniques. Two of the most prominent robotic models released in 2023, PaLM-E and RT-2, were both trained on combined corpora of language and robotic trajectories data. Unlike many of its robotic predecessors, PaLM-E can engage in manipulation tasks that involve some degree of reasoning — for example, sorting blocks by color. More impressive, it can also caption images, generate haikus, and tell jokes. RT-2, on the other hand, is especially skilled at manipulating in never-before-seen environments. Both these systems are promising steps toward the development of more general robotic assistants that can intelligently maneuver in the real world and assist humans in tasks like basic housework.

Agentic AI, the Next Frontier?

Agentic AI also saw significant gains. Researchers introduced several new benchmarks — including AgentBench and MLAgentBench — that test how well AI models can operate semi-autonomously. Although there are already promising signs that AI agents can serve as useful computer science assistants, they still struggle with some more complex tasks like conducting our online shopping, managing our households, or independently operating our computers. Still, the introduction of the aforementioned benchmarks suggests that researchers are prioritizing this new field of AI research. 

AI Accelerates Science

Last year's AI Index first noted AI’s use in accelerating science. In 2023, significant new systems included GraphCast, a model that can deliver extremely accurate 10-day weather predictions in under a minute; GNoME, which unveiled over 2 million new crystal structures previously overlooked by human researchers; and AlphaMissence, which successfully classified around 89 percent of 71 million possible missense mutations. AI can now perform the kind of brute force calculations that humans struggle with but are nevertheless essential for solving some of the most complex scientific problems. On the medical side, new research shows that doctors can use AI to better diagnose breast cancer, interpret X-rays, and detect lethal forms of cancer.

While large language models captured the world’s attention last year, these were not the only technical advancements at the frontier of AI. Promising developments in generation, robotics, agentic AI, science, and medicine show that AI will be much more than just a tool for answering queries and writing cover letters. 

Nestor Maslej is the research manager and editor-in-chief of the AI Index.

The AI Index was first created to track AI development. The index collaborates with such organizations as LinkedIn, Quid, McKinsey, Studyportals, the Schwartz Reisman Institute, and the International Federation of Robotics to gather the most current research and feature important insights on the AI ecosystem. 

More News Topics