Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
AI+Education: How Large Language Models Could Speed Promising New Classroom Curricula | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

AI+Education: How Large Language Models Could Speed Promising New Classroom Curricula

Date
October 14, 2024
Topics
Education, Skills

Stanford computer science scholars propose using language models to create new learning materials for K-12 students.

Developing new classroom curricula is a complex, time-consuming process. Instructors must create lessons and then run experiments with numerous students under different conditions to ensure they work for all learners.

Stanford scholars at the intersection of AI and education posed an interesting question: Could AI improve the process? In a recently published study, they show how large language models (LLMs) can mimic the experts who create and evaluate new materials to assist curriculum designers in getting more high-quality education content to students faster.

“In traditional methods, instructors design every detail, from what topics to cover to example problems for students to solve to supporting videos and other media. Then they test the material on students to see what’s effective,” says Joy He-Yueya, a computer science PhD student who is part of the Stanford AI Lab (SAIL). “It’s a slow process with many logistical challenges. We thought, there might be a better way.”

With support from a multiyear Hoffman-Yee Research Grant, He-Yueya and her co-advisors, associate professor of computer science at Stanford School of Engineering Emma Brunskill and associate professor of psychology at Stanford School of Humanities and Sciences Noah D. Goodman, started brainstorming alternative approaches. 

Previously, AI researchers had tried to build computational models of student learning that could be used to optimize instructional materials; however, this approach fell short due to the difficulty of modeling the cognitive dynamics of human students. Instead, the trio wondered if a model could be trained to act like a teacher and use its own judgment to evaluate new learning materials.

AI as Instructor

First, the scholars needed to verify whether an LLM could be an effective evaluator of educational materials. In a simulated expert evaluation, the scholars asked GPT-3.5 to consider a student’s prior knowledge of a math concept, along with a specific set of word problems, and predict the student’s performance on test questions administered after the lesson. For this phase of research, the team wanted to understand whether certain learning materials are effective for different student personas, such as eighth graders learning algebra or fifth graders struggling with fractions.

To assess the model’s capabilities as a simulated educational expert, the scholars decided to run a small set of basic tests to see if the model’s curriculum evaluations could replicate two well-known phenomena in education psychology. The first is that instructional strategies need to change as a learner’s skills develop. While beginners benefit from structured guidance in the materials, more proficient students perform better with minimal guidance. The Stanford team reasoned that if the LLM replicated this “Expertise Reversal Effect” in its assessments of learning materials, this would be a good indicator of the AI’s potential for mimicking human teachers.

According to the second phenomenon, called the “Variability Effect,” introducing a greater variety of practice problems doesn’t always help students master a concept because it can overload their memory capacity. Less is more, in other words.

When the scholars put their model to the task of evaluating math word problems involving systems of equations and different groups of students, once again, the results echoed this known pattern of outcomes. 

The Instruction Optimization Approach

Once they had confirmed the potential for an AI instructor to evaluate new materials, the scholars turned their attention to the question of whether a pair of models could work together to optimize educational content. They proposed a pipeline approach in which one model generates new educational material and the other evaluates the materials by predicting students’ learning outcomes, as measured by post-test scores. They applied this Instruction Optimization Approach to develop new math word problem worksheets. 

Overall, the AI approach performed well: In a study involving 95 people with teaching experience, those experts generally concurred with the AI evaluator on which AI-generated worksheets would be more effective. The scholars noted a few exceptions, where teachers did not find a significant difference between worksheets that the AI thought were significantly different. The findings from this research are detailed in a 2024 paper published at the Educational Data Mining Conference: Evaluating and Optimizing Educational Content with Large Language Model Judgments.

“While LLMs should not be viewed as a replacement for teaching expertise or real data about what best supports students, our hope is that this approach could help support teachers and instructional designers,” Brunskill said.

Share
Link copied to clipboard!
Contributor(s)
Nikki Goth Itoi
Related
  • Education and AI
    Stanford HAI
    Feb 01
    Industry Brief
    Education and AI Industry Brief Cover

    The development of digital learning infrastructure and platforms was driven by innovative AI technologies and accelerated by pandemic-imposed needs. This industry brief provides a cross-section of key AI research – at HAI and across Stanford – that are reshaping how we learn. Discover how researchers are facilitating high-quality personalized learning at scale, creating novel systems augmenting teaching and assessments, designing intentional and inclusive learning environments, and more.

Related News

What Workers Really Want from Artificial Intelligence
Shana Lynch
Jul 07, 2025
News
illustration of data pouring out of a business suit

A Stanford study captures the gap between worker desires and AI’s abilities, and highlights areas ripe for research and development.

News
illustration of data pouring out of a business suit

What Workers Really Want from Artificial Intelligence

Shana Lynch
Education, SkillsWorkforce, LaborJul 07

A Stanford study captures the gap between worker desires and AI’s abilities, and highlights areas ripe for research and development.

Stanford AI Scholars Find Support for Innovation in a Time of Uncertainty
Nikki Goth Itoi
Jul 01, 2025
News

Stanford HAI offers critical resources for faculty and students to continue groundbreaking research across the vast AI landscape.

News

Stanford AI Scholars Find Support for Innovation in a Time of Uncertainty

Nikki Goth Itoi
Machine LearningFoundation ModelsEducation, SkillsJul 01

Stanford HAI offers critical resources for faculty and students to continue groundbreaking research across the vast AI landscape.

Assessing the Role of Intelligent Tutors in K-12 Education
Nikki Goth Itoi
Apr 21, 2025
News

Scholars discover short-horizon data from edtech platforms can help predict student performance in the long term.

News

Assessing the Role of Intelligent Tutors in K-12 Education

Nikki Goth Itoi
Education, SkillsGenerative AIApr 21

Scholars discover short-horizon data from edtech platforms can help predict student performance in the long term.