Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
How Math Teachers Are Making Decisions About Using AI | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs

How Math Teachers Are Making Decisions About Using AI

A Stanford summit explored how K-12 educators are selecting, adapting, and critiquing AI tools for effective learning.

Christopher Mah, Dora Demszky, Helen Higgins
Link copied to clipboard!
September 15, 2025
Christopher Mah, Dora Demszky, Helen Higgins
September 15, 2025
Education, Skills
Share:
Link copied to clipboard!

Related News

Stanford HAI and Swiss National AI Institute Form Alliance to Advance Open, Human-Centered AI
Jan 22, 2026
Announcement
Your browser does not support the video tag.

Stanford, ETH Zurich, and EPFL will develop open-source foundation models that prioritize societal values over commercial interests, strengthening academia's role in shaping AI's future.

Announcement
Your browser does not support the video tag.

Stanford HAI and Swiss National AI Institute Form Alliance to Advance Open, Human-Centered AI

Education, SkillsJan 22

Stanford, ETH Zurich, and EPFL will develop open-source foundation models that prioritize societal values over commercial interests, strengthening academia's role in shaping AI's future.

What Parents Need to Know About AI in the Classroom
Nikki Goth Itoi
Sep 29, 2025
News

From immersive learning and personalized tutors to lesson plans and grading, AI is everywhere in K-12 education.

News

What Parents Need to Know About AI in the Classroom

Nikki Goth Itoi
Education, SkillsSep 29

From immersive learning and personalized tutors to lesson plans and grading, AI is everywhere in K-12 education.

My Summer of Learning: Inside Stanford HAI’s AI4ALL Program
Anandita Mukherjee
Aug 06, 2025
News

AI4All participant Anandita Mukherjee shares her experience in an immersive two-week AI learning experience.

News

My Summer of Learning: Inside Stanford HAI’s AI4ALL Program

Anandita Mukherjee
Education, SkillsAug 06

AI4All participant Anandita Mukherjee shares her experience in an immersive two-week AI learning experience.

Nearly three years since the release of ChatGPT, educators are still grappling with the implications of generative artificial intelligence in schools. As AI technology has expanded in capabilities and reach, so too has the presence of edtech tools built on large language models (LLMs). Companies such as MagicSchool and Khan Academy have developed popular teacher-facing platforms, while frontier labs such as OpenAI, Anthropic, and Google have all released education-specific products or services powered by AI models. Schools, districts, and universities (along with individual educators) are purchasing these products in growing numbers. 

Despite rapid advances in capabilities, LLMs still have limitations. They hallucinate, generate biased content, and present data privacy challenges, all issues of critical importance in education. Additionally, they may also shortcut the learning process and erode critical thinking skills if students become overreliant on them. The quality of AI edtech tools varies, and their proliferation requires teachers to carefully vet them before using them in the classroom.

Yet, there is little research that explains how educators actually decide when, how, and why they choose whether or not to use AI tools. In surveys, teachers point to factors like competing priorities, training, district infrastructure, and applicability to one’s subject or grade level that influence their use of AI tools, but systematic and more in-depth evidence on teachers’ criteria for AI tools is limited. As a result, edtech developers too often design products without input or feedback from their most important users. 

Educators bring rich and different perspectives, experiences, and attitudes related to AI that inform how they evaluate them. Understanding the variations in teachers’ reasoning will lead to the development of more inclusive and effective products that serve diverse users.

With that perspective, our research team hosted a summit on the Stanford campus for math educators to explore their needs and feelings toward AI, with the goal of providing educators with practical frameworks for evaluating AI tools and offering AI edtech developers diverse perspectives to inform the design of their tools. 

Image Credit: Stanford

Elevating Practitioner Voices

This summer, our research team (members of the EduNLP Lab at Stanford University Graduate School of Education) brought together over 60 K-12 math educators from around the country. Our participants represented diverse experience levels, grades taught, attitudes toward AI, and familiarity with AI tools. Over the course of a two-day summit hosted at Stanford University, teachers participated in learning activities, group discussions, breakout sessions, and hands-on workshops around teacher-facing AI education tools. As they did so, we invited them to connect their experiences to an educational value that they found meaningful, such as resilience, joy, or community. We stated that the summit would not focus on cheating or a specific AI tool; rather, we hoped to provide a platform for teachers to share their expertise, ask questions, and connect with each other. We also explicitly invited teachers to hold curious, critical, and hopeful perspectives toward AI. 

Our team of facilitators and researchers collected video, survey, and interview data, as well as several artifacts, including handwritten notes and rubrics. Teachers created personal rubrics by listing and ranking evaluation criteria that mattered to them, for a teaching scenario they defined (e.g., “modify an assignment to assess the same skills, but accommodate a student on a 4th-grade reading level”). We analyzed these data and synthesized our findings into the themes below.

Our Findings

Finding 1: Teachers valued many different criteria but placed highest importance on accuracy, inclusiveness, and utility. 

We analyzed 61 rubrics that teachers created to evaluate AI. Teachers generated a diverse set of criteria, which we grouped into ten categories: accuracy, contextual awareness, engagingness, fidelity, inclusiveness, output variety, pedagogical soundness, user agency, and utility. We asked teachers to rank their criteria in order of importance and found a relatively flat distribution, with no single criterion emerging as one that a majority assigned highest importance. Still, our results suggest that teachers placed highest importance on accuracy, inclusiveness, and utility. 13% of teachers listed accuracy (which we defined as mathematically accurate, grounded in facts, and trustworthy) as their top evaluation criterion. Several teachers cited “trustworthiness” and “mathematical correctness” as their most important evaluation criteria, and another teacher described accuracy as a “gateway” for continuing evaluation; in other words, if the tool was not accurate, it would not even be worth further evaluation. Another 13% ranked inclusiveness (which we defined as accessible to diverse cognitive and cultural needs of users) as their top evaluation criterion. Teachers required AI tools to be inclusive to both student and teacher users. With respect to student users, teachers suggested that AI tools must be “accessible,” free of “bias and stereotypes,” and “culturally relevant.” They also wanted AI tools to be adaptable for “all teachers.” One teacher wrote, “Different teachers/scenarios need different levels/styles of support. There is no ‘one size fits all’ when it comes to teacher support!” Additionally, 11% of teachers reported utility as their top evaluation criterion (defined as benefits of using the tool significantly outweigh the costs). Teachers who cited this criterion valued “efficiency” and “feasibility.” One added that AI needed to be “directly useful to me and my students.” 

In addition to accuracy, inclusiveness, and utility, teachers also valued tools that were relevant to their grade level or other context (10%), pedagogically sound (10%), and engaging (7%). Additionally, 8% reported that AI tools should be faithful to their own methods and voice. Several teachers listed “authentic,” “realistic,” and “sounds like me” as top evaluation criteria. One remarked that they wanted ChatGPT to generate questions for coaching colleagues, “in my voice,” adding, “I would only use ChatGPT-generated coaching questions if they felt like they were something I would actually say to that adult.” 

CODE

DESCRIPTION

EXAMPLES

Accuracy

Tool outputs are mathematically accurate, grounded in fact, and trustworthy.

Grounded in actual research and sources ( not hallucinations); mathematical correctness

Adaptability

Tool learns from data and can improve over time or with iterative prompting

Continue to prompt until it fits the needs of the given scenario; continue to tailor it!

Contextual Awareness

Tool is responsive and applicable to specific classroom contexts, including grade level, standards, or teacher-specified goals.

Ability to be specific to a context / grade-level / community

Engagingness

Tool evokes users’ interest, curiosity, or excitement.

A math problem should be interesting or motivate students to engage with the math

Fidelity

Tool outputs are faithful to users’ intent or voice.

In my voice- I would only use chatGPT- generated coaching questions if they felt like they were something I would actually say to that adult

Inclusiveness

Tool is accessible to diverse cognitive and cultural needs of users.

I have to be able to adapt with regard to differentiation and cultural relevance.

Output Variety

Tool can provide a variety of output options for users to evaluate or enhance divergent thinking.

Multiple solutions, not all feedback from chat is useful so providing multiple options is beneficial

Pedagogically Sound

Tool adheres to established pedagogical best practices.

Knowledge about educational lingo and pedagogies

User Agency

Tool promotes users’ control over their own teaching and learning experience.

It is used as a tool that enables student curiosity and advocacy for learning rather than a source to find answers.

Utility

Benefits of using the tool significantly outweigh the costs (e.g., risks, resource and time investment).

Efficiency - will it actually help or is it something I already know

Table 1. Codes for the top criteria, along with definitions and examples. 

Teachers expressed criteria in their own words, which we categorized and quantified via inductive coding.

We have summarized teachers’ evaluation criteria on the chart below:

Finding 2: Teachers’ evaluation criteria revealed important tensions in AI edtech tool design.

In some cases, teachers listed two or more evaluation criteria that were in tension with one another. For example, many teachers emphasized the importance of AI tools that were relevant to their teaching context, grade level, and student population, while also being easy to learn and use. Yet, providing AI tools with adequate context would likely require teachers to invest significant time and effort, compromising efficiency and utility. Additionally, tools with high degrees of context awareness might also pose risks to student privacy, another evaluation criterion some teachers named as important. Teachers could input student demographics, Individualized Education Plans (IEPs), and health records into an AI tool to provide more personalized support for a student. However, the same data could be leaked or misused in a number of ways, including further training of AI models without consent. 

Image Credit: Stanford

Another tension apparent in our data was the tension between accuracy and creativity. As mentioned above, teachers placed highest importance on mathematical correctness and trustworthiness, with one stating that they would not even consider other criteria if a tool was not reliably accurate or produced hallucinations. However, several teachers also listed creativity as a top criterion – a trait produced by LLMs’ stochasticity, which in turn also leads to hallucinations. The tension here is that while accuracy is paramount for fact-based queries, teachers may want to use AI tools as a creative thought-partner for generating novel, outside-the-box tasks – potentially with mathematical inaccuracies – that motivate student reasoning and discussion. 

Finding 3: A collaborative approach helped teachers quickly arrive at nuanced criteria. 

One important finding we observed is that, when provided time and structure to explore, critique, and design with AI tools in community with peers, teachers develop nuanced ways of evaluating AI – even without having received training in AI. Grounding the summit in both teachers’ own values and concrete problems of practice helped teachers develop specific evaluation criteria tied to realistic classroom scenarios. We used purposeful tactics to organize teachers into groups with peers who held different experiences with and attitudes toward AI than they did, exposing them to diverse perspectives they may not have otherwise considered. Juxtaposing different perspectives informed thoughtful, balanced evaluation criteria, such as, “Teaching students to use AI tools as a resource for curiosity and creativity, not for dependence.” One teacher reflected, “There is so much more to learn outside of where I’m from and it is encouraging to learn from other people from all over.” 

Over the course of the summit, several of our facilitators observed that teachers – even those who arrived with strong positive or strong negative feelings about AI – adopted a stance toward AI that we characterized as “critical but curious.” They moved easily between optimism and pessimism about AI, often in the same sentence. One teacher wrote in her summit reflection, “I’m mostly skeptical about using AI as a teacher for lesson planning, but I’m really excited … it could be used to analyze classroom talk, give students feedback … and help teachers foster a greater sense of community.” Another summed it up well: “We need more people dreaming and creating positive tools to outweigh those that will create tools that will cause challenges to education and our society as a whole.”

A Path Forward

Tools built for educators must also be built with educators. During this summit, we provided teachers with space and resources to interrogate questions about AI, both technical and theoretical, alongside peers of all levels of experience. When teachers were given the opportunity to explore and think deeply about AI tools, they gained a deeper understanding of the technology’s affordances and limitations. They also demonstrated the ways in which pedagogical and content expertise inform thoughtful evaluation of AI tools. We believe that teachers’ expertise should be elevated and valued as much as the expertise of researchers and edtech developers in the design and development of AI tools. We include a few recommendations from teachers to edtech developers below. 

Generative AI is a general-purpose technology with potential use cases constrained only by one’s imagination. At the same time, its capabilities are not consistent within tasks, nor across tasks with different degrees of complexity (solving complex scientific problems while failing to count the r’s in strawberry). The jagged edge of intelligence makes it difficult (and perhaps unproductive) to evaluate AI tools holistically and suggests that teachers may be better served evaluating tools based on their ability to complete specific tasks.

Educators are a diverse population with diverse experiences, perspectives, and pedagogical approaches, so AI tools must strike a difficult balance between universal design and specialized capabilities, while also navigating thorny issues related to ethics and data privacy. In our experience, if provided with a platform and with respect, educators are eager to share their expertise to support edtech R&D. Our summit seeks to pave a path toward a future of edtech guided by practitioner voices. 

Recommendations by Teachers to Edtech Developers

Recommendations by Teachers to Edtech Developers

Recommendations by Teachers to Edtech Developers

Recommendations by Teachers to Edtech Developers

Recommendations by Teachers to Edtech Developers

Recommendations by Teachers to Edtech Developers

Recommendations by Teachers to Edtech Developers