Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
How Do Governments Track and Understand AI? | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

How Do Governments Track and Understand AI?

Date
September 28, 2020
Carol M. Highsmith

Researchers discuss the obstacles to measuring AI’s impact.

Artificial intelligence is enabling spectacular advances in fields from medicine to robotics, but it also generates worry about job losses, privacy, fairness and human accountability.

Small wonder that governments worldwide are fixated on policies to both stay competitive and head off dangers. In June alone, for example, U.S. lawmakers introduced seven separate AI bills.

But do policymakers and the public have accurate data? How do we even define AI, much less measure “progress” or competitiveness? Do we have any agreed-upon metrics about benefits and risks?

Those questions were the focus of a recent workshop convened by Stanford HAI and Stanford’s AI Index.

The AI Index may be the world’s most comprehensive public source of data on AI activity, investment and impact. Yet the message from this conference was just how hard it still is to know what’s going on.

We sat down with three of the AI Index’s creators – Saurabh Mishra of Stanford, Ray Perrault of SRI International and Jack Clark of OpenAI – to better understand the challenges.

Why is it hard to measure the progress and impact of AI, and why should we worry?

Perrault: Policymakers want to know what’s happening, but they need good information. People who want more government funding may have incentives to present one set of numbers to warn that we’re underinvesting in AI, while others might present a different set of numbers to claim we’re heavily funding AI and having great impact. So this requires careful thought about what we’re measuring.

The problem is that it’s difficult to put a boundary around what we mean by “artificial intelligence.” AI has borrowed ideas from many disciplines over time, including logic, linguistics and psychology. Machine learning draws many of its foundations from statistics and optimization, and today is being applied to a broad range of other fields, from bioinformatics to finance. Many of those advances are coming not from AI researchers but from people in the applying disciplines.

There’s nothing wrong with that – it’s progress. But it does raise challenges about how to measure investment and advances in artificial intelligence.

For example, should we think that an investment in self-driving cars is all about AI? AI is certainly important, but you can’t give it credit for the whole field. When you actually build a self-driving car, AI is a pretty small share of the total cost. Most applications of AI are driven by a mix of technologies.

How good are we at measuring performance?

Perrault: Strictly from a technical standpoint, there are different metrics of AI performance. These include the amount of data it takes to train a system, but also the amount of computation required and how well a model performs with real-world data that’s different from what it was trained on.

Speech recognition is much more practical now because it’s possible to collect vast numbers of speech samples for the systems to train on. The more data and the more computing power you can throw at the job, the better the results will be.

But accurate results are not the only measure of performance. Another might be: Is there a way to get the same job done, but with less data and computing power? Increasingly, authors of papers about new AI models indicate how much computing was necessary to get their results.

Clark: One way of cutting through the hype is by having better and more standardized metrics for what you want to achieve.

Imagine if car companies hadn’t standardized horsepower. One company could claim its engine had 100,000 “foxpower,” while another claimed its had 700,000 “flypower.” That’s a little like where AI is today. It can be very challenging to compare the performance of a system from task to task, or to compare different systems on the same task, because you use different standards to evaluate them.

You can have a system that’s useful but will use enough energy to boil the ocean, or you can have a system that’s just kind of useful but runs on a triple-A battery. You need to talk about those systems in the same universe.

Mishra: Another metric of progress is in avoiding bias. We know that facial recognition systems are more accurate with some racial groups than others. The National Institute of Standards and Technology has developed a set of systemic evaluation methods to compare the bias of competing facial recognition systems, and it has published reports showing that every system has problems. But those kinds of in-depth standardized measurements and evaluations are still rare in other domains impacted by AI.

Clark: But if you have a single metric, a single performance score, you’re likely to get something wrong. Let’s say you want to measure the bias of facial recognition systems, but the measure is actually a blend of how a system performs for different social or racial groups. What happens if a system is reasonably good overall but weirdly bad at recognizing one particular group?

How good are we at measuring the social and economic impact of AI?

Perrault: It’s a challenge. In spite of all the technological advances in AI, for example, productivity growth has been lagging – even in the West. Part of the answer is that not all the uses of AI generate economic consequences. I can ask a question to my phone and get an answer, but how much economic impact does that have? You can ask many more questions during the day than you could before, but are you more productive than when you had to go look up the information yourself? And how much is the AI contributing? Google says it’s an AI company, but no one really knows how much of their revenue comes from AI.

Mishra: To put this into a global perspective, we need to think about distributional consequences and inequality. We need to study these trends in terms of the impact on developing countries. We don’t have much clarity about which nations, which domains and which organizations are deploying AI. Who has access to which data? Who has access to the computing power? There’s a big paucity of data about developing countries.

Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more. 

Carol M. Highsmith
Share
Link copied to clipboard!
Contributor(s)
Edmund L. Andrews
Related
  • Coded Bias: Director Shalini Kantayya on Solving Facial Recognition’s Serious Flaws
    Katharine Miller
    Sep 14
    news

    We need ‘guidelines around transparency and laws that balance Big Tech’s power.’

Related News

A New Economic World Order May Be Based on Sovereign AI and Midsized Nation Alliances
Alex Pentland
Feb 06, 2026
News
close-up of a globe with pinpoints of lights coming out of all the countries

As trust in the old order erodes, mid-sized countries are building new agreements involving shared digital infrastructure and localized AI.

News
close-up of a globe with pinpoints of lights coming out of all the countries

A New Economic World Order May Be Based on Sovereign AI and Midsized Nation Alliances

Alex Pentland
Feb 06

As trust in the old order erodes, mid-sized countries are building new agreements involving shared digital infrastructure and localized AI.

Smart Enough to Do Math, Dumb Enough to Fail: The Hunt for a Better AI Test
Andrew Myers
Feb 02, 2026
News
illustration of data and lines

A Stanford HAI workshop brought together experts to develop new evaluation methods that assess AI's hidden capabilities, not just its test-taking performance.

News
illustration of data and lines

Smart Enough to Do Math, Dumb Enough to Fail: The Hunt for a Better AI Test

Andrew Myers
Foundation ModelsGenerative AIPrivacy, Safety, SecurityFeb 02

A Stanford HAI workshop brought together experts to develop new evaluation methods that assess AI's hidden capabilities, not just its test-taking performance.

What Davos Said About AI This Year
Shana Lynch
Jan 28, 2026
News
James Landay and Vanessa Parli

World leaders focused on ROI over hype this year, discussing sovereign AI, open ecosystems, and workplace change.

News
James Landay and Vanessa Parli

What Davos Said About AI This Year

Shana Lynch
Economy, MarketsJan 28

World leaders focused on ROI over hype this year, discussing sovereign AI, open ecosystems, and workplace change.