An AI-based system more effectively — and fairly — identifies tax returns to audit.
Over $400 billion.
That’s how large the gap is between annual U.S. federal income taxes paid and owed, by some estimates.
That’s a problem, given that tax revenues help fund government programs focused on everything from health care to education.
But addressing the problem is no easy task, explain Daniel Ho, Stanford professor of law and political science and associate director for the Stanford Institute for Human-Centered Artificial Intelligence, and Stanford associate professor of law Jacob Goldin. Both are working with the Internal Revenue Service on shaping the future of tax collection.
“The IRS first pioneered an approach in the 1970s, which it has periodically updated, to identify when there is the risk of underpayment of taxes,” Ho says. “The agency is engaging in an ambitious and rigorous plan to explore better ways to do it.”
Ho and Goldin proposed one such better way: an active-learning system that uses an AI algorithm to decide which tax returns are more worthy of an audit. This system offers the promise of reducing the tax gap while using IRS resources more fairly and effectively.
The research team, which includes seven Stanford researchers and collaborators from other schools and the IRS, received one of Stanford Institute for Human-Centered Artificial Intelligence's inaugural Hoffman-Yee grants, which will fund their work.
Tax audits represent an area of great challenge for the U.S. First, tighter resources restrict the IRS’s ability to audit. Federal funding cuts over the last decade have resulted in 14 percent fewer employees, 20 percent fewer enforcement staff, and the lowest level of individual and business audits in 10 years, notes Rebecca Lester, one of the team members and a Stanford Graduate School of Business associate professor of accounting. “Companies are audited less frequently, and when selected for an audit, the amounts they are assessed are often lower,” she notes. “This suggests considerable reduction in enforcement of how companies comply with tax laws and the amount of tax they contribute to national revenues.”
Furthermore, reductions in audit capabilities cut to the heart of the IRS’s tax collection strategy.
“The tax system is a voluntary compliance system where people self-report how much tax they owe and what that’s based on,” Goldin says. “The IRS is looking for tax returns with discrepancies between what’s paid and what’s actually owed.”
But doing that effectively is easier said than done. The system the IRS developed in the 1970s, key features of which remain in use today, picks a random sample of taxpayer returns for an “intense” research audit, as Ho describes it — a line-by-line review for potential discrepancies. The agency then uses the results of those audits to build a risk-estimation model to select returns for a regular audit by one of its thousands of revenue agents.
The system has faced growing challenges. First, due to the resource constraints, only a tiny fraction of returns are selected for random audits — from 50,000 at peak to just several thousand today — limiting the value of the data yielded. That likely contributes to a higher false-positive rate, or percentage of returns targeted for audits but yielding no discrepancy. Some analysts have suggested that audits excessively focus on lower-income taxpayers, and this collaboration will enable a more complete analysis and approach to these distributive concerns.
“We need a more effective system to protect the tax base,” Ho says. “Historically, random audits have informed regular ones, but there’s been no consequent feedback from regular audits to inform smarter selection.”
The solution the research team has proposed is based on “active learning.” Where conventional machine-learning approaches train a model once with a specific data set to apply to new data, active learning seeks to learn continuously and iteratively, selecting data points intentionally to update the model.
“Many tech companies have been using active learning to deploy resources most effectively,” Ho says, contrasting active learning with more conventional evaluations that can be time-consuming and expensive. He provides the example of Netflix showing a sample of customers every possible combination of movie banners to learn trends in preferences — the traditional approach — versus simply showing users one randomly chosen banner to see if they click on it — and then updating the model with that outcome, using active learning.
In the same way, rather than doing a set of random audits and then applying what’s learned to regular audits, as it always has, the IRS could use active learning to immediately “learn something with each new audit and then update the model to examine the next taxpayer return,” Ho says.
“The goal of the algorithm,” Goldin says, “is to find the returns that are going to result in a meaningful adjustment to the tax liability owed — or elements that suggest a discrepancy between the tax liability stated by the taxpayer and the actual one.” For example, if the system recognized that certain types of deductions were more likely to lead to a miscalculation of tax owed, it would begin to flag returns with these deductions for audit.
Identifying returns more efficiently will mean the IRS is “devoting its scarce resources where they're likely to actually recover revenue for the federal government,” Ho says.
The proposed work carries several related benefits.
The most obvious, of course, is closing the significant gap between taxes owed and taxes paid, which has been an ongoing IRS priority and one that means more funding for vital government programs. That need is particularly acute now, Lester notes, as the government works to fund the COVID-19 response.
Active learning could also help estimate the actual tax gap, which has proved challenging as the number of audits dwindles.
But there’s also the possibility of a fairer audit system based on these principles. “The goal of our research team and the IRS is to design a system in which the burden of audits is shared in a fair manner,” Goldin says.
Finally, the project could help the wider AI community contribute to challenging problems of public policy and government. Previously, Ho participated in a study that examined the use of AI by major federal regulatory agencies; he directs Stanford’s RegLab, which builds such partnerships. These academic-agency partnerships, Ho notes, can pave the way to fair, effective, and accountable use of AI in government.
Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.