A Trustworthy AI Assistant for Investigative Journalists

Gathering and analyzing data require time and expertise — two resources that cash-strapped newspapers often don’t have. Can AI help?
In 2023, an average of 2.5 local newspapers shut down every week. More than half of U.S. counties now have little or no reliable local news coverage, and the trend is accelerating.
This is a business problem. It is also, arguably, a democracy problem. For centuries, local journalism has kept voters engaged in local politics and politicians accountable to those voters. Small papers with investigative tenacity have also routinely broken stories of national importance — the Patriot-News uncovering Penn State’s Jerry Sandusky scandal, for instance.
The answer to this crisis? “Everybody says, ‘Let’s use AI to help,’ ” replies Monica Lam, a professor of computer science at Stanford University. The problem with this, she adds, is that most AI tools aren’t reliable. She cites a 2025 study conducted by the BBC in which the media outlet used major AI models to analyze news content on its website. Over half of answers from the AI had “significant issues,” according to the BBC, including factual errors and fabricated quotations.
“It’s not so easy,” says Lam.
Now, Lam is working with technologists and journalists to develop a more useful tool for the news industry. With Cheryl Phillips, the founder of Stanford’s Big Local News, along with seed funding from the Stanford Institute for Human-Centered AI and a grant from the Brown Institute for Media Innovation at Stanford and Columbia, Lam created DataTalk, a chatbot specifically designed to help investigative journalists and cash-strapped newsrooms do their work more efficiently without sacrificing factual accuracy. DataTalk is built on top of a large language model and designed to retrieve and analyze information kept in big, sometimes unruly, public databases.
“Journalism is losing a lot of people and deep investigative work is harder than ever,” Lam says. “If more people know about the tool we’re building, and if we can keep improving it and keep generating success stories, then our hope is to bolster this type of journalism into the future.”
What is DataTalk?
Investigative journalists often rely on knowledge of database languages like SQL and the expertise of data scientists to unearth important stories. With DataTalk, they could instead simply type their question into a chat window and get an answer within a few seconds.
Available for anyone to use, the tool is currently focused on campaign finance data, meaning its public use is constrained to questions related to federal political campaigns, such as how much money a candidate for Congress has raised from out of state.
But the tool is expanding. The Baltimore Banner recently began using DataTalk to discover news stories buried in 311 non-emergency call log data. In the coming months, Big Local News hopes to work with Lam and other journalism organizations to identify other key datasets that could be integrated into DataTalk and to build a system that will make it easy for local journalists to add their own data to the agent. State-level campaign finance records are one example.
Along with its analysis, DataTalk provides the code that it used to conduct the analysis and an explanation, in plain English, of what the code is doing. This ensures that what it’s asking in technical language is the same question that the journalist asked in plain language. It also explains the ways in which its analysis may be limited.
To ensure DataTalk is accurate and useful, Lam and Phillips worked with domain expert Derek Willis, one of the country’s foremost campaign finance data journalists, who helped refine how the chatbot conducts its search and interpretation.
“Willis was able to provide really critical instructions to make sure that when a regular journalist asks a question of the agent, it knows which tables to go to and how to form a query out of the general instructions it received,” Phillips says. “Simpler datasets like 311 calls might not need this level of expertise. We consider the structure of the information and the domain we’re looking at to determine what kind of expertise is needed to ensure this model works.”
Once the tool was established, Lam’s group collaborated with Willis and the Big Local team to continually evaluate and improve the DataTalk interface. He also worked with students in Phillips’ class to help improve their understanding of how the agent works and has, since then, continued to maintain and improve the tool’s technical infrastructure.
From Classroom to Newsroom
In the fall of 2024, Phillips piloted the chatbot in her “Big Local Journalism” class. Students focused on campaign finance stories and, over the course of the quarter, published three stories in partnership with local newsrooms. One story compared the pools of donors for two candidates in a Hawaiian congressional race; another story looked at Kamala Harris’s campaign spending on reproductive health ads in Georgia. (The students manually fact-checked each story and replicated DataTalk’s analysis using their own code.)
“The newsrooms that published this work were happy to have it,” Phillips says. “These were stories that they would not otherwise have been able to tell.”
Around this time, the Maine Monitor reached out to do its own analysis comparing campaign contributions from inside and outside of the state. Reassured by the success of the pilot, Phillips helped the journalist at the Monitor conduct her investigation.
An AI Toolbox for Journalism
DataTalk is one piece within Phillips' and Lam’s more sweeping plan to support the world of investigative journalism; they have in mind a full toolkit of applications that help newsrooms generate stories, whether those newsrooms are small local operations collapsing under the strain of scarce resources or national outlets with plenty of investigative muscle. The scholars also plan to provide tutorials on how to use these applications and different stories to which they might be assigned.
Next up, the team hopes to add in DataTalk functionality to Agenda Watch, which uses computational methods as well as AI to gather meeting agendas and minutes from city councils, school boards, and other local decision-making bodies around the U.S. Agenda Watch can also alert users to newsworthy items that appear in local documents.
“Taken together, this effort is meant to reduce the cost of producing accountability journalism,” Phillips says. “It makes it possible, we hope, to dig into investigations and produce stories that matter.”



