Stanford
University

Stanford Home
Maps & Directions
Search Stanford
Emergency Info

Terms of Use
Privacy
Copyright
Trademarks
Non-Discrimination
Accessibility

© Stanford University. Stanford, California 94305.

Erik Altman | Synthetic Data Sets: Use Cases for the Financial Industry | Stanford HAI

Skip to content

About
Research
Education
Policy
AI Index

News
Events
Industry
Centers & Labs

Navigate

About
Events
AI Glossary
Careers
Search

Participate

Get Involved
Support HAI
Contact Us

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Erik Altman | Synthetic Data Sets: Use Cases for the Financial Industry

Status

Past

Date

Wednesday, May 07, 2025 12:00 PM - 1:15 PM PST/PDT

Location

Gates Computer Science Building, Room 119 353 Jane Stanford Way Stanford, CA 94305

Topics

Finance, Business

IBM Synthetic Data Sets (SDS) have been created for use cases in the financial industry.

Share

Link copied to clipboard!

Event Contact

Annie Benisch

abenisch@stanford.edu

2099183302

Related Events

Empirical Methods in the Age of AI Conference

ConferenceOct 02, 2026

October

02

2026

Save the Date. Artificial intelligence is transforming how researchers collect, analyze, and learn from data. As AI systems become increasingly integrated into scientific discovery, business decision-making, and policy analysis, they are reshaping both the questions researchers can ask and the methods they use to answer them.

Event

Empirical Methods in the Age of AI Conference

Oct 02, 2026

Save the Date. Artificial intelligence is transforming how researchers collect, analyze, and learn from data. As AI systems become increasingly integrated into scientific discovery, business decision-making, and policy analysis, they are reshaping both the questions researchers can ask and the methods they use to answer them.

Confronting Our AI Future: Hope, Fear, and the Choices Ahead

ConferenceOct 28, 20269:00 AM - 6:00 PM

October

28

2026

The rapid acceleration of AI comes with a profound wave of anxiety. Across every sector of society, people are facing unsettling questions about their worth and their place in a shifting world.

Conference

Confronting Our AI Future: Hope, Fear, and the Choices Ahead

Oct 28, 20269:00 AM - 6:00 PM

The rapid acceleration of AI comes with a profound wave of anxiety. Across every sector of society, people are facing unsettling questions about their worth and their place in a shifting world.

One key focus is fraud and criminal activity, whose cost runs into the hundreds of billions of dollars per year or more. SDS labels many of these criminal activities including money laundering, credit card fraud, check fraud, APP (Authorized Push Payment) fraud (scams), and insurance claims fraud. As such SDS data provides an attractive foundation for training AI detection models.

Unlike much current activity around synthetic data generation, SDS is not built using large language models. Instead SDS uses an agent-based virtual world approach. A key advantage of the SDS design is that all labels are correct: all fraud is labelled fraud, and only fraud is labelled fraud. By contrast, much criminal activity is missed in the real world, including 95% of money laundering by a UN estimate. Hence, even if real data is available, it is often of poor quality for training detection models, or for generating synthetic data.

In practice, access to real data is generally limited to a small number of people at the institution (e.g. a bank) that owns the data. As such real data provides only a narrow view of activity at a single institution – as opposed to the global view provided by SDS data. The SDS approach also yields a broad set of synthetic personal information. This information is highly realistic despite using no information from real individuals.

Development of effective techniques for SDS has required deep expertise across diverse areas. It has also required significant manual effort. How to automate some of these efforts remains an open challenge, as do calibration, scaling, and other areas.

Speakers

Erik Altman

IBM Researcher

Watch Event Recording