Public AI Training Data

To build equitable AI systems, we need advanced market commitments for training data. These AMCs will spur private innovation in collecting training datasets that are representative of the underlying population.

By Abhilash Mishra and Bhasi Nair

What is your proposal? A number of problematic applications of AI (from facial recognition to assessing risk of heart attacks) can be tied to lack of representative training data for algorithms. Can we create incentives for the private sector to collect population-representative data? A key bottleneck in this approach is the lack of a "guaranteed buyer" for these datasets which would prevent private companies to invest in collecting representative datasets. Creating an advanced market commitment where the govt (or a philanthrophic partnership) pays for the collection of equitable datasets can help fix this market failure.

What problem does your proposal address? The proposal helps address the challenge of building equitable AI systems by ensuring training data used in these systems represent the population these systems want to serve.

How does this policy proposal relate to artificial intelligence? AI uses training data and biases in training data can lead to problematic applications of AI.