The world is rife with risk-prediction algorithms. Algorithms tell lenders whether a borrower is likely to default. Colleges use algorithms to predict which applicants won’t make it to graduation. Doctors rely on algorithms to counsel patients on health concerns. Risk assessment algorithms are used to predict the likelihood that a criminal will be a repeat offender.
All such algorithms have one thing in common: They rely on data. And that is how Julian Nyarko, professor at the Stanford Law School and associate director at the Stanford Institute for Human-Centered AI, came to study the effectiveness of risk-based prediction models. At stake is whether risk assessment models actually predict the truths they purport to predict.
Nyarko and two colleagues from Harvard University have published a new paper in Science Advances showing that many risk models may not be all they are cracked up to be, not because they lack data, but because they have too much data. They refer to the conventional wisdom in the field as the “kitchen sink” approach — as in the “everything . . . and the kitchen sink” strategy where more data is better.
Read the study, Risk Scores, Label Bias, and Everything But the Kitchen Sink
“The thinking goes, ‘Let’s just give the model access to as much data as possible. It can’t hurt, right?’” Nyarko explains. “If the data say sunspots, or shoe size, or the price of coffee are good predictors of recidivism, researchers should want to know that and to use that information in their models.”
Proxy Votes
The problem, Nyarko says, is that risk models usually don’t measure the thing they are actually trying to measure — which is often hidden or unmeasurable, as with crime or many medical conditions. Instead, such models measure things indirectly using proxies. The use of inapt proxies leads to a research phenomenon known as label bias. In essence, the proxy has been mislabeled as the truth. So, while the models can get very good at predicting their proxies, they end up off the mark when trying to divine the truth.
The researchers demonstrated the impact of label bias in a few real-world case studies. The first came from the criminal justice system where judges often use models to estimate risk to public safety in order to make bond decisions for arrested individuals. Existing models are trained on the likelihood of future arrests. That is, arrests serve as a proxy for unobserved behavior, not the true outcome of interest: future criminal activity.
Nyarko and his colleagues showed that arrests can actually be a poor predictor of risk to public safety because they depend on both behavior and geography. That is, those who engage in the same illicit behavior may be arrested at different rates depending simply on where they live. They point to one well-known study showing how one major U.S. city concentrates police activity in specific neighborhoods, leading to higher arrest rates for Black citizens over white ones, even though white citizens are just as likely to re-offend. Based on such models, Black detainees are more likely to be denied bail.
The researchers then turned to the medical field, looking at a risk assessment tool used to identify patients for high-risk care management programs that can extend or even save lives. Such models typically predict expected future medical costs as a proxy for medical need. Here again, Black individuals are less likely to be enrolled than whites; white patients are more likely to seek medical treatment, thereby incurring higher costs than their equally sick Black counterparts and then scoring higher in terms of anticipated future medical costs.
Small Ball
Using that example, Nyarko and his collaborators trained two new medical risk models, one simpler with 128 predictors of risk, and the other more complex with 150. They showed that the simpler model repeatedly identifies more high-need patients for high-risk care programs and also enrolls more Black patients in those programs. They attribute this fairer distribution to the simpler model prioritizing immediate medical need over future costs — a better proxy for the truth.
“Researchers should be mindful of — and be diligent — when they don’t have the data that they really care about but instead only a proxy,” Nyarko counsels. “And, when we have only a proxy, being mindful in our choice of proxy and making the models less complex by excluding certain data can improve both the accuracy and equity of risk prediction.”
Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.