Is Natural Language Processing Ready to Take on Legal Hearings?
Every year, California holds thousands of parole hearings for eligible prisoners. At the epicenter of America’s mass incarceration crisis — the decision about whether to release a prisoner who has served the minimum required sentence comes down to two people: a parole commissioner and a deputy. During a three-hour hearing, they review the life story of a parole candidate, briefly deliberate, and then decide whether or not to grant parole. If they choose to deny, the candidate must wait a period of up to 15 years until re-appearing before the Board of Parole Hearings.
In each of those hearings, a 150-page transcript of the entire conversation is produced for the government and public to review. And most likely, that transcript will never be read. In 2019 alone, the California Board of Parole Hearings held 6,061 hearings and granted parole in 1,181 cases. For a process of this scale, there isn’t much time to review cases to ensure consistency across parole decisions. The governor’s office and parole review unit are tasked with checking parole decisions, but they lack the resources to read every transcript, so as a matter of practicality, they generally only read transcripts for parole approvals. If parole is denied, unless an appellate attorney or another influential stakeholder pushes for a review, the transcript is usually just archived.
Machine learning opens the opportunity to devise a new approach: What if we could “read” thousands of hearing transcripts within minutes, writing out the most important factors for each case? At a glance we would know when a parolee’s last disciplinary infraction was, for example, or whether the prisoner participated in rehabilitation programming. We could then get a picture of how the parole process operates at scale, judge whether it is fair or not, and identify individual cases that appear inconsistent within it. With the knowledge gleaned, we could push for systemic changes where necessary and identify and rectify potential errors in individual cases directly. This approach would center on human discretionary judgment and use technology to ensure transparency and consistency.
We call this the “Recon Approach” and believe it has applications well beyond parole. For example, the approach could be adapted for use in the Social Security Administration, where administrative law judges must decide whether an unemployment claim is valid. It might also be brought to bear in immigration processes, where a single officer must determine whether or not to grant asylum. In a human-led legal decision-making process, machine learning can take the role of making visible mountains of case records — records that would otherwise be boxed up on shelves in dusty archives. We outline this role in a paper forthcoming in the Berkeley Technology Law Journal titled “The Recon Approach: A New Direction for Machine Learning in Criminal Law” (published in The Berkeley Technology Law Journal). Our team includes Stanford Professor of Computer Science and Electrical Engineering Nick McKeown, University of Oregon Law Professor Kristen Bell, Stanford Professor of Computer Science and Linguistics Christopher Manning (a Stanford HAI associate director), and Stanford PhD students Jenny Hong and Catalin Voss, with support from Stanford HAI.
New Challenges for Natural Language Processing
Our vision requires a different flavor of Natural Language Processing (NLP) than what is commonly used today. Massive language models like BERT and GPT-3 have shown dramatic performance improvements across a large variety of NLP tasks in the last few years. However, even these advanced models struggle with the kinds of complex information aggregation tasks that we need to tackle in order to make legal records accessible. There are three main reasons for these challenges that, we believe, deserve the focus of the NLP community. If solved, they will open up many new ways to apply NLP to the law.
1. We need models that can process longer text.
Most existing models have been applied to short text passages on the order of 500-1,000 words. Parole hearing transcripts average 10,000 words. Written decision records in the Social Security Administration are longer than 3,000 words. Asylum case records are frequently longer than 15,000 words.
2. We need to move beyond Named Entity Recognition.
In order to identify a sub-region of a large piece of text where the model can look for the answer to a given question, existing information extraction systems typically rely on Named Entity Recognition (NER). NER spots all instances of entities such as companies, well-known individuals, or other concepts in a long piece of text. This approach works well for the kinds of questions we ask Siri, but parole hearings are not Wikipedia articles. For many of our extraction challenges, there is a single named entity — the parole candidate — about whom we are attempting to answer a large number of questions. The answers to those questions are spread across many sections, so even if we can identify the relevant named entities, we need to piece together information from various places in an unstructured hearing.
3. Existing models need to improve multi-step reasoning.
Consider the question: “When was the last disciplinary infraction this parole candidate incurred (if any)?” To answer this question, a human annotator skims the transcript to find whether any write-ups for misconduct are mentioned, then finds the dates corresponding to these, and then identifies the most recent one. This kind of multi-hop reasoning task remains challenging for NLP today.
We believe that if we can build NLP models that can consume long text at once and “region isolation techniques” beyond NER that can isolate the most relevant part of a document to answer a given question, we will make considerable progress toward tackling the first two challenges. Some promising approaches, such as skimming neural networks, have been proposed in the literature, but more work is required to see if these can be helpful for practical information aggregation.
A Pilot in California
Project Recon, a collaboration between computer scientists and legal scholars at Stanford and the University of Oregon, aims to pilot a machine learning system that reads case records for the California parole hearing system. We have obtained a dataset of over 35,000 parole hearing transcripts from the State of California. Last year, we won a lawsuit against the California Department of Corrections and Rehabilitation (CDCR) seeking to obtain race and attorney representation data for the prisoners mentioned in the hearings. We look forward to tackling the many technical challenges that lie before us.
We invite researchers in NLP in computational law to join us on this journey and bring your expertise to the table.
Catalin Voss is a PhD candidate in Artificial Intelligence at Stanford University, and Jenny Hong is a PhD candidate in Management Science and Engineering.
Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.