Some reasonable clinical psychologists might insist that artificial intelligence should play no role in mental health care — that the psychotherapeutic relationship is sacrosanct, and AI shouldn’t even be in the room. But in the United States, where only half of those with mental health problems get services, there’s also a reasonable argument that AI — in particular, natural language processing (NLP) tools — could improve care as well as increase access to it.
For example, NLP could help researchers figure out what makes some psychotherapists more effective than others — information that could then be used to improve training for new clinicians. NLP systems could also listen in on therapy sessions to provide feedback to clinical trainees or experienced clinicians, create a first draft of necessary clinical documentation, or spot changes in a patient’s speech patterns that might be an early indicator of a worsening condition. More advanced AI might even converse with patients and be available at all hours of the day — increasing access to care.
“Assuming privacy protections are done right, NLP tools could meaningfully augment clinician experience or insight,” says Adam Miner, a licensed clinical psychologist, epidemiologist, and instructor in psychiatry and behavioral sciences at Stanford School of Medicine.
To take steps in that direction, Miner and his colleagues have begun to lay some of the groundwork for NLP to be used in mental health care. They have explored whether automated speech recognition systems can accurately transcribe therapy sessions; used NLP to identify expressions of empathy in peer support text messages; and pondered how conversational AI might affect the therapeutic relationship while also expanding access to care.
NLP to Augment Care: Are Speech Recognition Tools Good Enough?
Before NLP can be used to augment clinical care, researchers need to know whether automatic speech recognition systems (ASRs) can transcribe therapy sessions accurately enough that they can then be used to detect therapeutic patterns in real-world clinical settings.
“This technology will be useful before it’s perfect,” Miner notes. “So that begs the question, what does ‘good enough’ look like and what does it look like in different populations or for different use cases?”
In a recent paper, Miner and his colleagues examined Google Speech-to-Text transcriptions of 100 therapy sessions involving 100 different patient/therapist pairs. Their aim: to analyze the transcriptions’ overall accuracy, how well they detect key symptom words related to depression or anxiety, and how well they identify suicidal or homicidal thoughts. “In mental health there are a few things that we really want to get right — like a propensity to self-harm — so if systems aren’t doing that well, that’s a big problem — even if they are doing well generally,” Miner says.
For overall accuracy, the team looked at the transcriptions’ word error rate as well as a mathematical measure of how far apart the human and ASR-transcribed phrases were in their meaning — a measure known as semantic distance. For example, if a person says “my grandmother is dead” and the ASR transcribes it as “my grandmother is dying,” that’s a 25 percent word error rate, but because the meaning is similar, the semantic distance is small. On the other hand, if the ASR reports “my grandmother is lying,” the word error rate is still 25 percent but the semantic distance is greater.
Overall in their study, Miner’s team found a word error rate of 25 percent, but the semantic distance between the ASR and manual transcripts was just slightly worse than a human paraphrase — i.e., it was more like “my grandmother is dying” than “my grandmother is lying.”
“That makes us hopeful,” Miner says. “With more training data, and with better modeling in high-stakes medical settings, ASR can get better and better at transcribing therapy sessions.”
The ASR also correctly identified 80 percent of the depression-related symptom words. But the results for harm-related utterances were more interesting and subtle: The word error rate was higher than it was generally (34 percent) but the semantic distance was much lower — i.e., the ASR captured the meaning quite accurately (much better than a human paraphrase).
“There’s not one perfect measure of language. It’s not like measuring a kilometer or a mile. As a community we have to decide what measures of accuracy we want to use,” Miner says.
It’s also important to make sure the ASR works well for all patient populations, Miner says. For example, the team found that the ASR transcribed female and male patients equally well, but did not look at transcription accuracy for different racial groups, which is an important area for future research, Miner says. A recent paper by a Stanford group led by Sharad Goel found that ASR systems have higher error rates for black people. “That’s clearly concerning,” Miner says. “We need to be doing a better job of making sure that these systems are robust for groups that have been underserved in health care settings.”
Extracting Empathy Cues from Text
The next step after accurate transcription of therapy sessions will be determining whether NLP systems can extract therapeutically meaningful information from them. Can we, for example, determine what makes some therapists more effective than others? Might expressions of empathy be one feature of good therapy?
Currently, it’s hard to use NLP in this way because therapy is a private, intimate experience that is rarely recorded and even more rarely transcribed. So Miner and his colleagues Tim Althoff at the University of Washington, Althoff’s graduate student Ashish Sharma, and David Atkins of the UW Medical School turned to an alternative high-quality dataset of text-based therapeutic communications from the online peer support forum TalkLife and Reddit’s peer support system.
“Because these online forums are text-based, we know exactly what was said [there is no issue of transcription]. And we also know who said what,” Miner says. Thus, using de-identified and privacy-protected datasets of these text messages, researchers can already take the next step toward discovering whether NLP can detect nuanced interpersonal responses.
For example, in a recent paper, Sharma (first author), Miner, Atkins and Althoff developed a framework for identifying empathy in texts, and then looked for markers of empathy in the text messages that the TalkLife and Reddit peer support volunteers wrote in response to people seeking help.
The NLP system, which was trained on 10,000 manually marked up text/response pairs, was able to accurately spot expressions of empathy more than 80 percent of the time.
The system could also spot troubling trends: When the team used the NLP system to analyze 235,000 additional text/response interactions, they found that peer support volunteers showed low levels of empathy and did not get better at showing empathy over time.
“It’s exciting that these services increase access to mental health support,” Miner says, “But it’s concerning that the listeners aren’t more skilled.”
These results suggest the need for better training of peer support volunteers. “If we can start to learn from these conversations and coach the helpers, that could lead to a virtuous cycle of feedback and learning that makes the system work better,” Miner says.
In addition, the empathy study sets the stage for advancing Miner’s work with therapy transcripts, since it points to NLP’s ability to detect things clinicians know are important but are hard to measure — such as identifying indicators of high-quality therapy.
But even if AI can help improve the quality of in-person therapy, there just aren’t enough therapists to go around. So the biggest game changer for access to mental health care might come from AI chatbots providing mental health services with — or possibly without — an escape valve to reach a human therapist as needed. “The benefit is that the chatbot is awake at 2 a.m. and on Thanksgiving Day. And it will never hang up on you or judge you,” Miner says.
In a recent paper, Miner and his colleagues explore four alternatives to mental health care delivery, including traditional therapy with no AI involvement, traditional therapy augmented with AI (akin to the approaches described above), therapy that is AI delivered but human supervised, and therapy that is 100 percent provided by AI with no expectation of supervision by a human clinician.
In the case of the human-supervised chatbot, a clinician might hand off specific roles to the AI, or review conversations between patients and the front-line conversational AI. “And if someone mentions self-harm, the AI system could offer a call to 911, or connect the patient with a human as an escape valve or escalation point,” Miner says.
Because of safety concerns, Miner has a harder time envisioning the use case where only a chatbot talks to the patient. But he wonders if there are patients for whom an AI might be better than a human. “We actually don’t know: Could a piece of software be more trustworthy than a human for certain people or certain conversations?”
In their paper, Miner and his colleagues explore the various ways a chatbot therapist would impact not only access to mental health care but the quality of care, the clinician-patient relationship, and patients’ willingness to disclose their thoughts, as well as issues of safety, privacy, and oversight.
“We have to get the privacy right because if patients don’t trust the system, they might not seek care or might not disclose the sensitive experiences that are important to talk about and share,” Miner says. “We want to avoid having that chilling effect.”
In the long run, Miner envisions a world where all four options might be in use. Clinicians will not be replaced. “To me, these tools augment or supplement clinician experience or insight,” he says. Going forward, he says, there’s plenty of work to do to understand that human/AI collaboration. How do clinicians react to it? How does it fit into their workflow? If systems aren’t implemented well or designed well for clinicians and administrators, they will fail, he says.
Nevertheless, Miner says, “This is coming faster than many people think.
Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.