Toward Responsible Development and Evaluation of LLMs in Psychotherapy

Date

June 13, 2024

Topics

abstract

This brief reviews the current landscape of LLMs developed for psychotherapy and proposes a framework for evaluating the readiness of these AI tools for clinical deployment.

Key Takeaways

Large language models (LLMs) hold promise for supporting, augmenting, and even automating psychotherapy through tasks ranging from note-taking during interviews to assessment and delivering therapy.
However, psychotherapy is a uniquely complex, high-stakes domain. The use of LLMs in this field poses wide-ranging safety, legal, and ethical concerns.
We propose a framework for evaluating and reporting on whether AI applications are ready for clinical deployment in behavioral health contexts based on safety, confidentiality/privacy, equity, effectiveness, and implementation concerns.
Policymakers and behavioral health practitioners should proceed cautiously when integrating LLMs into psychotherapy. Product developers should integrate evidence-based psychotherapy expertise and conduct comprehensive effectiveness and safety evaluations of clinical LLMs.

Executive Summary

There is growing enthusiasm about the potential of OpenAI’s GPT-4, Google’s Gemini, Anthropic’s Claude, and other large language models (LLMs) to support, augment, and even fully automate psychotherapy. By serving as conversational agents, LLMs could help address the shortage of mental healthcare services, problems with individual access to care, and other challenges. In fact, behavioral healthcare specialists are beginning to use LLMs for tasks such as note-taking, while consumers are already conversing with LLM-powered therapy chatbots.

However, psychotherapy is a uniquely complex, high-stakes domain. Responsible and evidence-based therapy requires nuanced expertise. While the stakes involved with using an LLM for productivity purposes may be failing to maximize efficiency, in behavioral healthcare, the stakes may include the improper handling of suicide risk.

Our paper, “Large Language Models Could Change the Future of Behavioral Healthcare,” provides a road map for the responsible application of clinical LLMs in psychotherapy. We provide an overview of the current landscape of clinical LLM applications and analyze the different stages of integration into psychotherapy. We discuss the risks of these LLM applications and offer recommendations for guiding their responsible development.

In a more recent paper, “Readiness for AI Deployment and Implementation (READI): A Proposed Framework for the Evaluation of AI-Mental Health Applications,” we build on our prior work and propose a new framework for evaluating whether AI mental health applications are ready for clinical deployment.

This work underscores the need for policymakers to understand the nuances of how LLMs are already, or could soon be, integrated in psychotherapy environments as researchers and industry race to develop AI mental health applications. Policymakers have the opportunity and responsibility to ensure that the field evaluates these innovations carefully, taking into consideration their potential limitations, ethical considerations, and risks.

Introduction

The use of AI in psychotherapy is not a new phenomenon. Decades before the emergence of mainstream LLMs, researchers and practitioners used AI applications, such as natural language processing models, in behavioral health settings. For instance, various research experiments used machine learning and natural language processing to detect suicide risk, identify homework resulting from psychotherapy sessions, and evaluate patient emotions. More recently, mental health chatbots such as Woebot and Tessa have applied rules-based AI techniques to target depression and eating pathology. Yet they frequently struggle to respond to user inputs and have high dropout rates and low user engagement.

LLMs have the potential to fill some of these gaps and change many aspects of psychotherapy care thanks to their ability to parse human language, generate human-like and context-dependent responses, annotate text, and flexibly adopt different conversational styles.

However, while LLMs show vast promise in performing certain tasks and skills associated with psychotherapy, clinical LLM products and prototypes are not yet sophisticated enough to replace psychotherapy. There is a gap between simulating therapy skills and implementing them to alleviate patient suffering. To achieve the implementation piece, clinical LLMs need to be tailored to psychotherapy contexts using prompt engineering—structuring a set of instructions so they can be understood by an AI model—or fine-tuning techniques that use curated datasets to train the LLM.

As LLMs are increasingly used in psychotherapy, it is essential to understand the complexity and stakes at play: In the worst-case scenario, an “LLM co-pilot” functioning poorly could lead to the improper handling of the risk of suicide or homicide. While clinical LLMs are, of course, not the only AI applications that may involve life-or-death decisions—consider self-driving cars, for example—predicting and mitigating risk in psychotherapy is unique. It requires conceptualizing complex cases, considering social and cultural contexts, and addressing unpredictable human behavior. Poor outcomes or ethical transgressions from clinical LLMs could seriously harm individuals and undermine public trust in behavioral healthcare as a field, as has been seen in other domains.

Beginning with an overview of the clinical LLMs in use today, our first paper reviews the current landscape of clinical LLM development. We examine how clinical LLMs progress across different stages of integration and identify specific ethical and other concerns related to their use in different scenarios. We then make recommendations for how to responsibly approach the development of LLMs for use in behavioral health settings. In our second paper, we propose a framework that could be used by developers, researchers, clinicians, and policymakers to evaluate and report on the readiness of generative AI mental health applications for clinical deployment.

Related Publications

The Complexities of Race Adjustment in Health Algorithms

Marika Cusick, Glenn Chertow, Douglas Owens, Michelle Williams, Sherri Rose

Sep 26, 2024

Policy Brief

This brief explores the complexities of accounting for race in clinical algorithms for evaluating kidney disease and the implications for tackling deep-seated health inequities.

Policy Brief

The Complexities of Race Adjustment in Health Algorithms

Marika Cusick, Glenn Chertow, Douglas Owens, Michelle Williams, Sherri Rose

HealthcareEthics, Equity, InclusionSep 26

This brief explores the complexities of accounting for race in clinical algorithms for evaluating kidney disease and the implications for tackling deep-seated health inequities.

Pathways to Governing AI Technologies in Healthcare

Caroline Meinhardt, Alaa Youssef, Rory Thompson, Daniel Zhang, Rohini Kosoglu, Kavita Patel

Jul 15, 2024

Explainer

Leading policymakers, academics, healthcare providers, AI developers, and patient advocates discuss the path forward for healthcare AI policy at closed-door workshop.

Explainer

Pathways to Governing AI Technologies in Healthcare

Caroline Meinhardt, Alaa Youssef, Rory Thompson, Daniel Zhang, Rohini Kosoglu, Kavita Patel

HealthcareJul 15

Leading policymakers, academics, healthcare providers, AI developers, and patient advocates discuss the path forward for healthcare AI policy at closed-door workshop.

How Can We Better Regulate Health AI?

Shana Lynch

Jul 15, 2024

Explainer

HAI Associate Director Curt Langlotz explains the current state of health regulation and where we need to move to protect patients and better assist doctors.

Explainer

How Can We Better Regulate Health AI?

Shana Lynch

HealthcareJul 15

HAI Associate Director Curt Langlotz explains the current state of health regulation and where we need to move to protect patients and better assist doctors.

Michelle M. Mello's Testimony Before the U.S. Senate Committee on Finance

Michelle Mello

Feb 08, 2024

Testimony

Michelle M. Mello discusses how Congress can support healthcare organizations and health insurers navigating the uncharted territory of AI tools by imposing some guardrails while allowing the rules to evolve with the technology.

Testimony

Michelle M. Mello's Testimony Before the U.S. Senate Committee on Finance

Michelle Mello

HealthcareRegulation, Policy, GovernanceFeb 08

policyPolicy Brief

Toward Responsible Development and Evaluation of LLMs in Psychotherapy

Date

June 13, 2024

Topics

Healthcare

Read Paper

abstract

This brief reviews the current landscape of LLMs developed for psychotherapy and proposes a framework for evaluating the readiness of these AI tools for clinical deployment.

Key Takeaways

Large language models (LLMs) hold promise for supporting, augmenting, and even automating psychotherapy through tasks ranging from note-taking during interviews to assessment and delivering therapy.
However, psychotherapy is a uniquely complex, high-stakes domain. The use of LLMs in this field poses wide-ranging safety, legal, and ethical concerns.
We propose a framework for evaluating and reporting on whether AI applications are ready for clinical deployment in behavioral health contexts based on safety, confidentiality/privacy, equity, effectiveness, and implementation concerns.
Policymakers and behavioral health practitioners should proceed cautiously when integrating LLMs into psychotherapy. Product developers should integrate evidence-based psychotherapy expertise and conduct comprehensive effectiveness and safety evaluations of clinical LLMs.

Executive Summary

Introduction

Related Publications

The Complexities of Race Adjustment in Health Algorithms

Marika Cusick, Glenn Chertow, Douglas Owens, Michelle Williams, Sherri Rose

Sep 26, 2024

Policy Brief

This brief explores the complexities of accounting for race in clinical algorithms for evaluating kidney disease and the implications for tackling deep-seated health inequities.

Policy Brief

The Complexities of Race Adjustment in Health Algorithms

Marika Cusick, Glenn Chertow, Douglas Owens, Michelle Williams, Sherri Rose

HealthcareEthics, Equity, InclusionSep 26

This brief explores the complexities of accounting for race in clinical algorithms for evaluating kidney disease and the implications for tackling deep-seated health inequities.

Pathways to Governing AI Technologies in Healthcare

Caroline Meinhardt, Alaa Youssef, Rory Thompson, Daniel Zhang, Rohini Kosoglu, Kavita Patel

Jul 15, 2024

Explainer

Leading policymakers, academics, healthcare providers, AI developers, and patient advocates discuss the path forward for healthcare AI policy at closed-door workshop.

Explainer

Pathways to Governing AI Technologies in Healthcare

Caroline Meinhardt, Alaa Youssef, Rory Thompson, Daniel Zhang, Rohini Kosoglu, Kavita Patel

HealthcareJul 15

Leading policymakers, academics, healthcare providers, AI developers, and patient advocates discuss the path forward for healthcare AI policy at closed-door workshop.

How Can We Better Regulate Health AI?

Shana Lynch

Jul 15, 2024

Explainer

HAI Associate Director Curt Langlotz explains the current state of health regulation and where we need to move to protect patients and better assist doctors.

Explainer

How Can We Better Regulate Health AI?

Shana Lynch

HealthcareJul 15

HAI Associate Director Curt Langlotz explains the current state of health regulation and where we need to move to protect patients and better assist doctors.

Michelle M. Mello's Testimony Before the U.S. Senate Committee on Finance

Michelle Mello

Feb 08, 2024

Testimony

Michelle M. Mello's Testimony Before the U.S. Senate Committee on Finance

Michelle Mello

HealthcareRegulation, Policy, GovernanceFeb 08

Toward Responsible Development and Evaluation of LLMs in Psychotherapy

Key Takeaways

Executive Summary

Introduction

Johannes Eichstaedt

Related Publications

The Complexities of Race Adjustment in Health Algorithms

The Complexities of Race Adjustment in Health Algorithms

Pathways to Governing AI Technologies in Healthcare

Pathways to Governing AI Technologies in Healthcare

How Can We Better Regulate Health AI?

How Can We Better Regulate Health AI?

Michelle M. Mello's Testimony Before the U.S. Senate Committee on Finance

Michelle M. Mello's Testimony Before the U.S. Senate Committee on Finance

Toward Responsible Development and Evaluation of LLMs in Psychotherapy

Key Takeaways

Executive Summary

Introduction

Johannes Eichstaedt

Related Publications

The Complexities of Race Adjustment in Health Algorithms

The Complexities of Race Adjustment in Health Algorithms

Pathways to Governing AI Technologies in Healthcare

Pathways to Governing AI Technologies in Healthcare

How Can We Better Regulate Health AI?

How Can We Better Regulate Health AI?

Michelle M. Mello's Testimony Before the U.S. Senate Committee on Finance

Michelle M. Mello's Testimony Before the U.S. Senate Committee on Finance