Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.
Sign Up For Latest News
The science behind AI romances; plus the benefits and risks for mental health. A Stanford HAI study shows that because AI companions can provide unlimited affirmation and interaction, they may create unrealistic expectations for relationships.
The science behind AI romances; plus the benefits and risks for mental health. A Stanford HAI study shows that because AI companions can provide unlimited affirmation and interaction, they may create unrealistic expectations for relationships.
Abstract
Background: Digital phenotyping has seen a broad increase in application across clinical research; however, little research has implemented passive assessment approaches for suicide risk detection. There is a significant potential for a novel form of digital phenotyping, termed screenomics, which captures smartphone activity via screenshots.
Objective: This paper focuses on a comprehensive case review of 2 participants who reported past 1-month active suicidal ideation, detailing their passive (ie, obtained via screenomics screenshot capture) and active (ie, obtained via ecological momentary assessment [EMA]) risk profiles that culminated in suicidal crises and subsequent psychiatric hospitalizations. Through this analysis, we shed light on the timescale of risk processes as they unfold before hospitalization, as well as introduce the novel application of screenomics within the field of suicide research.
Methods: To underscore the potential benefits of screenomics in comprehending suicide risk, the analysis concentrates on a specific type of data gleaned from screenshots—text—captured prior to hospitalization, alongside self-reported EMA responses. Following a comprehensive baseline assessment, participants completed an intensive time sampling period. During this period, screenshots were collected every 5 seconds while one’s phone was in use for 35 days, and EMA data were collected 6 times a day for 28 days. In our analysis, we focus on the following: suicide-related content (obtained via screenshots and EMA), risk factors theoretically and empirically relevant to suicide risk (obtained via screenshots and EMA), and social content (obtained via screenshots).
Results: Our analysis revealed several key findings. First, there was a notable decrease in EMA compliance during suicidal crises, with both participants completing fewer EMAs in the days prior to hospitalization. This contrasted with an overall increase in phone usage leading up to hospitalization, which was particularly marked by heightened social use. Screenomics also captured prominent precipitating factors in each instance of suicidal crisis that were not well detected via self-report, specifically physical pain and loneliness.
Conclusions: Our preliminary findings underscore the potential of passively collected data in understanding and predicting suicidal crises. The vast number of screenshots from each participant offers a granular look into their daily digital interactions, shedding light on novel risks not captured via self-report alone. When combined with EMA assessments, screenomics provides a more comprehensive view of an individual’s psychological processes in the time leading up to a suicidal crisis.
Abstract
Background: Digital phenotyping has seen a broad increase in application across clinical research; however, little research has implemented passive assessment approaches for suicide risk detection. There is a significant potential for a novel form of digital phenotyping, termed screenomics, which captures smartphone activity via screenshots.
Objective: This paper focuses on a comprehensive case review of 2 participants who reported past 1-month active suicidal ideation, detailing their passive (ie, obtained via screenomics screenshot capture) and active (ie, obtained via ecological momentary assessment [EMA]) risk profiles that culminated in suicidal crises and subsequent psychiatric hospitalizations. Through this analysis, we shed light on the timescale of risk processes as they unfold before hospitalization, as well as introduce the novel application of screenomics within the field of suicide research.
Methods: To underscore the potential benefits of screenomics in comprehending suicide risk, the analysis concentrates on a specific type of data gleaned from screenshots—text—captured prior to hospitalization, alongside self-reported EMA responses. Following a comprehensive baseline assessment, participants completed an intensive time sampling period. During this period, screenshots were collected every 5 seconds while one’s phone was in use for 35 days, and EMA data were collected 6 times a day for 28 days. In our analysis, we focus on the following: suicide-related content (obtained via screenshots and EMA), risk factors theoretically and empirically relevant to suicide risk (obtained via screenshots and EMA), and social content (obtained via screenshots).
Results: Our analysis revealed several key findings. First, there was a notable decrease in EMA compliance during suicidal crises, with both participants completing fewer EMAs in the days prior to hospitalization. This contrasted with an overall increase in phone usage leading up to hospitalization, which was particularly marked by heightened social use. Screenomics also captured prominent precipitating factors in each instance of suicidal crisis that were not well detected via self-report, specifically physical pain and loneliness.
Conclusions: Our preliminary findings underscore the potential of passively collected data in understanding and predicting suicidal crises. The vast number of screenshots from each participant offers a granular look into their daily digital interactions, shedding light on novel risks not captured via self-report alone. When combined with EMA assessments, screenomics provides a more comprehensive view of an individual’s psychological processes in the time leading up to a suicidal crisis.


Synthetic brain MRI technology is supercharging computational neuroscience with massive data.
Synthetic brain MRI technology is supercharging computational neuroscience with massive data.

As Large Language Models (LLMs) become increasingly integrated into our everyday lives, understanding their ability to comprehend human mental states becomes critical for ensuring effective interactions. However, despite the recent attempts to assess the Theory-of-Mind (ToM) reasoning capabilities of LLMs, the degree to which these models can align with human ToM remains a nuanced topic of exploration. This is primarily due to two distinct challenges: (1) the presence of inconsistent results from previous evaluations, and (2) concerns surrounding the validity of existing evaluation methodologies. To address these challenges, we present a novel framework for procedurally generating evaluations with LLMs by populating causal templates. Using our framework, we create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5,000 model-written evaluations. We find that human participants rate the quality of our benchmark higher than previous crowd-sourced evaluations and comparable to expert-written evaluations. Using BigToM, we evaluate the social reasoning capabilities of a variety of LLMs and compare model performances with human performance. Our results suggest that GPT4 has ToM capabilities that mirror human inference patterns, though less reliable, while other LLMs struggle.
As Large Language Models (LLMs) become increasingly integrated into our everyday lives, understanding their ability to comprehend human mental states becomes critical for ensuring effective interactions. However, despite the recent attempts to assess the Theory-of-Mind (ToM) reasoning capabilities of LLMs, the degree to which these models can align with human ToM remains a nuanced topic of exploration. This is primarily due to two distinct challenges: (1) the presence of inconsistent results from previous evaluations, and (2) concerns surrounding the validity of existing evaluation methodologies. To address these challenges, we present a novel framework for procedurally generating evaluations with LLMs by populating causal templates. Using our framework, we create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5,000 model-written evaluations. We find that human participants rate the quality of our benchmark higher than previous crowd-sourced evaluations and comparable to expert-written evaluations. Using BigToM, we evaluate the social reasoning capabilities of a variety of LLMs and compare model performances with human performance. Our results suggest that GPT4 has ToM capabilities that mirror human inference patterns, though less reliable, while other LLMs struggle.

