Hoffman-Yee Research Grants

Status

Closed

Date

Call for proposals will open in Winter 2025

Topics

Healthcare

Six Stanford research teams have received funding to solve some of the most challenging problems in AI.

Tuning Our Algorithmic Amplifiers: Encoding Societal Values into Social Media Algorithms

MARPLE: Explaining what happened through multi-modal simulation

Foundation Models: Integrating Technical Advances, Social Responsibility, and Applications

EAE Scores: A Framework for Explainable, Actionable and Equitable Risk Scores for Healthcare Decisions

Matching Newcomers To Places: Leveraging Human-Centered AI to Improve Immigrant Integration

Dendritic Computation for Knowledge Systems

Link copied to clipboard!

Stanford HAI Awards $2.75M in Hoffman-Yee Grants
Shana Lynch
Aug 18
announcement
This year’s winners propose innovative, bold ideas pushing the boundaries of artificial intelligence.
Stanford HAI Announces Hoffman-Yee Grants Recipients for 2024
Nikki Goth Itoi
Aug 21
announcement
Six interdisciplinary research teams received a total of $3 million to pursue groundbreaking ideas in the field of AI.
Stanford HAI Announces Four Hoffman-Yee Grantees
Shana Lynch
Oct 25
announcement
The second round of funding will sponsor teams that leverage AI to focus on real-world problems in health care, education, and society.
Hoffman-Yee Symposium
conferenceSep 21
September
21
2021
2023 Hoffman-Yee Symposium
conferenceSep 199:00 AM - 5:30 PM
September
19
2023
Four Research Teams Awarded New Hoffman-Yee Grant Funding
Nov 13
announcement
This year's research spans foundation models, health care algorithms, social values in social media, and improved chip technology.
DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines
Matei Zaharia, Omar Khattab, Christopher Potts
Jan 16
Research
The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded “prompt templates”, i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, or imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric, by creating and collecting demonstrations. We conduct two case studies, showing that succinct DSPy programs can express and optimize pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, DSPy can automatically produce pipelines that outperform out-of-the-box few-shot prompting as well as expert-created demonstrations for GPT-3.5 and Llama2-13b-chat. On top of that, DSPy programs compiled for relatively small LMs like 770M parameter T5 and Llama2-13b-chat are competitive with many approaches that rely on large and proprietary LMs like GPT-3.5 and on expert-written prompt chains. DSPy is available at https://github.com/stanfordnlp/dspy
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
Matei Zaharia, Omar Khattab, David Broman, Josh Purtell, Michael J Ryan, Krista Opsahl-Ong, Christopher Potts
Nov 14
Research
Language Model Programs, i.e. sophisticated pipelines of modular language model (LM) calls, are increasingly advancing NLP tasks, but they require crafting prompts that are jointly effective for all modules. We study prompt optimization for LM programs, i.e. how to update these prompts to maximize a downstream metric without access to module-level labels or gradients. To make this tractable, we factorize our problem into optimizing the free-form instructions and few-shot demonstrations of every module and introduce several strategies to craft task-grounded instructions and navigate credit assignment across modules. Our strategies include (i) program- and data-aware techniques for proposing effective instructions, (ii) a stochastic mini-batch evaluation function for learning a surrogate model of our objective, and (iii) a meta-optimization procedure in which we refine how LMs construct proposals over time. Using these insights we develop MIPRO, a novel algorithm for optimizing LM programs. MIPRO outperforms baseline optimizers on five of seven diverse multi-stage LM programs using a best-in-class open-source model (Llama-3-8B), by as high as 13% accuracy. We have released our new optimizers and benchmark in DSPy at [http://dspy.ai](http://dspy.ai).
Equitable Implementation of a Precision Digital Health Program for Glucose Management in Individuals with Newly Diagnosed Type 1 Diabetes
Ananta Addala, Franziska K Bishop, Korey Hood, Ming Yeh Lee, Victoria Y Ding, Priya Prahalad, Dessi P Zaharieva, Johannes Ferstad, Manisha Desai, David Scheinker, Ramesh Johari, David Maahs
Jul 30
Research
Few young people with type 1 diabetes (T1D) meet glucose targets. Continuous glucose monitoring improves glycemia, but access is not equitable. We prospectively assessed the impact of a systematic and equitable digital-health-team-based care program implementing tighter glucose targets (HbA1c < 7%), early technology use (continuous glucose monitoring starts <1 month after diagnosis) and remote patient monitoring on glycemia in young people with newly diagnosed T1D enrolled in the Teamwork, Targets, Technology, and Tight Control (4T Study 1). Primary outcome was HbA1c change from 4 to 12 months after diagnosis; the secondary outcome was achieving the HbA1c targets. The 4T Study 1 cohort (36.8% Hispanic and 35.3% publicly insured) had a mean HbA1c of 6.58%, 64% with HbA1c < 7% and mean time in the range (70-180 mg dl-1) of 68% at 1 year after diagnosis. Clinical implementation of the 4T Study 1 met the prespecified primary outcome and improved glycemia without unexpected serious adverse events. The strategies in the 4T Study 1 can be used to implement systematic and equitable care for individuals with T1D and translate to care for other chronic diseases.
Smart Start—Designing Powerful Clinical Trials Using Pilot Study Data
Emily Fox, Priya Prahalad, Dessi P Zaharieva, Johannes Ferstad, Manisha Desai, David Scheinker, Ramesh Johari, David Maahs
Jan 22
Research
BACKGROUND
Digital health interventions may be optimized before evaluation in a randomized clinical trial. Although many digital health interventions are deployed in pilot studies, the data collected are rarely used to refine the intervention and the subsequent clinical trials.
METHODS
We leverage natural variation in patients eligible for a digital health intervention in a remote patient-monitoring pilot study to design and compare interventions for a subsequent randomized clinical trial.
RESULTS
Our approach leverages patient heterogeneity to identify an intervention with twice the estimated effect size of an unoptimized intervention.
CONCLUSIONS
Optimizing an intervention and clinical trial based on pilot data may improve efficacy and increase the probability of success. (Funded by the National Institutes of Health and others; ClinicalTrials.gov number, NCT04336969.)
Internal Fractures: The Competing Logics of Social Media Platforms
Jeanne Tsai, Chenyan Jia, Chunchen Xu, Jeffrey Hancock, Michael Bernstein, Angèle Christin
Aug 21
Research
Social media platforms are too often understood as monoliths with clear priorities. Instead, we analyze them as complex organizations torn between starkly different justifications of their missions. Focusing on the case of Meta, we inductively analyze the company’s public materials and identify three evaluative logics that shape the platform’s decisions: an engagement logic, a public debate logic, and a wellbeing logic. There are clear trade-offs between these logics, which often result in internal conflicts between teams and departments in charge of these different priorities. We examine recent examples showing how Meta rotates between logics in its decision-making, though the goal of engagement dominates in internal negotiations. We outline how this framework can be applied to other social media platforms such as TikTok, Reddit, and X. We discuss the ramifications of our findings for the study of online harms, exclusion, and extraction.
Measuring receptivity to misinformation at scale on a social media platform
Nathaniel Persily, Christopher K Tokita, Jonathan Nagler, Joshua A Tucker, Kevin Aslett, Richard Bonneau, William P Godel, Zeve Sanderson
Sep 10
Research
Measuring the impact of online misinformation is challenging. Traditional measures, such as user views or shares on social media, are incomplete because not everyone who is exposed to misinformation is equally likely to believe it. To address this issue, we developed a method that combines survey data with observational Twitter data to probabilistically estimate the number of users both exposed to and likely to believe a specific news story. As a proof of concept, we applied this method to 139 viral news articles and find that although false news reaches an audience with diverse political views, users who are both exposed and receptive to believing false news tend to have more extreme ideologies. These receptive users are also more likely to encounter misinformation earlier than those who are unlikely to believe it. This mismatch between overall user exposure and receptive user exposure underscores the limitation of relying solely on exposure or interaction data to measure the impact of misinformation, as well as the challenge of implementing effective interventions. To demonstrate how our approach can address this challenge, we then conducted data-driven simulations of common interventions used by social media platforms. We find that these interventions are only modestly effective at reducing exposure among users likely to believe misinformation, and their effectiveness quickly diminishes unless implemented soon after misinformation’s initial spread. Our paper provides a more precise estimate of misinformation’s impact by focusing on the exposure of users likely to believe it, offering insights for effective mitigation strategies on social media.
ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning
Chethan Anand Bhateja, Joey Hejna, Karl Pertsch, Yichen Jiang, Dorsa Sadigh
Sep 05
Research
Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.

researchGrant

Hoffman-Yee Research Grants

Status

Closed

Date

Call for proposals will open in Winter 2025

Topics

Healthcare

Six Stanford research teams have received funding to solve some of the most challenging problems in AI.

Tuning Our Algorithmic Amplifiers: Encoding Societal Values into Social Media Algorithms

Artificial intelligence algorithms underpin social media, influencing everything from feed ranking to moderation to disinformation classification. These algorithms maximize each user's individual experience—as predicted through likes, retweets, and other behavioral data—which can harm societal values such as wellbeing, social capital, mitigating harm to minoritized groups, democracy, and maintaining pro-social norms. How can we encode societal values into these algorithms without sacrificing the core of what can make social media compelling? Our project will develop intertwined social scientific, engineering, and policy answers to these questions. Social scientific research will help us understand the societal values at play, how the algorithms themselves influence those values or how they might be creating feedback loops that undercut such values. Engineering research will develop new participatory models for collective determination of how to embed these societal values in the social media AI (e.g., feed ranking), how to measure the impact of AI decisions on these values from sparse observable data, and how to concretely embed these (potentially conflicting) values into the AIs. Policy proposals will articulate how the societal values in such algorithms ought to be decided upon, and the kinds of regulation and oversight that social media algorithms ought to have. Underlying each of these threads is a measurement challenge at scale: so, we will recruit a large participant panel that reaches across political, gender, racial, and cultural identities. These consented participants will become a longitudinal panel for interviews, surveys, data collection, and evaluation of the interventions that we develop, enabling a scale of measurement and testing that is typically out of reach for research. Through this work, we hope to paint a future where social media AIs aid us in achieving our societal goals rather than undermine them.

NAME	ROLE	SCHOOL	DEPARTMENTS
Michael Bernstein	Main PI	Engineering	Computer Science
Angele Christin	Co-PI	Humanities and Sciences	Communication
Jeffrey Hancock	Co-PI	Humanities and Sciences	Communication
Tatsunori Hashimoto	Co-PI	Engineering	Computer Science
Nathaniel Persily	Co-PI	Law School	Law School
Jeanne Tsai	Co-PI	Humanities and Sciences	Psychology
Johan Ugander	Co-PI	Engineering	Management Science and Engineering

MARPLE: Explaining what happened through multi-modal simulation

Humans have a remarkable ability to figure out what happened. From a puddle of milk, we can infer that our roommate must have forgotten to close the fridge, that the milk toppled over, and splashed on the floor. When humans infer what happened, they combine evidence from multiple modalities to do so. For example, jury members are often presented with a large variety of pieces of evidence that may include images of the crime scene, surveillance videos, audio recordings, and various kinds of testimony from different witnesses. The jury member’s task is to take in all these sources of information and get at the truth of what happened. Current AI systems fail to match the inferential capacities of humans. While great strides have been made in developing models that understand and produce language, as well as models that process visual input, we believe that a key component is missing: we need AI systems that integrate different sources of evidence into a causal model of the world.

In this project, we will take major steps to bridge the gap between vision and language models in AI. We will develop MARPLE (named after the detective Miss Marple) – a computational framework that combines evidence from vision, audio, and language to produce human-understandable explanations of what happened. Current research in cognitive science shows that the human capacity to draw flexible inferences about the physical world, and about each other, is best explained by assuming that people construct mental causal models of the domain, and that they use these models to simulate different counterfactual possibilities. To give explanations of what happened, and to say what caused what, the capacity to go beyond what actually happened and simulate alternative possibilities is critical. An AI system capable of producing explanations from multiple sources of evidence has enormous potential impact. It will improve home assistants, enable meaningful video analysis, support legal fact-finders in court, and help advance our understanding of human inference.

NAME	ROLE	SCHOOL	DEPARTMENTS
Tobias Gerstenberg	Main PI	Humanities and Sciences	Psychology
Chelsea Finn	Co-PI	Engineering	Computer Science
Noah Goodman	Co-PI	Humanities and Sciences	Psychology
Thomas Icard	Co-PI	Humanities and Sciences	Philosophy
Robert MacCoun	Co-PI	Law School	Law School
Jiajun Wu	Co-PI	Engineering	Computer Science

Foundation Models: Integrating Technical Advances, Social Responsibility, and Applications

We are entering a new era of artificial intelligence driven by foundation models such as GPT-3, which are trained on broad data (at immense scale using self-supervision) and can be adapted to a wide range of downstream tasks. These models demonstrate strong capabilities, learning rich representations of data and being able to generate human-quality text and images, and even exhibiting emergent phenomena such as in-context learning. Moreover, they represent a paradigm shift in how AI operates: a huge amount of resources are pooled into large-scale data collection and training of foundation models (like infrastructure), which then provide an indispensable resource for almost any downstream application.

At the same time, foundation models are still in their infancy. They are technically immature and poorly-understood. At the same time, given the immense commercial incentives to deploy them, they pose new social risks that must be studied and managed. Finally, they have so far been applied to popular Internet applications that the companies developing foundation models care about. However, there is a rich array of other applications across many fields such as law and medicine that could benefit from foundation models while posing new research challenges.

To address these deficiencies of foundation models, we propose improving the technical capabilities while being sensitive to social responsibility while staying grounded to real-world applications. We have assembled a diverse team with deep multidisciplinary expertise across many areas such as machine learning, law, political science, biomedicine, vision, and robotics. This team has already demonstrated the ability to work together to produce the 200+ page report on foundation models. We plan to improve our understanding of training objectives, study the role of data on model behavior, develop novel model architectures based on structured state space models, diffusion, and retrieval. We will investigate privacy and intellectual property implications, the effect of homogenization, develop frameworks for recourse when downstream systems fail. Finally, we will leverage foundation models in applications in biomedicine, law, and robotics. Overall, we believe this multi-faceted, integrated approach will be key to improving the foundations of powerful future AI systems.

NAME	ROLE	SCHOOL	DEPARTMENTS
Percy Liang	Main PI	Engineering	Computer Science
Russ B. Altman	Co-PI	Engineering	Bioengineering
Jeannette Bohg	Co-PI	Engineering	Computer Science
Akshay Chaudhari	Co-PI	Medicine	Radiology
Chelsea Finn	Co-PI	Engineering	Computer Science
Tatsunori Hashimoto	Co-PI	Engineering	Computer Science
Dan E. Ho	Co-PI	Law	Law School
Fei-Fei Li	Co-PI	Engineering	Computer Science
Tengyu Ma	Co-PI	Engineering	Computer Science
Christopher Manning	Co-PI	Engineering	Computer Science
Christopher Re	Co-PI	Engineering	Computer Science
Rob Reich	Co-PI	Humanities and Sciences	Political Science
Dorsa Sadigh	Co-PI	Engineering	Computer Science
Matei Zaharia	Co-PI	Engineering	Computer Science

EAE Scores: A Framework for Explainable, Actionable and Equitable Risk Scores for Healthcare Decisions

Many clinical systems rely on risk stratification of patients to guide care and select interventions. For example, risk scores may be calculated for everything from cardiovascular outcomes to hospital readmission. Three clinical settings we consider are remote monitoring of type 1 diabetes (T1D) patients, opioid overdose, and seizure prediction. Traditionally, the cycle of assessing risk and treating patients is informed by clinical training relying on simple decision rules, such as whether a patient’s blood glucose met target metrics. AI provides the potential to transform this process through data-driven risk stratification and personalized intervention strategies. However, as a clinical decision support system, such methods must be (i) explainable – providing factors that meaningfully contribute to a clinician’s reasoning, (ii) actionable – leading to insights directly informing intervention decisions, and (iii) equitable – ensuring the scores neither perpetuate patterns of inequality nor induce negative feedback loops.

In the proposed work, we will create EAE Scores, a framework for developing explainable, actionable, and equitable risk scores for healthcare decisions. EAE Scores will both produce new forms of introspection through explainability, and enable providers to close-the-loop between their knowledge of the intervention decisions and the AI’s inferences. Furthermore, EAE Scores will provide a systematic approach to incorporate equitable decisions in every step of the development process. The proposed outcomes of the work are threefold: (i) general AI algorithms and methods, which have applicability beyond our clinical settings, (ii) robust open-source tools, allowing others to create and deploy more explainable, actionable and equitable risk scores, and (iii) direct improvements in clinical outcomes for T1D, epilepsy and opioid overdose risk.

NAME	ROLE	SCHOOL	DEPARTMENTS
Carlos Ernesto Guestrin	Main PI	Engineering	Computer Science
Carissa Carter	Co-PI	Engineering	d.school
Emily Fox	Co-PI	Humanities and Sciences	Statistics, Computer Science (courtesy)
Ramesh Johari	Co-PI	Engineering	Management Science and Engineering, Electrical Engineering (courtesy), Computer Science (courtesy)
David Maahs	Co-PI	Medicine	Pediatrics
Priya Prahalad	Co-PI	Medicine	Pediatrics
Sherri Rose	Co-PI	Medicine	Health Policy
David Scheinker	Co-PI	Medicine	Pediatrics

Matching Newcomers To Places: Leveraging Human-Centered AI to Improve Immigrant Integration

The place where immigrants settle within a host country has a powerful impact on their lives. This destination can be a stepping stone and provide opportunities to find employment, maximize earnings, learn the host country language, and access services such as education and healthcare. Location decisions therefore not only affect immigrants themselves; they also shape immigrants’ contributions to the local economy and society. This project seeks to develop and test data-driven matching tools (called GeoMatch) for location decision-makers—both governments and immigrants themselves—that generate personalized location recommendations, leveraging insights from historical data and human-centered AI. The goal is to advance both the theoretical and empirical frontiers of algorithmic matching for newcomers. On the theoretical front, our interdisciplinary team of faculty experts will tackle problems at the intersection of estimation and prediction, algorithms and mechanism design, human-AI interaction, and immigrant integration. On the empirical front, we plan to conduct pilot tests via randomized controlled trials on the use of GeoMatch in collaboration with partners.

NAME	ROLE	SCHOOL	DEPARTMENTS
Jens Hainmueller	Main PI	Humanities and Sciences	Political Science
Avidit Acharya	Co-PI	Humanities and Sciences	Political Science
Yonatan Gur	Co-PI	Graduate School of Business	Graduate School of Business
Tomas Jimenez	Co-PI	Humanities and Sciences	Sociology
Dominik Rothenhaeusler	Co-PI	Humanities and Sciences	Statistics

Dendritic Computation for Knowledge Systems

Artificial Intelligence (AI) now advances by multiplying twice as many floating-point numbers every two months, but the semiconductor industry tiles twice as many digital multipliers on a chip every two years. Consequently, users must access advanced AI through the cloud, which houses tens of thousands of chips and consumes about 20 megawatts of electricity, enough to power 16,000 homes. We aim to exchange digital multipliers tiled in 2-D for dendrite-like nanodevices integrated in 3-D by moving away from learning with synapses to learning with dendrites. This dendrocentric reconception of the brain promises datacenter performance for smartphone energy-budget. That would reign in AI’s unsustainable energy, carbon, and monetary costs, distribute its productivity gains equitably, transform its users’ experience, and restore their privacy.

NAME	ROLE	SCHOOL	DEPARTMENTS
Kwabena Boahen	Co-PI	Engineering	Bioengineering
Scott W Linderman	Co-PI	Humanities and Sciences	Statistics
H.-S. Philip Wong	Co-PI	Engineering	Electrical Engineering
Matei Zaharia	Co-PI	Engineering	Computer Science

Link copied to clipboard!

Stanford HAI Awards $2.75M in Hoffman-Yee Grants
Shana Lynch
Aug 18
announcement
This year’s winners propose innovative, bold ideas pushing the boundaries of artificial intelligence.
Stanford HAI Announces Hoffman-Yee Grants Recipients for 2024
Nikki Goth Itoi
Aug 21
announcement
Six interdisciplinary research teams received a total of $3 million to pursue groundbreaking ideas in the field of AI.
Stanford HAI Announces Four Hoffman-Yee Grantees
Shana Lynch
Oct 25
announcement
The second round of funding will sponsor teams that leverage AI to focus on real-world problems in health care, education, and society.
Hoffman-Yee Symposium
conferenceSep 21
September
21
2021
2023 Hoffman-Yee Symposium
conferenceSep 199:00 AM - 5:30 PM
September
19
2023
Four Research Teams Awarded New Hoffman-Yee Grant Funding
Nov 13
announcement
This year's research spans foundation models, health care algorithms, social values in social media, and improved chip technology.
DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines
Matei Zaharia, Omar Khattab, Christopher Potts
Jan 16
Research
The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded “prompt templates”, i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, or imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric, by creating and collecting demonstrations. We conduct two case studies, showing that succinct DSPy programs can express and optimize pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, DSPy can automatically produce pipelines that outperform out-of-the-box few-shot prompting as well as expert-created demonstrations for GPT-3.5 and Llama2-13b-chat. On top of that, DSPy programs compiled for relatively small LMs like 770M parameter T5 and Llama2-13b-chat are competitive with many approaches that rely on large and proprietary LMs like GPT-3.5 and on expert-written prompt chains. DSPy is available at https://github.com/stanfordnlp/dspy
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
Matei Zaharia, Omar Khattab, David Broman, Josh Purtell, Michael J Ryan, Krista Opsahl-Ong, Christopher Potts
Nov 14
Research
Language Model Programs, i.e. sophisticated pipelines of modular language model (LM) calls, are increasingly advancing NLP tasks, but they require crafting prompts that are jointly effective for all modules. We study prompt optimization for LM programs, i.e. how to update these prompts to maximize a downstream metric without access to module-level labels or gradients. To make this tractable, we factorize our problem into optimizing the free-form instructions and few-shot demonstrations of every module and introduce several strategies to craft task-grounded instructions and navigate credit assignment across modules. Our strategies include (i) program- and data-aware techniques for proposing effective instructions, (ii) a stochastic mini-batch evaluation function for learning a surrogate model of our objective, and (iii) a meta-optimization procedure in which we refine how LMs construct proposals over time. Using these insights we develop MIPRO, a novel algorithm for optimizing LM programs. MIPRO outperforms baseline optimizers on five of seven diverse multi-stage LM programs using a best-in-class open-source model (Llama-3-8B), by as high as 13% accuracy. We have released our new optimizers and benchmark in DSPy at [http://dspy.ai](http://dspy.ai).
Equitable Implementation of a Precision Digital Health Program for Glucose Management in Individuals with Newly Diagnosed Type 1 Diabetes
Ananta Addala, Franziska K Bishop, Korey Hood, Ming Yeh Lee, Victoria Y Ding, Priya Prahalad, Dessi P Zaharieva, Johannes Ferstad, Manisha Desai, David Scheinker, Ramesh Johari, David Maahs
Jul 30
Research
Few young people with type 1 diabetes (T1D) meet glucose targets. Continuous glucose monitoring improves glycemia, but access is not equitable. We prospectively assessed the impact of a systematic and equitable digital-health-team-based care program implementing tighter glucose targets (HbA1c < 7%), early technology use (continuous glucose monitoring starts <1 month after diagnosis) and remote patient monitoring on glycemia in young people with newly diagnosed T1D enrolled in the Teamwork, Targets, Technology, and Tight Control (4T Study 1). Primary outcome was HbA1c change from 4 to 12 months after diagnosis; the secondary outcome was achieving the HbA1c targets. The 4T Study 1 cohort (36.8% Hispanic and 35.3% publicly insured) had a mean HbA1c of 6.58%, 64% with HbA1c < 7% and mean time in the range (70-180 mg dl-1) of 68% at 1 year after diagnosis. Clinical implementation of the 4T Study 1 met the prespecified primary outcome and improved glycemia without unexpected serious adverse events. The strategies in the 4T Study 1 can be used to implement systematic and equitable care for individuals with T1D and translate to care for other chronic diseases.
Smart Start—Designing Powerful Clinical Trials Using Pilot Study Data
Emily Fox, Priya Prahalad, Dessi P Zaharieva, Johannes Ferstad, Manisha Desai, David Scheinker, Ramesh Johari, David Maahs
Jan 22
Research
BACKGROUND
Digital health interventions may be optimized before evaluation in a randomized clinical trial. Although many digital health interventions are deployed in pilot studies, the data collected are rarely used to refine the intervention and the subsequent clinical trials.
METHODS
We leverage natural variation in patients eligible for a digital health intervention in a remote patient-monitoring pilot study to design and compare interventions for a subsequent randomized clinical trial.
RESULTS
Our approach leverages patient heterogeneity to identify an intervention with twice the estimated effect size of an unoptimized intervention.
CONCLUSIONS
Optimizing an intervention and clinical trial based on pilot data may improve efficacy and increase the probability of success. (Funded by the National Institutes of Health and others; ClinicalTrials.gov number, NCT04336969.)
Internal Fractures: The Competing Logics of Social Media Platforms
Jeanne Tsai, Chenyan Jia, Chunchen Xu, Jeffrey Hancock, Michael Bernstein, Angèle Christin
Aug 21
Research
Social media platforms are too often understood as monoliths with clear priorities. Instead, we analyze them as complex organizations torn between starkly different justifications of their missions. Focusing on the case of Meta, we inductively analyze the company’s public materials and identify three evaluative logics that shape the platform’s decisions: an engagement logic, a public debate logic, and a wellbeing logic. There are clear trade-offs between these logics, which often result in internal conflicts between teams and departments in charge of these different priorities. We examine recent examples showing how Meta rotates between logics in its decision-making, though the goal of engagement dominates in internal negotiations. We outline how this framework can be applied to other social media platforms such as TikTok, Reddit, and X. We discuss the ramifications of our findings for the study of online harms, exclusion, and extraction.
Measuring receptivity to misinformation at scale on a social media platform
Nathaniel Persily, Christopher K Tokita, Jonathan Nagler, Joshua A Tucker, Kevin Aslett, Richard Bonneau, William P Godel, Zeve Sanderson
Sep 10
Research
Measuring the impact of online misinformation is challenging. Traditional measures, such as user views or shares on social media, are incomplete because not everyone who is exposed to misinformation is equally likely to believe it. To address this issue, we developed a method that combines survey data with observational Twitter data to probabilistically estimate the number of users both exposed to and likely to believe a specific news story. As a proof of concept, we applied this method to 139 viral news articles and find that although false news reaches an audience with diverse political views, users who are both exposed and receptive to believing false news tend to have more extreme ideologies. These receptive users are also more likely to encounter misinformation earlier than those who are unlikely to believe it. This mismatch between overall user exposure and receptive user exposure underscores the limitation of relying solely on exposure or interaction data to measure the impact of misinformation, as well as the challenge of implementing effective interventions. To demonstrate how our approach can address this challenge, we then conducted data-driven simulations of common interventions used by social media platforms. We find that these interventions are only modestly effective at reducing exposure among users likely to believe misinformation, and their effectiveness quickly diminishes unless implemented soon after misinformation’s initial spread. Our paper provides a more precise estimate of misinformation’s impact by focusing on the exposure of users likely to believe it, offering insights for effective mitigation strategies on social media.
ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning
Chethan Anand Bhateja, Joey Hejna, Karl Pertsch, Yichen Jiang, Dorsa Sadigh
Sep 05
Research
Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.