Skip to main content Skip to secondary navigation
Page Content

Student Lightning Talks

During the HAI spring conference Intelligence Augmentation: AI Empowering People to Solve Global Challenges, Stanford students presented their latest research examining how AI can augment critical aspects of medicine, education, and art. Watch their one-minute overviews:

Session I: Healthcare

Kevin Wu

Abstract: The SARS-CoV-2 coronavirus continues to drive a global pandemic, yet there is much we have yet to understand about its molecular functions and pathogenicity. To shed light on these questions using computational analyses, we applied RNA-GPS to SARS-CoV-2’s RNA transcripts. RNA-GPS is a machine learning model we previously developed to predict the localization, or “addressing” of RNA transcripts to distinct sub-cellular neighborhoods and organelles in human cells. RNA molecules are responsible for a plethora of cellular functions ranging from enabling conversion of genetic information into functional proteins, to regulating the abundance of various biomolecules in the cell, and their localization behavior is critical to this functionality. Although RNA-GPS is trained to predict localization of human RNA molecules, the fact that viruses hijack human cellular machinery suggests that RNA-GPS can provide insights for SARS-CoV-2’s mechanisms as well. We most notably reported that RNA-GPS predicts viral transcripts’ affinity for host mitochondria. This hypothesis has since been validated across several experiments. In addition, RNA-GPS’s predicted viral interaction with host mitochondria has been suggested as a key mechanism for immune suppression and viral replication. Overall, our work showcases how machine learning models of biological systems can be used to drive hypothesis generation, streamlining scientific and public health inquiries.

Maya Varma

Abstract: Behavioral and social impairments occur frequently in children with autism spectrum disorder (ASD), giving rise to the possibility of utilizing computer vision techniques to evaluate a child’s social phenotype from home videos. In this study, we use a mobile health application to collect a large video dataset depicting children engaged in gameplay in a natural home environment. We utilize automated data annotations to analyze two social indicators exhibited by children during a 90-second gameplay video: (1) gaze fixation patterns and (2) visual scanning motifs; we then train an LSTM neural network in order to determine if social indicators could be predictive of ASD. Our work identifies one statistically significant region of fixation and one significant gaze transition pattern that differed between our two cohorts during gameplay. In addition, our deep learning model demonstrates mild predictive power in identifying the presence of ASD based on social engagement features. Ultimately, our results demonstrate the utility of game-based mobile health platforms in generating large, unstructured, and natural video datasets; when paired with automated labeling techniques, such systems can provide insights into disease diagnostics. 

Kevin Thomas

AbstractKnee osteoarthritis (OA) is a leading cause of disability in older adults, with no effective treatments currently available and dissatisfaction rates of nearly 20% in patients who undergo joint replacement surgery. To facilitate the development of treatments, there is a need to make disease staging more efficient. We have developed an end-to-end interpretable model that takes full radiographs as input and predicts Kellgren-Lawrence severity scores with state-of-the-art accuracy, performs as well as musculoskeletal radiologists, and does not require manual image preprocessing. Saliency maps suggest the model’s predictions are based on clinically relevant information. We've deployed this model as a web application that clinicians and researchers can use by simply dragging and dropping images into their browser to have them assessed. This has enabled the Hospital for Special Surgery to conduct external validation of the model on clinical images without doing any computer programming.

Adam Lavertu

Abstract: The scale and speed of the COVID-19 pandemic has strained many parts of the national healthcare infrastructure, including communicable disease monitoring and prevention. Many local health departments now receive hundreds or thousands of COVID-19 case reports a day. Many arrive via faxed handwritten forms, often intermingled with other faxes sent to a general fax line, making it difficult to rapidly identify the highest priority cases for outreach and monitoring. We present an AI-based system capable of real-time identification and triage of handwritten faxed COVID-19 forms. The system relies on two models: one model to identify which received pages correspond to case report forms, and a second model to extract information from the set of identified case reports. We evaluated the system on a set of 1,224 faxes received by a local health department over a two-week period. For the 88% of faxes of sufficient quality, the system detects COVID-19 reports with high precision, 0.98, and high recall, 0.91. Among all received COVID-19 faxes, the system identifies high priority cases with a specificity of 0.87, a precision of 0.46 and recall of 0.83. Our system can be adapted to new forms, after a brief training period. Covid Fast Fax can support local health departments in their efforts to control the spread of COVID-19 and limit its impact on the community. 

Session II: Art

Jack Atherton

Abstract: Reality by Example is a tool and meta-environment that allows users to create and populate a virtual world in VR, from within VR. It uses interactive machine learning to enable users to shape terrain, music, and creature sounds and animations by providing examples. The mapping from examples to reality is learned instantaneously from the user's provided examples and updates whenever they change the examples or provide more. The overarching goal is to improve creation methods for amateur creators and enable them to understand VR as a medium for creation, not just content consumption. Ongoing work involves the social dimension and how people can feel a sense of belonging and meaningful self-expression while connecting remotely with others.

Camille Noufi

AbstractIn my lightning talk, I overview the larger goal of my PhD work: understanding how context affects the vocal signal.  I highlight why it is important for voice-base technologies to recognize context and present my approach of fusing research disciplines (psychoacoustics, sociomusicology, linguistics and signal processing) with machine learning in order to explore and parameterize the space representing vocal context.

Miguel Novelo

AbstractInterdisciplinary artist from the peninsula of Yucatan, Mexico whose practice explores the contemporary use of language, the ambiguity of translations, the space in memories, and the aesthetics of miscommunication. Novelo uses new media, interactivity, and expanded cinema to create immersive sound and expanded image experiences that generate platforms for participatory storytelling. His research-based practice relies on scientific papers, internet tutorials, and collaboration to compose new forms of communication and engagement. New media allows him to work in the constant unknown. An immigrant state: decode, simulate, adapt! a never-ending resolution in the relationship of identity and space.

Panos Achlioptas

Abstract: We present a novel large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. In contrast to most existing annotation datasets in computer vision, we focus on the affective experience triggered by visual artworks and ask the annotators to indicate the dominant emotion they feel for a given image and, crucially, to also provide a grounded verbal explanation for their emotion choice. As we demonstrate, this leads to a rich set of signals for both the objective content and the affective impact of an image, creating associations with abstract concepts (e.g., "freedom" or "love"), or references that go beyond what is directly visible, including visual similes and metaphors, or subjective references to personal experiences. We focus on visual art (e.g., paintings, artistic photographs) as it is a prime example of imagery created to elicit emotional responses from its viewers. Our dataset, termed ArtEmis, contains 450K emotion attributions and explanations from humans, on 80K artworks from WikiArt. Building on this data, we train and demonstrate a series of captioning systems capable of expressing and explaining emotions from visual stimuli. Remarkably, the captions produced by these systems often succeed in reflecting the semantic and abstract content of the image, going well beyond systems trained on existing datasets. The collected dataset and developed methods are available at

Session III: Education

Griffin Dietz

Abstract: Computational thinking (CT) education reaches only a fraction of young children, in part because CT learning tools often require expensive hardware or fluent literacy. Informed by needfinding interviews, we developed a smartphone application leveraging natural language processing to introduces computing concepts to preliterate children through storytelling games. The app includes two storytelling games where users create and listen to stories as well as four CT games where users then modify those stories to learn about sequences, loops, events, and variables. We improved upon the app design through wizard-of-oz testing (N = 28) and iterative design testing (N = 22) before conducting an evaluation study (N = 22). Children were successfully able to navigate the app, effectively learn about the target computing concepts, and, after using the app, children demonstrated above-chance performance on a near transfer CT concept recognition task. Our results demonstrate that we can use AI to integrate a conversational voice interface with creative activities to effectively introduce computing concepts to children in a manner that is accessible, approachable, and engaging.

Eric Rawn

AbstractQuickpose is a novel system for interactive and exploratory version control for artistic programming. Built for the creative programming language Processing, Quickpose visualizes a project version history as a directed graph, enabling users to easily navigate across versions, create new branches, and view a version history at-a-glance. By allowing artists to easily work on multiple, branching, and concurrent artistic ideas in a single project, we hope that Quickpose enables artists to utilize version control as an integral part of the artistic process, affording greater exploratory and iterative capabilities than traditional version control. Of interest to the HAI audience, Quickpose was originally conceived as an AI tool: automatically saving a new version when significant progress had been made. Especially in the Arts, understanding whether AI is the correct solution to a user-centered problem is often non-trivial, and requires careful attention and experimentation. Quickpose is then a case study for a design process which found an AI solution unnecessary to enabling a novel software interaction.

Sherry Ruan

AbstractIn the well-known two sigma problem introduced in 1984, Bloom found that students tutored by a one-on-one expert tutor achieved a learning outcome two standard deviations higher than those taught using traditional classroom methods. Since one-on-one tutoring is too costly to scale up to the majority of students, technology-based solutions have been suggested as promising solutions to simulate one-on-one human tutoring experiences. However, current automated computer-based tutors still primarily consist of learning activities with limited interactivity such as multiple-choice questions, review-and-flip flash cards, and listen-and-repeat practices. These tutors tend to be unengaging and thus their effectiveness typically relies on students’ desire to learn. With recent advances in artificial intelligence, we now have the potential to create conversation-based tutoring systems with the ability to provide personalized feedback to make learning more engaging and effective and eventually help bridge the gap between one-on-one human tutoring and computer-based tutoring.  In this lightning talk, I present the design, development, and testing of the Smart Primer: a smart conversational tutor with embedded narrative stories to supplement elementary school students’ math learning. The Smart Primer was implemented in two ways: the first version was implemented using Wizard-of-Oz techniques and the second was powered by online reinforcement learning algorithms. I conducted human evaluations with over 400 students using these two versions to better understand how humans interact with AI in these educational systems. Our results show that, compared to traditional learning systems, conversation-based tutoring systems that leverage new natural language processing and reinforcement learning techniques to provide adaptive feedback can engage students more and improve student learning outcomes.