Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.
Sign Up For Latest News
The widepread deployment of AI systems in critical domains demands more rigorous approaches to evaluating their capabilities and safety.
The widepread deployment of AI systems in critical domains demands more rigorous approaches to evaluating their capabilities and safety.
2025 Spring Conference
In an era when information is treated as a form of power and self-knowledge an unqualified good, the value of what remains unknown is often overlooked.
In an era when information is treated as a form of power and self-knowledge an unqualified good, the value of what remains unknown is often overlooked.
Artificial intelligence (AI) has enormous potential for both positive and negative impact, especially as we move from current-day systems towards more capable systems in the future. However, as a society we lack an understanding of how the developers of this technology, AI researchers, perceive the benefits and risks of their work, both in today's systems and impacts in the future. In this talk, Gates will present results from over 70 interviews with AI researchers, asking questions ranging from "What do you think are the largest benefits and risks of AI?" to "If you could change your colleagues’ perception of AI, what attitudes/beliefs would you want them to have?"
READINGS:
“The case for taking AI seriously as a threat to humanity” by Kelsey Piper (Vox)
Human-Compatible, by Stuart Russell
The Alignment Problem, by Brian Christian
The Precipice: Existential Risk and the Future of Humanity, by Toby Ord
The Most Important Century, specifically "Forecasting Transformative AI", by Holden Karnofsky
TECHNICAL READINGS:
Empirical work by DeepMind's Safety team on alignment
Empirical work by Anthropic on alignment
Talk (and transcript) by Paul Christiano describing the AI alignment landscape in 2020
Podcast (and transcript) by Rohin Shah, describing the state of AI value alignment in 2021
Unsolved Problems in ML Safety by Hendrycks et al. (2022)
Interpretability work aimed at alignment: Elhage et al. (2021) and Olah et al. (2020)
AI Safety Resources by Victoria Krakovna (DeepMind) and Technical Alignment Curriculum
FUNDING:
Open Philanthropy Graduate Student Fellowship
Open Philanthropy Faculty Fellowship (faculty and others can reach out to OpenPhil directly as well)
FTX Future Fund
STANFORD RESOURCES:
Contact Vael Gates at vlgates@stanford.edu for further questions or collaboration inquiries.
HAI Network Affiliate
Vael received their Ph.D. in Neuroscience (Computational Cognitive Science) from UC Berkeley in 2021. During their Ph.D. they worked on formalizing and testing computational cognitive models of social collaboration. Their Ph.D.
No tweets available.