AI Alignment means making sure an AI system’s goals and behavior match what people actually want—our values, rules, and intentions. It’s about getting the AI to do the “right thing” even in new situations, not just follow instructions literally in ways that cause harm. In practice, it includes preventing unwanted outcomes like deception, unsafe shortcuts, or optimizing a metric that misses the real objective.
Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.
Sign Up For Latest News

A new study on generative AI argues that addressing biases requires a deeper exploration of ontological assumptions, challenging the way we define fundamental concepts like humanity and connection.
A new study on generative AI argues that addressing biases requires a deeper exploration of ontological assumptions, challenging the way we define fundamental concepts like humanity and connection.


Ever-changing lexicons, multilingualism, and varying cultural value systems compromise accuracy of large language models.
Ever-changing lexicons, multilingualism, and varying cultural value systems compromise accuracy of large language models.
