Skip to main content Skip to secondary navigation
Page Content

November 16 Agenda: Workshop on Sociotechnical AI Safety

Overview          Speakers           November 17 Agenda

NOVEMBER 16

Time

Agenda Item

8:15am

Check-In/Breakfast Available

8:45-9:00am

Welcome & Introduction

Speaker: Seth Lazar (ANU)

9:00-10:00am

Building the Epistemic Community of AI Safety

Speaker: Shazeda Ahmed (UCLA)

Discussant: Alan Chan (MILA)

Abstract: The emerging field of "AI safety" has attracted public attention and large infusions of capital to support its implied promise: the ability to deploy advanced artificial intelligence (AI) while reducing its gravest risks. Ideas from effective altruism, longtermism, and the study of existential risk are foundational to this new field. In this paper, we contend that overlapping communities interested in these ideas have merged into what we refer to as the broader "AI safety epistemic community," which is sustained through its mutually reinforcing community-building and knowledge production practices. We support this assertion through an analysis of four core sites in this community’s epistemic culture: 1) online community-building through career advising and web forums; 2) AI forecasting; 3) AI safety research; and 4) prize competitions. The dispersal of this epistemic community's members throughout the tech industry, academia, and policy organizations ensures their continued input into global discourse about AI. Understanding the epistemic culture that fuses their moral convictions and knowledge claims is crucial to evaluating these claims, which are gaining influence in critical, rapidly changing debates about the harms of AI and how to mitigate them.

10:00-10:30am

Break

10:30-11:30am

Integrating Transdisciplinary Insights Towards a Transgranular Entity Alignment Framework

Speaker: Shiri Dori-Hacohen (UConn)

Discussant: Alondra Nelson (IAS)

Abstract: Most safety and alignment work to date has emphasized AI as the focal point of alignment efforts, often overlooking the role of non-AI components, the heterogenous and often conflicting nature of human needs, and/or the positionality of AI within a sociotechnical lens rather than a reductive or technodeterministic one. Our recent work [1] addresses misalignment in populations of humans and AI agents, enabling modeling of complex alignment scenarios; recently, the framing of "Sociotechnical AI Safety" [2] was proposed to overcome the latter two challenges, as well. The issue of AI's centrality, however, remains unaddressed: counter-examples abound in which non-AI technical systems have caused significant harms, and can therefore be posited as misaligned with (some) human interests. In this work-in-progress, we interrogate this apparent paradox by challenging the prevailing AI-vs-humanity duality and its dominance of alignment research to date. By training our attention on several alignment scenarios using insights from systems engineering, biology, and the social sciences, we highlight the need for a new framework that steps outside of the bounds of a simplistic AI-human duality. Our proposed Transgranular Entity Alignment Framework, still in early stages of development, hypothesizes that entities spanning incredibly varied granularity levels - from the macromolecular, through deployed technical systems and the sociotechnical systems they yield, to planet-level entities and the biosphere - can all be considered as possible targets for alignment modeling. Our novel approach affords modeling alignment both within and across granularity levels, offering rich explanatory power for inter-entity and intra-entity misalignment, with a potential to shift the conversation on tackling Sociotechnical Alignment (whether of AI, or otherwise). We will share the nascent framework at the workshop, as a spark to stimulate discussion on the gaps in current Alignment and Safety research across disciplinary boundaries. 

11:30-11:45am

Break 

11:45-12:45pm

Toward Normative Alignment of AI Systems

Speaker: Mark Riedl (Georgia Tech)

Discussant: Jonathan Stray (Berkeley)

Abstract: As more machine learning agents interact with humans, it is increasingly a prospect that an agent trained to perform a task optimally, using only a measure of task performance as feedback, can violate societal norms for acceptable behavior or cause harm. I introduce the concept of normative alignment as bridge between value alignment and responsible AI practices. Normativity refers to behavior that conforms to expected societal norms and contracts, whereas non-normative behavior aligns to values that deviate from these expected norms. In this largely speculative talk, I discuss the goal of normatively aligned agents, describe early work on attempting to learn sociocultural norms for the purpose of informing agent behavior, and speculate on how the field of normative and value alignment can evolve.

12:45-1:45pm

Lunch

1:45-2:45pm

Safety and Geopolitics: A Critical Security Studies Lens 

Speaker: Marie-Therese Png (Oxford)

Discussant: Yonadav Shavit (OpenAI)

Abstract: An important minority of leaders of AI policy or governance initiatives are recognising their responsibility to ensure that AI deployment and regulation do not lock in intranational and international inequalities. In addition, they understand the potential strategic advantages of globally inclusive efforts for consensus building, governance effectiveness, and geopolitical stability - that extreme power imbalances between different regions can have long-term detrimental and destabilising effects at a global level.

To support efforts to mitigate extreme power imbalances, and in anticipation of AI systems becoming more pervasive, capable, and a means for concentrating power, this paper offers three areas of consideration to develop theoretically and empirically: 

 

  • An analysis of dehumanisation and safety, distinguishing between systems safety engineering, and human safety - as defined by critical security studies. The latter acknowledges instrumentalised mechanisms of dehumanisation as an important dimension of analysis to understand how safety is afforded and compromised at scale. It also informs that risks are not adequately defined or addressed by those distanced by structural power and institutional safety, and a more accurate assessment of harm is possible with participation from at-risk communities. With this lens, safety requires co-governance mechanisms as a more developed framework of inclusive governance.
  • Concentration of corporate and political power - concerns around increased concentrations of power have been expressed both by the AI Safety and sociotechnical communities. However, the mechanisms by which corporate and geopolitical power are accumulated and maintained remain under-examined. This paper identifies constitutive erasures of international relations and geopolitics such as the “banalisation of race”, that result in a theoretically and empirically flawed picture of world politics.
  • The geopolitics of compute -  As AI systems advance in capabilities, they are likely to increase in energy consumption, compute demand, and natural resource demands to build physical AI infrastructure. This paper examines this increase through the growing geopolitical, rights and environmental implications of compute, energy, and critical materials within complex multi-country supply chains. Dependency theory frames existing dynamics between wealthy, infrastructurally developed states who extract from resource rich countries, albeit who are weaker members of the world market economy. Without adequate safeguards and regulations, extractive and exploitative practices will leave populations in these countries systemically more vulnerable to risks emerging from AI resource demands.

2:45-3:15pm

Break 

3:15-4:15pm

Evaluating the Social Impact of Generative AI Systems Across Modalities

Speaker: Irene Solaiman (Hugging Face)

Discussant: Sanmi Koyejo (Stanford)

Abstract: Generative AI systems across modalities, ranging from text, image, audio, and video, have broad social impacts, but there exists no official standard for means of evaluating those impacts and which impacts should be evaluated. We move toward a standard approach in evaluating a generative AI system for any modality, in two overarching categories: what is able to be evaluated in a base system that has no predetermined application and what is able to be evaluated in society. We describe specific social impact categories and how to approach and conduct evaluations in the base technical system, then in people and society. Our framework for a base system defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. Suggested methods for evaluation apply to all modalities and analyses of the limitations of existing evaluations serve as a starting point for necessary investment in future evaluations. We offer five overarching categories for what is able to be evaluated in society, each with their own subcategories: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. Each subcategory includes recommendations for mitigating harm. We are concurrently crafting an evaluation repository for the AI research community to contribute existing evaluations along the given categories.

4:15-4:30pm

Break 

4:30-5:30pm

Transparency for Foundation Models: A Lost Cause or a Valiant Fight?

Speaker: Rishi Bommasani (Stanford)

Discussant: Tyna Eloundou (OpenAI)

Abstract: Foundation models (GPT-4, StableDiffusion) are transforming society: remarkable capabilities, serious risks, rampant deployment, unprecedented adoption, overflowing funding, and unending controversy. In this talk, I will consider a mixture of initiatives to improve the transparency of foundation models. This will range from documenting the broad ecosystem (Ecosystem Graphs) to evaluating specific models (HELM). While bearing insight, this work will remind us of the bigger picture: foundation models are increasingly opaque. One can say astonishingly little about the data and compute, the labor practices, the model architecture, and ultimately the use and societal impact of a model like GPT-4. As a result, I will then discuss two in-progress efforts on directly rating companies for their transparency and shaping public policy to support open foundation models, which are demonstrably more transparent.

5:30pm

Day 1 Closing

5:30-7:00pm

Reception 

This workshop is supported by the ANU Machine Intelligence and Normative Theory Lab, the Institute for Human-Centered AI, Stanford, and the McCoy Family Center for Ethics in Society, Stanford.