Domain Shift and Emerging Questions in Facial Recognition Technology

This brief urges transparent, verifiable standards for facial-recognition systems and calls for a moratorium on government use until rigorous in-domain testing frameworks are established.
Key Takeaways
FRT vendors and developers should ensure their models are created in a way that is as transparent as possible, capable of being validated by the user, and well documented. The effect these systems have on the decision making of their users must be understood more deeply and policymakers should embrace A/B testing as a tool to gauge this.
Users in government and business settings should condition the procurement of FRT systems on in-domain testing and adherence to established protocols
We support calls for a moratorium on FRT adoption in government and policing while a more responsible testing framework is developed.
Executive Summary
Facial recognition technologies have grown in sophistication and adoption throughout American society. Consumers now use facial recognition technologies (FRT) to unlock their smartphones and cars; retailers use them for targeted advertising and to monitor stores for shoplifters; and, most controversially, law enforcement agencies have turned to FRT to identify suspects. Significant anxieties around the technology have emerged—including privacy concerns, worries about surveillance in both public and private settings, and the perpetuation of racial bias.
In January 2020, Detroit resident Robert Julian-Borchak Williams was wrongfully arrested, in what the New York Times named as possibly the first instance of an arrest based on a faulty FRT algorithm. The incident highlights the role of FRT in the nation’s ongoing conversation around racial injustice. The killings of George Floyd, Breonna Taylor, and Ahmaud Arbery and the public demonstrations that followed in the spring and summer of 2020 compelled a long overdue reckoning with racial injustice in the United States. FRT systems have been documented to exhibit worse performance with darker-skinned individuals and we must hence examine the potential for such technology to perpetuate existing injustices. This brief points towards an evaluative framework to benchmark whether FRT works as billed. In the face of calls for a ban or moratorium on government and police use of FRT systems, we embrace the demand for a pause so that the technical and human elements at play can be more deeply understood and so that standards for a more rigorous evaluation of FRT can be developed.
Our recommendations in this brief extend to both the computational and human side of FRT. In seeking to answer how we bridge the gap between testing FRT algorithms in the lab and testing products under real world conditions, we focus on two sources of uncertainty: first, the specific differences in model output between development settings and end user applications (which we term here domain shift), and second, the differences in end user interpretation and usage of model output across the institutions employing FRT (which we refer to as institutional shift). Policymakers have a crucial role to play in ensuring that responsible protocols for FRT assessment are codified—both as they pertain to the impact FRT have on human decision making as well as how they pertain to the performance of the technology itself. In building out a framework for responsible testing and development, policymakers should further look to empowering regulators to use stronger auditing authority and the procurement process to prevent FRTs from evolving in ways that would be harmful to the broader public.







