This week, the World Economic Forum’s Centre for the Fourth Industrial Revolution in San Francisco will release a new white paper that has profound implications not only for the data that powers today’s data-driven business models, but also for the future of artificial intelligence.
The project, Advancing Digital Agency: The Power of Data Intermediaries, which I coauthored, focuses on the need for intermediaries to protect consumer data, gain meaningful consent, and build trust in data companies.
The paper highlights one particular path – the trusted digital agent, a piece of software similar to an autonomous password manager that could negotiate your data-sharing preferences and terms on your behalf.
But third-party protection more broadly will be instrumental for a data-driven economy that respects individual rights and seeks to directly benefit both data subjects and society at large.
Building Trust
This new WEF paper arose from an earlier project I proposed to the organization in 2019 on the future of consent, which culminated in a white paper published in July 2020 discussing the challenges of notice and consent in data collection. My coauthors and I noted the problem of meaningful consent was exacerbated by technology and the lack of attention to the inherently social aspects of technological mediation. The report concluded with several ideas for further exploration, including two that became the basis of this recent project: data trusts (a specific form of data intermediary) and personal user agents.
The Advancing Digital Agency report picks up these two suggestions and expands them in depth. Uniting both concepts beneath the banner of “third-party intermediaries,” it first describes the need for data intermediaries, explores the many forms intermediaries take, and then focuses on one specific proposal, the trusted digital agent.
In creating data intermediaries and the infrastructure that supports them, the key goal is to build third parties with a set of fiduciary duties to benefit data rights holders, whether they be individuals and their personal data, businesses, or even the public sector. A substantial trust deficit exists online today between individuals and the companies collecting, aggregating, and sharing or selling their data. This problem is especially acute in the U.S., where we lack national consumer data privacy legislation. Even if we passed data privacy legislation that curbed some uses of data and provided individuals with a set of data rights, data intermediaries could still play an important role in mediating relationships between individuals and companies. (Learn more about another example of these types of intermediaries, a data collective, in this panel I moderated in November.)
This concept can be broadened beyond a consumer-company relationship to also hold for businesses and public sector actors as well. In our 2021 report Building a National AI Research Resource, my coauthors Professor Dan Ho, HAI Director of Policy Russell Wald, and I included a discussion of the role a National Research Cloud could perform as a data intermediary that aggregated public data for use by AI researchers. The EU’s Data Governance Act, adopted by the European Commission in 2020, also focuses on creating more opportunities for exchanging public data.
Data Intermediaries and AI
Data intermediaries are particularly relevant for the development of artificial intelligence. First, as my Stanford colleague Professor James Zou suggests, we are experiencing a shift from model-centric to data-centric AI. Access to large data is essential to ongoing AI development. Yet obtaining high-quality, large-scale datasets, particularly containing individual or personal data, is a challenge.
This is where data intermediaries can help. At its simplest, a data intermediary can act as a cooperative where data from multiple sources is pooled together. The intermediary would be responsible for managing access (likely through licensing arrangements) to that data in a manner that reflects the priorities and value of the data subjects. For example, imagine that you would like to use both medical services (e.g., personalized medicine) and consumer services (from a direct-to-consumer DNA analysis company) based on access to your sequenced genome, but aren’t comfortable with giving up your DNA to multiple actors. You could join a data intermediary that allowed these companies access to your genomic data without giving up your rights to actual data. You could also allow the intermediary to share access to your data for scientific research projects that met your personal criteria. Ideally, all of these and other interactions could be managed through a trusted digital agent that negotiated these decisions on your behalf.
There are several potential wins here: for consumers, who retain meaningful control over their personal data and yield more direct benefits than simply access to free services; for companies that want access to high-quality data that does not run afoul of data protection laws; and for society at large, when we have a data ecosystem that operates with transparency and is focused on fair exchange rather than extraction.
Divest Data from Big Tech
Decoupling many forms of data from the platforms that today are the primary gatekeepers yields benefits not only for individuals but for other businesses as well. It is access to vast amounts of data that in part enables large platforms to have a disproportionate influence on AI development. And as large platforms continue to use data in ways that violate our expectations of privacy and trust, the pressure increases to find solutions to this dilemma. Federal data privacy regulation is one solution. I suggest another: Require large companies to divest themselves of our personal data that we then put in the stewardship of a trusted third party to manage on our behalf. An action like this would take companies out of the business of monetizing our data without our consent as well as potentially compensate us for uses of our data that stretch far beyond the original intent for its collection.
The question of consent here is central. As I discuss in depth in my first WEF report, collecting data with meaningful consent is a highly problematic practice today and often poorly executed. To date, it is the European Union with the support of the General Data Protection Regulation (GDPR) framework where data protection authorities have taken the strongest measures to question and change existing online consent mechanisms. In the absence of meaningful regulation by the U.S. Congress, it is this vision of data protection that will increasingly become the global norm outside of countries that show little respect for individual rights. But as long as both data collection and consent to it are unregulated and easily manipulated, the incentives for companies to mine as much data as possible without regard to its source will remain. This portends a data-centric AI industry built upon data acquired under murky circumstances with questionable ethical practices. For a sector under the close watch of regulators with tremendous pressure to build transparent, explainable, and non-biased AI, this is not the foundation that will enable that future.
The Advancing Digital Agency report expounds on these and related critical issues to offer a new way forward for data governance and stewardship across the public and private sectors. As AI is poised to revolutionize so many aspects of our society, it is crucial that it not be built upon the existing foundations of data access and collection that jeopardize our individual and collective data rights.
Dr. Jennifer King is a Privacy and Data Policy Fellow at the Stanford Institute for Human-Centered Artificial Intelligence.
Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.