Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Ecosystem Graphs: The Social Footprint of Foundation Models | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Ecosystem Graphs: The Social Footprint of Foundation Models

Date
March 29, 2023
Topics
Machine Learning

Researchers develop a framework to capture the vast downstream impact and complex upstream dependencies that define the foundation model ecosystem.

Foundation models (FMs) are the defining paradigm of modern AI. Beginning with language models like GPT-3, the paradigm extends to images, videos, code, proteins, and much more. Foundation models are both the frontier of AI research and the subject of global discourse (NYT, Nature, The Economist, CNN). To exemplify this, UK Prime Minister Rishi Sunak announced a FM taskforce as a top-level national priority. Given their remarkable capabilities, these models are widely deployed: Google and Microsoft are integrating FMs into flagship products like Microsoft Word and Google Slides, while OpenAI’s GPT-4 is powering products and services from Morgan Stanley, Khan Academy, Duolingo, and Stripe. FMs have seen unprecedented adoption — Reuters recently reported that ChatGPT, launched in November 2022, is the fastest-growing consumer application in history.

With FMs on the rise, especially in light of billions of dollars of funding to FM developers (Adept, Anthropic, Character, OpenAI, Perplexity), transparency is on the decline. OpenAI’s GPT-4 report reads: “Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.” At the Center for Research on Foundation Models, we are committed to ensuring foundation models are transparent. To date, we have built HELM to standardize their evaluation, argued for community norms to govern their release, and advocated for policy to expand these efforts.

To clarify the societal impact of foundation models, we are releasing Ecosystem Graphs as a centralized knowledge graph for documenting the foundation model ecosystem. Ecosystem Graphs consolidates distributed knowledge to improve the ecosystem’s transparency. Ecosystem Graphs helps us answer many questions about the foundation model ecosystem:

  • What are the newest datasets, models, and products?

  • Which modalities are trending? How are models scaling over time?

  • If an influential asset like the OpenAI API goes down, what products would be affected?

  • What video FMs are permissively-licensed so I can build an app using them?

  • If I find a bug with an FM-based product, where can I report my feedback?

  • Which organizations have the most impactful assets? How do FMs mediate relationships between institutions? Who has power in the ecosystem?

Ecosystem Graphs currently documents the ecosystem of 200+ assets, including datasets (e.g. The Pile, LAION-5B), models (e.g. BLOOM, Make-a-Video), and applications (e.g. GitHub Copilot X, Notion AI), spanning 60+ organizations and 9 modalities (e.g. music, genome sequences). Ecosystem Graphs will be a community-maintained resource (akin to Wikipedia) that we encourage everyone to contribute to! We envision Ecosystem Graphs will provide value to stakeholders spanning AI researchers, industry professionals, social scientists, auditors and policymakers.

The Ecosystem

flow chart of the foundation model pipeline

Figure 1. Foundation model pipeline.

Like all technologies, foundation models are situated in a broader social context. In Figure 1, we depict the canonical pipeline for such development, whereby datasets, models, and applications mediate the relationship between the people on either side.

Concretely, consider Stable Diffusion:

  1. People create content (e.g. take photos) that they (or others) upload to the web.

  2. LAION curates the LAION-5B dataset from the CommonCrawl web scrape filtered for problematic content.

  3. LMU Munich, IWR Heidelberg University, and RunwayML train Stable Diffusion on a filtered version of LAION-5B using 256 A100 GPUs.

  4. Stability AI builds Stable Diffusion Reimagine by replacing the Stable Diffusion text encoder with an image encoder.

  5. Stability AI deploys Stable Diffusion Reimagine as an image editing tool to end users on Clipdrop, allowing users to generate multiple variations of a single image (e.g. imagine a room with different furniture layouts). The application includes a filter to block inappropriate requests and solicits user feedback to improve the system as well as mitigate bias.

While framed technically (e.g. curation, training, adaptation), the process bears broader societal ramifications: for example, ongoing litigation contends the use of LAION-5B and the subsequent image generation from StableDiffusion is unlawful, infringing on the rights and the styles of data creators.

Framework

Illustration of the types of subgraphs identified with ecosystem graphs

Figure 2. Subgraphs identified with Ecosystem Graphs.

The ecosystem graph is defined in terms of assets: each asset has a type (one of dataset, model, application). Examples include The Pile dataset, the Stable Diffusion model, and the Bing Search application. To define the graph structure, each asset X has a set of dependencies, which are the assets required to build X. For example, LAION-5B is a dependency for Stable Diffusion and Stable Diffusion is a dependency for Stable Diffusion Reimagine. To supplement the graph structure, we annotate metadata (e.g. who built the asset) for every asset in its ecosystem card. In Figure 2, we give several examples of primitive structures (i.e. subgraphs) that we observe in the full ecosystem graph:

  • The standard pipeline: The Jurassic-1 dataset is used to train Jurassic-1, which is used in the AI21 Playground.

  • A common adaptation process: BLOOM is adapted by fine-tuning on xP3 to produce the instruction-tuned BLOOMZ model.

  • Layering of applications: The GPT-4 API powers Microsoft 365 Copilot, which is integrated into Microsoft Word.

  • Dependence on applications: While applications often are the end of pipelines (i.e. sinks in the graph), applications like Google Search both depend on FMs like MUM and support FMs (e. through knowledge retrieval) like Sparrow.

Value

Illustration of various hubs of model ecosystems including The Pile and ChatGPT API

Figure 3. Hubs identified with Ecosystem Graphs.

Ecosystem Graphs will provide value to a range of stakeholders: developers, AI researchers, social scientists, investors, auditors, policymakers, end users, and more. To demonstrate this value, we zoom into the hubs in the ecosystem: hubs are nodes with unusually high connectivity, featuring prominently in the analysis of graphs and networks. Figure 3 identifies four such hubs.

  1. The Pile is an essential resource for training foundation models from a range of institutions (EleutherAI, Meta, Microsoft, Stanford, Tsinghua, Yandex).

  2. P3 is of growing importance as interest in instruction-tuning grows, both directly used to train models and as a component in other instruction-tuning datasets.

  3. PaLM features centrally in Google’s internal foundation models for vision (PALM-E), robotics (PaLM-SayCan), text (Flan-U-PaLM), reasoning (Minerva), and medicine (Med-PaLM), making the recent announcement of the PaLM API for external use especially relevant.

  4. The ChatGPT API profoundly accelerates deployment with downstream products spanning a range of industry sectors.

For asset developers, hubs indicate their assets are high impact. For economists, hubs communicate emergent market structure and potential consolidation of power. For investors, hubs signal opportunities to further support or acquire. For policymakers and auditors, hubs identify targets to scrutinize to ensure their security and safety. Beyond hubs, we present more extensive arguments for why Ecosystem Graphs is useful in the paper.

Conclusion

The footprint of foundation models is rapidly expanding: this class of emerging technology is pervading society and is only in its infancy. Ecosystem Graphs aims to ensure that foundation models are transparent, establishing the basic facts of how they impact society. Concurrently, we encourage policy uptake to standardize reporting on all assets in Ecosystem Graphs: we will release a brief on Ecosystem Graphs as part of our series on foundation models.

Where to go from here:

  1. Website: Explore the ecosystem and hone in on specific datasets, models, and applications.

  2. Paper: Read more about the framework, design challenges, and value proposition.

  3. Add assets: Anyone can contribute new assets, independent of technical expertise!

  4. GitHub: Help build and maintain the ecosystem; we will be adding mechanisms soon to acknowledge significant open-source contributions!

Acknowledgements

Ecosystem Graphs was developed with Dilara Soylu (Stanford University) and Katie Creel (Northeastern University). Our design and implementation were informed by valuable feedback from folks at a range of organizations: we would like to thank Alex Tamkin, Ali Alvi, Amelia Hardy, Ansh Khurana, Ashwin Paranjape, Ayush Kanodia, Chris Manning, Dan Ho, Dan Jurafsky, Daniel Zhang, David Hall, Dean Carignan, Deep Ganguli, Dimitris Tsipras, Erik Brynjolfsson, Iason Gabriel, Irwing Kwong, Jack Clark, Jeremy Kaufmann, Laura Weidinger, Maggie Basta, Michael Zeng, Nazneen Rajani, Rob Reich, Rohan Taori, Tianyi Zhang, Vanessa Parli, Xia Song, Yann Dubois, and Yifan Mai for valuable feedback on this effort at various stages. We specifically thank Ashwin Ramaswami for extensive guidance on prior work in the open-source ecosystem and future directions for the foundation model ecosystem.

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.

Share
Link copied to clipboard!
Authors
  • Rishi Bommasani
    Rishi Bommasani

Related News

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering
Shana Lynch
Nov 07, 2025
News

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.

News

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering

Shana Lynch
Computer VisionMachine LearningNov 07

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.

Offline “Studying” Shrinks the Cost of Contextually Aware AI
Andrew Myers
Sep 29, 2025
News
Blue abstract background with light traveling through abstract flat cable illustrating data flow (3D render)

By having AI study a user’s context offline, researchers dramatically reduce the memory and cost required to make AI contextually aware.

News
Blue abstract background with light traveling through abstract flat cable illustrating data flow (3D render)

Offline “Studying” Shrinks the Cost of Contextually Aware AI

Andrew Myers
Foundation ModelsMachine LearningSep 29

By having AI study a user’s context offline, researchers dramatically reduce the memory and cost required to make AI contextually aware.

BEHAVIOR Challenge Charts the Way Forward for Domestic Robotics
Andrew Myers
Sep 22, 2025
News

With a first-of-its-kind competition for roboticists everywhere, researchers at Stanford are hoping to push domestic robotics into a new age of autonomy and capability.

News

BEHAVIOR Challenge Charts the Way Forward for Domestic Robotics

Andrew Myers
RoboticsMachine LearningSep 22

With a first-of-its-kind competition for roboticists everywhere, researchers at Stanford are hoping to push domestic robotics into a new age of autonomy and capability.