Skip to main content Skip to secondary navigation
Page Content

SEAMS

Overview

SEAMS: Self-improving, Efficient and Accelerated Models and Systems

About

Foundation models, trained on broad data at immense scale, have revolutionized AI and computing, in the last few years. They exhibit a qualitative leap in capabilities: They can generate full documents and images, they can engage in compelling dialogue, and they can solve a wide range of tasks by simply prompting with natural language instructions. They represent a paradigm shift in how AI systems are built, enabling a new form of rapid prototyping, and they will transform every sector. However, we are still in the early innings of this revolution - similar to the Internet in 1993. Today's systems are still vastly inefficient, functioning only as an existence proof that certain AI capabilities are possible.

Our fundamental thesis is that we can obtain many orders of magnitude of improvement by developing self-improving systems, that is, using AI to optimize AI. An AI system consists of three layers: the systems layer (including both hardware and software), the modeling layer (e.g., the model architecture, training procedure), and the data layer (e.g., how to supervise the system). Currently, all three layers require human experts to manually make decisions (e.g., designing custom kernels, manual hyperparameter tuning, manual weighting of data). However, as AlphaGo and AlphaFold have clearly demonstrated, if we have a clear objective function, automation can optimize it far better than a human can. This enables the human expert to focus on higher-level, strategic planning. Finally, automatic optimization requires simulation, which can be expensive. We will develop a cascade of efficient (possibly approximate) simulators of all three levels of the stack to enable efficient optimization.

SEAMS Faculty

Kunle Olukotun (faculty lead) 
Kayvon Fatahalian 
Percy Liang 
Azalia Mirhoseini 
Christopher Ré

SEAMS Affiliate Program

  • Membership: $500,000 per year 
  • Benefits
    • Invitations to attend up to two retreats per year, which are open to all affiliate members.
    • Active engagement with researchers. 
    • Recognition on Lab website and at Lab events..
    •  Option to send a visiting scholar to join the Stanford research team with additional funding to work on a project (in accordance with the Dean of Research’s policies). 
  • Research: Please note that the SEAMS Affiliate Program is governed by the Stanford University Policies Affecting Industrial Affiliates Program Memberships. The site presentations and all information, data and results arising from such visitation interactions will be shared with all members and the public.

SEAMS Affiliate Members

PLACEHOLDER

Research Areas

Systems: Foundation models are on the cusp of serving billions of daily users across numerous applications. Given the large cost of training and inference, customized optimizations for foundation models across the software and hardware stack can yield massive savings and reduce their carbon footprint. We are interested in significantly reducing the cost of foundation models through custom co-design techniques and specialized hardware. Our prior work shows the effectiveness of model, software, and hardware co-design for improving efficiency. For example, FlashAttention is an exact I/O-aware algorithm that speeds up attention by 2-4x and reduces the model’s memory footprint. FlashAttention has already been deployed in many mainstream industry-scale large language models. To account for expensive inference cost, we have developed a high-throughput engine for latency-insensitive tasks and sparsity-based techniques for improving LLM inference time. We have also developed a full-stack accelerator search framework with a large design search space across hardware datapaths, software schedules, and compiler fusions, demonstrating multiplier performance gains across vision and language benchmarks. We are excited about developing broader co-design techniques across the entire computing stack, including data, model, software, and hardware to drastically improve the efficiency and scalability of foundation models.

Modeling: The training of foundation models involves a dizzying array of choices: for the model architecture, there’s the number of layers, the number of heads, the types of normalization, the types of positional embeddings; for the training procedure, there is the learning schedule, the length of training; for the dataset, there is the data mixture, the filtering procedure, etc. These choices influence quality, throughput, and stability. How to tune these models is highly empirical and ML developers develop intuitions through a lot of trial and error. Scaling laws, which allow one to predict performance at larger scales, are central to the success of foundation models. Our goal is to capture all this rich intuition within an AI model itself. This model would be pre-trained with historical data (e.g., papers of existing models) and further refined via actual experiments. The model should also guide a policy for determining the experiments that both increase implicit knowledge and improve performance. The successes of AlphaGo and AlphaFold at surpassing human capabilities in problems where there is a clear objective gives us great hope that this methodology will lead to huge gains in performance for foundation models.

Data: Despite their huge success, foundation models are still far from being reliable, especially when it comes to long-horizon design and optimization tasks. While fine-tuning models based on human feedback data has shown to improve reliability, this approach is costly and unscalable. We are interested in developing techniques that enable the model to “self-improve” through an interactive and scalable process. In Constitutional AI, we have developed a new approach that heavily depends on the model itself to improve its alignment with intended objectives such as helpfulness and harmlessness. In this approach, we provide the model with a set of rules (aka the constitution). We ask the model to generate self-criticizing data to explain whether the model’s outputs follow the rules or not (aka AI feedback), and then fine-tune the model based on this data (as opposed to human feedback data). Our goal is to enable reliable self-improving AI models that can identify their shortcomings, develop new skills, and use available tools as needed to correct themselves. The models should acquire in-depth domain knowledge related to target problems (through books, tutorials, manuals, etc.). The models should also be able to evaluate and critique themselves. To enable this, they can use feedback from different sources of ground truth, such as search engines, databases, Python interpreters, debugging tools or from other foundation models. A concrete end goal is to design specialized foundation models that go beyond co-pilots and can be “hired” as domain experts to complete end-to-end software or data science projects.

Simulation: Many of these automatic optimizations are fundamentally combinatorial search problems that rely heavily on intense simulation: optimizing accelerator hardware requires simulating a design's energy and area; training an intelligent AI agent requires simulation of the world it must operate in. Recently, we've shown that running extensive training simulations need not incur supercomputing-scale cost---the Madrona Engine provides orders of magnitude speedups rearchitecting simulators for training problem-solving agents. However, simulations for all three layers described above remain complex, and in addition to massively parallel simulator systems, AI should be able to help us with simulation distillation: automatically making cheaper versions of the simulator itself. AI can help us determine what simulation fidelity is necessary, and to automatically generate low-cost models that approximate high-fidelity simulations, which will be used to carry out system optimization self-improvement more efficiently.

Contact

Contact Marc Gough if you have any questions.