
In this brief, Stanford scholars introduce Holistic Evaluation of Language Models (HELM) as a framework to evaluate commercial application of AI use cases.
Extending the WILDS Benchmark for Unsupervised Adaptation
The new 2.7B parameter language model trained on biomedical literature delivers an improved state of the art for medical question answering.