Skip to main content Skip to secondary navigation
Page Content

HAI Weekly Seminar with Sarah Bana

Using Language Models to Understand Wage Premia

Investments in human capital are increasingly diverse, with the number of online courses, microcredentials, and digital badges exceeding the number of traditional postsecondary education degrees and certificates in 2020. While more workers than ever have access to learning opportunities, information about the potential effects of these certifications on earnings has not scaled at the same rate. Using a new dataset from Greenwich.HR with salary information linked to posting data from Burning Glass Technologies, I apply natural language processing (NLP) techniques to build a model that predicts salaries from job posting text with impressive accuracy. This model serves as the basis for the estimating the effect of salary premia for various characteristics, using a hedonic regression framework. I uncover the premia associated with eight in-demand certifications, in the fields of business analysis, project management, computer networking, and supply chain. Estimates range from 0.005 log points to 0.048 log points -- almost a tenfold difference for certifications considered "in-demand." The real-time pricing of these attributes can provide additional information to firms and workers about how to strategically invest, improving decisions about human capital accumulation.

Sarah Bana

Sarah Bana

Stanford Digital Economy Lab Postdoctoral Fellow, Stanford University

Dr. Sarah Bana is a postdoctoral fellow at Stanford’s Digital Economy Lab, part of Stanford’s Institute for Human-Centered Artificial Intelligence. Her research uses online data to characterize technologies, tasks, and the interactions between them to answer fundamental questions about the evolution ...