HAI Weekly Seminar with Sarah Bana
Using Language Models to Understand Wage Premia
Investments in human capital are increasingly diverse, with the number of online courses, microcredentials, and digital badges exceeding the number of traditional postsecondary education degrees and certificates in 2020. While more workers than ever have access to learning opportunities, information about the potential effects of these certifications on earnings has not scaled at the same rate. Using a new dataset from Greenwich.HR with salary information linked to posting data from Burning Glass Technologies, I apply natural language processing (NLP) techniques to build a model that predicts salaries from job posting text with impressive accuracy. This model serves as the basis for the estimating the effect of salary premia for various characteristics, using a hedonic regression framework. I uncover the premia associated with eight in-demand certifications, in the fields of business analysis, project management, computer networking, and supply chain. Estimates range from 0.005 log points to 0.048 log points -- almost a tenfold difference for certifications considered "in-demand." The real-time pricing of these attributes can provide additional information to firms and workers about how to strategically invest, improving decisions about human capital accumulation.