Published Work
Bana, Sarah H., Brynjolfsson, Erik, Jin, Wang, Steffen, Sebastian and Wang, Xiupeng. (Forthcoming) “Human Capital Acquisition in Response to Data Breaches.” MIS Quarterly.
https://misq.umn.edu/human-capital-acquisition-in-response-to-data-breaches.html
Toups, Connor, Bommasani, Rishi, Creel, Kathleen, Bana, Sarah H., Jurafsky, Dan and Liang, Percy. (2023). “Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes.” In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
Published PDF
Bana, Sarah, Bedard, Kelly, Rossin-Slater, Maya and Stearns, Jenna. 2023. “Unequal use of social insurance benefits: The role of employers.” Journal of Econometrics, https://doi.org/10.1016/j.jeconom.2022.02.008
Published PDF
Kim, Hung Chau, Bana, Sarah H., Bouvier, Baptiste and Frank, Morgan F. (2023). “Connecting Higher Education to Career Skills.” PLoS One.
https://doi.org/10.1371/journal.pone.0282323
Bana, Sarah H., Bedard, Kelly and Rossin‐Slater, Maya. 2020. “The Impacts of Paid Family Leave Benefits: Regression Kink Evidence from California Administrative Data.” Journal of Policy Analysis and Management, doi:10.1002/pam.22242
Published PDF
Bana, Sarah H., Benzell, Seth G. and Solares, Rodrigo Razo. 2020. “Ranking How National Economies Adapt to Remote Work.” Sloan Management Review.
Media Mentions: BBC Business Daily—Homeworking’s Winners and Losers, Valor Economico—Brasil é o quinto país com maior dificuldade para o home office
Working Papers
work2vec: Using Language Models to Understand Wage Premia
Stanford HAI article featuring the research
NotebookLM podcast
Link to working paper
Hedonic regressions have long helped economists understand how job characteristics contribute to earnings, but measurement challenges have limited which attributes could be analyzed systematically. Using a new dataset linking salary information from Greenwich.HR to job posting data from Burning Glass Technologies, I apply natural language processing techniques to decompose how different job characteristics contribute to earnings. The resulting model explains 83 percent of salary variation—a 19 percent improvement over traditional occupation-location controls. Using an attribution method called integrated gradients, I identify which words most strongly predict salaries. I then develop an entity extraction model to categorize posting content into activities, amenities, education, experience, firm names, general and technical job skills, hours, job titles, and location. The analysis reveals that job activities dominate both in frequency and earnings relevance. While skills and job titles have been used as proxies for tasks, directly measuring the activities described in job postings provides better insight into wage determination. This represents the first decomposition to quantify how such a wide range of workplace characteristics—rarely captured in administrative data—shapes earnings.
Beyond Code: The Multidimensional Impacts of Large Language Models in Software Development
with Sardar Fatooreh Bonabi, Tingting Nian, and Vijay Gurbaxani
Link to working paper
Large language models (LLMs) are poised to significantly impact software development, especially in the Open-Source Software (OSS) sector. To understand this impact, we first outline the mechanisms through which LLMs may influence OSS through code development, collaborative knowledge transfer, and skill development. We then empirically examine how LLMs affect OSS developers' work in these three key areas. Leveraging a natural experiment from a temporary ChatGPT ban in Italy, we employ a Difference-in-Differences framework with two-way fixed effects to analyze data from all OSS developers on GitHub in three similar countries—Italy, France, and Portugal—totaling 88,022 users. We find that access to ChatGPT increases developer productivity by 6.4%, knowledge sharing by 9.6%, and skill acquisition by 8.4%. These benefits vary significantly by user experience level: novice developers primarily experience productivity gains, whereas more experienced developers benefit more from improved knowledge sharing and accelerated skill acquisition. In addition, we find that LLM-assisted learning is highly context-dependent, with the greatest benefits observed in technically complex, fragmented, or rapidly evolving contexts. We show that the productivity effects of LLMs extend beyond direct code generation to include enhanced collaborative learning and knowledge exchange among developers—dynamics that are essential for gaining a holistic understanding of LLMs' impact in OSS. Our findings offer critical managerial implications: strategically deploying LLMs can accelerate novice developers' onboarding and productivity, empower intermediate developers to foster knowledge sharing and collaboration, and support rapid skill acquisition—together enhancing long-term organizational productivity and agility.
AI-Enabled Job Markets & Market Participation:
A Field Experiment on How AI Shapes Jobseekers’ Expectations of Competition
with Kevin Boudreau
Link to working paper
Artificial intelligence is increasingly mediating how jobseekers are matched with employers, raising fundamental questions about how its use affects participation in labor markets. We examine whether AI-based matching alters jobseeker behavior-specifically, their willingness to participate. Drawing on a field experiment with 4,562 jobseekers randomly assigned to disclosure conditions, we find that participation was about one-quarter lower when AI use was disclosed than in either a humanmatching treatment or a control group with no source specified. Participation responses varied systematically, consistent with jobseekers forming expectations about how AI affects (i) predicted match quality, (ii) the types of inputs and information used, and (iii) the scale of competition. These relationships were strongest among jobseekers with greater familiarity with AI, proxied by STEM backgrounds. Our findings highlight how AI disclosure can reshape both the level and composition of participation, as AI alters expectations of payoffs, with implications for platform design, policy, and the governance. More broadly, the results underscore the importance of distinguishing structural payoff effects-which may persist over time-from subjective attitudinal reactions to AI, which may be more transient.
Estimating the Cost of Advance Notice for Firms Conducting Mass Layoffs
with Jacob Morris
This paper investigates the impact of advance notice requirements on firm behavior during large-scale workforce reductions. We exploit discontinuous policy rules in the Employment Standards Act (ESA) to estimate the cost of additional notice requirements for firms conducting mass terminations in Ontario, Canada. Our novel empirical evidence shows that firms strategically manipulate the scale of layoffs to circumvent additional notice obligations. Specifically, we utilize quasi-experimental variation in notice requirements facilitated by the ESA to estimate an approximate 30% increase in the frequency of layoff events that bunch just below the threshold at which the mandatory notice period discontinuously increases from 8 to 12 weeks. These discrete jumps in mandatory notice reveal that the costs associated with additional notice provisions for displaced workers significantly distort firms' termination behavior during mass layoffs.
Identifying Vulnerable Displaced Workers: The Effect of State-Level Occupation Conditions
Which attribute of a worker's job — their industry or their occupation — plays a larger role in determining future labor market outcomes? Understanding the dominant attribute and their relative weights allows policymakers and researchers to more accurately measure potential exposure to labor market shocks, and to target the relevant populations with interventions. Yet limited government measurement of short-term occupation level employment has inhibited such a comparison. In this paper, I derive a measure of short-term occupation conditions in a worker's state using a shift-share approach. This measure facilitates a comparison between vulnerability to industry conditions and vulnerability to occupation conditions. I estimate the effect of these conditions on displaced workers' labor market outcomes. While both state-level industry and occupation conditions appear to affect displaced workers' labor market outcomes, variation in occupation conditions completely explains the relationship between industry conditions and subsequent outcomes. This implies that the dominant worker attribute is their occupation, and suggests that large negative shocks to occupation-level employment have major labor market consequences for those workers.
Media Coverage: Wall Street Journal - Real Time Economics Blog
work2vec: Learning the Latent Structure of the Labor Market
[New results forthcoming!]
with Erik Brynjolfsson, Daniel Rock, and Sebastian Steffen
Slides ASSA 2023
Job postings provide unique insights about the demand for skills, tasks, and occupations. Using the full text of data from millions of online job postings, we train and evaluate a natural language processing (NLP) model with over 100 million parameters to classify job postings' occupation labels and salaries. To derive additional insights from the model, we develop a method of injecting deliberately constructed text snippets reflecting occupational content into postings. We apply this text injection technique to understand the returns to several information technology skills including machine learning itself. We further extract measurements of the topology of the labor market, building a “jobspace” using the relationships learned in the text structure. Our measurements of the jobspace imply expansion of the types of work available in the U.S. labor market from 2010 to 2019. We also demonstrate that this technique can be used to construct indices of occupational technology exposure with an application to remote work. Moreover, our analysis shows that data-driven hierarchical taxonomies can be constructed from job postings to augment existing occupational taxonomies like the SOC (Standard Occupational Classification) system. Exploring further the model structure, we find that between 2010 and 2019, occupations have become increasingly distinct from each other in their language, suggesting a rise in specialization of tasks in the economy. This trend is strongest for managerial, computer science, and sales occupations.