Published Work

Bana, Sarah H., Brynjolfsson, Erik, Jin, Wang, Steffen, Sebastian and Wang, Xiupeng. (Forthcoming) “Human Capital Acquisition in Response to Data Breaches.” MIS Quarterly.
https://misq.umn.edu/human-capital-acquisition-in-response-to-data-breaches.html

Toups, Connor, Bommasani, Rishi, Creel, Kathleen, Bana, Sarah H., Jurafsky, Dan and Liang, Percy. (2023). “Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes.” In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
Published PDF

Bana, Sarah, Bedard, Kelly, Rossin-Slater, Maya and Stearns, Jenna. 2023. “Unequal use of social insurance benefits: The role of employers.” Journal of Econometrics, https://doi.org/10.1016/j.jeconom.2022.02.008
Published PDF

Kim, Hung Chau, Bana, Sarah H., Bouvier, Baptiste and Frank, Morgan F. (2023). “Connecting Higher Education to Career Skills.” PLoS One.
https://doi.org/10.1371/journal.pone.0282323

Bana, Sarah H., Bedard, Kelly and Rossin‐Slater, Maya. 2020. “The Impacts of Paid Family Leave Benefits: Regression Kink Evidence from California Administrative Data.” Journal of Policy Analysis and Management, doi:10.1002/pam.22242
Published PDF

Bana, Sarah H., Benzell, Seth G. and Solares, Rodrigo Razo. 2020. “Ranking How National Economies Adapt to Remote Work.” Sloan Management Review.
Media Mentions: BBC Business Daily—Homeworking’s Winners and Losers, Valor Economico—Brasil é o quinto país com maior dificuldade para o home office

Bana, Sarah, Bedard, Kelly and Rossin-Slater, Maya. 2018. “Trends and Disparities in Leave Use under California's Paid Family Leave Program: New Evidence from Administrative Data.” AEA Papers & Proceedings, 108: 388-91.

Working Papers

work2vec: Using Language Models to Understand Wage Premia

Stanford HAI article featuring the research
NotebookLM podcast
Hedonic regressions have long helped economists understand how job characteristics contribute to earnings, but measurement challenges have limited which attributes could be analyzed systematically. Using a new dataset linking salary information from Greenwich.HR to job posting data from Burning Glass Technologies, I apply natural language processing techniques to decompose how different job characteristics contribute to earnings. The resulting model explains 83 percent of salary variation—a 19 percent improvement over traditional occupation-location controls. Using an attribution method called integrated gradients, I identify which words most strongly predict salaries. I then develop an entity extraction model to categorize posting content into activities, amenities, education, experience, firm names, general and technical job skills, hours, job titles, and location. The analysis reveals that job activities dominate both in frequency and earnings relevance. While skills and job titles have been used as proxies for tasks, directly measuring the activities described in job postings provides better insight into wage determination. This represents the first decomposition to quantify how such a wide range of workplace characteristics—rarely captured in administrative data—shapes earnings.

The Impacts of Generative AI on Software Development

with Sardar Fatooreh Bonabi, Tingting Nian, and Vijay Gurbaxani
Draft available upon request

Software development, particularly within the Open-Source Software (OSS) sector, is notably positioned to experience significant influence from large language models (LLMs). In this study, we examine the real-world impacts of LLMs on various aspects of OSS developers' work. While initial studies have predominantly focused on short-term or controlled settings to explore LLMs' productivity impacts, we bridge this gap by investigating the long-term, real-world effects of ChatGPT on the key activities of OSS software developers–productivity, knowledge dissemination, and skill acquisition. Leveraging a natural experiment from a temporary ban on ChatGPT in Italy, we employ a Difference-in-Differences (DiD) framework with two-way fixed effects and analyze data from 88,022 OSS developers on GitHub. Our findings show that losing access to ChatGPT results in a 6.4% decrease in productivity and 8.4% decrease in skill acquisition rate, while resuming access leads to a 9.6% increase in knowledge. Additionally, we find that users benefit differently from access to this LLM based on their experience levels. Novice users experience notable productivity gains, while intermediate users see improved knowledge dissemination and skill acquisition rates. This study contributes to the literature on the impacts of LLMs on knowledge worker productivity, knowledge-sharing, and learning in online platforms.

AI-Enabled Job Markets & Market Participation:
Jobseekers’ “Rational Expectations” About Competition vs “AI Aversion”

with Kevin Boudreau
Draft available upon request

Machine-based artificial intelligence (AI) systems are increasingly used to match jobseekers with employers in labor markets. We theorize that the number and types of jobseekers willing to participate in AI-enabled job markets may be influenced by (i) inherent preferences or attitudes towards AI, referred to as "AI aversion," or (ii) expectations about how AI reshapes competition and the benefits of participation. This study reports on a field experiment investigating the unintended consequences of AI systems on jobseeker behavior and market participation, while controlling for the quality of jobseeker-employer matches. The experiment involved 4,562 jobseekers who received job recommendations, with the source randomly assigned as AI matching, human matching, or undisclosed (control). Participants in the AI-Matching treatment were 26% less likely to engage compared to both the Human-Matching and control groups. Moreover, AI-Matching altered the composition of participants by educational background, gender, match quality, and network size. Results from the experiment and a follow-up survey confirm that both AI aversion and expectations about competition played significant roles in shaping behavior. These findings offer new insights into AI-human interactions in labor markets and digital platforms, with important implications for organizational theorizing related to human-AI interactions.

Estimating the Cost of Advance Notice for Firms Conducting Mass Layoffs

with Jacob Morris

This paper investigates the impact of advance notice requirements on firm behavior during large-scale workforce reductions. We exploit discontinuous policy rules in the Employment Standards Act (ESA) to estimate the cost of additional notice requirements for firms conducting mass terminations in Ontario, Canada. Our novel empirical evidence shows that firms strategically manipulate the scale of layoffs to circumvent additional notice obligations. Specifically, we utilize quasi-experimental variation in notice requirements facilitated by the ESA to estimate an approximate 30% increase in the frequency of layoff events that bunch just below the threshold at which the mandatory notice period discontinuously increases from 8 to 12 weeks. These discrete jumps in mandatory notice reveal that the costs associated with additional notice provisions for displaced workers significantly distort firms' termination behavior during mass layoffs.

Identifying Vulnerable Displaced Workers: The Effect of State-Level Occupation Conditions

Which attribute of a worker's job — their industry or their occupation — plays a larger role in determining future labor market outcomes? Understanding the dominant attribute and their relative weights allows policymakers and researchers to more accurately measure potential exposure to labor market shocks, and to target the relevant populations with interventions. Yet limited government measurement of short-term occupation level employment has inhibited such a comparison. In this paper, I derive a measure of short-term occupation conditions in a worker's state using a shift-share approach. This measure facilitates a comparison between vulnerability to industry conditions and vulnerability to occupation conditions. I estimate the effect of these conditions on displaced workers' labor market outcomes. While both state-level industry and occupation conditions appear to affect displaced workers' labor market outcomes, variation in occupation conditions completely explains the relationship between industry conditions and subsequent outcomes. This implies that the dominant worker attribute is their occupation, and suggests that large negative shocks to occupation-level employment have major labor market consequences for those workers.

Media Coverage: Wall Street Journal - Real Time Economics Blog

work2vec: Learning the Latent Structure of the Labor Market

[New results forthcoming!]
with Erik Brynjolfsson, Daniel Rock, and Sebastian Steffen
Slides ASSA 2023

Job postings provide unique insights about the demand for skills, tasks, and occupations. Using the full text of data from millions of online job postings, we train and evaluate a natural language processing (NLP) model with over 100 million parameters to classify job postings' occupation labels and salaries. To derive additional insights from the model, we develop a method of injecting deliberately constructed text snippets reflecting occupational content into postings. We apply this text injection technique to understand the returns to several information technology skills including machine learning itself. We further extract measurements of the topology of the labor market, building a “jobspace” using the relationships learned in the text structure. Our measurements of the jobspace imply expansion of the types of work available in the U.S. labor market from 2010 to 2019. We also demonstrate that this technique can be used to construct indices of occupational technology exposure with an application to remote work. Moreover, our analysis shows that data-driven hierarchical taxonomies can be constructed from job postings to augment existing occupational taxonomies like the SOC (Standard Occupational Classification) system. Exploring further the model structure, we find that between 2010 and 2019, occupations have become increasingly distinct from each other in their language, suggesting a rise in specialization of tasks in the economy. This trend is strongest for managerial, computer science, and sales occupations.