Research
Publications
Human Capital Acquisition in Response to Data Breaches
MIS Quarterly, 2025
Abstract
Given the rise in the frequency and cost of data security threats, it is critical to understand whether and how companies strategically adapt their operational workforce in response to data breaches. We study hiring in the aftermath of data breaches by combining information on data breach events with detailed firm-level job posting data. Using a staggered difference-in-differences approach, we show that breached firms significantly increase their demand for cybersecurity workers. Furthermore, firms' responses to data breaches extend to promptly recruiting public relations personnel -- an act aimed at managing trust and alleviating negative publicity -- often ahead of cybersecurity hires. Following a breach, the likelihood of firms posting a cybersecurity job rises by approximately two percentage points, which translates to an average willingness to spend an additional $61,961 in annual wages on cybersecurity, public relations, and legal workers. While these hiring adjustments are small for affected firms, they represent a large potential impact of over $300 million on the overall economy. Our findings underscore the vital role of human capital investments in shaping firms' cyber defenses and provide a valuable roadmap for managers and firms navigating cyberthreats in an increasingly digital age.
Course-Skill Atlas: A National Longitudinal Dataset of Skills Taught in U.S. Higher Education Curricula
Scientific Data, 2024
Abstract
Higher education plays a critical role in driving an innovative economy by equipping students with knowledge and skills demanded by the workforce. While researchers and practitioners have developed data systems to track detailed occupational skills, such as those established by the U.S. Department of Labor (DOL), much less effort has been made to document which of these skills are being developed in higher education at a similar granularity. Here, we fill this gap by presenting Course-Skill Atlas -- a longitudinal dataset of skills inferred from over three million course syllabi taught at nearly three thousand U.S. higher education institutions. To construct Course-Skill Atlas, we apply natural language processing to quantify the alignment between course syllabi and detailed workplace activities (DWAs) used by the DOL to describe occupations. We then aggregate these alignment scores to create skill profiles for institutions and academic majors. Our dataset offers a large-scale representation of college education's role in preparing students for the labor market.
Ecosystem-Level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes
NeurIPS, 2023
Abstract
Machine learning is traditionally studied at the model level: researchers measure and improve the accuracy, robustness, bias, efficiency, and other dimensions of specific models. In practice, however, the societal impact of any machine learning model depends on the context into which it is deployed. To capture this, we introduce ecosystem-level analysis: rather than analyzing a single model, we consider the collection of models that are deployed in a given context. Across three modalities (text, images, speech) and eleven datasets, we establish a clear trend: deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available. Even when individual models improve over time, we find these improvements rarely reduce the prevalence of systemic failure. In light of these trends, we analyze medical imaging for dermatology, a setting where the costs of systemic failure are especially high.
Unequal Use of Social Insurance Benefits: The Role of Employers
Journal of Econometrics, 2023
Abstract
Disability Insurance (DI) and Paid Family Leave (PFL) programs are important sources of social insurance, but there is considerable inequality in benefit take-up, and little is known about the role of firms in determining benefit use. Using administrative data from California, we find that firms that pay higher earnings premiums also have substantially higher public DI and PFL take-up rates, and that this relationship is particularly strong among the lowest-earning workers within the firm. Our results suggest that changes in firm behavior may impact social insurance use, thus reducing an important dimension of inequality in America.
Connecting Higher Education to Career Skills
PLOS One, 2023
Abstract
Higher education is a source of skill acquisition for many middle- and high-skilled jobs. But what specific skills do universities impart on students to prepare them for desirable careers? In this study, we analyze a large novel corpora of over one million syllabi from over eight hundred bachelors' granting US educational institutions to connect material taught in higher education to the detailed work activities in the US economy as reported by the US Department of Labor. First, we show how differences in taught skills both within and between college majors correspond to earnings differences of recent graduates. Further, we use the co-occurrence of taught skills across all of academia to predict the skills that will be taught in a major moving forward.
The Impacts of Paid Family Leave Benefits: Regression Kink Evidence from California Administrative Data
Journal of Policy Analysis and Management, 2020
Abstract
We use 10 years of California administrative data with a regression kink design to estimate the causal impacts of benefits in the first state-level paid family leave program for women with earnings near the maximum benefit threshold. We find no evidence that a higher weekly benefit amount (WBA) increases leave duration or leads to adverse future labor market outcomes for this group. In contrast, we document that a rise in the WBA leads to an increased likelihood of returning to the pre-leave firm (conditional on any employment) and of making a subsequent paid family leave claim.
Working Papers
Beyond Code: The Multidimensional Impacts of Large Language Models in Software Development
Working Paper, 2025
Abstract
While large language models (LLMs) appear to significantly impact software development, especially in the open-source software (OSS) sector, there is considerable uncertainty about their multi-faceted effects. We empirically examine how LLMs affect OSS developers' work on three key dimensions: code development, knowledge sharing, and skill acquisition. Leveraging a natural experiment from a temporary ChatGPT ban in Italy, we employ a Difference-in-Differences framework to analyze data on the universe of OSS developers on GitHub in Italy, France, and Portugal. We find that losing access to ChatGPT decreases code development by 6.4% and skill acquisition by 8.4%, while resumed access increases knowledge sharing by 9.6%. These benefits vary significantly by user experience level: novice developers primarily gain in code development, whereas more experienced developers benefit more from improved knowledge sharing and accelerated skill acquisition.
AI-Enabled Job Markets and Market Participation: A Field Experiment on How AI Shapes Jobseekers' Expectations of Competition
Working Paper, 2025
Abstract
Artificial intelligence is increasingly mediating how jobseekers are matched with employers, raising fundamental questions about how its use affects participation in labor markets. We examine whether AI-based matching alters jobseeker behavior -- specifically, their willingness to participate. Drawing on a field experiment with 4,562 jobseekers randomly assigned to disclosure conditions, we find that participation was about one-quarter lower when AI use was disclosed than in either a human-matching treatment or a control group with no source specified. Participation responses varied systematically, consistent with jobseekers forming expectations about how AI affects predicted match quality, the types of inputs and information used, and the scale of competition. Our findings highlight how AI disclosure can reshape both the level and composition of participation.
Algorithmic Monocultures in Hiring
Working Paper, 2025
Available upon request.
Fighting Fire with Fire: Infusing AI into Peer Review to Sustain Quality Scholarship
Under review at Management Science, 2025
Abstract
The accelerating use of generative AI tools in research raises a looming crisis in academic publishing: while author productivity increases, peer review capacity remains constrained, threatening the integrity and timeliness of scholarly evaluation. In this paper, we propose a hybrid review framework that integrates large language models (LLMs) into the journal review process as structured first-line reviewers. Our approach outlines a concrete editorial workflow wherein LLM-generated reviews are presented to authors for their response before human review. We then analyze our proposal through a formal analytical model and demonstrate that this AI-augmented process reduces the reviewer's burden and improves decision accuracy, particularly in response to increasing submission volumes.
AI-Exposed Jobs Deteriorated Before ChatGPT
Under review at Science, 2025
Abstract
Public debate links worsening job prospects for AI-exposed occupations to the release of ChatGPT in late 2022. Using monthly U.S. unemployment insurance records, we measure occupation- and location-specific unemployment risk and find that risk rose in AI-exposed occupations beginning in early 2022, months before ChatGPT. Analyzing millions of LinkedIn profiles, we show that graduate cohorts from 2021 onward entered AI-exposed jobs at lower rates than earlier cohorts, with gaps opening before late 2022. Finally, from millions of university syllabi, we find that graduates taking more AI-exposed curricula had higher first-job pay and shorter job searches after ChatGPT. Together, these results point to forces pre-dating generative AI and to the ongoing value of LLM-relevant education.
work2vec: Using Language Models to Understand Wage Premia
Working Paper, 2024
Abstract
Hedonic regressions have long helped economists understand how job characteristics contribute to earnings, but measurement challenges have limited which attributes could be analyzed systematically. Using a new dataset linking salary information to job posting data, I apply natural language processing techniques to decompose how different job characteristics contribute to earnings. The resulting model explains 83 percent of salary variation -- a 19 percent improvement over traditional occupation-location controls. Using an attribution method called integrated gradients, I identify which words most strongly predict salaries. I then develop an entity extraction model to categorize posting content into activities, amenities, education, experience, firm names, general and technical job skills, hours, job titles, and location. The analysis reveals that job activities dominate both in frequency and earnings relevance.
Estimating the Cost of Advance Notice for Firms Conducting Mass Layoffs
Working Paper, 2025
Abstract
This paper investigates the impact of advance notice requirements on firm behavior during large-scale workforce reductions in two Canadian provinces. We exploit discontinuous policy rules in Ontario's Employment Standards Act and Quebec's Act Respecting Labor Standards to estimate the cost of additional notice requirements for firms conducting mass terminations. Our novel empirical evidence shows that firms in both provinces strategically manipulate the scale of layoffs to circumvent additional notice obligations. In Ontario, we find an approximate 300% increase in the frequency of layoff events that bunch just below the 200-worker threshold where the mandatory notice period discontinuously increases from 8 to 12 weeks.