Extending 'GPTs Are GPTs' to Firms

GovAI · Research Paper · 5 pages

Type

Report

classification

Source

GovAI

publisher

Published

2025

May 1, 2025

Series

Research Paper

document class

Pages

source PDF

Words

2,787

full text on file

Topics

tagged subjects

Full text

On file

readable here

Source of record

GovAI

economicslabor

Abstract

Full text

51 AEA Papers and Proceedings 2025, 115: 51–55 https://doi.org/10.1257/pandp.20251045 Extending “GPTs Are GPTs” to Firms† By Benjamin Labaschin, Tyna Eloundou, Sam Manning, Pamela Mishkin, and Daniel Rock* Given the major role in the adoption and deployment path of artificial intelligence (AI) played by firms, corporate data on AI-related investments, opportunities, and risks are important inputs for researchers seeking to understand the impact of AI (Raj and Seamans 2018). We apply the approach and data from Eloundou et al. (2024) (“ GPTs Are GPTs”) to build firm-level aggregate large language model (LLM) exposure measures for a set of publicly traded firms. Eloundou et al. (2024) construct task- and occupation-level measures of “exposure” to LLMs such as OpenAI’s GPT-4 (Achiam et al. 2023). That work uses a rubric to evaluate the set of O*NET tasks (National Center for O*NET Development 2023) to document the extent to which the GPT-4 vintage of LLMs might help workers complete their tasks. The original Eloundou et al. (2024) paper does not include firm-level analyses, and this paper extends that work to demonstrate how task-level LLM exposure scores can be aggregated to draw firm-level descriptive statistics. Since often firm-level decision-making is a critical diffusion and development path for new technologies, understanding LLM exposure patterns at this grain can help inform understanding of which types of work have potential to change. There is a handful of other firm-level studies of AI potential and employment, often making use of large online résumé databases (Rock 2019; Alekseeva et al. 2020; Acemoglu et al. 2022; Eisfeldt, Schubert, and Zhang 2023; Babina et al. 2024; McElheran et al. 2024). We merge Eloundou et al.’s (2024) exposure scores by Standard Occupational Classification (SOC) code to online résumé data from Revelio Labs that detail corporate employment counts by SOC code in each month. We find that (i) companies with higher counts of technology workers tend to be more exposed to LLM capabilities, (ii) within exposure category differences across firms are not as large as differences between exposure categories, and (iii) companies with more AI-skilled employees also tend to have higher levels of LLM exposure. I. Data Our data come from two sources: (i) Eloundou et al. (2024) for occupation-level measures of LLM exposure and (ii) firm-level counts of employees in each occupation and total counts of AI skills for publicly traded firms. The second data source is Revelio Labs, which compiles online résumé data and other sources to create a panel of company labor inputs (Revelio Labs 2024). Eloundou et al. (2024) provide task-level labels of “E1” for tasks with exposure to LLMs of roughly GPT-4-level capabilities deployed through chat interfaces, “E2” for tasks that have exposure conditional on integration of LLMs 1 Exposure categories in this case refer to combinations of the Eloundou et al. (2024) groupings of E0, E1, and E2. For E1, the task is exposed as is to LLM tools. In E2, additional software tools are required to unlock potential LLM-based productivity gains. For E0, there is no opportunity for LLMs alone to help execute the task. Our categories are E1, E1 + 0.5 × E2, and E1 + E2, treating each label as a binary variable as in the original paper. An exposed task is one in which an LLM like GPT-4 can help reduce the time to complete the task by 50 percent or more without a drop in quality. 2 “Occupation” is operationalized as six-digit SOC codes. We use GPT-4-based labels, but results are qualitatively similar for human ratings.

* Labaschin: Workhelix (email: ben@workhelix.com); Eloundou: OpenAI (email: tyna@openai.com); Manning: Centre for the Governance of AI (email: sam.manning@governance.ai); Mishkin: independent (email: pamela.mishkin@gmail.com); Rock: University of Pennsylvania and Workhelix (rockdi@wharton.upenn.edu). This paper reflects Mishkin’s individual research and recommendations and was not written in her official capacity at OpenAI. We thank Dean Boerner and Ben Zweig of Revelio Labs for providing the data on firm-level occupational compositions and skills over time. We are grateful to Zara Contractor, Marina Mendes Tavares, and participants at the 2025 AEA meetings for helpful comments. † Go to https://doi.org/10.1257/pandp.20251045 to visit the article page for additional materials and author disclosure statement(s).Copyright American Economic Association; reproduced with permission. MAY 202552AEA PAPERS AND PROCEEDINGS with other software tooling and systems, and “E0” for no exposure to LLMs. We create aggregated LLM exposure scores by taking an employment-weighted average of the occupation-specific LLM scores within a company-month. We focus on June 2024 as a recent month with high-quality coverage for cross-sectional analyses. In any given month, we compute the firm-level exposure score for firm i in time t (Eit) from full-time employment (FTE) counts of each occupation category j as follows:

(1) Eit = ∑ j FT Eijt_ FT Eit × Ej. Following Eloundou et al. (2024), Ej is defined three ways: (i) the proportion of tasks in an occupation labeled “E1” (exposed to LLMs as is), (ii) the average within an occupation over tasks of E1 + 0.5 × E2, where “E1” or “E2” are coded as 1 for tasks with either of those labels and 0 for “E0,” and (iii) the average within an occupation over tasks of E1 + E2. The differences in exposure between E1 and E2 indicate the extent to which additional software systems can help yield productive applications of LLMs from the GPT-4 vintage. This suggests that incorporating LLMs into corporate systems, processes, and software greatly expands the potential for productivity gains (E1 and E2) relative to just using LLMs (E1). Since technology workers are both relatively highly exposed to LLM capabilities and directly valuable in building the technological software capital required to unlock E2 LLM applications, we also build a count measure of technology workers by aggregating FTEs in the following SOC occupations: computer programmers ( 15-1251), software developers ( 15-1252), software quality assurance analysts and testers ( 15-1253), data scientists ( 15-2051), and computer and information research scientists ( 15-1221). In addition to counts of workers in each company by SOC code, Revelio also provides counts of inferred AI skills by company. “AI skills” includes sums of workers with “machine learning,” “ scikit-learn,” “tensorflow,” or “artificial intelligence” on their résumés. The Revelio panel that we use spans from January 2013 to June 2024 and includes 7,963 unique companies. Since there is a lag in the availability of online résumé data, we concentrate our analysis on data from the most recent month for which we have access to a full cross section of 7,894 companies—June 2024. Summary statistics are in Table 1. All exposure measures are GPT-4-based labels from Eloundou et al. (2024), aggregated across tasks with equal weights to occupation-level statistics and merged to firm-occupation pairs by SOC code. Firm-occupation data are provided by Revelio Labs. For “E1 Exposure,” a task is given a score of 1 if E1 exposed, else 0. For E1 + 0.5 × E2 and E1 + E2, a task is given a score of 1 if E1 or E2 exposed. Occupation scores are computed as the average of these binary flags within an SOC category. The mean firm has E1 exposure of 0.17, E1 + 0.5 E2 exposure of 0.47, and E1 + E2 exposure (fully deployed software complements) of 0.77. Large corporations in our sample (including only publicly traded firms) have more workers at high levels of exposure than the economy overall. Only the top quartile of AI-hiring firms have multiple tagged AI skills across their workforce, but the average is 36 per company. Likewise, the largest technology employers have many more FTEs in the assigned technology worker categories, with the firm-level average at 451. Table 1—Summary Statistics for Firm-Level LLM Exposure, Skills, and Technology Workers, June 2024 E1 exposure E1 + 0.5 × E2 exposure E1 + E2 exposure AI skills Total skills Tech. FTEs Count 7,894 7,894 7,894 7,764 7,764 7,894 Mean 0.17 0.47 0.77 36 65,979 451 SD 0.08 0.09 0.13 804 383,874 5,039 Min. 0.00 0.00 0.00 0 0 0 25% 0.13 0.42 0.69 0 255 0 50% 0.16 0.47 0.79 0 1,898 3 75% 0.20 0.53 0.87 2 18,419 44 Max. 0.88 0.94 1.00 44,268 14,959,961 227,940 VOL. 11553EXTENDING “ GPTS ARE GPTS” TO FIRMS II. Firm-Level Heterogeneity in LLM Exposure for Large Companies A. LLM Exposure Differences The median firm in our sample has about 16 percent of their tasks exposed by the E1 rating, 47 percent by E1 + 0.5 × E2, and 79 percent at the E1 + E2 rating (that assumes that all complementary software is present). LLM exposure is pervasive across the Revelio sample. To the extent that technology workers are required to unlock E2-rated tasks, some firms are better positioned because of their software-focused workforce composition. In Figure 1, we see that companies with more software engineering employees on average have higher levels of exposure. Companies with more software-focused employees may have as many as 10 percent more E1-exposed tasks. Since tech workers have more opportunities to use LLMs, it stands to reason that their employers are more exposed to changes. Across rating categories, however, there are much bigger differences. If software engineering can help enable more LLM use cases, these firms may be at an advantage for faster adoption. The differences in the share of within-firm workers exposed to LLMs in Figure 1 are as high as 50 percentage points between E1-exposed (no software complements needed) and E1 + E2–exposed (all software complements deployed). This firm-level evidence bolsters the case made by Eloundou et al. (2024) that LLMs satisfy the GPT criteria of pervasiveness and facilitating complementary innovation. B. Artificial Intelligence Skills and Firm LLM Exposure Since the O*NET categories we used to define software engineering work in the previous section are themselves on the higher end of LLM exposure, we studied the distribution of AI skills in the Revelio database with respect to LLM exposure by firm. AI skills are concentrated in a small portion of the companies in our sample. Only about a quarter of companies have identified AI skills at all. As in Figure 2, the AI workers who have these abilities are more likely to be found in firms with higher Figure 1. GPT-4 Firm-Level Exposure Proportion versus Total Technology Worker Count by Firm Notes: Figure shows the binscatter of firm-level headcount-weighted average “E1,” “E1 + 0.5 × E2,” and “E1 + E2” exposure scores from Eloundou et al. (2024) plotted against the logged count of technology workers on the x-axis at the same company as of June 2024. We define technology workers as belonging to one of the SOC occupations listed in the text. 1 0.75 0.5 0.25 0 log technology worker count GPT-4-based exposure proportion 3 6 9 Rating level E1 + E2 E1 + 0.5 × E2 E1 MAY 202554AEA PAPERS AND PROCEEDINGS levels of exposure. Regressing logged AI skills on the E1 + 0.5 × E2 score and an intercept, we find a slope of 6.374 (robust standard error of 0.359). Comparing two firms with 0.5 difference in average exposure to LLMs, we would predict the firm with more LLM exposure to have approximately 24.2 more employees with AI skills if each employee has one tagged skill. This suggests that the potential for change from LLMs is perhaps especially high in companies that already have some AI capabilities. Of course at this time, our exposure measures do not say more about the adoption, development, and use dynamics of AI tooling. III. Conclusions and Caveats for Using Exposure Scores for Firm-Level Analyses AI and related technological exposure measures are limited in purpose and scope. The value of these measures is in identifying possible domains where example tasks (Brynjolfsson and Mitchell 2017; Brynjolfsson, Mitchell, and Rock 2018; Eloundou et al. 2024) or occupations (Felten, Raj, and Seamans 2018; Webb 2019; Felten, Raj, and Seamans 2023) might change as a result of a technological advancement at a specific point in time. They are not designed to directionally predict changes in labor market equilibrium outcomes like wages and employment, or whether AI will augment or automate labor. Likewise for firms, aggregates of exposure scores do not provide evidence as to whether managers will decide to automate tasks or change their demand for various roles. Rather, these measurements are helpful insofar as they can provide another view into the potential general-purpose nature of LLMs. Organizations with broad-based exposure have more potential for workforce activity change. High exposure levels across firms suggest another test of the pervasiveness of LLM potential. The presence of technology workers in firms with high levels of exposure 3 Besides nontechnological barriers to LLM use, the structure of labor markets before and after integration of these tools is not in any manner captured by exposure scores. Figure 2. Logged AI Skills versus LLM Exposure by Firm Notes: Figure is a binscatter of firm-level GPT-4-based E1 exposure versus log (technology worker FTEs). GPT-4-based LLM exposure scores are taken from Eloundou et al. (2024) at the task level and aggregated to six-digit SOC codes. These occupation-level scores are then merged to Revelio count data on SOC code. We compute the firm-level exposure as the headcount-weighted average exposure across all SOC codes as of June 2024. Revelio also provides AI skill counts. The y-axis is the log (AI skill count) estimate in June 2024, omitting firms with fewer than one AI skill. Blue dots correspond to binned averages. 0.2 0.3 0.4 0.5 GPT-4-based LLM exposure score (E1 + 0.5 × E2) log (AI skills) 0.6 0.7 10 8 6 4 Correlation: 0.29 Slope = 6.374 (SE = 0.359) VOL. 11555EXTENDING “ GPTS ARE GPTS” TO FIRMS indicates organizational potential to integrate software tools with LLMs, allowing workers to execute tasks more effectively. With the important caveat that our sampling frame from Revelio skews toward large, publicly traded firms, this augments the tests of satisfying general-purpose technology criteria of pervasiveness and “spawning complementary innovations” when considering AI as a potential general-purpose technology (GPT) (Bresnahan and Trajtenberg 1995; Lipsey, Carlaw, and Bekar 2005; Goldfarb, Taska, and Teodoridis 2023; Bresnahan 2024; Eloundou et al. 2024).

REFERENCES

Acemoglu, Daron, David Autor, Jonathon Hazell, and Pascual Restrepo. 2022. “Artificial Intelligence and Jobs: Evidence from Online Vacancies.” Journal of Labor Economics 40 (S1): S293–340. Achiam, Josh, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, et al. 2023. “GPT-4 Technical Report.” Preprint, arXiv. https://doi.org/10.48550/arXiv.2303.08774. Alekseeva, Liudmila, Mireia Gine, Sampsa Samila, and Bledi Taska. 2020. “AI Adoption and Firm Performance: Management versus IT.” Preprint, SSRN. http://dx.doi.org/10.2139/ssrn.3677237. Babina, Tania, Anastassia Fedyk, Alex He, and James Hodson. 2024. “Artificial Intelligence, Firm Growth, and Product Innovation.” Journal of Financial Economics 151: 103745. Bresnahan, Timothy. 2024. “What Innovation Paths for AI to Become a GPT?” Journal of Economics and Management Strategy 33 (2): 305–16. Bresnahan, Timothy F., and Manuel Trajtenberg. 1995. “General Purpose Technologies ‘Engines of Growth?’” Journal of Econometrics 65 (1): 83–108. Brynjolfsson, Erik, and Tom Mitchell. 2017. “What Can Machine Learning Do? Workforce Implications.” Science 358 (6370): 1530–34. Brynjolfsson, Erik, Tom Mitchell, and Daniel Rock. 2018. “What Can Machines Learn and What Does It Mean for Occupations and the Economy?” AEA Papers and Proceedings 108: 43–47. Eisfeldt, Andrea L., Gregor Schubert, and Miao Ben Zhang. 2023. “Generative AI and Firm Values.” NBER Working Paper 31222. Eloundou, Tyna, Sam Manning, Pamela Mishkin, and Daniel Rock. 2024. “GPTs Are GPTs: Labor Market Impact Potential of LLMs.” Science 384 (6702): 1306–08. Felten, Edward W., Manav Raj, and Robert Seamans. 2018. “A Method to Link Advances in Artificial Intelligence to Occupational Abilities.” AEA Papers and Proceedings 108: 54–57. Felten, Edward W., Manav Raj, and Robert Seamans. 2023. “Occupational Heterogeneity in Exposure to Generative AI.” Preprint, SSRN. http://dx.doi.org/10.2139/ssrn.4414065. Goldfarb, Avi, Bledi Taska, and Florenta Teodoridis. 2023. “Could Machine Learning Be a General Purpose Technology? A Comparison of Emerging Technologies Using Data from Online Job Postings.” Research Policy 52 (1): 104653. Lipsey, Richard G., Kenneth I. Carlaw, and Clifford T. Bekar. 2005. Economic Transformations: General Purpose Technologies and Long-Term Economic Growth. Oxford University Press. McElheran, Kristina, J. Frank Li, Erik Brynjolfsson, Zachary Kroff, Emin Dinlersoz, Lucia Foster, and Nikolas Zolas. 2024. “AI Adoption in America: Who, What, and Where.” Journal of Economics and Management Strategy 33 (2): 375–415. National Center for O*NET Development. 2023. “O*NET OnLine.” www.onetonline.org/ (accessed January 23, 2025). Raj, Manav, and Robert Seamans. 2018. “Artificial Intelligence, Labor, Productivity, and the Need for FirmLevel Data.” In The Economics of Artificial Intelligence: An Agenda, edited by Ajay Agrawal, Joshua Gans, and Avi Goldfarb, 553–65. University of Chicago Press. Revelio Labs. 2024. “Revelio Labs Data Dictionary: Datasets.” https://www.data-dictionary.reveliolabs.com/ data.html (accessed January 24, 2025). Rock, Daniel. 2019. “Engineering Value: The Returns to Technological Talent and Investments in Artificial Intelligence.” Preprint, SSRN. http://dx.doi.org/10.2139/ssrn.3427412. Webb, Michael. 2019. “The Impact of Artificial Intelligence on the Labor Market.” Preprint, SSRN. http://dx.doi. org/10.2139/ssrn.3482150.