-
An Evaluation of the Gender Wage Gap Using Linked Survey and Administrative Data
November 2020
Working Paper Number:
CES-20-34
The narrowing of the gender wage gap has slowed in recent decades. However, current estimates show that, among full-time year-round workers, women earn approximately 18 to 20 percent less than men at the median. Women's human capital and labor force characteristics that drive wages increasingly resemble men's, so remaining differences in these characteristics explain less of the gender wage gap now than in the past. As these factors wane in importance, studies show that others like occupational and industrial segregation explain larger portions of the gender wage gap. However, a major limitation of these studies is that the large datasets required to analyze occupation and industry effectively lack measures of labor force experience. This study combines survey and administrative data to analyze and improve estimates of the gender wage gap within detailed occupations, while also accounting for gender differences in work experience. We find a gender wage gap of 18 percent among full-time, year-round workers across 316 detailed occupation categories. We show the wage gap varies significantly by occupation: while wages are at parity in some occupations, gaps are as large as 45 percent in others. More competitive and hazardous occupations, occupations that reward longer hours of work, and those that have a larger proportion of women workers have larger gender wage gaps. The models explain less of the wage gap in occupations with these attributes. Occupational characteristics shape the conditions under which men and women work and we show these characteristics can make for environments that are more or less conducive to gender parity in earnings.
View Full
Paper PDF
-
Are Customs Records Consistent Across Countries? Evidence from the U.S. and Colombia
March 2020
Working Paper Number:
CES-20-11
In many countries, official customs records include identifying information on the exporting and importing firms involved in each shipment. This information allows researchers to study international business networks, offshoring patterns, and the micro-foundations of aggregate trade flows. It also provides the government with a basis for tariff assessments at the border. However, there are no mechanisms in place to ensure that the shipment-level information recorded by the exporting country is consistent with the shipment-level information recorded by the importing country. And to the extent that there are discrepancies, it is not clear how prevalent they are or what form they take. In this paper we explore these issues, both to enhance our understanding of the limitations of customs records, and to inform future discussions of possible revisions in the way they are collected.
Specifically, we match U.S.-bound export shipments that appear in Colombian Customs records (DIAN) with their counterparts in the US Customs records (LFTTD): U.S. import shipments from Colombia. Several patterns emerge. First, differences in the coverage of the two countries customs records lead to significant discrepancies in the official bilateral trade flow statistics of these two countries: the DIAN database records 8 percent fewer transactions than the LFTTD database over the sample period, and the average export shipment size in the DIAN is roughly 4 percent smaller than the corresponding import shipment size in the LFTTD. These discrepancies are not due to difference in minimum shipment sizes and they are not particular to a few sectors, though they are more common among small shipments and they evolve over time.
Second, if we rely exclusively on firms' names and addresses, ignoring other shipment characteristics (value, product code, etc.), we are able to match 85 percent of the value of U.S. imports from Colombia in our LFTTD sample with particular Colombian suppliers in the DIAN. Further, fully 97 percent of the value of Colombian exports to the U.S. can be mapped onto particular importers in the U.S. LFTTD.
Third, however, match rates at the shipment level within buyer-seller pairs are low. That is, while buyers and sellers can be paired up fairly accurately, only 25-30 percent of the individual transactions in the customs records of the two countries can be matched using fuzzy algorithms at reasonable tolerance levels.
Fourth, the manufacturer ID (MANUF_ID) that appears in the LFTTD implies there are roughly twice as many Colombian exporters as actually appear in the DIAN. And similar comments apply to an analogous MANUF_ID variable constructed from importer name and address information in the DIAN. Hence studies that treat each MANUF_ID value as a distinct firm are almost surely overstating the number of foreign firms that engage in trade with the U.S. by a substantial amount.
Finally, we conclude that if countries were to require that exporters report standardized shipment identifiers'either invoice numbers or bill of lading/air waybill numbers'it would be far easier to track individual transactions and to identify international discrepancies in reporting.
View Full
Paper PDF
-
Nonemployer Statistics by Demographics (NES-D):
Exploring Longitudinal Consistency and Sub-national Estimates
December 2019
Working Paper Number:
CES-19-34
Until recently, the quinquennial Survey of Business Owners (SBO) was the only source of information for U.S. employer and nonemployer businesses by owner demographic characteristics such as race, ethnicity, sex and veteran status. Now, however, the Nonemployer Statistics by Demographics series (NES-D) will replace the SBO's nonemployer component with reliable, and more frequent (annual) business demographic estimates with no additional respondent burden, and at lower imputation rates and costs. NES-D is not a survey; rather, it exploits existing administrative and census records to assign demographic characteristics to the universe of approximately 25 million (as of 2016) nonemployer businesses.
Although only in the second year of its research phase, NES-D is rapidly moving towards production, with a planned prototype or experimental version release of 2017 nonemployer data in 2020, followed by annual releases of the series. After the first year of research, we released a working paper (Luque et al., 2019) that assessed the viability of estimating nonemployer demographics exclusively with administrative records (AR) and census data. That paper used one year of data (2015) to produce preliminary tabulations of business counts at the national level. This year we expand that research in multiple ways by: i) examining the longitudinal consistency of administrative and census records coverage, and of our AR-based demographics estimates, ii) evaluating further coverage from additional data sources, iii) exploring estimates at the sub-national level, iv) exploring estimates by industrial sector, v) examining demographics estimates of business receipts as well as of counts, and vi) implementing imputation of missing demographic values.
Our current results are consistent with the main findings in Luque et al. (2019), and show that high coverage and demographic assignment rates are not the exception, but the norm. Specifically, we find that AR coverage rates are high and stable over time for each of the three years we examine, 2014-2016. We are able to identify owners for approximately 99 percent of nonemployer businesses (excluding C-corporations), 92 to 93 percent of identified nonemployer owners have no missing demographics, and only about 1 percent are missing three or more demographic characteristics in each of the three years. We also find that our demographics estimates are stable over time, with expected small annual changes that are consistent with underlying population trends in the U.S.. Due to data limitations, these results do not include C-corporations, which represent only 2 percent of nonemployer businesses and 4 percent of receipts.
Without added respondent burden and at lower imputation rates and costs, NES-D will provide high-quality business demographics estimates at a higher frequency (annual vs. every 5 years) than the SBO.
View Full
Paper PDF
-
Predicting the Effect of Adding a Citizenship Question to the 2020 Census
June 2019
Working Paper Number:
CES-19-18
The addition of a citizenship question to the 2020 census could affect the self-response rate, a key driver of the cost and quality of a census. We find that citizenship question response patterns in the American Community Survey (ACS) suggest that it is a sensitive question when asked about administrative record noncitizens but not when asked about administrative record citizens. ACS respondents who were administrative record noncitizens in 2017 frequently choose to skip the question or answer that the person is a citizen. We predict the effect on self-response to the entire survey by comparing mail response rates in the 2010 ACS, which included a citizenship question, with those of the 2010 census, which did not have a citizenship question, among households in both surveys. We compare the actual ACS-census difference in response rates for households that may contain noncitizens (more sensitive to the question) with the difference for households containing only U.S. citizens. We estimate that the addition of a citizenship question will have an 8.0 percentage point larger effect on self-response rates in households that may have noncitizens relative to those with only U.S. citizens. Assuming that the citizenship question does not affect unit self-response in all-citizen households and applying the 8.0 percentage point drop to the 28.1 % of housing units potentially having at least one noncitizen would predict an overall 2.2 percentage point drop in self-response in the 2020 census, increasing costs and reducing the quality of the population count.
View Full
Paper PDF
-
Nonemployer Statistics by Demographics (NES-D): Using Administrative and Census Records Data in Business Statistics
January 2019
Working Paper Number:
CES-19-01
The quinquennial Survey of Business Owners or SBO provided the only comprehensive source of information in the United States on employer and nonemployer businesses by the sex, race, ethnicity and veteran status of the business owners. The annual Nonemployer Statistics series (NES) provides establishment counts and receipts for nonemployers but contains no demographic information on the business owners. With the transition of the employer component of the SBO to the Annual Business Survey, the Nonemployer Statistics by Demographics series or NES-D represents the continuation of demographics estimates for nonemployer businesses. NES-D will leverage existing administrative and census records to assign demographic characteristics to the universe of approximately 24 million nonemployer businesses (as of 2015). Demographic characteristics include key demographics measured by the SBO (sex, race, Hispanic origin and veteran status) as well as other demographics (age, place of birth and citizenship status) collected but not imputed by the SBO if missing. A spectrum of administrative and census data sources will provide the nonemployer universe and demographics information. Specifically, the nonemployer universe originates in the Business Register; the Census Numident will provide sex, age, place of birth and citizenship status; race and Hispanic origin information will be obtained from multiple years of the decennial census and the American Community Survey; and the Department of Veteran Affairs will provide administrative records data on veteran status.
The use of blended data in this manner will make possible the production of NES-D, an annual series that will become the only source of detailed and comprehensive statistics on the scope, nature and activities of U.S. businesses with no paid employment by the demographic characteristics of the business owner. Using the 2015 vintage of nonemployers, initial results indicate that demographic information is available for the overwhelming majority of the universe of nonemployers. For instance, information on sex, age, place of birth and citizenship status is available for over 95 percent of the 24 million nonemployers while race and Hispanic origin are available for about 90 percent of them. These results exclude owners of C-corporations, which represent only 2 percent of nonemployer firms. Among other things, future work will entail imputation of missing demographics information (including that of C-corporations), testing the longitudinal consistency of the estimates, and expanding the set of characteristics beyond the demographics mentioned above. Without added respondent burden and at lower imputation rates and costs, NES-D will meet the needs of stakeholders as well as the economy as a whole by providing reliable estimates at a higher frequency (annual vs. every 5 years) and with a more timely dissemination schedule than the SBO.
View Full
Paper PDF
-
Development of Survey Questions on Robotics Expenditures and Use in U.S. Manufacturing Establishments
October 2018
Working Paper Number:
CES-18-44
The U.S. Census Bureau in partnership with a team of external researchers developed a series of questions on the use of robotics in U.S. manufacturing establishments. The questions include: (1) capital expenditures for new and used industrial robotic equipment in 2018, (2) number of industrial robots in operation in 2018, and (3) number of industrial robots purchased in 2018. These questions are to be included in the 2018 Annual Survey of Manufactures. This paper documents the background and cognitive testing process used for the development of these questions.
View Full
Paper PDF
-
A Portrait of U.S. Factoryless Goods Producers
October 2018
Working Paper Number:
CES-18-43
This paper evaluates the U.S. Census Bureau's most recent data collection efforts to classify business entities that engage in an extreme form of production fragmentation called 'factoryless' goods production. 'Factoryless' goods-producing entities outsource physical transformation activities while retaining ownership of the intellectual property and control of sales to customers. Responses to a special inquiry on the incidence of purchases of contract manufacturing services in combination with data on production inputs and outputs, intellectual property, and international trade is used to identify and document characteristics of 'factoryless' firms in the U.S. economy.
View Full
Paper PDF
-
Understanding the Quality of Alternative Citizenship Data Sources for the 2020 Census
August 2018
Working Paper Number:
CES-18-38R
This paper examines the quality of citizenship data in self-reported survey responses compared to administrative records and evaluates options for constructing an accurate count of resident U.S. citizens. Person-level discrepancies between survey-collected citizenship data and administrative records are more pervasive than previously reported in studies comparing survey and administrative data aggregates. Our results imply that survey-sourced citizenship data produce significantly lower estimates of the noncitizen share of the population than would be produced from currently available administrative records; both the survey-sourced and administrative data have shortcomings that could contribute to this difference. Our evidence is consistent with noncitizen respondents misreporting their own citizenship status and failing to report that of other household members. At the same time, currently available administrative records may miss some naturalizations and capture others with a delay. The evidence in this paper also suggests that adding a citizenship question to the 2020 Census would lead to lower self-response rates in households potentially containing noncitizens, resulting in higher fieldwork costs and a lower-quality population count.
View Full
Paper PDF
-
Occupational Classifications: A Machine Learning Approach
August 2018
Working Paper Number:
CES-18-37
Characterizing the work that people do on their jobs is a longstanding and core issue in labor economics. Traditionally, classification has been done manually. If it were possible to combine new computational tools and administrative wage records to generate an automated crosswalk between job titles and occupations, millions of dollars could be saved in labor costs, data processing could be sped up, data could become more consistent, and it might be possible to generate, without a lag, current information about the changing occupational composition of the labor market. This paper examines the potential to assign occupations to job titles contained in administrative data using automated, machine-learning approaches. We use a new extraordinarily rich and detailed set of data on transactional HR records of large firms (universities) in a relatively narrowly defined industry (public institutions of higher education) to identify the potential for machine-learning approaches to classify occupations.
View Full
Paper PDF
-
An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices
August 2018
Working Paper Number:
CES-18-35
Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy.
View Full
Paper PDF