CREAT: Census Research Exploration and Analysis Tool

Papers Containing Keywords(s): 'census records'

The following papers contain search terms that you selected. From the papers listed below, you can navigate to the PDF, the profile page for that working paper, or see all the working papers written by an author. You can also explore tags, keywords, and authors that occur frequently within these papers.
Click here to search again

Frequently Occurring Concepts within this Search

Viewing papers 1 through 10 of 15


  • Working Paper

    Geographic Immobility in the United States: Assessing the Prevalence and Characteristics of Those Who Never Migrate Across State Lines Using Linked Federal Tax Microdata

    March 2025

    Working Paper Number:

    CES-25-19

    This paper explores the prevalence and characteristics of those who never migrate at the state scale in the U.S. Studying people who never migrate requires regular and frequent observation of their residential location for a lifetime, or at least for many years. A novel U.S. population-sized longitudinal dataset that links individual level Internal Revenue Service (IRS) and Social Security Administration (SSA) administrative records supplies this information annually, along with information on income and socio-demographic characteristics. We use these administrative microdata to follow a cohort aged between 15 and 50 in 2001 from 2001 to 2016, differentiating those who lived in the same state every year during this period (i.e., never made an interstate move) from those who lived in more than one state (i.e., made at least one interstate move). We find those who never made an interstate move comprised 75 percent of the total population of this age cohort. This percentage varies by year of age but never falls below 62 percent even for those who were teenagers or young adults in 2001. There are also variations in these percentages by sex, race, nativity, and income, with the latter having the largest effects. We also find substantial variation in these percentages across states. Our findings suggest a need for more research on geographically immobile populations in U.S.
    View Full Paper PDF
  • Working Paper

    Gradient Boosting to Address Statistical Problems Arising from Non-Linkage of Census Bureau Datasets

    June 2024

    Working Paper Number:

    CES-24-27

    This article introduces the twangRDC package, which contains functions to address non-linkage in US Census Bureau datasets. The Census Bureau's Person Identification Validation System facilitates data linkage by assigning unique person identifiers to federal, third party, decennial census, and survey data. Not all records in these datasets can be linked to the reference file and as such not all records will be assigned an identifier. This article is a tutorial for using the twangRDC to generate nonresponse weights to account for non-linkage of person records across US Census Bureau datasets.
    View Full Paper PDF
  • Working Paper

    Producing U.S. Population Statistics Using Multiple Administrative Sources

    November 2023

    Working Paper Number:

    CES-23-58

    We identify several challenges encountered when constructing U.S. administrative record-based (AR-based) population estimates for 2020. Though the AR estimates are higher than the 2020 Census at the national level, they are over 15 percent lower in 5 percent of counties, suggesting that locational accuracy can be improved. Other challenges include how to achieve comprehensive coverage, maintain consistent coverage across time, filter out nonresidents and people not alive on the reference date, uncover missing links across person and address records, and predict demographic characteristics when multiple ones are reported or when they are missing. We discuss several ways of addressing these issues, e.g., building in redundancy with more sources, linking children to their parents' addresses, and conducting additional record linkage for people without Social Security Numbers and for addresses not initially linked to the Census Bureau's Master Address File. We discuss modeling to predict lower levels of geography for people lacking those geocodes, the probability that a person is a U.S. resident on the reference date, the probability that an address is the person's residence on the reference date, and the probability a person is in each demographic characteristic category. Regression results illustrate how many of these challenges and solutions affect the AR county population estimates.
    View Full Paper PDF
  • Working Paper

    Noncitizen Coverage and Its Effects on U.S. Population Statistics

    August 2023

    Working Paper Number:

    CES-23-42

    We produce population estimates with the same reference date, April 1, 2020, as the 2020 Census of Population and Housing by combining 31 types of administrative record (AR) and third-party sources, including several new to the Census Bureau with a focus on noncitizens. Our AR census national population estimate is higher than other Census Bureau official estimates: 1.8% greater than the 2020 Demographic Analysis high estimate, 3.0% more than the 2020 Census count, and 3.6% higher than the vintage-2020 Population Estimates Program estimate. Our analysis suggests that inclusion of more noncitizens, especially those with unknown legal status, explains the higher AR census estimate. About 19.8% of AR census noncitizens have addresses that cannot be linked to an address in the 2020 Census collection universe, compared to 5.7% of citizens, raising the possibility that the 2020 Census did not collect data for a significant fraction of noncitizens residing in the United States under the residency criteria used for the census. We show differences in estimates by age, sex, Hispanic origin, geography, and socioeconomic characteristics symptomatic of the differences in noncitizen coverage.
    View Full Paper PDF
  • Working Paper

    Improving Estimates of Neighborhood Change with Constant Tract Boundaries

    May 2022

    Working Paper Number:

    CES-22-16

    Social scientists routinely rely on methods of interpolation to adjust available data to their research needs. This study calls attention to the potential for substantial error in efforts to harmonize data to constant boundaries using standard approaches to areal and population interpolation. We compare estimates from a standard source (the Longitudinal Tract Data Base) to true values calculated by re-aggregating original 2000 census microdata to 2010 tract areas. We then demonstrate an alternative approach that allows the re-aggregated values to be publicly disclosed, using 'differential privacy' (DP) methods to inject random noise to protect confidentiality of the raw data. The DP estimates are considerably more accurate than the interpolated estimates. We also examine conditions under which interpolation is more susceptible to error. This study reveals cause for greater caution in the use of interpolated estimates from any source. Until and unless DP estimates can be publicly disclosed for a wide range of variables and years, research on neighborhood change should routinely examine data for signs of estimation error that may be substantial in a large share of tracts that experienced complex boundary changes.
    View Full Paper PDF
  • Working Paper

    Foreign-Born and Native-Born Migration in the U.S.: Evidence from IRS Administrative and Census Survey Records

    July 2018

    Working Paper Number:

    carra-2018-07

    This paper details efforts to link administrative records from the Internal Revenue Service (IRS) to American Community Survey (ACS) and 2010 Census microdata for the study of migration among foreign-born and native-born populations in the United States. Specifically, we (1) document our linkage strategy and methodology for inferring migration in IRS records; (2) model selection into and survival across IRS records to determine suitability for research applications; and (3) gauge the efficacy of the IRS records by demonstrating how they can be used to validate and potentially improve migration responses for native-born and foreign-born respondents in ACS microdata. Our results show little evidence of selection or survival bias in the IRS records, suggesting broad generalizability to the nation as a whole. Moreover, we find that the combined IRS 1040, 1099, and W2 records may provide important information on populations, such as the foreign-born, that may be difficult to reach with traditional Census Bureau surveys. Finally, while preliminary, the results of our comparison of IRS and ACS migration responses shows that IRS records may be useful in improving ACS migration measurement for respondents whose migration response is proxy, allocated, or imputed. Taking these results together, we discuss the potential application of our longitudinal IRS dataset to innovations in migration research on both the native-born and foreign-born populations of the United States.
    View Full Paper PDF
  • Working Paper

    The Opportunities and Challenges of Linked IRS Administrative and Census Survey Records in the Study of Migration

    July 2018

    Working Paper Number:

    carra-2018-06

    This paper details efforts to link administrative records from the Internal Revenue Service (IRS) to American Community Survey (ACS) and 2010 Census microdata for the study of migration in the United States. Specifically, we (1) document our linkage strategy and methodology for inferring migration in IRS records; (2) model selection into and survival across IRS records to determine suitability for research applications; and (3) gauge the efficacy of the IRS records by demonstrating how they can be used to validate and potentially improve migration responses in ACS microdata. Our results show little evidence of selection or survival bias in the IRS records, suggesting broad generalizability to the nation as a whole. Moreover, we find that the combined IRS 1040, 1099, and W2 records may provide important information on populations that are hard to reach with traditional Census surveys. Finally, while preliminary, the results of our comparison of IRS and ACS migration responses shows that IRS records may be useful in improving ACS migration measurement for respondents whose migration response is proxy, allocated, or imputed. Taking these results together, we discuss the potential applications of our longitudinal IRS dataset to innovations in migration research in the United States.
    View Full Paper PDF
  • Working Paper

    The Use of Administrative Records and the American Community Survey to Study the Characteristics of Undercounted Young Children in the 2010 Census

    May 2018

    Working Paper Number:

    carra-2018-05

    Children under age five are historically one of the most difficult segments of the population to enumerate in the U.S. decennial census. The persistent undercount of young children is highest among Hispanics and racial minorities. In this study, we link 2010 Census data to administrative records from government and third party data sources, such as Medicaid enrollment data and tenant rental assistance program records from the Department of Housing and Urban Development, to identify differences between children reported and not reported in the 2010 Census. In addition, we link children in administrative records to the American Community Survey to identify various characteristics of households with children under age five who may have been missed in the last census. This research contributes to what is known about the demographic, socioeconomic, and household characteristics of young children undercounted by the census. Our research also informs the potential benefits of using administrative records and surveys to supplement the U.S. Census Bureau child population enumeration efforts in future decennial censuses.
    View Full Paper PDF
  • Working Paper

    When Race and Hispanic Origin Reporting are Discrepant Across Administrative Records and Third Party Sources: Exploring Methods to Assign Responses

    December 2015

    Working Paper Number:

    carra-2015-08

    The U.S. Census Bureau is researching uses of administrative records and third party data in survey and decennial census operations. One potential use of administrative records is to utilize these data when race and Hispanic origin responses are missing. When federal and third party administrative records are compiled, race and Hispanic origin responses are not always the same for an individual across sources. We explore different methods to assign one race and one Hispanic response when these responses are discrepant. We also describe the characteristics of individuals with matching, non-matching, and missing race and Hispanic origin data by demographic, household, and contextual variables. We find that minorities, especially Hispanics, are more likely to have non-matching Hispanic origin and race responses in administrative records and third party data compared to the 2010 Census. Minority groups and individuals ages 0-17 are more likely to have missing race or Hispanic origin data in administrative records and third party data. Larger households tend to have more missing race data in administrative records and third party data than smaller households.
    View Full Paper PDF
  • Working Paper

    Coverage and Agreement of Administrative Records and 2010 American Community Survey Demographic Data

    November 2014

    Working Paper Number:

    carra-2014-14

    The U.S. Census Bureau is researching possible uses of administrative records in decennial census and survey operations. The 2010 Census Match Study and American Community Survey (ACS) Match Study represent recent efforts by the Census Bureau to evaluate the extent to which administrative records provide data on persons and addresses in the 2010 Census and 2010 ACS. The 2010 Census Match Study also examines demographic response data collected in administrative records. Building on this analysis, we match data from the 2010 ACS to federal administrative records and third party data as well as to previous census data and examine administrative records coverage and agreement of ACS age, sex, race, and Hispanic origin responses. We find high levels of coverage and agreement for sex and age responses and variable coverage and agreement across race and Hispanic origin groups. These results are similar to findings from the 2010 Census Match Study.
    View Full Paper PDF