Creating Linked Historical Data: An Assessment of the Census Bureau's Ability to Assign Protected Identification Keys to the 1960 Census
September 2014
Working Paper Number:
carra-2014-12
Abstract
Document Tags and Keywords
Keywords
Keywords are automatically generated using KeyBERT, a powerful and innovative
keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant
keywords.
By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the
text, highlighting the most significant topics and trends. This approach not only enhances searchability but
provides connections that go beyond potentially domain-specific author-defined keywords.
:
data,
database,
census data,
microdata,
census research,
record,
matched,
matching,
population,
demography,
ancestry,
census bureau,
census file,
records census,
ssa,
census use,
datasets,
census records
Tags
Tags are automatically generated using a pretrained language model from spaCy, which excels at
several tasks, including entity tagging.
The model is able to label words and phrases by part-of-speech,
including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are
identified to contain references to specific institutions, datasets, and other organizations.
:
Social Security Administration,
Service Annual Survey,
Administrative Records,
Current Population Survey,
1940 Census,
Research Data Center,
American Community Survey,
Social Security Number,
Protected Identification Key,
2010 Census,
Minnesota Population Center,
Person Validation System,
Center for Administrative Records Research,
Person Identification Validation System,
Center for Administrative Records Research and Applications,
SSA Numident,
Personally Identifiable Information
Similar Working Papers
Similarity between working papers are determined by an unsupervised neural
network model
know as Doc2Vec.
Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the
capture of semantic meaning in a way that relates to the context of words within the document. The model learns to
associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as
document classification, clustering, and similarity detection by preserving the order and structure of words. The
document vectors are compared using cosine similarity/distance to determine the most similar working papers.
Papers identified with 🔥 are in the top 20% of similarity.
The 10 most similar working papers to the working paper 'Creating Linked Historical Data: An Assessment of the Census Bureau's Ability to Assign Protected Identification Keys to the 1960 Census' are listed below in order of similarity.
-
Working PaperPlaying with Matches: An Assessment of Accuracy in Linked Historical Data🔥
June 2016
Working Paper Number:
carra-2016-05
This paper evaluates linkage quality achieved by various record linkage techniques used in historical demography. I create benchmark, or truth, data by linking the 2005 Current Population Survey Annual Social and Economic Supplement to the Social Security Administration's Numeric Identification Syst...View Full Paper PDF
-
Working PaperPerson Matching in Historical Files using the Census Bureau's Person Validation System🔥
September 2014
Working Paper Number:
carra-2014-11
The recent release of the 1940 Census manuscripts enables the creation of longitudinal data spanning the whole of the twentieth century. Linked historical and contemporary data would allow unprecedented analyses of the causes and consequences of health, demographic, and economic change. The Census B...View Full Paper PDF
-
Working PaperWhere Are Your Parents? Exploring Potential Bias in Administrative Records on Children🔥
March 2024
Working Paper Number:
CES-24-18
This paper examines potential bias in the Census Household Composition Key's (CHCK) probabilistic parent-child linkages. By linking CHCK data to the American Community Survey (ACS), we reveal disparities in parent-child linkages among specific demographic groups and find that characteristics of chil...View Full Paper PDF
-
Working PaperAssessing Coverage and Quality of the 2007 Prototype Census Kidlink Database
September 2015
Working Paper Number:
carra-2015-07
The Census Bureau is conducting research to expand the use of administrative records data in censuses and surveys to decrease respondent burden and reduce costs while improving data quality. Much of this research (e.g., Rastogi and O''Hara (2012), Luque and Bhaskar (2014)) hinges on the ability to i...View Full Paper PDF
-
Working PaperThe Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey
April 2014
Working Paper Number:
carra-2014-08
Record linkage across survey and administrative records sources can greatly enrich data and improve their quality. The linkage can reduce respondent burden and nonresponse follow-up costs. This is particularly important in an era of declining survey response rates and tight budgets. Record linkage a...View Full Paper PDF
-
Working PaperComparison of Child Reporting in the American Community Survey and Federal Income Tax Returns Based on California Birth Records
September 2024
Working Paper Number:
CES-24-55
This paper takes advantage of administrative records from California, a state with a large child population and a significant historical undercount of children in Census Bureau data, dependent information in the Internal Revenue Service (IRS) Form 1040 records, and the American Community Survey to c...View Full Paper PDF
-
Working PaperThe Person Identification Validation System (PVS): Applying the Center for Administrative Records Research and Applications' (CARRA) Record Linkage Software
July 2014
Working Paper Number:
carra-2014-01
The Census Bureau's Person Identification Validation System (PVS) assigns unique person identifiers to federal, commercial, census, and survey data to facilitate linkages across and within files. PVS uses probabilistic matching to assign a unique Census Bureau identifier for each person. The PVS mat...View Full Paper PDF
-
Working PaperCoverage of Children in the American Community Survey Based on California Birth Records
September 2023
Working Paper Number:
CES-23-46
The U.S. Census Bureau's American Community Survey (ACS) collects information on individuals and households. The ACS provides survey-based estimates of children drawn from a sample of the U.S. population. However, survey responses may not match administrative records, such as birth records. Birth re...View Full Paper PDF
-
Working PaperMatching Addresses between Household Surveys and Commercial Data
July 2015
Working Paper Number:
carra-2015-04
Matching third-party data sources to household surveys can benefit household surveys in a number of ways, but the utility of these new data sources depends critically on our ability to link units between data sets. To understand this better, this report discusses potential modifications to the exist...View Full Paper PDF
-
Working PaperThe Use of Administrative Records and the American Community Survey to Study the Characteristics of Undercounted Young Children in the 2010 Census
May 2018
Working Paper Number:
carra-2018-05
Children under age five are historically one of the most difficult segments of the population to enumerate in the U.S. decennial census. The persistent undercount of young children is highest among Hispanics and racial minorities. In this study, we link 2010 Census data to administrative records fro...View Full Paper PDF