This paper develops two algorithms. Algorithm I computes the exact, Gaussian, log-likelihood function, its exact, gradient vector, and an asymptotic approximation of its Hessian matrix, for discrete-time, linear, dynamic models in state-space form. Algorithm 2, derived from algorithm I, computes the exact, sample, information matrix of this likelihood function. The computed quantities are analytic (not numerical approximations) and should, therefore, be useful for reliably, quickly, and accurately: (i) checking local identifiability of parameters by checking the rank of the information matrix; (ii) using the gradient vector and Hessian matrix to compute maximum likelihood estimates of parameters with Newton methods; and, (iii) computing asymptotic covariances (Cramer-Rao bounds) of the parameter estimates with the Hessian or the information matrix. The principal contribution of the paper is algorithm 2, which extends to multivariate models the univariate results of Porat and Friedlander (1986). By relying on the Kalman filter instead of the Levinson-Durbin filter used by Porat and Friedlander, algorithms 1 and 2 can automatically handle any pattern of missing or linearly aggregated data. Although algorithm 1 is well known, it is treated in detail in order to make the paper self contained.
-
Disclosure Avoidance Techniques Used for the 1970 through 2010 Decennial Censuses of Population and Housing
November 2018
Working Paper Number:
CES-18-47
The U.S. Census Bureau conducts the decennial censuses under Title 13 of the U. S. Code with the Section 9 mandate to not 'use the information furnished under the provisions of this title for any purpose other than the statistical purposes for which it is supplied; or make any publication whereby the data furnished by any particular establishment or individual under this title can be identified; or permit anyone other than the sworn officers and employees of the Department or bureau or agency thereof to examine the individual reports (13 U.S.C. ' 9 (2007)).' The Census Bureau applies disclosure avoidance techniques to its publicly released statistical products in order to protect the confidentiality of its respondents and their data.
View Full
Paper PDF
-
Total Error and Variability Measures for the Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnTheMap
September 2020
Working Paper Number:
CES-20-30
We report results from the first comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total flow-employment, beginning-of-quarter employment, full quarter employment, average monthly earnings of full-quarter employees, and total quarterly payroll. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in On-TheMap (OTM), including OnTheMap for Emergency Management. We account for errors due to coverage; record-level non response; edit and imputation of item missing data; and statistical disclosure limitation. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs are a transition zone, where cells may be fit for use with caution. Tabulations involving one or two jobs, which are generally suppressed on fitness-for-use criteria in the QWI and synthesized in LODES, have substantial total variability but can still be used to estimate statistics for untabulated aggregates as long as the job count in the aggregate is more than 10.
View Full
Paper PDF
-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report
December 2023
Authors:
Lars Vilhuber,
John M. Abowd,
Ethan Lewis,
Nathan Goldschlag,
Robert Ashmead,
Daniel Kifer,
Philip Leclerc,
Rolando A. RodrÃguez,
Tamara Adams,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Scott Moore,
Ramy N. Tadros
Working Paper Number:
CES-23-63R
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level'individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all of the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act. You are reading the full technical report. For the summary paper see https://doi.org/10.1162/99608f92.4a1ebf70.
View Full
Paper PDF
-
Long Term Effects of Military Service on the Distribution of Earnings
August 2009
Working Paper Number:
CES-09-17
I estimate the long term effect of military service on quantiles of earnings and education using the Vietnam draft lottery eligibility status as an instrument. I compare the local quantile treatment effect estimator studied by Abadie, Angrist, and Imbens (2002) to the instrumental variables quantile regression technique developed by Chernozhukov and Hansen (2008). Ordinary quantile regression shows a large negative association between service in Vietnam and earnings of white men, with the effect increasing in magnitude for the upper quantiles. Quantile treatment effects estimates show the opposite pattern, although much smaller in magnitude, with a small negative effect at the lower end of the distribution, and a small positive effect at the upper end. This suggests the ordinary quantile result is due to heterogeneous selection effects. The two methods of quantile treatment effects estimation give similar results.
View Full
Paper PDF
-
Disclosure Limitation and Confidentiality Protection in Linked Data
January 2018
Working Paper Number:
CES-18-07
Confidentiality protection for linked administrative data is a combination of access modalities and statistical disclosure limitation. We review traditional statistical disclosure limitation methods and newer methods based on synthetic data, input noise infusion and formal privacy. We discuss how these methods are integrated with access modalities by providing three detailed examples. The first example is the linkages in the Health and Retirement Study to Social Security Administration data. The second example is the linkage of the Survey of Income and Program Participation to administrative data from the Internal Revenue Service and the Social Security Administration. The third example is the Longitudinal Employer-Household Dynamics data, which links state unemployment insurance records for workers and firms to a wide variety of censuses and surveys at the U.S. Census Bureau. For examples, we discuss access modalities, disclosure limitation methods, the effectiveness of those methods, and the resulting analytical validity. The final sections discuss recent advances in access modalities for linked administrative data.
View Full
Paper PDF
-
ENVIRONMENTAL REGULATION AND INDUSTRY EMPLOYMENT: A REASSESSMENT
July 2013
Working Paper Number:
CES-13-36
This paper examines the impact of environmental regulation on industry employment, using a structural model based on data from the Census Bureau's Pollution Abatement Costs and Expenditures Survey. This model was developed in an earlier paper (Morgenstern, Pizer, and Shih (2002) - MPS). We extend MPS by examining additional industries and additional years. We find widely varying estimates across industries, including many implausibly large positive employment effects. We explore several possible explanations for these results, without reaching a satisfactory conclusion. Our results call into question the frequent use of the average impacts estimated by MPS as a basis for calculating the quantitative impacts of new environmental regulations on employment.
View Full
Paper PDF
-
After the Storm: How Emergency Liquidity Helps Small Businesses Following Natural Disasters
April 2024
Working Paper Number:
CES-24-20
Does emergency credit prevent long-term financial distress? We study the causal effects of government-provided recovery loans to small businesses following natural disasters. The rapid financial injection might enable viable firms to survive and grow or might hobble precarious firms with more risk and interest obligations. We show that the loans reduce exit and bankruptcy, increase employment and revenue, unlock private credit, and reduce delinquency. These effects, especially the crowding-in of private credit, appear to reflect resolving uncertainty about repair. We do not find capital reallocation away from neighboring firms and see some evidence of positive spillovers on local entry.
View Full
Paper PDF
-
Using Partially Synthetic Microdata to Protect Sensitive Cells in Business Statistics
February 2016
Working Paper Number:
CES-16-10
We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau's Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions).
View Full
Paper PDF
-
Creating Linked Historical Data: An Assessment of the Census Bureau's Ability to Assign Protected Identification Keys to the 1960 Census
September 2014
Working Paper Number:
carra-2014-12
In order to study social phenomena over the course of the 20th century, the Census Bureau is investigating the feasibility of digitizing historical census records and linking them to contemporary data. However, historical censuses have limited personally identifiable information available to match on. In this paper, I discuss the problems associated with matching older censuses to contemporary data files, and I describe the matching process used to match a small sample of the 1960 census to the Social Security Administration Numeric Identification System.
View Full
Paper PDF
-
CONSTRUCTION OF REGIONAL INPUT-OUTPUT TABLES FROM ESTABLISHMENT-LEVEL MICRODATA: ILLINOIS, 1982
August 1993
Working Paper Number:
CES-93-12
This paper presents a new method for use in the construction of hybrid regional input-output tables, based primarily on individual returns from the Census of Manufactures. Using this method, input- output tables can be completed at a fraction of the cost and time involved in the completion of a full survey table. Special attention is paid to secondary production, a problem often ignored by input-output analysts. A new method to handle secondary production is presented. The method reallocates the amount of secondary production and its associated inputs, on an establishment basis, based on the assumption that the input structure for any given commodity is determined not by the industry in which the commodity was produced, but by the commodity itself -- the commodity-based technology assumption. A biproportional adjustment technique is used to perform the reallocations.
View Full
Paper PDF