-
Developing a Residence Candidate File for Use With Employer-Employee Matched Data
January 2017
Working Paper Number:
CES-17-40
This paper describes the Longitudinal Employer-Household Dynamics (LEHD) program's ongoing efforts to use administrative records in a predictive model that describes residence locations for workers. This project was motivated by the discontinuation of a residence file produced elsewhere at the U.S. Census Bureau. The goal of the Residence Candidate File (RCF) process is to provide the LEHD Infrastructure Files with residence information that maintains currency with the changing state of administrative sources and represents uncertainty in location as a probability distribution. The discontinued file provided only a single residence per person/year, even when contributing administrative data may have contained multiple residences. This paper describes the motivation for the project, our methodology, the administrative data sources, the model estimation and validation results, and the file specifications. We find that the best prediction of the person-place model provides similar, but superior, accuracy compared with previous methods and performs well for workers in the LEHD jobs frame. We outline possibilities for further improvement in sources and modeling as well as recommendations on how to use the preference weights in downstream processing.
View Full
Paper PDF
-
Small Business Growth and Failure during the Great Recession: The Role of House Prices, Race & Gender
November 2016
Working Paper Number:
carra-2016-08
Using 2002-2011 data from the Longitudinal Business Database linked to the 2002 and 2007 Survey of Business Owners, this paper explores whether (through a collateral channel) the rise in home prices over the early 2000's and their subsequent fall associated with the Great Recession had differential impacts on business performance across owner race, ethnicity and gender. We find that the employment growth rate of minority-owned firms, particularly black and Hispanic-owned firms, is more sensitive to changes in house prices than is that of their nonminority-owned counterparts.
View Full
Paper PDF
-
Evaluating the Use of Commercial Data to Improve Survey Estimates of Property Taxes
August 2016
Working Paper Number:
carra-2016-06
While commercial data sources offer promise to statistical agencies for use in production of official statistics, challenges can arise as the data are not collected for statistical purposes. This paper evaluates the use of 2008-2010 property tax data from CoreLogic, Inc. (CoreLogic), aggregated from county and township governments from around the country, to improve 2010 American Community Survey (ACS) estimates of property tax amounts for single-family homes. Particularly, the research evaluates the potential to use CoreLogic to reduce respondent burden, to study survey response error and to improve adjustments for survey nonresponse. The research found that the coverage of the CoreLogic data varies between counties as does the correspondence between ACS and CoreLogic property taxes. This geographic variation implies that different approaches toward using CoreLogic are needed in different areas of the country. Further, large differences between CoreLogic and ACS property taxes in certain counties seem to be due to conceptual differences between what is collected in the two data sources. The research examines three counties, Clark County, NV, Philadelphia County, PA and St. Louis County, MO, and compares how estimates would change with different approaches using the CoreLogic data. Mean county property tax estimates are highly sensitive to whether ACS or CoreLogic data are used to construct estimates. Using CoreLogic data in imputation modeling for nonresponse adjustment of ACS estimates modestly improves the predictive power of imputation models, although estimates of county property taxes and property taxes by mortgage status are not very sensitive to the imputation method.
View Full
Paper PDF
-
A Loan by any Other Name:
How State Policies Changed Advanced Tax Refund Payments
June 2016
Working Paper Number:
carra-2016-04
In this work, I examine the impact of state-level regulation of Refund Anticipation Loans (RALs) on the increase in the use of Refund Anticipation Checks (RACs) and on taxpayer outcomes. Both RALs and RACs are products offered by tax-preparers that provide taxpayers with an earlier refund (in the case of a RAL) or a temporary bank account from which tax preparation fees can be deducted (in the case of a RAC). Each product is costly compared with the value of the refund, and they are often marketed to low-income taxpayers who may be liquidity constrained or unbanked. States have responded to the potentially predatory nature of RALs through regulation, leading to a switch to RACs. Using zip-code-level tax data, I examine the effects of various state-level policies on RAL activity and the transition of tax-preparers to RACs. I then specifically analyze New Jersey's interest rate cap on RALs, a regulation that was accompanied by greater enforcement of existing tax-preparer regulations. Employing an empirical strategy that uses variation in taxpayer location, which should be uninfluenced by tax preparers' decisions to provide these products and a state's decision to regulate them, I find increases in RAL and RAC use for taxpayers living near New Jersey's border with another state. Furthermore, I find that these same border taxpayers reported more social program use and more persons per household - a finding that is in line with the results of similar research into the effects of short-term borrowing on family finances.
View Full
Paper PDF
-
Assessing Coverage and Quality of the 2007 Prototype Census Kidlink Database
September 2015
Working Paper Number:
carra-2015-07
The Census Bureau is conducting research to expand the use of administrative records data in censuses and surveys to decrease respondent burden and reduce costs while improving data quality. Much of this research (e.g., Rastogi and O''Hara (2012), Luque and Bhaskar (2014)) hinges on the ability to integrate multiple data sources by linking individuals across files. One of the Census Bureau's record linkage methodologies for data integration is the Person Identification Validation System or PVS. PVS assigns anonymous and unique IDs (Protected Identification Keys or PIKs) that serve as linkage keys across files. Prior research showed that integrating 'known associates' information into PVS's reference files could potentially enhance PVS's PIK assignment rates. The term 'known associates' refers to people that are likely to be associated with each other because of a known common link (such as family relationships or people sharing a common address), and thus, to be observed together in different files. One of the results from this prior research was the creation of the 2007 Census Kidlink file, a child-level file linking a child's Social Security Number (SSN) record to the SSN of those identified as the child's parents. In this paper, we examine to what extent the 2007 Census Kidlink methodology was able to link parents SSNs to children SSN records, and also evaluate the quality of those links. We find that in approximately 80 percent of cases, at least one parent was linked to the child's record. Younger children and noncitizens have a higher percentage of cases where neither parent could be linked to the child. Using 2007 tax data as a benchmark, our quality evaluation results indicate that in at least 90 percent of the cases, the parent-child link agreed with those found in the tax data. Based on our findings, we propose improvements to the 2007 Kidlink methodology to increase child-parent links, and discuss how the creation of the file could be operationalized moving forward.
View Full
Paper PDF
-
Business Dynamics of Innovating Firms: Linking U.S. Patents with Administrative Data on Workers and Firms
July 2015
Working Paper Number:
CES-15-19
This paper discusses the construction of a new longitudinal database tracking inventors and patent-owning firms over time. We match granted patents between 2000 and 2011 to administrative databases of firms and workers housed at the U.S. Census Bureau. We use inventor information in addition to the patent assignee firm name to and improve on previous efforts linking patents to firms. The triangulated database allows us to maximize match rates and provide validation for a large fraction of matches. In this paper, we describe the construction of the database and explore basic features of the data. We find patenting firms, particularly young patenting firms, disproportionally contribute jobs to the U.S. economy. We find patenting is a relatively rare event among small firms but that most patenting firms are nevertheless small, and that patenting is not as rare an event for the youngest firms compared to the oldest firms. While manufacturing firms are more likely to patent than firms in other sectors, we find most patenting firms are in the services and wholesale sectors. These new data are a product of collaboration within the U.S. Department of Commerce, between the U.S. Census Bureau and the U.S. Patent and Trademark Office.
View Full
Paper PDF
-
Matching Addresses between Household Surveys and Commercial Data
July 2015
Working Paper Number:
carra-2015-04
Matching third-party data sources to household surveys can benefit household surveys in a number of ways, but the utility of these new data sources depends critically on our ability to link units between data sets. To understand this better, this report discusses potential modifications to the existing match process that could potentially improve our matches. While many changes to the matching procedure produce marginal improvements in match rates, substantial increases in match rates can only be achieved by relaxing the definition of a successful match. In the end, the results show that the most important factor determining the success of matching procedures is the quality and composition of the data sets being matched.
View Full
Paper PDF
-
Exploring Administrative Records Use for Race and Hispanic Origin Item Non-Response
December 2014
Working Paper Number:
carra-2014-16
Race and Hispanic origin data are required to produce official statistics in the United States. Data collected through the American Community Survey and decennial census address missing data through traditional imputation methods, often relying on information from neighbors. These methods work well if neighbors share similar characteristics, however, the shape and patterns of neighborhoods in the United States are changing. Administrative records may provide more accurate data compared to traditional imputation methods for missing race and Hispanic origin responses. This paper first describes the characteristics of persons with missing demographic data, then assesses the coverage of administrative records data for respondents who do not answer race and Hispanic origin questions in Census data. The paper also discusses the distributional impact of using administrative records race and Hispanic origin data to complete missing responses in a decennial census or survey context.
View Full
Paper PDF
-
The EITC over the business cycle: Who benefits?
December 2014
Working Paper Number:
carra-2014-15
In this paper, I examine the impact of the Great Recession on Earned Income Tax Credit (EITC) eligibility. Because the EITC is structurally tied to earnings, the direction of this impact is not immediately obvious. Families who experience complete job loss for an entire tax year lose eligibility, while those experiencing underemployment (part-year employment, a reduction in hours, or spousal unemployment in married households) may become eligible. Determining the direction and magnitude of the impact is important for a number of reasons. The EITC has become the largest cash-transfer program in the U.S., and many low-earning families rely on it as a means of support in tough times. The program has largely been viewed as a replacement for welfare, enticing former welfare recipients into the labor force. However, the effectiveness of the EITC during a period of very high unemployment has not been assessed. To answer these questions, I first use the Current Population Survey (CPS) matched to Internal Revenue Service data from tax years 2005 to 2010 to assess patterns of employment and eligibility over the Great Recession for different labor-force groups. Results indicate that overall, EITC eligibility increased over the recession, but only among groups that were cushioned from total household earnings loss by marriage. I also use the 2006 CPS matched to tax data from 2005 through 2011 to examine changes in eligibility experienced by individuals over time. In assessing three competing causes of eligibility loss, I find that less-educated, unmarried women experienced a greater hazard of eligibility loss due a yearlong lack of earnings compared with other labor-market groups. I discuss the implications of these findings on the view of the EITC as a safety-net program.
View Full
Paper PDF
-
Do Doubled-up Families Minimize Household-level Tax Burden?
September 2014
Working Paper Number:
carra-2014-13
This paper examines a method of tax avoidance not previously studied: the sorting of dependent children among related filers who have 'doubled up' in a household for economic reasons. Using the Current Population Survey Annual Social and Economic Supplement (CPS ASEC) linked with 1040 data from the Internal Revenue Service (IRS), we examine households with children and at least two adult tax filers to determine whether the household minimizes income tax burden, and thus maximizes refunds, by optimally claiming dependents. We examine specifically the relationship between the Earned Income Tax Credit (EITC) and the sorting of dependent children among filers in households. We find the following: The propensity to sort increases as the number of filers who are potentially eligible for the EITC increases; sorting probability increases as the optimal household EITC amount increases; and among households with at least one EITC-eligible filer, the propensity to sort increases as the difference between modeled household EITC amount and the optimal amount increases. We also exploit the 2009 change in EITC benefit for families with three or more children, finding that the propensity to sort to exactly three children increased among EITC-eligible filers after the rule change. The results of this analysis improve our understanding of filing behavior, particularly how households form filing units and pool resources, and have implications for poverty measurement in complex households This presentation was given at the CARRA Seminar, July 16, 2014
View Full
Paper PDF