-
Producing U.S. Population Statistics Using Multiple Administrative Sources
November 2023
Working Paper Number:
CES-23-58
We identify several challenges encountered when constructing U.S. administrative record-based (AR-based) population estimates for 2020. Though the AR estimates are higher than the 2020 Census at the national level, they are over 15 percent lower in 5 percent of counties, suggesting that locational accuracy can be improved. Other challenges include how to achieve comprehensive coverage, maintain consistent coverage across time, filter out nonresidents and people not alive on the reference date, uncover missing links across person and address records, and predict demographic characteristics when multiple ones are reported or when they are missing. We discuss several ways of addressing these issues, e.g., building in redundancy with more sources, linking children to their parents' addresses, and conducting additional record linkage for people without Social Security Numbers and for addresses not initially linked to the Census Bureau's Master Address File. We discuss modeling to predict lower levels of geography for people lacking those geocodes, the probability that a person is a U.S. resident on the reference date, the probability that an address is the person's residence on the reference date, and the probability a person is in each demographic characteristic category. Regression results illustrate how many of these challenges and solutions affect the AR county population estimates.
View Full
Paper PDF
-
The Local Origins of Business Formation
July 2023
Working Paper Number:
CES-23-34
What locations generate more business ideas, and where are ideas more likely to turn into businesses? Using comprehensive administrative data on business applications, we analyze the spatial disparity in the creation of business ideas and the formation of new employer startups from these ideas. Startups per capita exhibit enormous variation across granular units of geography. We decompose this variation into variation in ideas per capita and in their rate of transition to startups, and find that both components matter. Observable local demographic, economic, financial, and business conditions accounts for a significant fraction of the variation in startups per capita, and more so for the variation in ideas per capita than in transition rate. Income, education, age, and foreign-born share are generally strong positive correlates of both idea generation and transition. Overall, the relationship of local conditions with ideas differs from that with transition rate in magnitude, and sometimes, in sign: certain conditions (notably, the African-American share of the population) are positively associated with ideas, but negatively with transition rates. We also find a close correspondence between the actual rank of locations in terms of startups per capita and the predicted rank based only on observable local conditions ' a result useful for characterizing locations with high startup activity.
View Full
Paper PDF
-
Propagation and Amplification of Local Productivity Spillovers
August 2022
Working Paper Number:
CES-22-32
This paper shows that local productivity spillovers can propagate throughout the economy through the plant-level networks of multi-region firms. Using confidential Census plant-level data, we find that large manufacturing plant openings not only raise the productivity of local plants but also of distant plants hundreds of miles away, which belong to multi-region firms that are exposed to the local productivity spillover through one of their plants. To quantify the significance of plant-level networks for the propagation and amplification of local productivity shocks, we develop and estimate a quantitative spatial model in which plants of multi-region firms are linked through shared knowledge. Counterfactual exercises show that while knowledge sharing through plant-level networks amplifies the aggregate effects of local productivity shocks, it can widen economic disparities between workers and regions in the economy.
View Full
Paper PDF
-
Changes in Metropolitan Area Definition, 1910-2010
February 2021
Working Paper Number:
CES-21-04
The Census Bureau was established as a permanent agency in 1902, as industrialization and urbanization were bringing about rapid changes in American society. The years following the establishment of a permanent Census Bureau saw the first attempts at devising statistical geography for tabulating statistics for large cities and their environs. These efforts faced several challenges owing to the variation in settlement patterns, political organization, and rates of growth across the United States. The 1910 census proved to be a watershed, as the Census Bureau offered a definition of urban places, established the first census tract boundaries for tabulating data within cities, and introduced the first standardized metropolitan area definition. It was not until the middle of the twentieth century, however, the Census Bureau in association with other statistical agencies had established a flexible standard metropolitan definition and a more consistent means of tabulating urban data. Since 1950, the rules for determining the cores and extent of metropolitan areas have been largely regarded as comparable. In the decades that followed, however, a number of rule changes were put into place that accounted for metropolitan complexity in differing ways, and these have been the cause of some confusion. Changes put into effect with the 2000 census represent a consensus of sorts for how to handle these issues.
View Full
Paper PDF
-
Validating Abstract Representations of Spatial Population Data while considering Disclosure Avoidance
February 2020
Working Paper Number:
CES-20-05
This paper furthers a research agenda for modeling populations along spatial networks and expands upon an empirical analysis to a full U.S. county (Gaboardi, 2019, Ch. 1,2). Specific foci are the necessity of, and methods for, validating and benchmarking spatial data when conducting social science research with aggregated and ambiguous population representations. In order to promote the validation of publicly-available data, access to highly-restricted census microdata was requested, and granted, in order to determine the levels of accuracy and error associated with a network-based population modeling framework. Primary findings reinforce the utility of a novel network allocation method'populated polygons to networks (pp2n) in terms of accuracy, computational complexity, and real runtime (Gaboardi, 2019, Ch. 2). Also, a pseudo-benchmark dataset's performance against the true census microdata shows promise in modeling populations along networks.
View Full
Paper PDF
-
Reservation Nonemployer and Employer Establishments: Data from U.S. Census Longitudinal Business Databases
December 2018
Working Paper Number:
CES-18-50
The presence of businesses on American Indian reservations has been difficult to analyze due to limited data. Akee, Mykerezi, and Todd (AMT; 2017) geocoded confidential data from the U.S. Census Longitudinal Business Database to identify whether employer establishments were located on or off American Indian reservations and then compared federally recognized reservations and nearby county areas with respect to their per capita number of employers and jobs. We use their methods and the U.S. Census Integrated Longitudinal Business Database to develop parallel results for nonemployer establishments and for the combination of employer and nonemployer establishments. Similar to AMT's findings, we find that reservations and nearby county areas have a similar sectoral distribution of nonemployer and nonemployer-plus-employer establishments, but reservations have significantly fewer of them in nearly all sectors, especially when the area population is below 15,000. By contrast to AMT, the average size of reservation nonemployer establishments, as measured by revenue (instead of the jobs measure AMT used for employers), is smaller than the size of nonemployers in nearby county areas, and this is true in most industries as well. The most significant exception is in the retail sector. Geographic and demographic factors, such as population density and per capita income, statistically account for only a small portion of these differences. However, when we assume that nonemployer establishments create the equivalent of one job and use combined employer-plus-nonemployer jobs to measure establishment size, the employer job numbers dominate and we parallel AMT's finding that, due to large job counts in the Arts/Entertainment/Recreation and Public Administration sectors, reservations on average have slightly more jobs per resident than nearby county areas.
View Full
Paper PDF
-
Locally Owned Bank Commuting Zone Concentration and Employer Start-Ups in Metropolitan, Micropolitan and Non-Core Rural Commuting Zones from 1970-2010
August 2018
Working Paper Number:
CES-18-34
Access to financial capital is vital for the sustainability of the local business sector in metropolitan and nonmetropolitan communities. Recent research on the restructuring of the financial industry from local owned banks to interstate conglomerates has raised questions about the impact on rural economies. In this paper, we begin our exploration of the Market Concentration Hypothesis and the Local Bank Hypothesis. The former proposes that there is a negative relationship between the percent of banks that are locally owned in the local economy and the rate of business births and continuations, and a positive effect on business deaths, while that latter proposes that there is a positive relationship between the percent of banks that are locally owned in the local economy and the rate of business births and continuations, and a negative effect on business deaths. To examine these hypotheses, we examine the impact of bank ownership concentration (percent of banks that are locally owned in a commuting zone) on business establishment births and deaths in metropolitan, micropolitan and non-core rural commuting zones. We employ panel regression models for the 1980-2010 time frame, demonstrating robustness to several specifications and spatial spillover effects. We find that local bank concentration is positively related to business dynamism in rural commuting zones, providing support to the importance of relational lending in rural areas, while finding support for the importance of market concentration in urban areas. The implications of this research are important for rural sociology, regional economics, and finance.
View Full
Paper PDF
-
Who are the people in my neighborhood? The 'contextual fallacy' of measuring individual context with census geographies
February 2018
Working Paper Number:
CES-18-11
Scholars deploy census-based measures of neighborhood context throughout the social sciences and epidemiology. Decades of research confirm that variation in how individuals are aggregated into geographic units to create variables that control for social, economic or political contexts can dramatically alter analyses. While most researchers are aware of the problem, they have lacked the tools to determine its magnitude in the literature and in their own projects. By using confidential access to the complete 2010 U.S. Decennial Census, we are able to construct'for all persons in the US'individual-specific contexts, which we group according to the Census-assigned block, block group, and tract. We compare these individual-specific measures to the published statistics at each scale, and we then determine the magnitude of variation in context for an individual with respect to the published measures using a simple statistic, the standard deviation of individual context (SDIC). For three key measures (percent Black, percent Hispanic, and Entropy'a measure of ethno-racial diversity), we find that block-level Census statistics frequently do not capture the actual context of individuals within them. More problematic, we uncover systematic spatial patterns in the contextual variables at all three scales. Finally, we show that within-unit variation is greater in some parts of the country than in others. We publish county-level estimates of the SDIC statistics that enable scholars to assess whether mis-specification in context variables is likely to alter analytic findings when measured at any of the three common Census units.
View Full
Paper PDF
-
Reservation Employer Establishments: Data from the U.S. Census Longitudinal Business Database
January 2017
Working Paper Number:
CES-17-57
The presence of employers and jobs on American Indian reservations has been difficult to analyze due to limited data. We are the first to geocode confidential data on employer establishments from the U.S. Census Longitudinal Business Database to identify location on or off American Indian reservations. We identify the per capita establishment count and jobs in reservation-based employer establishments for most federally recognized reservations. Comparisons to nearby non-reservation areas in the lower 48 states across 18 industries reveal that reservations have a similar sectoral distribution of employer establishments but have significantly fewer of them in nearly all sectors, especially when the area population is below 15,000 (as it is on the vast majority of reservations and for the majority of the reservation population). By contrast, the total number of jobs provided by reservation establishments is, on average, at par with or somewhat higher than in nearby county areas but is concentrated among casino-related and government employers. An implication is that average job numbers per establishment are higher in these sectors on reservations, including those with populations below 15,000, while the remaining industries are typically sparser within reservations (in firm count and jobs per capita). Geographic and demographic factors, such as population density and per capita income, statistically account for some but not all of these differences.
View Full
Paper PDF
-
Geography in Reduced Form
January 2017
Working Paper Number:
CES-17-10
Geography models have introduced and estimated a set of competing explanations for the persistent relationships between firm and location characteristics, but cannot identify these forces. I introduce a solution method for models in arbitrary geographies that generates reduced-form predictions and tests to identify forces acting through geographic linkages. This theoretical approach creates a new strategy for spatial empirics. Using the correct observables, the model shows that geographic forces can be taken into account without being directly estimated; establishment and employment density emerge as sufficient statistics for all geographic forces. I present two applications. First, the model can be used to evaluate whether geographic linkages matter and when simplified models suffice: the mono-centric model is a good fit for business services firms but cannot capture the geography of manufactures. Second, the model generates reduced-form tests that distinguish between spillovers and firm sorting and finds evidence of sorting.
View Full
Paper PDF