CREAT: Census Research Exploration and Analysis Tool

Are Customs Records Consistent Across Countries? Evidence from the U.S. and Colombia

March 2020

Working Paper Number:

CES-20-11

Abstract

In many countries, official customs records include identifying information on the exporting and importing firms involved in each shipment. This information allows researchers to study international business networks, offshoring patterns, and the micro-foundations of aggregate trade flows. It also provides the government with a basis for tariff assessments at the border. However, there are no mechanisms in place to ensure that the shipment-level information recorded by the exporting country is consistent with the shipment-level information recorded by the importing country. And to the extent that there are discrepancies, it is not clear how prevalent they are or what form they take. In this paper we explore these issues, both to enhance our understanding of the limitations of customs records, and to inform future discussions of possible revisions in the way they are collected. Specifically, we match U.S.-bound export shipments that appear in Colombian Customs records (DIAN) with their counterparts in the US Customs records (LFTTD): U.S. import shipments from Colombia. Several patterns emerge. First, differences in the coverage of the two countries customs records lead to significant discrepancies in the official bilateral trade flow statistics of these two countries: the DIAN database records 8 percent fewer transactions than the LFTTD database over the sample period, and the average export shipment size in the DIAN is roughly 4 percent smaller than the corresponding import shipment size in the LFTTD. These discrepancies are not due to difference in minimum shipment sizes and they are not particular to a few sectors, though they are more common among small shipments and they evolve over time. Second, if we rely exclusively on firms' names and addresses, ignoring other shipment characteristics (value, product code, etc.), we are able to match 85 percent of the value of U.S. imports from Colombia in our LFTTD sample with particular Colombian suppliers in the DIAN. Further, fully 97 percent of the value of Colombian exports to the U.S. can be mapped onto particular importers in the U.S. LFTTD. Third, however, match rates at the shipment level within buyer-seller pairs are low. That is, while buyers and sellers can be paired up fairly accurately, only 25-30 percent of the individual transactions in the customs records of the two countries can be matched using fuzzy algorithms at reasonable tolerance levels. Fourth, the manufacturer ID (MANUF_ID) that appears in the LFTTD implies there are roughly twice as many Colombian exporters as actually appear in the DIAN. And similar comments apply to an analogous MANUF_ID variable constructed from importer name and address information in the DIAN. Hence studies that treat each MANUF_ID value as a distinct firm are almost surely overstating the number of foreign firms that engage in trade with the U.S. by a substantial amount. Finally, we conclude that if countries were to require that exporters report standardized shipment identifiers'either invoice numbers or bill of lading/air waybill numbers'it would be far easier to track individual transactions and to identify international discrepancies in reporting.

Document Tags and Keywords

Keywords Keywords are automatically generated using KeyBERT, a powerful and innovative keyword extraction tool that utilizes BERT embeddings to ensure high-quality and contextually relevant keywords.

By analyzing the content of working papers, KeyBERT identifies terms and phrases that capture the essence of the text, highlighting the most significant topics and trends. This approach not only enhances searchability but provides connections that go beyond potentially domain-specific author-defined keywords.
:
report, import, export, shipment, tariff, exporting, exporter, importing, foreign, multinational, exported, imported, custom, importer

Tags Tags are automatically generated using a pretrained language model from spaCy, which excels at several tasks, including entity tagging.

The model is able to label words and phrases by part-of-speech, including "organizations." By filtering for frequent words and phrases labeled as "organizations", papers are identified to contain references to specific institutions, datasets, and other organizations.
:
Bureau of Labor Statistics, National Science Foundation, Service Annual Survey, Center for Economic Studies, Office of Management and Budget, Bureau of Economic Analysis, Employer Identification Number, Department of Labor, Business Register, United Nations, Longitudinal Firm Trade Transactions Database, Customs and Border Protection, General Education Development, Michigan Institute for Data Science

Similar Working Papers Similarity between working papers are determined by an unsupervised neural network model know as Doc2Vec.

Doc2Vec is a model that represents entire documents as fixed-length vectors, allowing for the capture of semantic meaning in a way that relates to the context of words within the document. The model learns to associate a unique vector with each document while simultaneously learning word vectors, enabling tasks such as document classification, clustering, and similarity detection by preserving the order and structure of words. The document vectors are compared using cosine similarity/distance to determine the most similar working papers. Papers identified with 🔥 are in the top 20% of similarity.

The 10 most similar working papers to the working paper 'Are Customs Records Consistent Across Countries? Evidence from the U.S. and Colombia' are listed below in order of similarity.