An Evaluation of Using Linked Survey and Administrative Data to Impute Nonfilers to the Population of Tax Return Filers: Working Paper 2017-06
Working Paper
Data from the Census Bureau linked to individual tax returns are used to obtain demographic and income characteristics of filers and nonfilers; results are compared with those obtained using statistical matches of publicly available data.
Administrative tax return data are increasingly used for policy analysis and economic research. A potential weakness of that data source is that not everyone is required to file a tax return, even though information on the characteristics of those nonfilers is desirable for the analysis of various tax policies and tax administration. In this paper, I use data from the Census Bureau’s Current Population Survey (CPS) linked to administrative tax return data to obtain demographic and income characteristics of filers and nonfilers. Those linked data are also used to model an individual’s filing decision. In the absence of linked data, researchers rely on statistical matches of publicly available data—typically from the CPS and a sample of tax returns—to simulate filers and nonfilers in the population. I evaluate two statistical matches on the basis of how similar simulated filers and nonfilers are to filers and nonfilers in the linked data. The first method statistically matches records from the CPS and a public use file of tax returns by predicted income, and the second method uses the predicted probability of filing. I find that income and demographic characteristics for simulated filers under both methods are generally similar to those of filers in the linked data, but larger differences in income appear between simulated nonfilers and nonfilers in the linked data. Both simulation methods result in simulated nonfilers who have lower income than nonfilers in the linked data, although nonfilers simulated using the predicted probability method had higher income, on average, than those simulated using the predicted income method.