Selected Research & Analysis: Data Sets, Linkages, Quality, and Evaluation
See also related Statistics & Data Files.
Research has shown that survey-reported pension and retirement income measures may suffer from reporting errors, which lead to biased estimates of income and poverty of the aged population. In this paper, the authors evaluate income estimates from the Census Bureau's 2016 Current Population Survey (CPS) Annual Social and Economic Supplement (ASEC). The authors compare 2016 CPS ASEC public-use data with public-use survey data from the 2016 Health and Retirement Study and with CPS ASEC data that have been merged with administrative data from the Internal Revenue Service (IRS) and the Social Security Administration. They find that for the population aged 65 or older, supplementing the CPS ASEC with IRS and Social Security administrative data results in a higher estimate of pension income's share of aggregate income, less estimated reliance on Social Security, and a lower estimated rate of poverty. They also find that the HRS provides better estimates of the income of the aged population than the public-use CPS data.
When Every Dollar Counts: Comparing Reported Earnings of Social Security Disability Program Beneficiaries in Survey and Administrative Records
This article examines differences between survey- and administrative data–based estimates of employment and earnings for a sample of Social Security Administration (SSA) disability program beneficiaries. The analysis uses linked records from SSA's National Beneficiary Survey and administrative data from the agency's Master Earnings File. The authors find that estimated employment rates and earnings levels based on administrative data are higher than those based on survey data for beneficiaries overall and by sociodemographic subgroup. In proportional terms, the differences between survey and administrative data tend to be greater among subgroups with survey-reported employment rates that are lower than that of beneficiaries overall.
The Longevity Visualizer: An Analytic Tool for Exploring the Cohort Mortality Data Produced by the Office of the Chief Actuary
This note introduces the Longevity Visualizer (LV), a visual-analysis tool that enables users to explore various applications of cohort life-table data compiled and calculated by the Social Security Administration's Office of the Chief Actuary. The LV presents the life-table data in two series—survival functions and age-at-death probability distributions—each of which is generated for each potential age and each sex across a long range of historical and projected birth cohorts. The LV is designed to make complex longevity projections accessible to analysts and researchers, as well as to individuals making financial and retirement plans.
Why Researchers Now Rely on Surveys for Race Data on OASDI and SSI Programs: A Comparison of Four Major Surveys
Policy interest in the sociodemographic characteristics of beneficiaries of the Old-Age, Survivors, and Disability Insurance (OASDI) and Supplemental Security Insurance (SSI) programs is increasing as the minority share of the senior and disabled population grows. This note discusses using four major surveys—the Current Population Survey, the Survey of Income and Program Participation, the American Community Survey, and the Health and Retirement Study—to examine OASDI and SSI program use by race and ethnicity. Survey profiles highlight each survey's history, design, and methodology; the categories with which each collects race and ethnicity data; and their strengths and limitations for analyzing SSA's program data.
Comparing Earnings Estimates from the 2006 Earnings Public-Use File and the Annual Statistical Supplement
The Social Security Administration recently released the 2006 Earnings Public-Use File (EPUF). The EPUF contains earnings information for individuals drawn from a systematic random 1-percent sample of all Social Security numbers issued before January 2007. This note presents the process of evaluating the earnings data in EPUF. It also identifies and explains four key differences between the data in EPUF and the estimates published in the Annual Statistical Supplement to the Social Security Bulletin. The note specifically compares EPUF data with Annual Statistical Supplement estimates of earnings, number of workers with earnings, median earnings by sex and age group, and percentage of workers with earnings below the taxable maximum by sex. After accounting for the expected differences, the remaining discrepancies between EPUF and Annual Statistical Supplement estimates are relatively small.
This article introduces the 2006 Earnings Public-Use File (EPUF), a data file containing earnings records for individuals drawn from a 1-percent sample of all Social Security numbers issued before January 2007. The EPUF contains selected demographic and earnings information for 4.3 million individuals. It provides aggregate earnings data for 1937 to 1950 and annual earnings data for 1951 to 2006.
Using Matched Survey and Administrative Data to Estimate Eligibility for the Medicare Part D Low-Income Subsidy Program
This article uses matched survey and administrative data to estimate, as of 2006, the size of the population eligible for the Low-Income Subsidy (LIS), which was designed to provide "extra help" with premiums, deductibles, and copayments for Medicare Part D beneficiaries with low income and limited assets. The authors employ individual-level data from the Survey of Income and Program Participation and the Health and Retirement Study to cover the potentially LIS-eligible noninstitutionalized and institutionalized populations of all ages. The survey data are matched to Social Security administrative data to improve on potentially error-ridden survey measures of income components and program participation.
The Social Security Administration (SSA) receives reports of earnings for the U.S. working population each year from employers and the Internal Revenue Service. The earnings information received is stored at SSA as the Master Earnings File (MEF) and is used to administer Social Security programs and to conduct research on the populations served by those programs. This article documents the history, content, limitations, complexities, and uses of the MEF (and data files derived from the MEF). It is intended for researchers who use earnings data to study work patterns and their implications, and for those interested in understanding the data used to administer the current-law programs.
This article discusses the advantages and limitations of using administrative data for research, examines how linking administrative data to survey results can be used to evaluate and improve survey design, and discusses research studies and SSA statistical products and services that are based on administrative data.
The Social Security Administration's Death Master File: The Completeness of Death Reporting at Older Ages
To provide a more detailed assessment of the coverage of deaths of older adults in the Social Security Administration's Death Master File (DMF), this research note compares age-specific death counts from 1960 to 1997 in the DMF with official counts tabulated by the National Center for Health Statistics, the most authoritative source of death information for the U.S. population. Results suggest that for most years since 1973, 93 percent to 96 percent of deaths of individuals aged 65 or older were included in the DMF.
This article describes the development of SSA's administrative records database for the Project NetWork return-to-work experiment targeting persons with disabilities. The article is part of a series of papers on the evaluation of the Project NetWork demonstration. In addition to 8,248 Project NetWork participants randomly assigned to receive case management services and a control group, the simulation identified 138,613 eligible nonparticipants in the demonstration areas. The output data files contain detailed monthly information on Supplemental Security Income (SSI) and Disability Insurance (DI) benefits, annual earnings, and a set of demographic and diagnostic variables. The data allow for the measurement of net outcomes and the analysis of factors affecting participation. The results suggest that it is feasible to simulate complex eligibility rules using administrative records, and create a clean and edited data file for a comprehensive and credible evaluation. The study shows that it is feasible to use administrative records data for selecting control or comparison groups in future demonstration evaluations.
The Health and Retirement Study (HRS is a major longitudinal study designed for scientific and policy researchers for study of the economics, health, and demography of retirement and aging. This note describes the data from SSA records that have been released for linking to HRS data, linkage rates resulting from the consent process, and subgroup patterns in linkage rates.
The Health and Retirement Study (HRS) is a major longitudinal study designed for scientific and policy researchers for study of the economics, health, and demography of retirement and aging. The primary HRS sponsor is the National Institute of Aging, and the project is being conducted by the Survey Research Center of the Institute for Social Research at the University of Michigan. Several agencies, including the Social Security Administration, are supporting the project. This is the second paper describing SSA's data support for the HRS. It describes the data from SSA records that have been released for linking to HRS data, linkage rates resulting from the consent process, and subgroup patterns in linkage rates.
The Accuracy of Survey-Reported Marital Status: Evidence from Survey Records Matched to Social Security Records
Many researchers have concluded that, in surveys, divorced persons often fail to report accurate marital information. In this paper, I revisit this issue using a new source of data—surveys exactly matched to Social Security data. I find that divorced persons frequently misreport their marital status, but there is evidence that the misreporting is unintentional. A discussion of possible improvements in surveys is presented. Implications for the study of differential mortality and the study of poverty among aged women are discussed.
This article describes the statistical development of the geographic coding system used to identify worker location for the Continuous Work History Sample. The new system—which is planned for implementation for data year 1993—will provide more accurate geographic distributions of workers within a residence concept than the old system could provide within an employer location concept. The article also presents the results of a pilot study that tested the operational aspects of the new system. The results provide some preliminary estimates of the effect of the revised codes on the geographic distribution of workers.
It is well-known that for most purposes income size distribution data collected in household surveys are far from ideal. The problems with those data can be separated into two types: the data items that are collected, and the accuracy of the data collected. Usually, although there are important exceptions, the income data collected are confined to cash income before taxes, thus ignoring the effects of both taxes and noncash income of all types. Also, the income estimates usually are for one year, which often is not the best accounting period for analysis. Furthermore, there usually is a lack of adequate detail by income type, and the data ordinarily are not sufficiently detailed to adjust for changes in the composition of the family unit during the income accounting period.
An Example of the Use of Statistical Matching in the Estimation and Analysis of the Size Distribution of Income
This paper discusses the use of statistical matching in the estimation and analysis of the size distribution of family unit personal income. Statistical matching is a relatively new technique that has been used to combine, at the single observation level, data from two different samples, each of which contains some data items that are absent from the other file. In a statistical match, the information brought together from the different files ordinarily is not for the same person but for similar persons; the match is made on the basis of similar characteristics. In contrast, in an "exact" match, information for the same person from two or more files is brought together using personal identifying information.
For the past few years, the Division of Disability Studies has been using simple random and stratified random sampling procedures for many of its studies. The beneficiary sample for the 1978 Survey of Disability and Work was a stratified random sample drawn from the Master Benefit Record. The samples used in the Study of Consistency and Validity of Initial Disability Decisions and the Trial Work Period Folder Study also used simple random sampling procedures. Simple random subsampling has been used to enable multivariate analysis to be performed on files that would otherwise have been too large for existing software.
Because of the Division of Disability Studies' wide use of simple and stratified random sampling designs, software was developed to efficiently accomplish these sampling schemes. This paper describes the algorithm and presents the computer programs that are currently being used in the division.