Handbook on Using Administrative Data

for Research and Evidence-based Policy





The Handbook serves as a go-to reference for researchers seeking to use administrative data and for data providers looking to make their data accessible for research. The handbook is published online under an open licensing model and freely available to all. It provides information, best practices, and case studies on how to create privacy-protected access to, handle, and analyze administrative data, with the aim of pushing the research frontier as well as informing evidence-based policy innovations.

The Handbook is edited by Shawn Cole (Harvard Business School), Iqbal Dhaliwal (J-PAL at MIT),
Anja Sautmann (World Bank), and Lars Vilhuber (Cornell University).



Reception of the Handbook

“Unlocking administrative data for research access can be daunting, but this Handbook offers a road map for any researcher or institution looking to establish a long-term data sharing partnership or launch a new research project.”

Raj Chetty William A. Ackman Professor of Economics at Harvard University and Director of Opportunity Insights

“This Handbook offers clear guidance on protecting the confidentiality of data and is an excellent resource for state and local governments for securely sharing their data and using research findings for their decision-making.”

John Abowd Chief Scientist, US Census Bureau

“The Handbook is a valuable new resource for policymakers and researchers together to make administrative data accessible and use it to generate field research that effectively informs and impacts policy.”

Girija Vaidyanathan Retired Chief Secretary, Tamil Nadu

“Randomized evaluations are powerful tools to understand policy and promote innovation. This Handbook gives researchers and data practitioners the tools to build these projects using administrative data.”

Michael Kremer 2019 Nobel Laureate in Economics, University Professor in Economics at University of Chicago

Read online

Read the full Handbook for FREE online at https://admindatahandbook.mit.edu/book/. You can also download ebook versions.


Bound copy

Purchase a hardcover copy of the Handbook on Amazon US or other retailers.


Ebooks

Purchase a version for your Ebook reader:

or obtain a FREE version (works for some devices, identical to purchased version), by choosing the appropriate format:


Chapter Overview

Read or print individual chapters, or view its presentation by one of the authors. To print the entire Handbook, download the complete version (617 pages).

Foreword
(Daniel L. Goroff)
Using Administrative Data for Research and Evidence-Based Policy: An Introduction
(Shawn Cole, Iqbal Dhaliwal, Anja Sautmann, Lars Vilhuber)
Physically Protecting Sensitive Data
(Jim Shen and Lars Vilhuber)

Keeping sensitive data safe relies heavily on the physical environments in which data are stored, processed, transmitted, and accessed, and from which researchers can access computers that store and process the data. However, it is also the setting that is most dependent on rapidly evolving technology. The chapter provides snapshot of the technologies available and in use as of 2020, and characterizes the technologies along a multi-dimensional scale, allowing for some comparability across methods.

Model Data Use Agreements-- A Practical Guide
(Amy O'Hara)

Data use agreements (DUA)—also referred to as data sharing agreements or data use licenses—are documents that describe what data are being shared, for what purpose, for how long, and any access restrictions or security protocols that must be followed by the recipient of the data. Creating, negotiating, and finalizing a DUA is one of the most common challenges facing new data partnerships, but there are few practical references available to guide data providers and researchers. The chapter gives a valuable set of model agreements for new engagements with administrative data and expert insight on the legal agreements that underpin data access.

Collaborating with the Institutional Review Board (IRB)
(Kathleen Murphy)

The IRB is an administrative body that reviews human research (defined by 45 CFR 46.102 (e)(1)) to ensure the ethical protection of participants from the reasonably foreseeable risks of harm caused by research. For example, an inadvertent disclosure of sensitive or identifiable information is a common risk in social and behavioral research because the disclosure can result in social, psychological, or legal harm. The goal of this chapter is to provide researchers, data providers, data stewards, and other stakeholders with the tools they need to understand the IRB process. The chapter focuses on what the IRB does and does not do and what researchers, data providers, and related stakeholders can expect from IRB review. IRB review is a key step in launching projects, and learning about the IRB perspective will help researchers and policymakers understand how to successfully navigate this process.

Balancing Privacy and Data Usability: An Overview of Disclosure Avoidance Methods
(Ian M. Schmutte and Lars Vilhuber)

The Five Safes framework (safe projects, safe people, safe settings, safe data, and safe outputs) is one way of thinking about security of different aspects of a project, and is used throughout the Handbook and in research with administrative data. Within the Five Safes framework, data providers need to create safe data that can be provided to trusted safe people for use within safe settings, as part of safe projects. Finally, any findings that are shared publicly must be safe outputs. The processes used to create safe data and safe outputs (manipulations that render data less sensitive and therefore more appropriate for public release) are generally referred to as statistical disclosure limitation (SDL). This chapter describes techniques traditionally used within the field of SDL, pointing at methods as well as metrics to assess the resultant statistical quality and sensitivity of the data, and offers technical guidance applicable to any data provider or researcher looking for practical tools to apply to their own data to reduce the risk to privacy.

Designing Access with Differential Privacy
(Alexandra Wood, Micah Altman, Kobbi Nissim, Salil Vadhan)

Differential privacy technology has passed a preliminary transition from being the subject of academic work to initial implementations by large organizations and high-tech companies that have the expertise to develop and implement customized differentially private methods. With a growing collection of software packages for generating differentially private releases from summary statistics to machine learning models, differential privacy is now transitioning to being usable more widely and by smaller organizations. The chapter explains how administrative data containing personal information can be collected, analyzed, and published in a way that ensures the individuals in the data will be afforded the strong protections of differential privacy.

Institute for Employment Research, Germany: Access to Administrative Labor Market Data for International Researchers
(Dana Müller and Philipp vom Berge)

The Research Data Center at the Institute for Employment Research (RDC-IAB) in Nuremberg, Germany, founded in 2004, is a research department of the Institute for Employment Research (IAB), which belongs to the Federal Employment Agency (BA) of Germany. The RDC-IAB has three core functions: creating standardized research data for the scientific community, providing access to these data, and conducting research with and about IAB data. Various kinds of standardized labor market data are provided by the RDC-IAB. Administrative research data are based on the notification procedure of the German Social Security System and process-generated data are based on the BA. Additionally, surveys conducted by the IAB or partner institutes become part of the data portfolio. Furthermore, linked data between surveys and administrative data are produced. All data products are specifically created for the purpose of allowing external researchers access to the data. The chapter describes how data is made available to researchers at multiple universities around the world, and how the data held by the RDC-IAB is securely accessed through legal, institutional, and practical processes.

Ohio and the Longitudinal Data Archive: Developing Mutually Beneficial Partnerships Between State Government and the Research Community
(Joshua D. Hawley)

A research center at Ohio State University, the Ohio Longitudinal Data Archive (OLDA) is a long-running and successful administrative data partnership that first emerged in 2007. The OLDA has a primary research focus on the outcomes of education and training, but also engages with researchers on human services, housing, and health care as need arises. This collaboration between the Ohio state government and Ohio State University makes longitudinal data from multiple state agencies available for research, and offers an example of a robust institutional partnership for researchers and data providers looking to launch their own data center.

New Brunswick Institute for Research, Data and Training: A Ten-Year Partnership Between Government and Academia
(Donna Curtis Maillet and Ted McDonald)

This chapter describes the establishment and development of the New Brunswick Institute for Research, Data and Training (NB-IRDT) in Fredericton, NB, Canada. Launched in 2015 with the delivery of the first data set, NB-IRDT now holds and provides research access to more than 45 linkable person-level data sets from across the spectrum of service provision in NB. This includes access to data on health, social assistance, education and training, aged care, and workers compensation. The chapter highlights notable and unique aspects of the NB-IRDT partnership, including the legal context for receiving data from across NB public bodies, data access that is not restricted to academic users alone but also includes users from government, the non-profit sector, and the private sector, and active engagement with the NB government in collaborative research on government priority areas.

The Private Capital Research Institute: Making Private Data Accessible in an Opaque Industry
(Josh Lerner, Leslie Jeng, and Therese Juneau)

An increasing share of economic activity today is taking place in settings that elude traditional federal data collection mechanisms or fail to capture the richest of the activity at work. Against this backdrop, economists are increasingly turning to private data. The chapter describes the experience of the Private Capital Research Institute (PCRI), specifically the process of creating a database to facilitate access to private equity information for academics to address the myriad major concerns regarding private data. While this effort is certainly a work in progress, hopefully the experience can guide researchers who want to address similar issues in other fields.

Aurora Health Care: Using Electronic Medical Records for a Randomized Evaluation of Clinical Decision Support
(Laura Feeney and Amy Finkelstein)

This case study describes a randomized evaluation using administrative data, focusing on the process for sharing and using individual-level data from electronic medical records (EMR) for project with Aurora Health Care (a large, private, not-for-profit, integrated health care provider in Wisconsin and Illinois, comprising fifteen hospitals and more than 150 clinics in thirty communities). In this case, the delivery of the intervention and the measurement of outcomes were conducted through the EMR system, making access to administrative data a critical feature of the research project. This chapter describes the process by which the research team sought approval to conduct the study and access data, worked to understand data not originally designed for research, and addressed the challenges of working with de-identified data. Leveraging the rich administrative data captured through electronic medical records for research on mutually interesting questions led to a successful partnership with a large, private, healthcare organization.

The Stanford-SFUSD Partnership: Development of Data-Sharing Structures and Processes
(Eric Bettinger, Moonhawk Kim, Norma Ming, Michelle Reininger, Jim Shen, and Laura Wentworth)

The research-practice partnership between Stanford University and the San Francisco Unified School District is a long-term, mutualistic, and strategic relationship between researchers and practitioners in education, resulting in research that is both related to practical challenges and generalizable to the broader field. The Partnership exemplifies the university-based data center model, which benefits from the academic and technical resources at a large research university. SFUSD administrative data housed at Stanford University captures data on over 55,000 students, over 3,500 PreK–12 teachers, and a total of almost 10,000 staff from the academic year 2000/2001 to the present. In 2018, the Stanford data warehouse that hosts school district data received requests for data by nine projects.

City of Cape Town, South Africa: Aligning Internal Data Capabilities with External Research Partnerships
(Hugh Cole, Kelsey Jack, Brendan Maughan-Brown, and Derek Strong)

A new data policy at City of Cape Town government in 2016 led to a productive cooperation between the City and academic researchers to create systematic data access. This partnership between local government and university researchers prioritized strategic use of city administrative data to inform decision making for key policy challenges, including a 2018 drought crisis and the COVID-19 pandemic. This chapter describes the legal and institutional background for the partnership, and how data has been used to respond to pressing policy challenges using interactive dashboards and randomized evaluations.

Administrative Data in Research at the World Bank: The Case of Development Impact Evaluation (DIME)
(Maria Ruth Jones and Arianna Legovini)

As a global research program, DIME provides tailored impact evaluation services to governments. With about 200 long-term collaborations with government agencies across sixty countries, DIME works with governments to develop the data infrastructure and know-how to improve the evidence-base for public policy over time. In partnership with about thirty multilateral and bilateral organizations, DIME also invests in transforming the way development finance is used. The chapter describes how DIME generates demand from government agencies and supplies them with research services that augment their data, program management, and policy functions. DIME’s work ranges from developing a pilot administrative data system, to digitizing paper-based administrative data, to leveraging existing cross-sector administrative data to develop a country data set, and to developing sector-specific data sets across multiple countries.

The Use of Administrative Data at the International Monetary Fund
(Era Dabla-Norris, Federico Diez, and Romain Duval)

The chapter describes the use of administrative data at the International Monetary Fund in the context of its three main operations: macroeconomic surveillance and research, lending to member countries, and technical assistance to build capacity in policymaking in member countries. The Fund has a long-standing tradition of using administrative data in some activities, but the systematic use for monitoring economic developments in member countries and research is still in its infancy. In the future, through its bilateral engagement with its 189 member countries, participation in international data initiatives, and partnerships with universities and research networks, the IMF has the potential to gradually enhance the comparability, access, and use of (selected) administrative data produced by national authorities.

Using Administrative Data to Improve Social Protection in Indonesia
(Vivi Alatas, Farah Amalia, Abhijit Banerjee, Rema Hanna, Ben Olken, Sudarno Suamrto, and Putu Poppy Widyasari)

Researchers at J-PAL Southeast Asia and the World Bank have a longstanding partnership with the Government of Indonesia to evaluate and scale social policy. This partnership began when researchers worked with the Government of Indonesia to utilize administrative data in implementing the Raskin ID card program, and has since grown to include multiple evaluations and a unique collaboration on the administrative data collection process through nationwide government surveys. This chapter tells the story of this partnership and describes the policy implications of collaborations between researchers and policy makers.