Pre-conference Workshop A

back to Pre-Conference Workshops

Using probabilistic linkage to combine injury related databases: A primer for the non-statistically minded


Larry Cook, MStat, PhD
Associate Professor
University of Utah

Dr. Lawrence Cook is an Associate Professor at the University of Utah School of Medicine’s Department of Pediatrics. He has two decades’ experience with probabilistic linkage theory and its application to motor vehicle crash and health care databases. As the principal investigator for the Utah Crash Outcome Data Evaluation System (CODES) Project and the CODES Data Network Technical Resource Center, Dr. Cook led an effort to standardize probabilistic linkage practices and coding of data sets among all participating states. He has authored more the 40 papers and technical r eports on probabilistic linkage theory and analysis of linked databases.


Cody Olson, MS
University of Utah

Cody Olsen is a biostatistician in the Division of Critical Care and Department of Pediatrics at the University of Utah. Cody works with investigators throughout the country to study emergency care for children, rare pediatric diseases, and injury. He provides statistical support for clinical trials carried out by the Pediatric Emergency Care Applied Research Network (PECARN) and research projects within the Utah Crash Outcomes Data Evaluation System (Utah CODES), and the National Pediatric Multiple Sclerosis Centers (NPMSC) research network. Cody has particular interest and experience with probabilistic linkage, multiple imputation, centralized statistical monitoring, and non-parametric methods.

Attendee Prerequisites:

For this introductory course, participants should be familiar with basic concepts of collecting and storing variables in databases. Participants should also have an introductory exposure to statistics.

Course Goal:

To provide participants with a foundation for understanding and applying probabilistic linkage methodology.

Learning Outcomes:

At the conclusion of the workshop, participants will be able to:

  • Describe the process of probabilistic linkage and the potential value added to understanding an injury control problem or research project
  • Describe how the properties of linkage variables are used to calculate match weights
  • Determine if the amount of information contained in a database will facilitate a successful linkage
  • Discuss ethical considerations surrounding access to and the use of sensitive variables often used in probabilistic linkages
  • Demonstrate situations in which linkage can be successfully conducted without names

Course Description:

In an Injury Prevention: Editor’s Blog, Dr. Scott Parker states, ‘I am not an expert in data linkage, nor am I up to the challenge of linking various data sources, however I am acutely aware that NOT linking data is a huge obstacle for injury prevention.’ This workshop is targeted to those researchers who are motivated to overcome this obstacle. Participants will gain a firm understanding of basic probabilistic linkage methodology and will learn how using probabilistic linkage can aid injury control research and surveillance efforts.


Often the information required to examine an injury control problem, perform surveillance, or answer a research question resides in separate, disparate databases. For example, event information is often available in motor vehicle crash, poison control or law enforcement databases, while information regarding medical or other outcomes is contained in separate databases, including emergency medical services, hospital billing, vital records or judicial court records. In the era of big data, where many (often large) databases are available electronically, the ability to link data sources will become essential. If the necessary databases do not share a common unique identifier, then obtaining the desired result may seem impossible.


Probabilistic linkage is used in a wide range of injury control areas to successfully link disparate databases when common unique identifiers do not exist. This workshop will cover the essentials of probabilistic linkage for non-statisticians. Using examples, and descriptions of methodological and practical issues, participants will learn the strengths, weaknesses, dos and don’ts of probabilistic linkage. We will begin with several motivational examples highlighting several injury control examples. A brief overview of the history, main concepts and ethical concerns of probabilistic linkage will be covered. Technical details will be explained, including how to calculate match weights and probabilities. We will explain how multiple imputation may be used when commonly used powerful identifiers, such as names or dates of birth, are not available. We will also cover how to determine the feasibility of a linkage based on the available variables. An overview of several different linkage software packages will be provided.

Society for the Advancement of Violence and Injury Research


Powered by Wild Apricot Membership Software