Project Tycho, Data for Health: Open Access to Newly Digitized U.S. Weekly Nationally Notifiable Disease Surveillance Data from 1888-Present – International Society for Disease Surveillance


Wednesday, April 23, 2014, 1:00 PM – 2:00 PM EDT (17:00 – 18:00 GMT)


ISDS Research Committee


Wilbert van Panhuis, MD PhD, Assistant Professor, University of Pittsburgh Graduate School of Public Health


Background: Public health agencies in the United States such as the Public Health Service before 1950 and the Centers for Disease Control after 1950 have published nationally notifiable disease reports for cities and states every week since 1888 in journals such as the Public Health Reports and the Morbidity and Mortality Weekly Report. Because most of these reports have been publicly available in PDF or paper format only, opportunities to use this wealth of information for statistical and computational analysis have been greatly restricted.

Methods: We identified and digitized PDF or paper files of all 6500 weekly nationally notifiable disease surveillance reports published since 1888 into Excel spreadsheets using independent double data entry. All numeric disease reports (defined as counts) and contextual information such as the reporting locations, dates, and disease names were extracted from these spreadsheets using semi-automatic computational algorithms. All extracted information was standardized and made publicly available without restrictions through an online user interface.

Results: The online database named after Tycho Brahe (1546-1601) provides tools for the exploration and retrieval of datasets selected by users. Available data have been classified into three levels, each with different content. Level 1 includes data that have been standardized into a common format for specific studies. Level 2 includes data that have been reported in a common and consistent format, e.g. diseases reported for a one week period and without disease subcategories that changed over time. Level 3 includes all data available in raw format. Although level 3 is the most complete level of data, the large heterogeneity in types and formats of reports included requires extensive standardization before use in any analysis. All levels of data can be freely accessed and used for any purpose on after registration and agreement to a creative commons attribution license.

Conclusions: The Project Tycho database of newly digitized 125 years of weekly US notifiable disease data creates a new paradigm for the availability and use of large scale public health data. This new resource can help public health officials track and compare their historical data, allowing evidence-based decision making. This resource will also accelerate new multi-disciplinary translational approaches that integrate public health data with large scale data from other domains such as electronic medical records, genomic data, climate data, and social media data, maximizing opportunities to use available data for better health.

Learning Objectives

  1. To get a general idea of the history of notifiable disease surveillance across the United States
  2. To become familiar with a new resource for historical notifiable disease surveillance data
  3. To develop an idea of new opportunities provided by large scale historical disease surveillance data for evidence-based decision making

Presentation Slides

Please use #ProjectTychoWebinar in all webinar-related Tweets.

Certified Public Health (CPH) Recertification Credit Available:

You may earn 1 CPH recertification credit for viewing webinars live. CPH recertification credits are free for ISDS members (as a benefit of membership) and are $10/credit for non-ISDS members (click here to pay).