A novel web informatics approach for automated surveillance of cancer mortality trends

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Cancer surveillance data are collected every year in the United States via the National Program of Cancer Registries (NPCR) and the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute (NCI). General trends are closely monitored to measure the nation's progress against cancer. The objective of this study was to apply a novel web informatics approach for enabling fully automated monitoring of cancer mortality trends. The approach involves automated collection and text mining of online obituaries to derive the age distribution, geospatial, and temporal trends of cancer deaths in the US. Using breast and lung cancer as examples, we mined 23,850 cancer-related and 413,024 general online obituaries spanning the timeframe 2008-2012. There was high correlation between the web-derived mortality trends and the official surveillance statistics reported by NCI with respect to the age distribution (ρ = 0.981 for breast; ρ = 0.994 for lung), the geospatial distribution (ρ = 0.939 for breast; ρ = 0.881 for lung), and the annual rates of cancer deaths (ρ = 0.661 for breast; ρ = 0.839 for lung). Additional experiments investigated the effect of sample size on the consistency of the web-based findings. Overall, our study findings support web informatics as a promising, cost-effective way to dynamically monitor spatiotemporal cancer mortality trends.

Original languageEnglish
Pages (from-to)110-118
Number of pages9
JournalJournal of Biomedical Informatics
Volume61
DOIs
StatePublished - Jun 1 2016

Funding

This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725 . This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy (DOE). The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ). The study was supported by the National Cancer Institute at the National Institutes of Health (Grant No. 1R01-CA170508 ).

Keywords

  • Breast cancer
  • Cancer mortality
  • Digital epidemiology
  • Lung cancer
  • Web informatics
  • Web mining

Fingerprint

Dive into the research topics of 'A novel web informatics approach for automated surveillance of cancer mortality trends'. Together they form a unique fingerprint.

Cite this