The effect of data sampling on the performance evaluation of artificial neural networks in medical diagnosis

Georgia D. Tourassi, Carey E. Floyd

Research output: Contribution to journalArticlepeer-review

56 Scopus citations

Abstract

Purpose. To study the effect of data sampling on the predictive assessment of artificial neural networks (ANNs) for medical diagnostic tasks. Methods. Three statistical techniques were used to evaluate the diagnostic performances of ANNs: 1) cross validation, 2) round robin, and 3) bootstrap. These techniques are different sampling plans designed to reduce the small- sample estimation bias and variance contributions. The study was based on two networks, one developed for the diagnosis of pulmonary embolism (1,064 cases) and the other developed for the diagnosis of breast cancer (206 cases). Results. The three sampling techniques produced different performance estimates for both networks. The estimates varied substantially depending on the training sample size and the training-stopping criterion. Conclusion. The predictive assessment of ANNs in medical diagnosis can vary substantially based on the complexity of the problem, the data sampling technique, and the number of cases available.

Original languageEnglish
Pages (from-to)186-192
Number of pages7
JournalMedical Decision Making
Volume17
Issue number2
DOIs
StatePublished - Apr 1997
Externally publishedYes

Keywords

  • artificial neural networks
  • bootstrap
  • computer-aided diagnosis
  • cross validation
  • receiver operating characteristic analysis.
  • round robin
  • sampling

Fingerprint

Dive into the research topics of 'The effect of data sampling on the performance evaluation of artificial neural networks in medical diagnosis'. Together they form a unique fingerprint.

Cite this