TY - GEN
T1 - Analysis of mammography reports using maximum variation sampling
AU - Patton, Robert M.
AU - Beckerman, Barbara
AU - Potok, Thomas E.
PY - 2008
Y1 - 2008
N2 - A genetic algorithm (GA) was developed to implement a maximum variation sampling technique to derive a subset of data from a large dataset of unstructured mammography reports. It is well known that a genetic algorithm performs very well for large search spaces and is easily scalable to the size of the data set. In mammography, much effort has been expended to characterize findings in the radiology reports. Existing computer-assisted technologies for mammography are based on machine-learning algorithms that must learn against a training set with known pathologies in order to further refine the algorithms with higher validity of truth. In a large database of reports and corresponding images, automated tools are needed just to determine which data to include in the training set. This work presents preliminary results showing the use of a GA for finding abnormal reports without a training set. The underlying premise is that abnormal reports should consist of unusual or rare words, thereby making the reports very dissimilar in comparison to other reports. A genetic algorithm was developed to test this hypothesis, and preliminary results show that most abnormal reports in a test set are found and can be adequately differentiated.
AB - A genetic algorithm (GA) was developed to implement a maximum variation sampling technique to derive a subset of data from a large dataset of unstructured mammography reports. It is well known that a genetic algorithm performs very well for large search spaces and is easily scalable to the size of the data set. In mammography, much effort has been expended to characterize findings in the radiology reports. Existing computer-assisted technologies for mammography are based on machine-learning algorithms that must learn against a training set with known pathologies in order to further refine the algorithms with higher validity of truth. In a large database of reports and corresponding images, automated tools are needed just to determine which data to include in the training set. This work presents preliminary results showing the use of a GA for finding abnormal reports without a training set. The underlying premise is that abnormal reports should consist of unusual or rare words, thereby making the reports very dissimilar in comparison to other reports. A genetic algorithm was developed to test this hypothesis, and preliminary results show that most abnormal reports in a test set are found and can be adequately differentiated.
KW - Genetic algorithms
KW - Maximum variation sampling
KW - Text analysis
KW - Unstructured radiology reports
UR - http://www.scopus.com/inward/record.url?scp=57349084724&partnerID=8YFLogxK
U2 - 10.1145/1388969.1389022
DO - 10.1145/1388969.1389022
M3 - Conference contribution
AN - SCOPUS:57349084724
SN - 9781605581309
T3 - GECCO'08: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation 2008
SP - 2061
EP - 2064
BT - GECCO'08
PB - Association for Computing Machinery
T2 - 10th Annual Genetic and Evolutionary Computation Conference, GECCO 2008
Y2 - 12 July 2008 through 16 July 2008
ER -