TY - GEN
T1 - An SVM-based algorithm for identification of photosynthesis-specific genome features
AU - Yu, G. X.
AU - Ostrouchov, G.
AU - Geist, A.
AU - Samatova, N. F.
N1 - Publisher Copyright:
© 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - This paper presents a novel algorithm for identification and functional characterization of "key" genome features responsible for a particular biochemical process of interest. The central idea is that individual genome features are identified as "key" features if the discrimination accuracy between two classes of genomes with respect to a given biochemical process is sufficiently affected by the inclusion or exclusion of these features. In this paper, genome features are defined by high-resolution gene functions. The discrimination procedure utilizes the support vector machine classification technique. The application to the oxygenic photosynthetic process resulted in 126 highly confident candidate genome features. While many of these features are well-known components in the oxygenic photosynthetic process, others are completely unknown, even including some hypothetical proteins. It is obvious that our algorithm is capable of discovering features related to a targeted biochemical process.
AB - This paper presents a novel algorithm for identification and functional characterization of "key" genome features responsible for a particular biochemical process of interest. The central idea is that individual genome features are identified as "key" features if the discrimination accuracy between two classes of genomes with respect to a given biochemical process is sufficiently affected by the inclusion or exclusion of these features. In this paper, genome features are defined by high-resolution gene functions. The discrimination procedure utilizes the support vector machine classification technique. The application to the oxygenic photosynthetic process resulted in 126 highly confident candidate genome features. While many of these features are well-known components in the oxygenic photosynthetic process, others are completely unknown, even including some hypothetical proteins. It is obvious that our algorithm is capable of discovering features related to a targeted biochemical process.
KW - genome comparative analysis
KW - key genome features
KW - oxygenic photosynthetic process
KW - support vector machines
UR - http://www.scopus.com/inward/record.url?scp=84960349226&partnerID=8YFLogxK
U2 - 10.1109/CSB.2003.1227323
DO - 10.1109/CSB.2003.1227323
M3 - Conference contribution
C2 - 16452798
AN - SCOPUS:84960349226
T3 - Proceedings of the 2003 IEEE Bioinformatics Conference, CSB 2003
SP - 235
EP - 243
BT - Proceedings of the 2003 IEEE Bioinformatics Conference, CSB 2003
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International IEEE Computer Society Computational Systems Bioinformatics Conference, CSB 2003
Y2 - 11 August 2003 through 14 August 2003
ER -