TY - JOUR
T1 - A computational method for assessing peptide-identification reliability in tandem mass spectrometry analysis with SEQUEST
AU - Razumovskaya, Jane
AU - Olman, Victor
AU - Xu, Dong
AU - Uberbacher, Edward C.
AU - VerBerkmoes, Nathan C.
AU - Hettich, Robert L.
AU - Xu, Ying
PY - 2004/4
Y1 - 2004/4
N2 - High-throughput protein identification in mass spectrometry is predominantly achieved by first identifying tryptic peptides by a database search and then by combining the peptide hits for protein identification. One of the popular tools used for the database search is SEQUEST. Peptide identification is carried out by selecting SEQUEST hits above a specified threshold, the value of which is typically chosen empirically in an attempt to separate true identifications from false ones. These SEQUESTscores are not normalized with respect to the composition, length and other parameters of the peptides. Furthermore, there is no rigorous reliability estimate assigned to the protein identifications derived from these scores. Hence, the interpretation of SEQUEST hits generally requires human involvement, making it difficult to scale up the identification process for genome-scale applications. To overcome these limitations, we have developed a method, which combines a neural network and a statistical model, for normalizing SEQUESTscores, and also for providing a reliability estimate for each SEQUEST hit. This method improves the sensitivity and specificity of peptide identification compared to the standard filtering procedure used in the SEQUEST package, and provides a basis for estimating the reliability of protein identifications.
AB - High-throughput protein identification in mass spectrometry is predominantly achieved by first identifying tryptic peptides by a database search and then by combining the peptide hits for protein identification. One of the popular tools used for the database search is SEQUEST. Peptide identification is carried out by selecting SEQUEST hits above a specified threshold, the value of which is typically chosen empirically in an attempt to separate true identifications from false ones. These SEQUESTscores are not normalized with respect to the composition, length and other parameters of the peptides. Furthermore, there is no rigorous reliability estimate assigned to the protein identifications derived from these scores. Hence, the interpretation of SEQUEST hits generally requires human involvement, making it difficult to scale up the identification process for genome-scale applications. To overcome these limitations, we have developed a method, which combines a neural network and a statistical model, for normalizing SEQUESTscores, and also for providing a reliability estimate for each SEQUEST hit. This method improves the sensitivity and specificity of peptide identification compared to the standard filtering procedure used in the SEQUEST package, and provides a basis for estimating the reliability of protein identifications.
KW - Identification
KW - Mass spectrometry
KW - Neural network
KW - Reliability
KW - SEQUEST
UR - http://www.scopus.com/inward/record.url?scp=1842423492&partnerID=8YFLogxK
U2 - 10.1002/pmic.200300656
DO - 10.1002/pmic.200300656
M3 - Article
C2 - 15048978
AN - SCOPUS:1842423492
SN - 1615-9853
VL - 4
SP - 961
EP - 969
JO - Proteomics
JF - Proteomics
IS - 4
ER -