Detecting differential and correlated protein expression in label-free shotgun proteomics

Bing Zhang, Nathan C. VerBerkmoes, Michael A. Langston, Edward Uberbacher, Robert L. Hettich, Nagiza F. Samatova

Research output: Contribution to journalArticlepeer-review

337 Scopus citations

Abstract

Recent studies have revealed a relationship between protein abundance and sampling statistics, such as sequence coverage, peptide count, and spectral count, in label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics. The use of sampling statistics offers a promising method of measuring relative protein abundance and detecting differentially expressed or coexpressed proteins. We performed a systematic analysis of various approaches to quantifying differential protein expression in eukaryotic Saccharomyces cerevisiae and prokaryotic Rhodopseudomonas palustris label-free LC-MS/MS data. First, we showed that, among three sampling statistics, the spectral count has the highest technical reproducibility, followed by the less-reproducible peptide count and relatively nonreproducible sequence coverage. Second, we used spectral count statistics to measure differential protein expression in pairwise experiments using five statistical tests: Fisher's exact test, G-test, AC test, t-test, and LPE test. Given the S. cerevisiae data set with spiked proteins as a benchmark and the false positive rate as a metric, our evaluation suggested that the Fisher's exact test, G-test, and AC test can be used when the number of replications is limited (one or two), whereas the t-test is useful with three or more replicates available. Third, we generalized the G-test to increase the sensitivity of detecting differential protein expression under multiple experimental conditions. Out of 1622 identified R. palustris proteins in the LC-MS/MS experiment, the generalized G-test detected 1119 differentially expressed proteins under six growth conditions. Finally, we studied correlated expression of these 1119 proteins by analyzing pairwise expression correlations and by delineating protein clusters according to expression patterns. Through pairwise expression correlation analysis, we demonstrated that proteins co-located in the same operon were much more strongly coexpressed than those from different operons. Combining cluster analysis with existing protein functional annotations, we identified six protein clusters with known biological significance. In summary, the proposed generalized G-test using spectral count sampling statistics is a viable methodology for robust quantification of relative protein abundance and for sensitive detection of biologically significant differential protein expression under multiple experimental conditions in label-free shotgun proteomics.

Original languageEnglish
Pages (from-to)2909-2918
Number of pages10
JournalJournal of Proteome Research
Volume5
Issue number11
DOIs
StatePublished - Nov 2006

Keywords

  • Clustering
  • Correlated expression
  • Differential expression
  • LC-MS/MS
  • Label-free
  • Rhodopseudomonas palustris
  • Saccharomyces cerevisiae
  • Shotgun proteomics

Fingerprint

Dive into the research topics of 'Detecting differential and correlated protein expression in label-free shotgun proteomics'. Together they form a unique fingerprint.

Cite this