LipoCLEAN: A Machine Learning Filter to Improve Untargeted Lipid Identification Confidence

Steven L. Tavis, Matthew J. Keller, Andrew J. Stai, Tomás A. Rush, Robert L. Hettich

Research output: Contribution to journalArticlepeer-review

Abstract

In untargeted lipidomics experiments, putative lipid identifications generated by automated analysis software require substantial manual filtering to arrive at usable high-confidence data. However, identification software tools do not make full use of the available data to assess the quality of lipid identifications. Here, we present a machine-learning-based model to provide coherent, holistic quality scores based on multiple lines of evidence. Underutilized metrics such as isotope ratios and chromatographic behavior allow for much higher accuracy of identification confidence. We find that approximately 50% of tandem mass spectrometry-based automated lipid identifications are incorrect but that multidimensional rescoring reduces false discoveries to only 7% while retaining 80% of true positives. Our method works with most chromatography methods and is generalized across a family of MS instruments. LipoCLEAN is available at https://github.com/stavis1/LipoCLEAN.

Original languageEnglish
JournalAnalytical Chemistry
DOIs
StateAccepted/In press - 2024

Funding

This work was sponsored by the ORNL Plant\u2013Microbe Interface (PMI) Scientific Focus Area funded by the Genomic System Sciences Program, U.S. Department of Energy, Office of Science, Biological and Environmental Research. Fungal isolates were obtained from the ORNL-PMI microbial collection (ORNL; http://pmi.ornl.gov ). M.J.K. acknowledges stipend support from project ERKPA14 funded by the Office of Biological & Environmental Research in the Department of Energy (DOE) Office of Science. This manuscript has been authored by UT-Battelle, LLC, under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ). This research used resources of the Compute and Data Environment for Science (CADES) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Fingerprint

Dive into the research topics of 'LipoCLEAN: A Machine Learning Filter to Improve Untargeted Lipid Identification Confidence'. Together they form a unique fingerprint.

Cite this