A high-performance computing implementation of iterative random forest for the creation of predictive expression networks

Ashley Cliff, Jonathon Romero, David Kainer, Angelica Walker, Anna Furches, Daniel Jacobson

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

As time progresses and technology improves, biological data sets are continuously increasing in size. New methods and new implementations of existing methods are needed to keep pace with this increase. In this paper, we present a high-performance computing (HPC)-capable implementation of Iterative Random Forest (iRF). This new implementation enables the explainable-AI eQTL analysis of SNP sets with over a million SNPs. Using this implementation, we also present a new method, iRF Leave One Out Prediction (iRF-LOOP), for the creation of Predictive Expression Networks on the order of 40,000 genes or more. We compare the new implementation of iRF with the previous R version and analyze its time to completion on two of the world’s fastest supercomputers, Summit and Titan. We also show iRF-LOOP’s ability to capture biologically significant results when creating Predictive Expression Networks. This new implementation of iRF will enable the analysis of biological data sets at scales that were previously not possible.

Original languageEnglish
Article number996
JournalGenes
Volume10
Issue number12
DOIs
StatePublished - Dec 2019

Funding

Acknowledgments: This research used resources of the Oak Ridge Leadership Computing Facility and the Compute and Data Environment for Science at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. The manuscript was coauthored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy. The US Government retains and the publisher, by accepting the article for publication, acknowledges that the US Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). Funding: Funding provided by the Plant-Microbe Interfaces (PMI) Scientific Focus Area in the Genomic Science Program and by The Center for Bioenergy Innovation (CBI). The U.S. Department of Energy Bioenergy Research Centers are supported by the Office of Biological and Environmental Research in the DOE Office of Science. This work is also supported by the Exascale & Petascale Networks for KBase project funded by the Genomic Sciences Program from the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research. Funding provided by the Plant-Microbe Interfaces (PMI) Scientific Focus Area in the Genomic Science Program and by The Center for Bioenergy Innovation (CBI). The U.S. Department of Energy Bioenergy Research Centers are supported by the Office of Biological and Environmental Research in the DOE Office of Science. This work is also supported by the Exascale & Petascale Networks for KBase project funded by the Genomic Sciences Program from the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research.

FundersFunder number
Compute and Data Environment for Science
DOE Office of Science
Exascale & Petascale Networks for KBase
Interfaces
Oak Ridge National Laboratory
Office of Biological and Environmental Research
Plant-Microbe Interfaces
U.S. Department of Energy Bioenergy Research Centers
U.S. Department of Energy
Office of Science
Biological and Environmental Research
Oak Ridge National Laboratory
Center for Bioenergy Innovation
Philip Morris International

    Keywords

    • Gene Expression Networks
    • High-performance computing
    • Iterative Random Forest
    • Random Forest
    • X-AI-based eQTL

    Fingerprint

    Dive into the research topics of 'A high-performance computing implementation of iterative random forest for the creation of predictive expression networks'. Together they form a unique fingerprint.

    Cite this