Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data

Angelica M. Walker, Ashley Cliff, Jonathon Romero, Manesh B. Shah, Piet Jones, Joao Gabriel Felipe Machado Gazolla, Daniel A. Jacobson, David Kainer

Research output: Contribution to journalArticlepeer-review

19 Scopus citations

Abstract

Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data.

Original languageEnglish
Pages (from-to)3372-3386
Number of pages15
JournalComputational and Structural Biotechnology Journal
Volume20
DOIs
StatePublished - Jan 2022

Funding

This research used resources of the Oak Ridge Leadership Computing Facility and the Compute and Data Environment for Science at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. The manuscript was coauthored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy. The US Government retains and the publisher, by accepting the article for publication, acknowledges that the US Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ). This research used resources of the Oak Ridge Leadership Computing Facility and the Compute and Data Environment for Science at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. The manuscript was coauthored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy. The US Government retains and the publisher, by accepting the article for publication, acknowledges that the US Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This work was supported by the Center for Bioenergy Innovation (CBI) and the Secure Ecosystem Engineering and Design (SEED) project. CBI is a U.S. Department of Energy Bioenergy Research Center supported by the Office of Biological and Environmental Research in the DOE Office of Science. The Secure Ecosystem Engineering and Design (SEED) ( https://seed-sfa.ornl.gov/ ) project is funded by the Genomic Science Program of the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER) as part of the Secure Biosystems Design Science Focus Area (SFA).

FundersFunder number
DOE Public Access Plan
Secure Biosystems Design Science Focus Area
Secure Ecosystem Engineering and Design
U.S. Department of Energy Bioenergy Research Center
U.S. Department of EnergyDE-AC05-00OR22725
Office of Science
Biological and Environmental Research
Center for Bioenergy Innovation
Government of South Australia
UT-Battelle
Southern Finance Association

    Keywords

    • Gene expression networks
    • Iterative random forest
    • Network biology
    • Random forest

    Fingerprint

    Dive into the research topics of 'Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data'. Together they form a unique fingerprint.

    Cite this