Deep data analytics for genetic engineering of diatoms linking genotype to phenotype via machine learning

Artem A. Trofimov, Alison A. Pawlicki, Nikolay Borodinov, Shovon Mandal, Teresa J. Mathews, Mark Hildebrand, Maxim A. Ziatdinov, Katherine A. Hausladen, Paulina K. Urbanowicz, Chad A. Steed, Anton V. Ievlev, Alex Belianinov, Joshua K. Michener, Rama Vasudevan, Olga S. Ovchinnikova

Research output: Contribution to journalArticlepeer-review

16 Scopus citations

Abstract

Genome engineering for materials synthesis is a promising avenue for manufacturing materials with unique properties under ambient conditions. Biomineralization in diatoms, unicellular algae that use silica to construct micron-scale cell walls with nanoscale features, is an attractive candidate for functional synthesis of materials for applications including photonics, sensing, filtration, and drug delivery. Therefore, controllably modifying diatom structure through targeted genetic modifications for these applications is a very promising field. In this work, we used gene knockdown in Thalassiosira pseudonana diatoms to create modified strains with changes to structural morphology and linked genotype to phenotype using supervised machine learning. An artificial neural network (NN) was developed to distinguish wild and modified diatoms based on the SEM images of frustules exhibiting phenotypic changes caused by a specific protein (Thaps3_21880), resulting in 94% detection accuracy. Class activation maps visualized physical changes that allowed the NNs to separate diatom strains, subsequently establishing a specific gene that controls pores. A further NN was created to batch process image data, automatically recognize pores, and extract pore-related parameters. Class interrelationship of the extracted paraments was visualized using a multivariate data visualization tool, called CrossVis, and allowed to directly link changes in morphological diatom phenotype of pore size and distribution with changes in the genotype.

Original languageEnglish
Article number4
Journalnpj Computational Materials
Volume5
Issue number1
DOIs
StatePublished - Dec 1 2019

Funding

The research was partially conducted at the Center for Nanophase Materials Sciences, which is a DOE Office of Science User Facility. Work by C.A.S. was sponsored by the Department of Energy under the Scientific Discovery through Advanced Computing RAPIDS project. Work by K.A.H. and P.K.U. was enabled through the Oak Ridge High School Math Thesis Program. The research by J.K.M., T.J.M., O.S.O., A.A.P., A.A.T., S.M. and M.H. was sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U.S. Department of Energy. This paper has been authored by UT-Battelle, LLC, under Contract no. DE-AC0500OR22725 with the U.S. Department of Energy. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy. gov/downloads/doe-public-access-plan).

FundersFunder number
DOE Office of Science
U.S. Department of Energy
Oak Ridge National Laboratory

    Fingerprint

    Dive into the research topics of 'Deep data analytics for genetic engineering of diatoms linking genotype to phenotype via machine learning'. Together they form a unique fingerprint.

    Cite this