Enabling graph appliance for genome assembly

Rina Singh, Jeffrey A. Graves, Sangkeun Lee, Sreenivas R. Sukumar, Mallikarjun Shankar

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In recent years, there has been a huge growth in the amount of genomic data available as reads generated from various genome sequencers. The number of reads generated can be huge, ranging from hundreds to billions of nucleotide, each varying in size. Assembling such large amounts of data is one of the challenging computational problems for both biomedical and data scientists. Most of the genome assemblers that have developed use de Bruijn graph techniques. A de Bruijn graph represents a collection of read sequences by billions of vertices and edges, which require large amounts of memory and computational power to store and process. This is the major drawback to de Bruijn graph assembly. Massively parallel, multithreaded, shared memory systems can be leveraged to overcome some of these issues. The objective of our research is to investigate the feasibility and scalability issues of de Bruijn graph assembly on Cray's Urika-GD system; Urika-GD is a high performance graph appliance with a large shared memory and massively multithreaded custom processor designed for executing SPARQL queries over large-scale RDF data sets. However, to the best of our knowledge, there is no research on representing a de Bruijn graph as an RDF graph or finding Eulerian paths in RDF graphs using SPARQL for potential genome discovery. In this paper, we address the issues involved in representing de Bruin graphs as RDF graphs and propose an iterative querying approach for searching cycles to find Eulerian paths in large RDF graphs. We evaluate the performance of our implementation on real world ebola genome datasets and illustrate how genome assembly can be accomplished with Urika-GD using iterative SPARQL queries.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
EditorsFeng Luo, Kemafor Ogan, Mohammed J. Zaki, Laura Haas, Beng Chin Ooi, Vipin Kumar, Sudarsan Rachuri, Saumyadipta Pyne, Howard Ho, Xiaohua Hu, Shipeng Yu, Morris Hui-I Hsiao, Jian Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2583-2590
Number of pages8
ISBN (Electronic)9781479999255
DOIs
StatePublished - Dec 22 2015
Event3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States
Duration: Oct 29 2015Nov 1 2015

Publication series

NameProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

Conference

Conference3rd IEEE International Conference on Big Data, IEEE Big Data 2015
Country/TerritoryUnited States
CitySanta Clara
Period10/29/1511/1/15

Fingerprint

Dive into the research topics of 'Enabling graph appliance for genome assembly'. Together they form a unique fingerprint.

Cite this