Abstract
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.
Original language | English |
---|---|
Article number | 1784 |
Journal | Nature Communications |
Volume | 10 |
Issue number | 1 |
DOIs | |
State | Published - Dec 1 2019 |
Externally published | Yes |
Funding
We thank Nancy Halsema and Karina Wakker-Hoekstra for help with preparing Strand-seq libraries and T. Brown for assistance in editing this manuscript. We also thank the people who generously contributed samples to the 1000 Genomes Project. Funding for this research project by the Human Genome Structural Variation Consortium (HGSVC) came from the following grants: National Institutes of Health (NIH) U41HG007497 (to C.L., E.E.E., J.O.K., M.A.B., M.G., S.A.M., R.E.M., and J.S.), NIH R01CA166661 (to S.E. D.), NIH R01HG002898 (to S.E.D.), NIH F31HG009223 (to E.J.G.), NIH RO1HG008628 (to G.T.M.), NIH UO1HG006513 (to G.T.M.), NIH 1R21AI117407-01A1 (to A.B.), NIH R01HD081256 (to M.E.T.), NIH 1R01HG007068-01A1 (to R.E.M.), NIH RO1HG002385 (to E.E.E.), NIH R15HG009565 (to X.S.), NIH R01HD081256 and MH115957 (to M.E. T.), NIH R01 HG005946 (to P.Y.K. and M.X.), the Wellcome Trust grants WT085532 and WT104947/Z/14/Z and the European Molecular Biology Laboratory (to S.F., L.C., E.L., H.Z.-B., P.F.), grant UM.0000125/KWJ.HI from the University of Malaya (to C.L. K.), by a National Health and Medical Research Council (NHMRC) CJ Martin Biomedical Fellowship (#1073726) to S.C., National Science Foundation of China (31671372 to K.Y., 31701739 to L.G.), National Key R&D Program of China (2017YFC0907500 to K.Y., 2018YFC0910400 to K.Y. & L.G.) and an Advanced ERC grant (to P.M.L.). E.E.E. is an investigator of the Howard Hughes Medical Institute. J.O.K. is a European Research Council (ERC) investigator. C.L. is a distinguished Ewha Womans University Professor, supported in part by the Ewha Womans University Research grant of 2016-8.