研究表明,几乎所有生物的表型变异,环境适应和物种形成都与基因组间的结构变异有关。人类基因组既包括蛋白质编码基因,也包括控制这些基因何时表达以及表达到何种程度的调控信息。结构变异(Structure Variantions,简称SVs)是造成物种表型差异的一个重要原因,且与各类疾病,特别是癌症的发生、发展紧密相关,因此研究结构变异非常重要。基因组结构变异通常是指长度大于1Kb的基因组序列变异,包括多种不同的类型:插入(insertion)、缺失(deletion)、反转(inversion)、异位(translocation)、拷贝数变异(copy number variation,CNV或者duplication)。
Callset construction pipeline.
Version of the “B38” callset derived from 14,623 samples
(a)Number of high-confidence and low-confidence SVs by class and frequency bin. SV classes are defined as: DEL, deletion; MEI, mobile element insertion; DUP, duplication; INV, inversion; BND, “break-end”, which is a generic term in the VCF specification for SV breakpoints that cannot be unequivocally classified. Minor allele frequency (MAF) bins are defined as: “ultra-rare” is private to an individual or family; “rare” is MAF<1%; “low-frequency” is 1%<MAF<5%; “common” is MAF>5%.
(b)Number of SVs per sample (x-axis, square-root scaled) by SV type (y-axis) and frequency class (panels labelled at top).
(c)MAF distribution for SNV, indel, deletion (DEL) and duplication (DUP) variants for a subset of 4,298 samples for which GATK-based SNV/indel were also available.
(d)CNV length distributions for each frequency class, defined as in part (a). (e)Histogram showing the resolution of SV breakpoint calls, as defined by the length of the 95% confidence interval of the breakpoint-containing region defined by LUMPY, after cross-sample merging and refinement using svtools.
A key goal of whole-genome sequencing (WGS) for human genetics studies is to interrogate all forms of variation, including single nucleotide variants (SNV), small insertion/deletion (indel) variants and structural variants (SV). However, tools and resources for the study of SV have lagged behind those for smaller variants. Here, we used a scalable pipeline22 to map and characterize SV in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest WGS-based SV resource to date. On average, individuals carry 2.9 rare SVs that alter coding regions, affecting the dosage or structure of 4.2 genes and accounting for 4.0-11.2% of rare high-impact coding alleles. Based on a computational model, we estimate that SVs account for 17.2% of rare alleles genome-wide with predicted deleterious effects equivalent to loss-of-function coding alleles; approximately 90% of such SVs are non-coding deletions (mean 19.1 per genome). We report 158,991 ultra-rare SVs and show that around 2% of individuals carry ultra-rare megabase-scale SVs, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and non-coding elements, revealing trends related to element class and conservation. This work will help guide SV analysis and interpretation in the era of WGS.
DOI: 10.1038/s41586-020-2371-0- 本文固定链接: https://oversea.maimengkong.com/zixun/1389.html
- 转载请注明: : 萌小白 2023年3月3日 于 卖萌控的博客 发表
- 百度已收录