Chinese Researchers Develop a Human Genome Variation Database Covering Nearly 1000 Present-day and Ancient Populations
A recent study led by Dr. XU Shuhua from CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health of Chinese Academy of Sciences (CAS), created a genome variation resource/database, PGG.SNV (https://www.pggsnv.org) which archives 265 million single nucleotide variations (SNVs) across 220,147 present-day genomes and 1,018 ancient genomes, including 1,009 newly sequenced genomes, representing 977 global populations.
PGG.SNV significantly improve the coverage of Asian populations which are significantly under-represented in other available database such as GnomAD. Compared with the available database, another unique feature of the PGG.SNV is that it provides estimation of population genetic diversity and evolutionary parameters.
Despite Asia is the largest and most populous of earth's continents, most of the genomic studies have been conducted in Europe and America. Accordingly, currently available human genome variation resources are based on populations of European ancestry. For example, nearly half of the genomes in gnomAD are from European ancestry and merely 9% of the genomes are of African ancestry, resulting in an enormous number of variants that harbored in Asian genomes cannot be observed in the extensively studied populations of European ancestry. Moreover, samples in gnomAD were merely classified into groups majorly on continental level, leaving the majority of the specific ethnic groups unknown. For example, gnomAD groups East Asians roughly into three categories: “Korean”, “Japanese” and “other East Asians”. Therefore, researchers fail to query the allele frequencies for most of East Asian populations.
Compared to other frequently used data sets, PGG.SNV documents more genomes and represents a much more comprehensive genomic diversity of worldwide populations. For instance, there are 90,514 Asian genomes included in PGG.SNV, compared to 993 and 25,285 in the 1KGP and gnomAD data sets, respectively. Moreover, PGG.SNV includes 1,009 newly-generated whole genome sequences from 16 ethnic groups, especially many indigenous groups living in East Asia and Southeast Asia whose genomes have not been sequenced before. Beside present-day human populations, the database integrates 1,018 ancient genomes that represent time periods from the 430,000 years before the present day up to the early 20th century, which is rarely considered in many other existing databases.
With a comprehensive catalogue of genetic variants and annotations, PGG.SNV enables studies of variants that are rare or not existing in well-studied populations, and provides the population prevalence of variants in various populations with little ancestral bias and further guides Mendelian-inherited disease mapping studies. PGG.SNV documents many ancient genomes and compares them with contemporary human genomes, allowing researchers to understand the evolutionary trajectory of genetic variants as well as gene flow or introgression events. Moreover, this database improves interpretations of putative causal loci for Mendelian diseases and, population differentiation analysis, and adaptation to local environments for global populations. Eventually, PGG.SNV will help advance our understanding of the biological meaning of the human genome sequence in light of human evolution.
PGG.SNV provides a web-based user interface to access data. The users can search genetic variants by physical position, RSID, a genomic region, official gene symbol or Ensembl gene name etc. PGG.SNV has also embedded a web-based tool (https://www.pggsnv.org/tools.html) for the generation of figures after users have uploaded their own analyses. In addition to the web-based interface, users can query variants using a mobile application (App) by linking to the WeChat official account named PGGbase.
The main user interface of the PGG.SNV. (Image provided by Dr. XU Shuhua’s group)
The study was published online in Genome Biology on October 22, 2019, entitled “PGG.SNV: Understanding the evolutionary and medical implications of human single nucleotide variations in diverse populations”.
This work was conducted by Dr. ZHANG Chao, Dr. LU Yan, and PhD students GAO Yang from ShanghaiTech University, NING Zhinlin and a few members from Dr. XU Shuhua’s team. It was funded by the grants from CAS, the National Natural Science Foundation of China, the National Key Research and Development Program and Science and Technology Commission of Shanghai Municipality.
WANG Jin (Ms.)
Shanghai Institute of Nutrition and Health,
Chinese Academy of Sciences