Human Population Genetics and Genomics ISSN 2770-5005
Human Population Genetics and Genomics 2024;4(1):0002 | https://doi.org/10.47248/hpgg2404010002
Original Research Open Access
Evaluation of ancient DNA imputation: a simulation studyMariana Escobar-Rodríguez 1,2 , Krishna R. Veeramah 3
Correspondence: Krishna R. Veeramah
Academic Editor(s): Joshua Akey
Received: Sep 19, 2023 | Accepted: Dec 24, 2023 | Published: Jan 5, 2024
This article belongs to the Special Issue Paleogenomics, ancient DNA, and genomic tales of human history
Cite this article: Escobar-Rodríguez M, Veeramah K. Evaluation of ancient DNA imputation: a simulation study. Hum Popul Genet Genom 2024; 4(1):0002. https://doi.org/10.47248/hpgg2404010002
Ancient genomic data is becoming increasingly available thanks to recent advances in high-throughput sequencing technologies. Yet, post-mortem degradation of endogenous ancient DNA often results in low depth of coverage and subsequently high levels of genotype missingness and uncertainty. Genotype imputation is a potential strategy for increasing the information available in ancient DNA samples and thus improving the power of downstream population genetic analyses. However, the performance of genotype imputation on ancient genomes under different conditions has not yet been fully explored, with all previous work primarily using an empirical approach of downsampling high coverage paleogenomes. While these studies have provided invaluable insights into best practices for imputation, they rely on a fairly limited number of existing high coverage samples with significant temporal and geographical biases. As an alternative, we used a coalescent simulation approach to generate genomes with characteristics of ancient DNA in order to more systematically evaluate the performance of two popular imputation software, BEAGLE and GLIMPSE, under variable divergence times between the target sample and reference haplotypes, as well as different depths of coverage and reference sample size. Our results suggest that for genomes with coverage <=0.1x imputation performance is poor regardless of the strategy employed. Beyond 0.1x coverage imputation is generally improved as the size of the reference panel increases, and imputation accuracy decreases with increasing divergence between target and reference populations. It may thus be preferable to compile a smaller set of less diverged reference samples than a larger more highly diverged dataset. In addition, the imputation accuracy may plateau beyond some level of divergence between the reference and target populations. While accuracy at common variants is similar regardless of divergence time, rarer variants are better imputed on less diverged target samples. Furthermore, both imputation software, but particularly GLIMPSE, overestimate high genotype probability calls, especially at low coverages. Our results provide insight into optimal strategies for ancient genotype imputation under a wide set of scenarios, complementing previous empirical studies based on imputing downsampled high-coverage ancient genomes.
KeywordsPaleogenomics, ancient DNA, genomics, imputation, simulations, population genetics
Copyright © 2024 Pivot Science Publications Corp. - unless otherwise stated | Terms and Conditions | Privacy Policy