Human Population Genetics and Genomics ISSN 2770-5005

Human Population Genetics and Genomics 2024;4(1):0002 | https://doi.org/10.47248/hpgg2404010002

Original Research Open Access

Evaluation of ancient DNA imputation: a simulation study

Mariana Escobar-Rodríguez 1,2 , Krishna R. Veeramah 3

  • Center for Genomic Sciences, National Autonomous University of Mexico, 62209 Cuernavaca, Morelos, Mexico
  • Institut Pasteur, Université de Paris Cité, CNRS UMR 2000, Microbial Paleogenomics Unit, F-75015 Paris, France
  • Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794, USA

Correspondence: Krishna R. Veeramah

Academic Editor(s): Joshua Akey

Received: Sep 19, 2023 | Accepted: Dec 24, 2023 | Published: Jan 5, 2024

This article belongs to the Special Issue

Cite this article: Escobar-Rodríguez M, Veeramah K. Evaluation of ancient DNA imputation: a simulation study. Hum Popul Genet Genom 2024; 4(1):0002. https://doi.org/10.47248/hpgg2404010002

Abstract

Ancient genomic data is becoming increasingly available thanks to recent advances in high-throughput sequencing technologies. Yet, post-mortem degradation of endogenous ancient DNA often results in low depth of coverage and subsequently high levels of genotype missingness and uncertainty. Genotype imputation is a potential strategy for increasing the information available in ancient DNA samples and thus improving the power of downstream population genetic analyses. However, the performance of genotype imputation on ancient genomes under different conditions has not yet been fully explored, with all previous work primarily using an empirical approach of downsampling high coverage paleogenomes. While these studies have provided invaluable insights into best practices for imputation, they rely on a fairly limited number of existing high coverage samples with significant temporal and geographical biases. As an alternative, we used a coalescent simulation approach to generate genomes with characteristics of ancient DNA in order to more systematically evaluate the performance of two popular imputation software, BEAGLE and GLIMPSE, under variable divergence times between the target sample and reference haplotypes, as well as different depths of coverage and reference sample size. Our results suggest that for genomes with coverage <=0.1x imputation performance is poor regardless of the strategy employed. Beyond 0.1x coverage imputation is generally improved as the size of the reference panel increases, and imputation accuracy decreases with increasing divergence between target and reference populations. It may thus be preferable to compile a smaller set of less diverged reference samples than a larger more highly diverged dataset. In addition, the imputation accuracy may plateau beyond some level of divergence between the reference and target populations. While accuracy at common variants is similar regardless of divergence time, rarer variants are better imputed on less diverged target samples. Furthermore, both imputation software, but particularly GLIMPSE, overestimate high genotype probability calls, especially at low coverages. Our results provide insight into optimal strategies for ancient genotype imputation under a wide set of scenarios, complementing previous empirical studies based on imputing downsampled high-coverage ancient genomes.

Keywords

Paleogenomics, ancient DNA, genomics, imputation, simulations, population genetics

Share this article

About Us Journals Join Us Submit Fees Contact