On the Demographic History of Chimpanzees and Some Consequences of Integrating Population Structure in Chimpanzees and Other Great Apes

Sections

Abstract
1. Introduction
2. Materials and Methods
3. Results
4. Discussion
5. Conclusions
Supplementary Materials
Declarations
Abbreviations
Reference

Human Population Genetics and Genomics 2025;5(2):0004 | https://doi.org/10.47248/hpgg2505020004

Original Research Open Access

On the Demographic History of Chimpanzees and Some Consequences of Integrating Population Structure in Chimpanzees and Other Great Apes

Camille Steux ^1,2,3 , Clément Couloigner ^3,4 , Armando Arredondo ^5,6 , Willy Rodríguez ^5,6 , Olivier Mazet ^5,6 , Rémi Tournebize ^1,3,7 , Lounès Chikhi ^1,2,3

1. Centre de Recherche sur la Biodiversité et l’Environnement (CRBE), Université de Toulouse, CNRS, IRD, Toulouse INP, Université Toulouse 3 – Paul Sabatier (UT3), 31062 Toulouse, France
2. Centre for Ecology, Evolution and Environmental Changes (cE3c), Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal
3. Instituto Gulbenkian de Ciência, 2780-156 Oeiras, Portugal
4. Centre d’Etude de la Forêt, Département de Biologie, Université Laval, G1V 0A6 Québec, Canada
5. Université de Toulouse, Institut National des Sciences Appliquées, Institut de Mathématiques de Toulouse, 31077 Toulouse, France
6. Institut de Mathématiques de Toulouse; UMR5219. Université de Toulouse, 31062 Toulouse, France
7. UMR DIADE - Diversité, adaptation, développement des plantes, CIRAD, CNRS, IRD, Université de Montpellier, 34394 Montpellier, France

Correspondence: Camille Steux; Lounès Chikhi

Academic Editor(s): Joshua Akey

Received: Jun 17, 2024 | Accepted: Jan 28, 2025 | Published: May 24, 2025

© 2025 by the author(s). This is an Open Access article distributed under the Creative Commons License Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is correctly credited.

Cite this article: Steux C, Couloigner C, Arredondo A, Rodríguez W, Mazet O, Tournebize R, Chikhi L. On the Demographic History of Chimpanzees and Some Consequences of Integrating Population Structure in Chimpanzees and Other Great Apes. Hum Popul Genet Genom 2025; 5(2):0004. https://doi.org/10.47248/hpgg2505020004

Abstract

Reconstructing the evolutionary history of great apes is of particular importance for our understanding of the demographic history of humans. The reason for this is that modern humans and their hominin ancestors evolved in Africa and thus shared the continent with the ancestors of chimpanzees and gorillas. Common chimpanzees (Pan troglodytes) are our closest relatives with bonobos (Pan paniscus) and most of what we know about their evolutionary history comes from genetic and genomic studies. Most evolutionary studies of common chimpanzees have assumed that the four currently recognised subspecies can be modelled using simple tree models where each subspecies is panmictic and represented by one branch of the evolutionary tree. In addition, one recent genetic study claimed to have detected admixture between bonobos and chimpanzees. However, several studies have identified the existence of significant population structure with evidence of isolation-by-distance (IBD) patterns, both within and between subspecies. This suggests that demographic models integrating population structure may be necessary to improve our understanding of their evolutionary history. Here we propose to use n-island models within each subspecies to infer a demographic history integrating population structure and changes in connectivity (i.e., gene flow). For each subspecies, we use SNIF (Structured non-stationary inference framework), a method developed to infer a piecewise stationary n-island model using PSMC (Pairwise sequentially Markovian coalescent) curves as summary statistics. We then propose a general model integrating the four subspecies as metapopulations within a phylogenetic tree. We find that this model correctly predicts estimates of within subspecies genetic diversity and differentiation, but overestimates genetic differentiation between subspecies as a consequence of the tree structure. We argue that spatial models integrating gene flow between subspecies should improve the prediction of between subspecies differentiation and generate the observed IBD patterns. We also simulated data under a simple spatially structured model for bonobos and chimpanzees (without admixture) and found that it generates potentially spurious signals of admixture between the two species that have been reported and could thus be spurious. This may have implications for our understanding of the evolutionary history of the Homo genus.

Keywords

demographic history, population structure, chimpanzees, connectivity, fragmentation, human evolution, spatial models, admixture

1. Introduction

Common chimpanzees (Pan troglodytes) and bonobos (Pan paniscus) are great apes found in western and central Africa, and they are the closest relatives to humans from which they diverged between 5 Mya [1,2] and 7-8 Mya [3]. The current taxonomy of the genus Pan recognises bonobos as one unique species, geographically separated from common chimpanzees by the Congo river and from which it would have diverged between 0.9 and 2 Mya [2,4–6]. Unlike bonobos, it is currently considered that common chimpanzees are further divided into four subspecies [2,7]. Western chimpanzees, P. t. verus, occur in the most western part of the species geographic range, from Senegal on the west to Ghana on the east (see Figure 1). The other three subspecies are separated from Western chimpanzees by the Dahomey gap, and their distribution ranges from Nigeria on the west to Tanzania on the east. From west to east, Nigeria-Cameroon chimpanzees (P. t. ellioti) are separated from Central chimpanzees (P. t. troglodytes) by the Sanaga river, and Eastern chimpanzees (P. t. schweinfurthii) are separated from Central chimpanzees by the Ubangi river (Figure 1).

Figure 1 Distribution of the Pan genus. Data were extracted from the IUCN Red List of Threatened Species [17,18], and the background was made with Natural Earth available at naturalearthdata.com.

Genetic and genomic analyses suggest that the four subspecies of common chimpanzees form two distinct monophyletic groups that split around 400-600 kya, with Western and Nigeria-Cameroon chimpanzees forming one clade and Central and Eastern chimpanzees forming the other clade [2,4,8–11]. Several studies estimate that Western and Nigeria-Cameroon chimpanzees probably split between 250-500 kya [2,6], while Eastern and Central chimpanzees separated more recently, around 90-250 kya [2,6,9,10]. The existence and the magnitude of gene flow between subspecies, today or in the past, remains however unclear [2,9–12]. Indeed, if the delimitation of the subspecies reflects mostly current geographic barriers (the Dahomey gap, the Sanaga and the Ubangi rivers), it is very likely that these barriers were more or less permeable in the past [13], and it has also been suggested that the ancestral population of common chimpanzees used to cover a wider and more continuous geographic range [14]. Furthermore, studies using models of isolation with migration have found signals of gene flow between subspecies, even though there is little consensus regarding the pairs of subspecies involved (see Figure 1 in Brand et al. [12] for a summary). For instance, Brand et al. [12] have identified introgressed segments from Western chimpanzees in Eastern chimpanzees, previously suggested by Hey [9], whereas Wegmann and Excoffier [11] found gene flow from Western to Central chimpanzees. Other studies have identified isolation-by-distance (IBD) patterns, where genetic distance increases with geographic distance [15,16]. Taking the four subspecies as a single unit, Lester et al. [16] computed pairwise F_ST values between samples both within and between subspecies. They suggested that patterns of genetic diversity and differentiation could indeed correspond to a model with continuous gene flow across the whole P. troglodytes species geographic range, with the exceptions of a few highly isolated populations [16]. Patterns of IBD were also identified across Central and Eastern chimpanzees in an earlier study by Fünfstück et al. [15], questioning the classification of these two populations as distinct subspecies [15].

Additionally, several studies have pointed out the existence of structure within subspecies [2,6,15,16,19–21]. Central and Eastern chimpanzees could thus correspond to a set of populations that were connected by gene flow in the recent past, thus explaining the IBD pattern still detectable today [15,19,22]. In Nigeria-Cameroon chimpanzees, Mitchell et al. [20] identified two genetically distinct populations, one located in the forests of western Cameroon and another one in central Cameroon. Prado-Martinez et al. [2] also suggested the existence of substructure as they identified three Nigeria-Cameroon individuals who could belong to a distinct population than the rest of their sample. Finally, population structure has been identified as well in Western chimpanzees [21].

In parallel, much work has been done to reconstruct the demographic history of common chimpanzees using genetic and genomic data [1,2,5,7–11,19,21–23]. Prado-Martinez et al. [2] used the Pairwise sequential Markovian coalescent (PSMC) method of Li and Durbin [24] on common chimpanzee individual genomes to reconstruct what Prado-Martinez et al. [2] interpreted as a history of effective population size (N_e) changes through time. This method is particularly suited for endangered species, for which genomic data can be limited [25–27] because it only requires one single diploid genome. The interpretation of the PSMC curve is however not trivial [28–31]. Indeed, whereas the interpretation of Prado-Martinez et al. [2] in terms of changes in N_e is potentially valid, several studies have shown that ignoring population structure can lead to the inference of spurious population size changes [32,33]. In the case of the PSMC method, Mazet et al. [28] have shown that under structured population models, the PSMC curve will not only be influenced by changes in N_e, but also by population structure, and subsequently by changes in migration rates between populations [28–31]. Given that population structure has been identified in common chimpanzees, both across the four subspecies and within subspecies, this means that there is currently no general model that would allow us to interpret the PSMC curves, while accounting for the observed patterns of IBD. Indeed, the current models of divergence represent the evolutionary history of the species and subspecies as successive splits of constant-size panmictic populations, which are incompatible with the PSMC curves. Furthermore, assuming non-structured models can lead to the detection of spurious signals of admixture [34,35]. For instance, de Manuel et al. [6] recently identified an ancient admixture event between bonobos and chimpanzees, under the assumption that subspecies could be modelled as panmictic units. Population structure in chimpanzees and bonobos could thus explain signals of admixture that have been previously detected [6,9,11,12]. Altogether chimpanzees (and bonobos) may represent interesting models to study ancient population structure and how it influences patterns of genomic diversity in present day populations. The current study benefits from work already done in humans [28,30,34,35] but could also provide some interesting avenues of research for geneticists interested in ancient population structure in humans, by providing comparative data, and prompting similar work in other great apes and vertebrates.

Mazet et al. [28] introduced the IICR (Inverse instantaneous coalescence rate), and showed that the PSMC is actually an estimate of the IICR and corresponds to changes in N_e under total panmixia but not necessarily under other demographic models. Mazet and colleagues also showed in several studies that the IICR can be characterized for any model of population structure under the coalescent, which opened the way to doing demographic inference using the PSMC as a summary statistic [29–31,36]. To that purpose, Arredondo et al. [31] developed a method that allows to infer the parameters of a piecewise stationary n-island model [37] from a PSMC curve. It allows to infer the number of islands or demes, n, their size N (in diploids), and the times, t_i, at which gene flow, M_i = 4Nm_i, may have changed by simply specifying a range of possible values for each one of these parameters. This method is implemented in SNIF (Structured non-stationary inference framework).

In the present study, we ask whether it is possible to integrate population structure within each subspecies of common chimpanzees and infer a reasonable demographic history that explains the PSMC curves within one single model for each subspecies and then for the species as a whole. We acknowledge the existence of more recent methods to reconstruct demographic history from whole genome data [38]. However, the current work is based on the use of published PSMC curves, rather than the original genomic data, without which the other approaches cannot be applied. In addition, the number of genomes available, while important for an endangered species, is rather low for several recent methods which require more individuals per population. Also, our aim here is to continue the work of Chikhi et al. [33] and others on the impact of structure on the inference of demographic events [28–31,39,40]. These studies are often cited as caution but too rarely integrated in the inference process. Here we show that it is possible to make progress in our understanding of past population structure with PSMC curves. We aim to show how this general framework and the SNIF program can be applied to actually interpret PSMC curves, hence providing a potential guide for other researchers interested in these questions. First, we use SNIF [31] to infer piecewise stationary n-island models for each subspecies of common chimpanzees, assuming constant deme size, using the PSMC curves generated by Prado-Martinez et al. [2]. At each step, we validate the SNIF inference by generating IICR and PSMC curves from the inferred demographic models and by applying SNIF to the simulated curves. From the resulting inferences, we then propose a model of demographic history for the four subspecies, integrating the n-island models and a tree model consistent with previous research. We predict genetic diversity (nucleotide diversity) and differentiation (F_ST) both within and between subspecies and compare the predicted values to empirical estimates. We found that a model of structured populations with successive population splits and variable migration rates is sufficient to explain both the PSMC curves and several statistics of genomic diversity. We also find some discrepancies between the observed and predicted F_ST values between subspecies and use these to identify future directions for research. In particular, we suggest that models incorporating gene flow between subspecies and including spatial structure should be explored. As a proof of concept we test a small set of tree models with gene flow between the branches. We also use a simple stepping-stone model and show how signals interpreted as signatures of admixture between chimpanzees and bonobos could be explained by population structure alone. These results are thus of great importance for the analysis of primate genomes in general, including humans where admixture events have been inferred and for which population structure has also been invoked as a possible explanation [34,35,41].

2. Materials and Methods

2.1 Data: PSMC curves

The PSMC curves of the chimpanzees used here were retrieved from the study of Prado-Martinez et al. [2] who kindly shared the .psmc files. We only kept the PSMC curves that were computed on genomes with a coverage higher than 12X. In total, this corresponded to a total of 17 individuals, namely three Eastern chimpanzees (P. t. schweinfurthii), four Central chimpanzees (P. t. troglodytes), five Nigeria-Cameroon chimpanzees (P. t. ellioti) and five Western chimpanzees (P. t. verus). The PSMC files were used to produce the PSMC curves (Figure S6).

2.2 Inference of demographic histories for each subspecies under piecewise stationary n-island models

We used SNIF (Structured non-stationary inferential framework), a freely available program (https://github.com/arredondos/snif) based on a method developed by Arredondo et al. [31] to infer parameters of piecewise stationary n-island models. SNIF assumes that the number of demes (n) and their size (N diploids) are unknown and constant through time, whereas scaled migration rates (M = 4Nm, where m is the proportion of migrating genes at each generation) are allowed to vary over time in a piecewise manner. Note that throughout the whole paper, deme sizes are given in number of diploid individuals.

More specifically, SNIF assumes that the PSMC can be decomposed by dividing time into periods, called "components" (c), during which migration rate, M_i for component c_i, is constant. SNIF will infer the best timing (t_i) and duration (t_i₊₁ – t_i) for these components to fit the observed PSMC/IICR curve for a given and fixed value of c provided by the user. To estimate the parameters of the model, SNIF minimizes a distance computed between the observed PSMC/IICR and the IICR simulated under the piecewise stationary n-island model (see Arredondo et al. [31] for details). The user must specify ω, a parameter that weights the computation of the distance between observed and simulated IICR curves by giving more weight to either recent or ancient times. The parameter space explored by SNIF is defined by the user, who specifies a range of values for the parameters of the model, namely the number of islands n, their size N (in number of diploids), the scaled migration rates M_i = 4Nm_i for each component c_i with i ∈ {0, …, c – 1}, and the times t_i (in generations) separating the components c_i and c_i+₁ (for instance t₁ separates the first component c₁ that starts at t₀ = 0 and the second component c₂ that ends at t₂). To scale and compare IICR curves to PSMC curves, a mutation rate (μ) and a generation time (g) are also required. The following values were used for the chimpanzee data: μ = 1.5 × 10^–8 per bp per generation [12] and g = 25 years [2].

To reduce computation time and improve consistency across runs, we first ran inferences using a wide parameter space to identify the range of parameter values that were more likely to produce IICR curves reasonably close to the observed PSMC. This allowed us to then re-run the analyses on a smaller parameter space, making the optimization algorithm more efficient for the same number of iterations. For this exploratory step, the following ranges were used: n ∈ [2; 100], N ∈ [10; 2 × 10⁴], M_i ∈ [0.01; 100], and t_i ∈ [4 × 10²; 4 × 10⁵]. We also identified values for c and to that would best describe the observed PSMC curves. We tested c ∈ {4, 5, 6, 7, 8} and ω ∈ {0.5, 1} (1 being the default value). SNIF was run ten times for each combination of c and ω value, and each run used 50 iteration steps of the optimization algorithm. The identification of a smaller parameter space necessarily resulted from subjective choices. First, even though SNIF computes a distance between the source IICR or PSMC and the inferred IICR, this distance computation depends on ω, making it impossible to use this distance to compare runs with different ω values and select the "best" ω. Secondly, the inference is expected to improve as c increases, since it allows the algorithm to add more changes in M and thus better fit the observed PSMC/IICR curve, necessarily reducing the distance between source and inferred IICR. We allowed c to vary between four and eight because the minimum number of components required to explain two humps in a PSMC is c = 4 [28] and because we considered that using more than eight components might lead to over-parametrization (see next section on the validation process).

This exploratory step allowed us to significantly reduce the parameter space as can be seen in Table 1, where the ranges for N and n were halved or nearly halved. We also found that ω=0.5 generated the best fit to the observed data, a value which gives more weight to the past than the default value 1. Finally, we identified the most likely time windows (t_i) for the changes of migration rates (see Table S2). The latter significantly reduced the time required by the optimization algorithm as the t_i and M_i values can vary over several orders of magnitude. In particular, the distance between the target and inferred IICR was on average smaller for the same number of iterations when we constrained the t_i values (based on preliminary runs) than when we did not (results not shown). We finally found that using seven components for Western, Nigeria-Cameroon and Eastern chimpanzees, and eight components for Central chimpanzees provided a good balance between model complexity and increase in fit to the observed PSMC (and validation, see next section). For Western chimpanzees, we further specified not to fit the very recent past (< 20 kya), because this part of the PSMC curves are known to be unreliable [24] and clearly differ in magnitude between the individuals of this subspecies. Such a recent increase is not observed for Nigeria-Cameroon and Eastern chimpanzees. In the case of Central chimpanzees, the very high recent increase of IICR curves for two individuals is so high that SNIF did not appear to fit it with our set of parameters (especially ω=0.5 which gives more weight to the past), therefore not impacting the inference. Table 1 summarizes the parameter space used for the analysis of all individuals of the different subspecies and for the results shown further down. Using this final parameter space, SNIF was run ten times on each PSMC curve and each run used 50 iteration steps of the optimization algorithm.

Table 1 Parameter space used to run SNIF

2.3 Validation of the inferred scenarios

Once we had inferred demographic scenarios, we performed a validation step as recommended by Arredondo et al. [31]. They suggested to simulate pseudo-observed data (POD) under an inferred scenario S^*, here in the form of IICR or PSMC curves, and to analyse these POD using SNIF with the same parameter ranges and c and ω values as those used to analyse the original observed PSMC curves. This procedure is somewhat similar to the validation process used in Approximate Bayesian Computation (ABC, [42]). In practice, SNIF allows the user to generate a ms command [43] to simulate coalescent times (T₂) under an inferred scenario S^*. This command is used by SNIF as an input to, first, produce a pseudo-observed IICR curve using simulated T₂ values, and second, infer the best piecewise stationary n-island model. This allows to quantify the discrepancy between the inferred model and the pseudo-observed underlying one. It is also possible to produce a ms command to simulate genomic data (by adapting the ms command using appropriate mutation and recombination rates and generation time) and run the PSMC method to produce a PSMC curve that can then be used as POD. Generating pseudo-observed IICR curves using simulated T₂ is faster than using simulated genomic sequences (several hours may be necessary to produce a pseudo-observed PSMC curve), and is a better approach when performing a high number of inferences with SNIF. Using sequences and the PSMC method should however be preferred to replicate studies with real data. We must note that the latter approach implicitly makes the assumption that the PSMC method correctly approximates the theoretical IICR. Here, we performed the validation step using both simulated IICR and PSMC curves, as explained below and in the Supplementary material.

Different individuals of the same subspecies exhibit PSMC curves that can differ more or less significantly in the recent past (see Figure S6 and Results). In addition, running SNIF on a particular PSMC curve can generate slightly different scenarios (see Results) generally characterized by similar connectivity graphs. Instead of trying to validate many similar scenarios, we arbitrarily chose an average scenario for each subspecies based on parameter values close to the median of the distribution of the inferred values. For instance, we found that the inferred n values varied between 12 and 48 for Western chimpanzees, with 50% of inferred values between 17 and 31, and we thus selected a scenario with n = 25 and extracted the corresponding ms command. This ms command then served to produce as many independent data sets of T₂ or genomic sequences, and therefore IICR or PSMC curves, as there were individuals in that subspecies (to simulate different “pseudo-observed individuals”). This allowed us to quantify the variation of inferred parameter values and compare it to the variation observed when analysing the empirical data. For each run, we simulated 10⁶ T₂ values to produce an IICR curve, and simulated 10 genomic sequences of 100 Mb each to produce PSMC curves, adapting the ms command using a mutation rate of 1.5 × 10^–8 per bp per generation [12], a recombination rate of 0.7 × 10^–8 per bp per generation [4], and a generation time of 25 years [2]. We produced (pseudo-observed) PSMC curves with PSMC [24] (flags -N 25 -t 15 -r 5 -p “4+25*2+4+6”). Altogether, these pseudo-observed IICR and PSMC curves were used by SNIF as POD, and the inference steps were done exactly as described in the previous section for the empirical PSMC curves.

2.4 Integration of the four subspecies n-island models in a general tree model

In the previous section, we inferred and validated demographic scenarios able to reproduce and fit the observed PSMC curves for each subspecies independently. Here we asked whether it was possible to integrate the four sub-subspecies into one unique demographic model that could explain the individual empirical PSMC curves, while incorporating a splitting tree based on the relationships between the subspecies as inferred by Prado-Martinez et al. [2] and the stepwise stationary n-island models within each branch of the tree, instead of assuming a panmictic population or subspecies. We constructed a scenario where an ancestral species is subdivided into n populations, and splits at time T_CENW into two branches which are themselves subdivided in demes. One of these branches will later divides into a set of demes representing the ancestor of the Central chimpanzees and another set of demes corresponding to the Eastern chimpanzees’ ancestors. The other ancestral branch becomes the ancestor to Western and Nigeria-Cameroon chimpanzees following a similar process. These splits thus generate the demes corresponding to the current four subspecies, at T_CE and T_NW for Central/Eastern chimpanzees and Western/Nigeria-Cameroon chimpanzees, respectively.

SNIF cannot infer complex models involving both n-islands and tree models. Consequently, we built a general scenario manually, using the arbitrary “average” scenario used for the validation step above as a starting point. From the subspecies scenarios, we constructed a tree model where the subspecies n-island models merge (backward in time) in a way similar to that used by Rodríguez et al. [30]. For instance, the n-island models of Western and Nigeria-Cameroon chimpanzees were characterized by n = 25 and n = 13 islands respectively, and we thus merged the two sets of islands so as to use the largest n number for the ancestral species as suggested by Rodríguez et al. [30]. The same process was done for Central and Eastern chimpanzees, and then again for the ancestral branches when they merged into the most ancestral meta-population. Different values were tested for T_CENW, T_NW and T_CE to match the times at which the PSMC curves of the sub-populations were merging backward in time, namely T_CENW ∈ {900, 800, 700}, T_NW ∈ {900, 800, 700} and T_CE ∈ {600, 500, 400} in kya (thousands of years ago). Alternatively, one could have estimated these divergence times using methods such as MSMC2 [44]. However, given that we only had access to the PSMC curves, we decided not to go in that direction and only use the PSMC curves. For each scenario we generated the IICR plots using a script developed by W. Rodriguez [29]. We used the script to simulate 10⁶ T₂ values with ms, sampling two haploid individuals in one deme, repeating the process for each of the four current meta-populations.

2.5 Prediction of genomic diversity and differentiation statistics

To test whether the general tree model (Figure 7) was able to predict genetic diversity and differentiation statistics in addition to the IICR, we simulated 100 segments of 1 Mb under the four subspecies models and under variations of the general model using ms [43], where we allowed for the splitting times (or joining times, with time going backward) to take several values (see previous section). We used a mutation rate of 1.5 × 10^–8 per bp per generation [12] and a recombination rate of 0.7 × 10^–8 per bp per generation [4]. We estimated genetic diversity by computing the individual observed heterozygosity (H_o) in 10 diploid individuals sampled in the present from one deme for each subspecies. We computed H_o as the number of heterozygous sites divided by the total length of the simulated genomes (100*1Mb). We also computed genetic differentiation (Hudson’s F_ST) between demes of the same subspecies and between demes from different subspecies, sampling ten diploid individuals per deme. Pairwise F_ST were computed using original scripts from Tournebize and Chikhi [35].

Empirical values of genetic diversity were retrieved from Prado-Martinez et al. [2] (from Suppl. Table 12.4.1) and de Manuel et al. [6] (Table S4). Both studies used the same genomic data produced by Prado-Martinez et al. [2] and computed observed individual heterozygosity. Empirical values of F_ST were retrieved from Fischer et al. [22] (computed on autosomal sequences using Hudson’s estimator) and Lester et al. [16] (computed on microsatellites). Lester et al. [16] only published F′_ST, another estimate of genetic distance derived from F_ST, and they kindly shared with us the original F_ST values. More information regarding the retrieved empirical genomic statistics and the corresponding studies are available in Table S5.

3. Results

3.1 Independent demographic history of the different subspecies

We found that by using the parameter space described in Table 1 and applying SNIF to all individual PSMC curves with high enough coverage within each subspecies, we were able to produce IICR curves that were similar to the observed PSMC plots, as displayed in Figure 2 for an example and Figures S7-S10 for all inferences. The inferred parameters are displayed in Figure 3. Panels A and B show the distribution of the inferred number of islands n and their size N (in number of diploid individuals), respectively (see also Table S4 and Table S5). The largest number of islands and the smallest deme size were inferred for Western chimpanzees, with a median n equal to 21 (50% of the inferred n values being between 17 to 31) and a median N equal to 305 (50% of the inferred N values between 239 and 335). The inferred number of islands was similar for Nigerian-Cameron and Eastern chimpanzees, with median values being 11 and 13, respectively and median N being 1154 and 800, respectively. Finally, for Central chimpanzees, 50% of the inferred n and N fall within 16-20 and 589-834, respectively.

Figure 2 Inferred IICR curves and empirical PSMC curves for one individual per subspecies. The IICR curves are in red, and the empirical PSMC curves are in blue, and each panel corresponds to one individual from a different chimpanzee subspecies. Panel A. Western chimpanzee (Clint). Panel B. Nigerian-Cameroon chimpanzee (Damian). Panel C. Central chimpanzee (Vaillant). Panel D. Eastern chimpanzee (Kidongo). Only one repetition of SNIF is displayed. See Figures S7-S10 for all the inferences. The vertical red lines highlight the times at which there is an inferred change in migration rate and therefore delimit the SNIF components. The grey zone in panel A corresponds to a part of the source PSMC which was not taken into account in the fitting of the curve by SNIF (see Material and Methods).

Figure 3 Distribution of the number and size of the demes inferred with SNIF for the four chimpanzee subspecies. Panel A: Inferred number of islands. Panel B: Deme size (in number of diploid individuals). For both panels the results are plotted for each subspecies separately for Western (W), Nigeria-Cameroon (NC), Central (C) and Eastern (E) chimpanzees across the ten independent repetitions/inferences carried out per individual for all individuals, using the parameter space shown Table 1. Each dot corresponds to one repetition of SNIF done on one individual. The horizontal lines inside the violins correspond to the 25%, 50% (median) and 75% quantiles.

The connectivity graphs (Figure 4) show the inferred migration rates (M_i = 4Nm_i) through time. For the four subspecies, we observe a significant increase in connectivity (forward in time) between 2 and 3 Mya, with a higher support for the period 2.5-3 Mya, followed by a decrease around 1 Mya. This period of higher connectivity is characterized by M_i values above 3 and up to 50 migrants per generation across the n-island for Nigeria-Cameroon, Central and Eastern chimpanzees, while values for Western chimpanzees are between 0.4 and 4. This period corresponds to a time when the four subspecies had a likely common ancestor. For Western and Nigeria-Cameroon chimpanzees, we infer a second more recent increase in connectivity between 500-600 kya and 200 kya, not observed in Central and Eastern chimpanzees, with M_i values ranging between 0.8 and 50. Finally, in the more recent past, all subspecies exhibit an increase in gene flow, occurring around 100-150 kya for Eastern chimpanzees, 50 kya for Central chimpanzees, 40 kya for Western chimpanzees and between 70 and 10 kya for Nigeria-Cameroon chimpanzees.

Figure 4 Connectivity graphs inferred by SNIF for the four chimpanzee subspecies. The y-axis represents scaled migration rates between demes (M_i), and the x-axis represents time in years on a logarithmic scale. Top panel shows connectivity for Western and Nigeria-Cameroon chimpanzees. Bottom panel shows the same for Central and Eastern chimpanzees. Each coloured curve corresponds to one inference (one repetition of SNIF) using one PSMC curve (or individual), giving therefore 50, 50, 40 and 30 curves in total for Western, Nigeria-Cameroon, Central and Eastern chimpanzees respectively. Backward in time, the vertical coloured intervals represent respectively: C-E divergence time (in pink, bottom panel), W-NC divergence time (in blue, top panel), C-E and W-NC ancestral populations divergence time (in yellow, both panels), the mid-Pleistocene transition (light green, both panels) and the Pliocene-Pleistocene transition (dark green, both panels).

While the connectivity graphs are rather robust across individuals from the same subspecies and for more ancient periods for individuals from different subspecies, there is some variability in the inferred scenarios, as expected from the fact that different individuals exhibit different PSMC plots (see next section). We also observe variability in the number of islands or demes inferred for Western chimpanzees (extreme values range: 12-48 islands), and in the deme size inferred for Nigerian-Cameroon chimpanzees (extreme values range: 616-1980 diploid individuals). We also observe much variability in the most recent and most ancient parts of the connectivity graphs. For instance, there is a great variability in M_i values for Central chimpanzees in the last 50 kya and before 10 Mya, and in the M_i and t_i values for Nigeria-Cameroon chimpanzees in the last 1 Mya.

3.2 Validation step

We simulated T₂ values from an average scenario identified for each subspecies and obtained IICR curves that were then analyzed using SNIF as a validation test of our inferential procedure (see Material and Methods). We were able to recover the original scenario with great precision, as shown in Figures 5A-B and Figure 6. Black dots and lines are the pseudo-observed data, and the coloured patterns are the values inferred by SNIF. The inferred n and N are centered around the values that were used to generate the pseudo-observed IICRs, suggesting that SNIF is able to infer the complex scenarios inferred from the real data. We observe some variability in the inferences, as expected in any inference process and as we observed for the real data. Similarly, the inferred connectivity graphs were generally very good, with nearly no variability in the inferred t_i values, and slightly more variability in the larger M_i values. The shape of the inferred connectivity graphs is however perfectly inferred for the complex scenarios that had seven or eight components.

Figure 5 Distribution of the number and the size of the demes inferred for simulated data under an inferred scenario for each subspecies of chimpanzees. A. The number of islands and B. the deme size inferred by SNIF on pseudo-observed IICR computed from simulated T₂, and C. the number of islands and D. the deme size inferred by SNIF on pseudo-observed PSMC computed from simulated genomic sequences. Is shown the distribution across the 10 repetitions of SNIF per pseudo-observed IICR or PSMC and across all individuals for Western (W), Nigeria-Cameroon (NC), Central (C) and Eastern (E) chimpanzees. Black dots are the pseudo-observed data. Horizontal lines in the violins represent the 25%, 50% (median) and 75% quantiles.

Figure 6 Inferred connectivity graph using a scenario inferred by SNIF as pseudo-observed data for each subspecies of common chimpanzee. A. Western chimpanzees, B. Nigeria-Cameroon chimpanzees, C. Central chimpanzees and D. Eastern chimpanzees. Black lines are the pseudo-observed data, and each colored line corresponds to one inference done on one simulated (or pseudo-observed) individual.

When we simulated genomic data under inferred scenarios, generated PSMC curves and provided them to SNIF as POD (Figure 5C-D and Figure S14), we also recovered the original scenarios with good precision, though SNIF inferred fewer and larger islands for Eastern chimpanzees, suggesting that we might be underestimating n and overestimating N for this particular subspecies. Inferred connectivity graphs were also generally good for the four subspecies, with higher variability in the inferred t_i and M_i values than for pseudo-observed IICRs computed on simulated T₂, which approaches the variability observed when running SNIF on empirical data. Altogether these results confirm the ability to infer a complex history of changes in connectivity under the piecewise stationary n-island model.

3.3 General n-island model

The general tree model incorporating within its branches n-island models based on the inferences presented above for each subspecies is represented Figure 7. As explained in the Materials and Methods section, it was obtained by selecting for each subspecies an “average” inferred scenario for the different subspecies. The scenarios we kept had n = 25 islands of size N = 352 (diploid) individuals for Western chimpanzees, n = 12 islands of size N = 1162 individuals for Nigeria-Cameron chimpanzees, n = 17 islands of size N = 819 individuals for Central chimpanzees and n = 13 islands of size N = 723 individuals for Eastern chimpanzees. The sets of demes were then successively divided at times corresponding to the estimated split times for the pairs of subspecies. These split times are not known but can be approximated by using the times at which the different PSMC curves join. For instance, at time T_CE, the 17 demes of Central chimpanzees and the 13 demes of Eastern chimpanzees are all assumed to derive (forward in time) from the demes of their ancestor. As noted in the Materials and Methods, this ancestor was assumed to have 17 demes as 17 is the largest of the two values of n. For simplicity, the 13 Eastern chimpanzee demes were assumed to derive from 13 demes rather than from all 17 demes of the ancestral meta-population. Similarly, at time T_NW, 12 demes of Nigeria-Cameroon chimpanzees join backward in time 12 (out of 25) demes of the ancestor they share with Western chimpanzees. Finally, at time T_CENW, the 17 islands of the ancestral meta-population of Central and Eastern chimpanzees join 17 (out of 25) islands of the ancestral meta-population of Western and Nigeria-Cameroon chimpanzees. Thus, the most ancestral meta-population is represented by 25 demes.

Figure 7 Proposed general n-island model for the history of the four subspecies of common chimpanzee. Dark full circles symbolize subpopulations of a n-island model, and the number in the middle corresponds to the number of demes n. Deme size N is given in number of diploid individuals (in blue). T_CE, T_NW and T_CENW correspond to the times at which two meta-populations join each other (backward in time), for Central/Eastern, Nigeria-Cameroon/Western, and the two ancestral chimpanzee meta-populations respectively. In blue, along the vertical intervals, are the migration rates M_i = 4Nm_i.

Several values for T_CE, T_NW, and T_CENW were tested as it is unclear how closely splitting times of meta-populations correspond to splitting times of IICR or PSMC curves (see for instance Rodríguez et al. [30] and Chikhi et al. [29]). Empirical and simulated IICR curves presented in Figure 8 were obtained for T_CE = 600 kya, T_NW = 800 kya and T_CENW = 900 kya, which are the values that gave the best visual fit of the estimated IICR curve to the observed PSMC curves (in particular for Nigeria-Cameroon and Eastern chimpanzees, see below and Figures S15-S18). For each subspecies, the simulated IICR curve produced for a sample taken in a deme is represented and closely follows the corresponding observed PSMC. We found that changing the splitting times (Figures S15-S18), and especially testing more recent values, did not significantly change the IICR curves, even though the estimated IICR could depart from the observed PSMC at the splitting time when the latter occurred too recently (see the case of Nigeria-Cameroon and Eastern chimpanzees in Figure S16 and Figure S18 respectively). Altogether, we could construct a general model able to explain all the observed PSMC curves while incorporating both a tree model and intra-subspecies population structure, without any population size change within each branch.

Figure 8 Empirical PSMC (solid line) and simulated IICR (dotted lines) for each subspecies of common chimpanzees under the n-island model displayed Figure 7, with T_CE = 600 Mya, T_NW = 800 Mya and T_CENW = 900 Mya. A. Western chimpanzees, B. Nigeria-Cameroon chimpanzees, C. Central chimpanzees and D. Eastern chimpanzees.

3.4 Prediction of other genomic statistics of diversity and differentiation

Though we stress that none of the models above (individual and global) should be taken at face value, the results obtained suggest that we were able to generate IICR plots that were similar to the observed PSMC plots under (1) n-island models for each subspecies independently and (2) a general model incorporating the four subspecies. We simulated genomic data under our general model and computed statistics representing genetic diversity and genetic differentiation to see if it could predict values close to empirical ones. As can be seen in Figure 9, we found that our simulated diversity measures were close to the observed values computed by de Manuel et al. [6], and slightly lower than those computed by Prado-Martinez et al. [2] for Nigeria-Cameroon, Central and Eastern chimpanzees. We also found that our simulations recovered the ranking of genetic diversity for three subspecies, with Central chimpanzees being the most genetically diverse and Western chimpanzees harbouring the lowest level of genetic diversity, as observed empirically (Figure 9). However, the simulated Nigeria-Cameroon chimpanzees showed the same genetic diversity as Western chimpanzees, which is not consistent with what is found empirically [2]. These diversity estimates were identical across the different splitting times we tested.

Figure 9 Genetic diversity of genomic data simulated under the model of Figure 7. Panel A corresponds to the simulated and empirical results for the Western (left) and Nigeria-Cameroon (right) chimpanzees. Panel B corresponds to the simulated and empirical results for the Central (left) and Eastern (right) chimpanzees. In this figure, we used several values for splitting times and sampled 10 diploid individuals in one of the demes of each of the subspecies. We plotted the simulated genetic diversity measures (observed heterozygosity) with different colours according to the subspecies splitting times. The empirical values were retrieved from (*) Prado-Martinez et al. [2] and (**) de Manuel et al. [6], each dot corresponding to an estimate of individual observed heterozygosity.

Regarding the pairwise F_ST values, Figure 10 (and Figures S19-S21) shows that the general model predicts levels of within subspecies genetic differentiation that are within the empirical distribution, with the exception of Nigeria-Cameroon chimpanzees, a subspecies for which there are nearly no observed pairwise values. Figure 10 also shows that we strongly over-estimate between subspecies differences. As expected, the F_ST values between subspecies were sensitive to splitting times. For T_CE = 600 kya, T_NW = 800 kya and T_CENW = 900 kya, the values of splitting times that follow most closely the empirical PSMC (see Figure 8), our model overestimated genetic differentiation in all pairs of subspecies. The F_ST values estimated from our model were three to five times higher than their respective empirical values for all pairs of subspecies. For instance, the median of the empirical F_ST^W-NC was 0.12 whereas for the simulated F_ST^W-NC it was 0.50. For the other pairs we had similar results (median empirical F_ST^W-C = 0.10 vs. simulated F_ST^W-C = 0.40, median empirical F_ST^W-E = 0.14 vs. simulated F_ST^W-E = 0.46, median empirical F_ST^NC-C = 0.08 vs. simulated F_ST^NC-C = 0.41, median empirical F_ST^NC-E = 0.08 vs. simulated F_ST^NC-E = 0.47, median empirical F_ST^C-E = 0.08 vs. simulated F_ST^C-E = 0.28).

Figure 10 Predicted genetic differentiation within and between subspecies under the general tree model. F_ST were computed on genomic data simulated under the model Figure 7 with T_CE = 600 kya, T_NW = 500 kya and T_CENW ∈ {700, 800, 900} kya. See Figures S19-S21 for other splitting time values. In blue are the empirical values: histograms were retrieved from Lester et al. [16] (who used microsatellite data) and the blue vertical lines were retrieved from Fischer et al. [22] (who did not have samples of Nigeria-Cameroon chimpanzees in their study).

As expected we found that having more recent splitting times reduced our F_ST estimates which were getting closer to empirical values, though they never reached them for the parameter values we tested (Figures S19-S21). Genetic differentiation between demes of the same subspecies was much lower than between demes from different subspecies, and were low and similar for three subspecies (Western, Central and Eastern) but much higher for the Nigeria-Cameroon subspecies.

As expected as well, our general model predicted lower F_ST values between Western and Nigeria-Cameroon chimpanzees and between Central and Eastern chimpanzees than between other pairs of subspecies. This what is empirically observed, but we note that this is the reason why the original authors proposed the topology that we used. Finally, Eastern chimpanzees are the most differentiated subspecies to both Western and Nigeria-Cameroon chimpanzees, and Nigeria-Cameroon chimpanzees are the most distant subspecies to Central and Eastern chimpanzees, in both observed and simulated estimates.

4. Discussion

Reconstructing the demographic history of species from genetic data is a complex endeavor and a major challenge because many factors have likely influenced the genetic patterns we observe today. These factors include population structure, changes in connectivity and in population size, selection, social structure (mating systems), among others [9,24,38,45]. Major progress in human population genetics and genomics, including paleogenomics, have revolutionized our understanding of present-day and past genetic variation [46,47]. Ideas and methods coming from human population genetics have influenced our understanding of the genetic diversity of other species [6,12,48,49]. The work presented here is an attempt at increasing our understanding of the ancient structure of chimpanzees by applying some of the concepts developed in Mazet et al. [28], Rodríguez et al. 2018 [30] and Arredondo et al. 2021 [31] on human genomic data.

4.1 Towards a demographic history of chimpanzees incorporating population structure

We managed to explain patterns of genetic diversity within each subspecies of common chimpanzees with simple models of population structure and variable migration rates only, and we obtained demographic scenarios for each subspecies with shared periods of connectivity change. The validation step we applied (see Supplementary Material) suggests that if real genomic data had been generated under the inferred demographic scenarios, our method would have been able to infer them. While we do take inferred scenarios with a grain of salt, this validation step suggests that complex population structure can be inferred from single genomes. This is particularly notable because we recovered the simulated scenarios for each subspecies independently. Though this does not confirm that chimpanzees evolved under the inferred scenarios, it suggests that SNIF can infer different complex scenarios from the chimpanzee PSMC curves. A similar validation process is often applied in ABC studies [42,50] but it is not that common in the literature, and we argue that it should be implemented more often to validate scenarios proposed on the basis of other inference methods.

Despite some variability in the inferences, we observed consistency across inferences within and between subspecies. We found that Western chimpanzees are characterized by a higher number of demes with a smaller size than the other three subspecies. This could be consistent with the fact that this subspecies lives in a drier habitat, mostly in savanna [51], leading to a forest habitat that is more fragmented. However, it is unclear if it was the case throughout the evolutionary history of Western chimpanzees, and thus drawing strong conclusions is difficult at this stage. More generally, interpreting the number of islands and their size is not trivial and relating these parameters to empirical observations is not straightforward, in the same way that effective population size (N_e) values inferred in previous studies assuming panmixia are not easily interpreted. We assumed for simplicity that the number of demes and their size were constant and allowing for the number of demes to vary would be more realistic but would also significantly increase the number of parameters in the models. At this stage, the inferred values of n and N should thus be interpreted with care.

Our models suggest that the four subspecies share a similar history of connectivity until approximately 500 kya (forward in time). We found a period of high connectivity between 3 and 1 Mya, followed by a decrease in connectivity until approximately 600-500 kya in all subspecies. The start of this period coincides with the Pliocene-Pleistocene transition boundary dated to around 2.6 Mya, whereas the drop in connectivity around 1 Mya falls within the Middle Pleistocene transition thought to have occurred between 1.2 Mya and 700 kya [52]. A second and more recent period of high connectivity, between 600-500 and 200-150 kya is also observed, although in Western and Nigeria-Cameroon chimpanzees only. The fact that only two chimpanzee subspecies were affected by this increase in migration during the most recent Middle Pleistocene period suggests that whichever environmental disturbances caused the genetic signal observed in the PSMC, these disturbances were localised mostly in Western Africa.

We must be careful in interpreting genomic data and environmental changes together, and more work would be needed to validate these results since they are based on a simple interpretation of the PSMC curves and on the assumption that PSMC curves infer the IICR with sufficient precision. An interesting perspective would be to compare our results with other demographic tools, for instance MSMC-IM such as used in Wang et al. 2020 [38] who inferred migration rates over time between human populations. Such endeavor would however be more complicated than it seems, given that under the model of MSMC-IM, the different chimpanzee individuals would have to be attributed to a specific population to infer the within-subspecies migration rates such as seen on our Figure 4. At this stage, we still lack a clear demographic model that would integrate the Pliocene-Pleistocene and Middle Pleistocene transitions and that can explain how environmental changes would have affected habitat connectivity (or changes in effective population size). Also, we must acknowledge that reconstructing paleo-environments and habitats is still a complex endeavour [53,54].

From a more technical point of view, we should also note that the inferred migration rates for Western chimpanzees, and to a lesser extant for Eastern chimpanzees, were lower than for the other two subspecies (Figure 4). This is surprising since all subspecies should provide similar values for the periods where there was one ancestral species to all four, and the PSMC curves overlap. This is likely due to the inferred total size of the different metapopulations. They are smaller for Western and Eastern chimpanzees (8,800 and 9,399, respectively) compared to Central and Nigeria-Cameroon chimpanzees (13,900 and 13,944, respectively). At constant metapopulation size n × N, a reduction in M leads to an increase of the IICR. If a metapopulation is smaller, then for the same M its IICR will be lower. Thus a smaller metapopulation needs a smaller M to reach the same IICR value. Again, caution is necessary, as the real chimpanzees likely experienced changes in deme size.

Similarly, we must note that the PSMC curves exhibit a large variance in the recent past with an apparent increase (forward in time) in the recent past that is interpreted in the connectivity graph as an increase in gene flow. We stress that this observed increase (forward in time) could also be due to a recent increase in the deme size, or to an uncertainty in the inference of the IICR. Indeed, the large variance in PSMC estimates in the recent past has been noted since the publication of the method of Li and Durbin [24]. Given that SNIF assumes models with constant size, it cannot typically fit this section of the IICR without making major changes in M [28]. At this stage, we thus considered that the recent increase in migration rate inferred for all chimpanzees should be interpreted very cautiously, if not ignored.

4.2 The general tree model and the limits of trees without gene flow

Using the demographic scenarios inferred for each subspecies of common chimpanzees, we successfully integrated the results of SNIF for each subspecies within a general tree model inspired by previous research on the four subspecies [2,6,12]. The difference with previous research was thus that we used n-island models instead of panmictic populations within each of the branches of the phylogenetic tree. We found that this model explained the PSMC curves of the four subspecies, and predicted well observed heterozygosity within demes and genetic differentiation between demes from the same subspecies. However, even using the shortest split times that would be consistent with the PSMC curves, the model led to an overestimation of genetic differentiation between subspecies. This suggests that the within-subspecies statistics, that depended on the SNIF inference, were generally better predicted than the between-subspecies statistics, that depended on the general tree model. This suggests that using a tree topology without gene flow may not be appropriate.

A recent study identified IBD patterns across the four subspecies considered as a single unit, suggesting that the four subspecies were part of a very large spatial metapopulation with gene flow between neighbouring populations, including populations currently attributed to different subspecies [16]. To account for genetic continuity between the four subspecies, we made an attempt at adding symmetrical and identical migration rates between subspecies in our general tree model, whose results are explained and shown in the Supplementary Section S2 (Figure S1, Figure S2 and Figure S3). Under such scenario, we found a set of parameters for which the simulated IICR fit the empirical curves for three of the four subspecies (Central, Eastern and Western chimpanzees). As expected, F_ST values were reduced and closer to empirical values. This makes us confident that it is possible to find a general tree model with structured subspecies and gene flow between them that would be compatible with the PSMC curves and other empirical genomic statistics. However, to account for the aforementioned IBD patterns, a way forward would be to add different between-subspecies migration rates depending on the pairs of subspecies involved to integrate subspecies spatial distribution. Finally, and as we discuss further below, another way forward would be to switch from n-islands to spatial models such as stepping-stones, which would allow to account for within-species IBD patterns and might therefore be even better at representing the evolutionary history of chimpanzees.

The idea that gene exchange may have taken place between subspecies has been present in the literature [2,6,8–12,55]. However, in most cases gene flow was seen as discrete events that could be dated, or that were limited to a pair of subspecies, which were in some cases not in geographical contact. Brand et al. [12] reviewed the literature on this question and found that at least 14 admixture events had been identified by eight different studies (Figure 1 of Brand et al. [12]) including one admixture event from a mysterious ghost species into the ancestors of bonobos [55]. Among these putative admixture events, some were identified by only one study whereas others were identified by two to five. It also appeared that some studies identified only one admixture event [7] whereas others identified as many as eight [11]. Brand et al. [12] themselves used an inferential method (Legofit, [56]) and a model that allowed for up to seven admixture events but only found support for two.

Altogether, this suggests that it has been difficult to find a consistent history of admixture or gene flow among previous genetic studies. We must stress that these studies are not always easy to compare, and some differences may arise from the fact that their sampling was different. For instance, several studies have no sample from one or two chimpanzee subspecies, whereas in other cases, the authors used samples from individuals with unknown geographic origin. However, one common feature of all these studies is that they consider tree models that ignore population structure below the subspecies level. They usually assume that the chimpanzee subspecies and the bonobos should be modelled as independent and panmictic lineages of an evolutionary tree where the only gene flow allowed is through these discrete admixture events.

4.3 The limits of the piecewise n-island model: towards spatial models

The approach used throughout the manuscript follows the theoretical and simulation-based work of several authors who found that structured stationary and non-stationary models can generate genetic signatures that will be interpreted in terms of population size change when population structure is ignored [28,32,33,39,40,57]. However, the n-island models we assumed with SNIF ignore spatial processes. As a consequence, IBD patterns observed by [15] and [16] cannot be reproduced, and suggest that the different subspecies were genetically connected in the recent past even if the chimpanzees’s habitat is currently highly fragmented and discontinuous.

A difficulty that could arise from the use of spatial models is that the parameter space may increase significantly making the inference process more difficult. However, if we wish to improve our understanding of the evolutionary history of great apes, including humans, we may have no choice but integrate spatial models [58,59]. As a simple test example and to illustrate the importance of spatial models, we developed a simple 1D stepping-stone model inspired by the demographic model proposed by de Manuel et al. [6] to study the demographic history of chimpanzees and bonobos. These authors assumed a tree model and allowed for the possibility of admixture events between subspecies and between bonobos and chimpanzees. They computed D-statistics and found evidence for admixture between bonobos and chimpanzees. Details of our 1D stepping-stone models and most of the results can be found in the Supplementary Material, but here we mainly wish to stress that we were able to reproduce the D-statistics with our spatial model without any introgression between bonobos and chimpanzees. By changing gene flow and deme size in the structured ancestral species, we found that we were recovering even higher D values than those observed today. This suggests that the bonobo admixture signals detected in chimpanzees might be the simple result of both ancient and recent population structure.

5. Conclusions

In this work, we showed that it is possible to propose a demographic model for common chimpanzees that accounts for population structure and gives a coherent interpretation of PSMC curves produced by previous studies. Although we stress that the general model we propose here should not be taken as face value, it manages to explain several patterns of genetic diversity within subspecies despite the limits of the n-island models. We noted the importance of using spatial models to account for the genetic differentiation between the subspecies and also showed that spatial models might also explain possibly spurious signatures of admixture with bonobos. This work is a first step towards more complex models, though we recognise the difficulty of such endeavour. There is an increasing recognition that ignoring population structure and spatial processes may lead to the inference of events that may never have happened during the evolutionary history of the species studied [33–35,41,60]. Our study shows an example of how the oversimplifying assumption of panmixia can be avoided, which has implications for many species where population structure has been detected and too often ignored in genomic inference approaches.

Supplementary Materials

The following supplementary materials are available on the website of this paper: HPGG2505020004SupplementaryMaterials.zip:

Supplementary Information S1: On the Validation Process.
Supplementary Information S2: Adding Between Sub-species Migration Rates in the General Tree Model (including Figure S1, Figure S2, Figure S3, Table S1, and Table S2).
Supplementary Information S3: Spatial Structure Confounds Ancient Admixture Estimates (including Figures S4 and Figure S5).
Table S3: Prior ranges of the t_i (times at which migrations rates are allowed to change, delimiting the components) given to SNIF for the final analysis.
Table S4: Distribution of inferred n, the number of demes of the n-island models.
Table S5: Distribution of inferred N, the deme size of the n-island models (in number of diploids).
Figure S6: Empirical PSMC curves of chimpanzees used as input data for the inferential process.
Figure S7: Inferred IICR curves (in red) and empirical PSMC (in blue) for the five Western common chimpanzees.
Figures S8: Inferred IICR curves (in red) and empirical PSMC (in blue) for the five Nigeria-Cameroon common chimpanzees.
Figure S9: Inferred IICR curves (in red) and empirical PSMC (in blue) for the four Central common chimpanzees.
Figure S10: Inferred IICR curves (in red) and empirical PSMC (in blue) for the three Eastern common chimpanzees.
Figure S11: Distribution of the inferred number and size of the demes for each chimpanzee individual.
Figure S12: Inferred connectivity graph (migration rates along successive time components) inferred by SNIF coloured by chimpanzee individuals.
Figures S13: Empirical PSMC (in blue), simulated PSMC computed on simulated sequences (10x100Mb) (in orange) and simulated IICR computed on simulated coalescent times (T2) (in red), given to SNIF as pseudo-observed data for the validation procedure.
Figures S14: Inferred connectivity graph for PSMC curves computed on simulated genomic data.
Figures S15: Empirical PSMC (solid line) and simulated IICR (dotted lines) for Western common chimpanzees under the general n-island model Figure 7 in the main text of the paper for different values of splitting times.
Figures S16: Empirical PSMC (solid line) and simulated IICR (dotted lines) for Nigeria-Cameroon common chimpanzees under the general n-island model Figure 7 for different values of splitting times.
Figures S17: Empirical PSMC (solid line) and simulated IICR (dotted lines) for Central common chimpanzees under the general n-island model Figure 7 in the main text of the paper for different values of splitting times.
Figures S18: Empirical PSMC (solid line) and simulated IICR (dotted lines) for Eastern common chimpanzees under the general n-island model Figure 7 in the main text of the paper for different values of splitting times.
Figures S19: Genetic distance (F_ST) between demes of the same subspecies or between demes from different subspecies computed on genomic data simulated under the model Figure 7 in the main text of the paper with T_CE = 500 kya, T_NW = 800 kya and T_CENW ∈ {700, 800, 900} kya.
Figures S20: Genetic distance (F_ST) between demes of the same subspecies or between demes from different subspecies computed on genomic data simulated under the model Figure 7 in the main text of the paper with T_CE = 500 kya, T_CENW = 900 kya and T_NW ∈ {500,600,800} kya.
Figures S21: Genetic distance (F_ST) within and between subspecies computed on genomic data simulated under the model Figure 7 in the main text of the paperwith T_CENW = 900 kya, T_NW = 800 kya and T_CE ∈ {400, 500, 600} kya.

Declarations

Ethics Statement

Not applicable.

Consent for Publication

Not applicable.

Availability of Data and Material

All code used for this paper is available at https://github.com/camillesteux/structurechimpanzees.

Funding

We acknowledge the financial support of a PhD studentship from the Ministère de l’Enseignement Supérieur et de la Recherche to CS. LC’s research was supported by the DevOCGen project, funded by the Occitanie Regional Council’s “Key challenges BiodivOc” program. This work was also supported by the LABEX entitled TULIP (ANR-10-558 LABX-41 and ANR-11-IDEX-0002-02) as well as the Investissement d’Avenir grant of the Agence Nationale de la Recherche (CEBA: ANR-10-LABX-25-01). We thank the IRP BEEG-B (International Research Project Bioinformatics, Ecology, Evolution, Genomics and Behaviour) for facilitating travel and collaboration between Toulouse (CRBE, IMT and INSA) and Lisbon (IGC and cE3c).

Competing Interests

Lounès Chikhi is a member of the Editorial Board of the journal Human Population Genetics and Genomics. The author was not involved in the journal’s review or decisions related to this manuscript. The authors have declared that no other competing interests exist.

Author Contributions

Conceptualization: LC; Methodology: LC, CS, CC, AA, WR, OM, RT; Investigation: LC, CS; Visualization: LC, CS, RT; Supervision: LC; Writing—original draft: LC, CS, RT ; Writing—review: CC, AA, WR, OM.

Acknowledgement

We thank all the members of the Population and Conservation Genetic group at the Instituto Gulbenkian de Ciência (IGC) for their support and for insightful discussions on the topic. We also thank the Bioinformatics Unit and the Informatics Team of the IGC for their help and support with computational resources, and the Genotoul bioinformatics platform. We also thank Mimi Arandjelovic, Jack Lester and Javier Prado-Martinez for sharing their data with us. Finally, we thank the two reviewers for their constructive comments and for encouraging us to add gene flow between subspecies in the general tree model, which gave results we find promising for future works.

Abbreviations

The following abbreviations are used in this manuscript:

ABC:: Approximate bayesian computation

IICR:: Inverse instantaneous coalescence rate

IBD:: Isolation-by-distance

POD:: Pseudo-observed data

kya:: Thousand years ago

Mya:: Millions years ago

bp:: base pair

C:: Central

E:: Eastern

NC:: Nigeria-Cameroon

W:: Western

Subspecies	Sample size	c	ω	n_min, n_max	N_min, N_max	M_i,_min, M_i,_max	t_i
Western	5	7	0.5	2, 60	10, 10000	0.01, 50	Specific time windows
Nigeria-Cameroon	5	7	0.5	2, 30	10, 10000	0.01, 50
Central	4	8	0.5	2, 60	10, 10000	0.01, 50
Eastern	3	7	0.5	2, 50	10, 10000	0.01, 50

References

1.	Chen FC, Li WH. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am J Hum Genet. 2001;68(2):444-456. [Google Scholar] [CrossRef]

2.	Prado-Martinez J, Sudmant PH, Kidd JM, Li H, Kelley JL, Lorente-Galdos B, et al. Great ape genetic diversity and population history. Nature. 2013;499(7459):471-475. [Google Scholar] [CrossRef]

3.	Langergraber KE, Prüfer K, Rowney C, Boesch C, Crockford C, Fawcett K, et al. Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution. Proc Natl Acad Sci U S A. 2012;109(39):15716-15721. [Google Scholar] [CrossRef]

4.	Kuhlwilm M, de Manuel M, Nater A, Greminger MP, Krützen M, Marques-Bonet T. Evolution and demography of the great apes. Curr Opin Genet Dev. 2016;41:124-129. [Google Scholar] [CrossRef]

5.	Lobon I, Tucci S, de Manuel M, Ghirotto S, Benazzo A, Prado-Martinez J, et al. Demographic history of the genus Pan inferred from whole mitochondrial genome reconstructions. Genome Biol Evol. 2016;8(6):2020-2030. [Google Scholar] [CrossRef]

6.	de Manuel M, Kuhlwilm M, Frandsen P, Sousa VC, Desai T, Prado-Martinez J, et al. Chimpanzee genomic diversity reveals ancient admixture with bonobos. Science. 2016;354(6311):477-481. [Google Scholar] [CrossRef]

7.	Caswell JL, Mallick S, Richter DJ, Neubauer J, Schirmer C, Gnerre S, et al. Analysis of chimpanzee history based on genome sequence alignments. PLoS Genet. 2008;4(4):e1000057. [Google Scholar] [CrossRef]

8.	Won YJ, Hey J. Divergence population genetics of chimpanzees. Mol Biol Evol. 2005;22(2):297-307. [Google Scholar] [CrossRef]

9.	Hey J. The divergence of chimpanzee species and subspecies as revealed in multipopulation isolation-with-migration analyses. Mol Biol Evol. 2010;27(4):921-933. [Google Scholar] [CrossRef]

10.	Becquet C, Przeworski M. A new approach to estimate parameters of speciation models with application to apes. Genome Res. 2007;17(10):1505-1519. [Google Scholar] [CrossRef]

11.	Wegmann D, Excoffier L. Bayesian inference of the demographic history of chimpanzees. Mol Biol Evol. 2010;27(6):1425-1435. [Google Scholar] [CrossRef]

12.	Brand CM, White FJ, Rogers AR, Webster TH. Estimating bonobo (Pan paniscus) and chimpanzee (Pan troglodytes) evolutionary history from nucleotide site patterns. Proc Natl Acad Sci U S A. 2022;119(17):1-12. [Google Scholar] [CrossRef]

13.	Barratt CD, Lester JD, Gratton P, Onstein RE, Kalan AK, McCarthy MS, et al. Quantitative estimates of glacial refugia for chimpanzees (Pan troglodytes) since the Last Interglacial (120,000 BP). Am J Primatol. 2021;83(10):e23320. [Google Scholar] [CrossRef]

14.	McBrearty S, Jablonski NG. First fossil chimpanzee. Nature. 2005;437(7055):105-108. [Google Scholar] [CrossRef]

15.

Fünfstück T, Arandjelovic M, Morgan DB, Sanz C, Reed P, Olson SH, et al. The sampling scheme matters: Pan troglodytes troglodytes and P. t. schweinfurthii are characterized by clinal genetic variation rather than a strong subspecies break. Am J Phys Anthropol. 2015;156(2):181-191. [Google Scholar] [CrossRef]

16.	Lester JD, Vigilant L, Gratton P, McCarthy MS, Barratt CD, Dieguez P, et al. Recent genetic connectivity and clinal variation in chimpanzees. Commun Biol. 2021;4(1):1-11. [Google Scholar] [CrossRef]

17.	Humle T, Maisels F, Oates JF, Plumptre A, Williamson EA. Pan troglodytes. The IUCN Red List of Threatened Species [Internet]; 2016 [cited 2023 Oct 10]. Available from: https://dx.doi.org/10.2305/IUCN.UK.2016-2.RLTS.T15933A17964454.en.

18.	Fruth BI, Hickey J, André C, Furuichi T, Hart J, Hart T, et al. Pan paniscus. The IUCN Red List of Threatened Species [Internet]; 2016 [cited 2024 May 31]. Available from: https://www.iucnredlist.org/species/15932/102331567.

19.	Fontsere C, Kuhlwilm M, Morcillo-Suarez C, Alvarez-Estape M, Lester JD, Gratton P, et al. Population dynamics and genetic connectivity in recent chimpanzee history. Cell Genom. 2022;2(6). [Google Scholar] [CrossRef]

20.	Mitchell MW, Locatelli S, Ghobrial L, Pokempner AA, Clee PRS, Abwe EE, et al. The population genetics of wild chimpanzees in Cameroon and Nigeria suggests a positive role for selection in the evolution of chimpanzee subspecies. BMC Evol Biol. 2015;15(1):1-15. [Google Scholar] [CrossRef]

21.	Becquet C, Patterson N, Stone AC, Przeworski M, Reich D. Genetic structure of chimpanzee populations. PLoS Genet. 2007;3(4):e66. [Google Scholar] [CrossRef]

22.	Fischer A, Pollack J, Thalmann O, Nickel B, Pääbo S. Demographic history and genetic differentiation in apes. Curr Biol. 2006;16(11):1133-1138. [Google Scholar] [CrossRef]

23.	Fischer A, Wiebe V, Pääbo S, Przeworski M. Evidence for a complex demographic history of chimpanzees. Mol Biol Evol. 2004;21(5):799-808. [Google Scholar] [CrossRef]

24.	Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475(7357):493-496. [Google Scholar] [CrossRef]

25.	Fan H, Wu Q, Wei F, Yang F, Ng BL, Hu Y. Chromosome-level genome assembly for giant panda provides novel insights into Carnivora chromosome evolution. Genome Biol. 2019;20(1):1-12. [Google Scholar] [CrossRef]

26.	Bursell MG, Dikow RB, Figueiró HV, Dudchenko O, Flanagan JP, Aiden EL, et al. Whole genome analysis of clouded leopard species reveals an ancient divergence and distinct demographic histories. Iscience. 2022;25(12):105647. [Google Scholar] [CrossRef]

27.	Liu S, Westbury MV, Dussex N, Mitchell KJ, Sinding MHS, Heintzman PD, et al. Ancient and modern genomes unravel the evolutionary history of the rhinoceros family. Cell. 2021;184(19):4874-4885. [Google Scholar] [CrossRef]

28.	Mazet O, Rodríguez W, Grusea S, Boitard S, Chikhi L. On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference? Heredity. 2016;116(4):362-371. [Google Scholar] [CrossRef]

29.	Chikhi L, Rodríguez W, Grusea S, Santos P, Boitard S, Mazet O. The IICR (inverse instantaneous coalescence rate) as a summary of genomic diversity: insights into demographic inference and model choice. Heredity. 2018;120(1):13-24. [Google Scholar] [CrossRef]

30.	Rodríguez W, Mazet O, Grusea S, Arredondo A, Corujo JM, Boitard S, et al. The IICR and the non-stationary structured coalescent: towards demographic inference with arbitrary changes in population structure. Heredity. 2018;121(6):663-678. [Google Scholar] [CrossRef]

31.	Arredondo A, Mourato B, Nguyen K, Boitard S, Rodríguez W, Mazet O, et al. Inferring number of populations and changes in connectivity under the n-island model. Heredity. 2021;126(6):896-912. [Google Scholar] [CrossRef]

32.	Wakeley J. Nonequilibrium migration in human history. Genetics. 1999;153(4):1863-1871. [Google Scholar] [CrossRef]

33.	Chikhi L, Sousa VC, Luisi P, Goossens B, Beaumont MA. The confounding effects of population structure, genetic diversity and the sampling scheme on the detection and quantification of population size changes. Genetics. 2010;186(3). [Google Scholar] [CrossRef]

34.	Eriksson A, Manica A. Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins. Proc Natl Acad Sci U S A. 2012;109(35):13956-13960. [Google Scholar] [CrossRef]

35.	Tournebize R, Chikhi L. Ignoring population structure in hominin evolutionary models can lead to the inference of spurious admixture events. Nat Ecol Evol. 2025;9(2):225-236. [Google Scholar] [CrossRef]

36.	Grusea S, Rodríguez W, Pinchon D, Chikhi L, Boitard S, Mazet O. Coalescence times for three genes provide sufficient information to distinguish population structure from population size changes. J Math Biol. 2019;78:189-224. [Google Scholar] [CrossRef]

37.	Wright S. Evolution in Mendelian populations. Genetics. 1931;16(2):97. [Google Scholar] [CrossRef]

38.	Wang K, Mathieson I, O’Connell J, Schiffels S. Tracking human population structure through time from whole genome sequences. PLoS Genet. 2020;16(3):e1008552. [Google Scholar] [CrossRef]

39.	Heller R, Chikhi L, Siegismund HR. The confounding effect of population structure on Bayesian skyline plot inferences of demographic history. PloS one. 2013;8(5):e62992. [Google Scholar] [CrossRef]

40.	Paz-Vinas I, Quéméré E, Chikhi L, Loot G, Blanchet S. The demographic history of populations experiencing asymmetric gene flow: combining simulated and empirical data. Mol Ecol. 2013;22(12):3279-3291. [Google Scholar] [CrossRef]

41.	Eriksson A, Manica A. The doubly conditioned frequency spectrum does not distinguish between ancient population structure and hybridization. Mol Biol Evol. 2014;31(6):1618-1621. [Google Scholar] [CrossRef]

42.	Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian computation in population genetics. Genetics. 2002;162(4):2025-2035. [Google Scholar] [CrossRef]

43.	Hudson RR. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics. 2002;18(2):337-338. [Google Scholar] [CrossRef]

44.	Schiffels S, Wang K. MSMC and MSMC2: the multiple sequentially markovian coalescent. In: Statistical Population Genomics. Methods in Molecular Biology. In: Dutheil JY, editor. New York, NY, USA: Humana; 2020. p.147-165. [CrossRef]

45.	Parreira BR, Chikhi L. On some genetic consequences of social structure, mating systems, dispersal, and sampling. Proc Natl Acad Sci U S A. 2015;112(26):E3318-E3326. [Google Scholar] [CrossRef]

46.	Speidel L, Forest M, Shi S, Myers SR. A method for genome-wide genealogy estimation for thousands of samples. Nat Genet. 2019;51(9):1321-1329. [Google Scholar] [CrossRef]

47.	Yang MA. A genetic history of migration, diversification, and admixture in Asia. Hum Popul Genet Genom. 2022;2(1). [Google Scholar] [CrossRef]

48.	Kozak KM, Joron M, McMillan WO, Jiggins CD. Rampant Genome-Wide Admixture across the Heliconius Radiation. Genome Biol Evol. 2021 05;13(7):evab099. [Google Scholar] [CrossRef]

49.	Teixeira H, Salmona J, Arredondo A, Mourato B, Manzi S, Rakotondravony R, et al. Impact of model assumptions on demographic inferences: the case study of two sympatric mouse lemurs in northwestern Madagascar. BMC Ecol Evol. 2021;21:1-18. [Google Scholar] [CrossRef]

50.	Beaumont MA. Approximate Bayesian computation in evolution and ecology. Annu Rev Ecol Evol Syst. 2010;41:379-406. [Google Scholar] [CrossRef]

51.	Humle T, Boesch C, Campbell G, Junker J, Koops K, et al. Pan troglodytes ssp. verus. The IUCN Red List of Threatened Species. 2016. Accessed on 30 May 2023. [Google Scholar] [CrossRef]

52.	Clark PU, Archer D, Pollard D, Blum JD, Rial JA, Brovkin V, et al. The middle Pleistocene transition: characteristics, mechanisms, and implications for long-term changes in atmospheric pCO2. Quat Sci Rev. 2006;25(23-24):3150-3184. [Google Scholar] [CrossRef]

53.	Berner N, Trauth MH, Holschneider M. Bayesian inference about Plio-Pleistocene climate transitions in Africa. Quat Sci Rev. 2022;277:107287. [Google Scholar] [CrossRef]

54.	Demenocal PB. Plio-pleistocene African climate. Science. 1995;270(5233):53-59. [Google Scholar] [CrossRef]

55.	Kuhlwilm M, Han S, Sousa VC, Excoffier L, Marques-Bonet T. Ancient admixture from an extinct ape lineage into bonobos. Nat Ecol Evol. 2019;3(6):957-965. [Google Scholar] [CrossRef]

56.	Rogers AR. Legofit: estimating population history from genetic data. BMC Bioinformatics. 2019;20:1-10. [Google Scholar] [CrossRef]

57.	Beaumont MA. Recent developments in genetic data analysis: what can they tell us about human demographic history? Heredity. 2004;92(5):365-379. [Google Scholar] [CrossRef]

58.	Barbujani G, Sokal RR, Oden NL. Indo-European origins: A computer-simulation test of five hypotheses. Am J Phys Anthropol. 1995;96(2):109-132. [Google Scholar] [CrossRef]

59.	Ray N, Currat M, Berthier P, Excoffier L. Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations. Genome Res. 2005;15(8):1161-1167. [Google Scholar] [CrossRef]

60.	Boitard S, Arredondo A, Chikhi L, Mazet O. Heterogeneity in effective size across the genome: effects on the inverse instantaneous coalescence rate (IICR) and implications for demographic inference under linked selection. Genetics. 2022;220(3):iyac008. [Google Scholar] [CrossRef]

Article Comments

Download PDF Article Comments

Share this article