Genomic epidemiology of Mpox virus in Sierra Leone

johnatsandi · May 28, 2025, 11:32am

This post has been updated to include new sequences, new analysis, and revised estimates.

Background

Sierra Leone confirmed its first Mpox case on January 10, 2025, with the Minister of Health declaring a Public Health Emergency shortly thereafter. Cases have surged in recent weeks, with over 2,800 new infections reported as of May 22, 2025, strongly indicating ongoing human-to-human transmission similar to patterns observed in the early stages of the ongoing outbreaks in Nigeria and the DRC (Parker et al. 2025; O’Toole et al. 2023; Vakaniaki et al. 2024). Sierra Leone accounted for up to half of all confirmed cases in Africa in early May, with weekly suspected and confirmed cases increasing by 71% and 61% relative to previous weeks.

The scarcity of full-length MPXV genomes from the region has left several critical questions unresolved, including whether a new lineage capable of sustained human transmission has emerged, and whether it is connected to ongoing epidemics in West or Central Africa. In this study, we analyze 77 genomes newly generated in Sierra Leone (44 generated at Central Public Health Reference Laboratory (CPHRL) and 33 generated at Kenema Government Hospital (KGH)), high-quality sequences collected between January 10 and May 7, 2025 from eight districts in Sierra Leone (Bo, Kenema, Kono, Port Loko, Bonthe, Kailahun, Western Area Urban, and Western Area Rural).

Results

Of the 77 MPXV genome sequences we generated, 76 fall within the hMPXV-1 (A) lineage of Clade IIb, which emerged in southern Nigeria in August 2014 and has since sustained the Nigerian epidemic through ongoing human-to-human transmission (Figure 1; Parker et al. 2025). According to the nomenclature proposed by Happi et al. (2022), hMPXV-1 is designated as Lineage A, with direct descendants designated as e.g. A.1, and subsequent subdivisions identified as e.g. A.1.1, similar to the Pango nomenclature for SARS-CoV-2. Within hMPXV-1, our sequences form a well-supported monophyletic group (posterior support -1 in Delphy analysis) descended from Lineage A.2.2 (Figure 1). In accordance with the nomenclature, the newly identified lineage from Sierra Leone is likely to be designated as G.1, the alias of A.2.2.1 (Figure 1).

The closest sequence to the novel G.1 lineage is PP_0015JY8.1, a sequence sampled from Chicago in the USA in mid-February 2025. G.1 is separated from its shared ancestor with PP_0015JY8.1 by 8 APOBEC-like and 2 non-APOBEC-like mutations along its stem branch, and an additional 9 APOBEC-like and 1 non-APOBEC-like mutation on PP_0015JY8.1’s terminal branch (Figure 1). PP_0015JY8.1 clusters with three sequences that were also sampled in the US from January 2024 - January 2025 (Figure 1). Based on the long branches separating the US sequences, it is more likely they represent independent viral imports from the Nigerian A.2.2 lineage than an established lineage that has been cryptically diversifying in the USA. The four US sequences were sampled from three different states: Illinois (2), California and Massachusetts, with confirmed travel history to Nigeria for at least two of the sequences. This suggests that G.1 is descendant from the ongoing epidemic in Nigeria, although we cannot infer a direct import to Sierra Leone as we cannot exclude the possibility of unsampled intermediates. The closest Nigerian sequence to G.1 is TRM326, sampled in Rivers state in September of 2022, which has a zero-length branch to its common ancestor with the USA-sampled A.2.2 diversity. There is also evidence of G.1 viral export from Sierra Leone, including two sequences sampled in the USA in March and April 2025, and two samples from Germany in March 2025 (sampled from a single patient).

To estimate the time to the most recent common ancestor (tMRCA) of the Sierra Leone G.1 lineage we employed BEAST (Suchard et al. 2018) with an exponential growth model applied to the G.1 lineage while allowing the rest of the tree (representing the source population) to evolve under an independent exponential growth model. This allows for the distinct epidemiological dynamics observed in Sierra Leone.

We estimated that the tMRCA of the G.1 lineage is 16 November 2024 [95% HPD: 4 October – 26 December 2024] (Figure 2). The tMRCA represents the bound on the time at which the sampled G.1 lineage was established in Sierra Leone. The estimates suggest that G.1 may have circulated for about 1-2 months before detection in January 2025. The estimated growth rate for the G.1 lineage corresponds to a doubling time of ~2.2 weeks, which is consistent with the observed sharp increase in incidence over a short time period. In contrast, the estimated growth rate for the Nigerian epidemic, represented by the remainder of lineage A that G.1 is descended from, has an estimated doubling time of ~2.00 years, indicative of much slower growth in a more established epidemic.

The closest outgroups to G.1 were sampled in the USA but likely represent independent viral export from Nigeria based on the long terminal branches of the USA outgroup genomes and confirmed travel histories for at least 2 of the 4 sequences. The tMRCA of the G.1 lineage and the closest of these outgroups is estimated to be 2 November 2023 [95% HPD: 24 June 2023 – 6 March 2024]. This represents a bound on the time of the viral introduction into Sierra Leone however, given the rapid exponential growth and lack of related samples from Nigeria it is likely the date of import was not long before the tMRCA of G.1.

We also provide real-time Bayesian phylogenetic analysis using Delphy (Varilly et al. 2025). We find a tMRCA for lineage G.1 of 14 November 2024 [95% HPD: 29 September – 25 December 2024], with a doubling time of ~2.3 weeks, based on a focused analysis of the G.1 clade. This approach is consistent with BEAST results, and similarly appropriately models the growth of G.1 using a single exponential curve, given its relative growth dynamics. These results are consistent across analyses and reflect the current version of Delphy. We note that an earlier version of this post reported a preliminary tMRCA of 5 July 2024 [95% HPD: 7 April – 8 September 2024], based on a whole-population analysis using a single exponential growth model. That approach did not fully account for the significantly faster growth rate of the G.1 lineage and thus produced an earlier estimate. This discrepancy has since been resolved through clade-specific modeling, which better reflects the underlying epidemiological dynamics.

To infer epidemiological parameters and investigate possible host-to-host links, we ran JUNIPER (Specht et al. 2025) on the 76 Sierra Leone G.1 genomes. JUNIPER explicitly reconstructs the underlying transmission tree behind an epidemic using a stochastic branching model, allowing it to explore the space of all possible networks among both sampled and unsampled cases. We estimated the reproductive number to equal 2.09 [95% HPD: 1.80–2.45], which, assuming a generation time of 11.4 days for Clade IIb (Marziano et al. 2024) is consistent with the above estimated doubling time. We also inferred that the 76 samples in our dataset represent 0.643% [95% HPD: 0.195%–1.38%] of all cases in the outbreak, equivalent to an estimated 11,800 [95% HPD: 5,500–40,000] total cases up to 7 May 2025 (the date of our latest sequence). We estimated that the outbreak originated on 15 November 2024 [95% HPD: 21 September – 16 December 2024], again consistent with the above results. JUNIPER did not infer any direct transmissions among the 76 sequenced cases with high confidence, suggesting that continued genomic surveillance is necessary to better understand spread patterns.

There is no evident geographic structuring in the phylogeny, i.e. sequences do not cluster in lineages according to the district of collection but this may change with more sequencing and phylogenetic resolution. Identical sequences were found in geographically distinct districts, including Bo, Kenema, and Bonthe, which was not unexpected given the short timeline of collection and the estimated evolutionary rate (O’Toole et al. 2023; Parker et al. 2025)

Our findings indicate that there was cryptic circulation and geographic spread prior to detection in these areas, underscoring the urgent need to strengthen surveillance systems and improve diagnostic and monitoring infrastructure. Enhanced case surveillance is essential to uncover the underlying transmission network and identify associated risk factors, including possible sexual networks, enabling the implementation of targeted interventions before the outbreak becomes even more widespread regionally and globally.

Figure 1: Clade IIb phylogeny with reconstructed SNPs mapped onto branches. We performed ancestral state reconstruction across our Clade IIb phylogeny to map SNPs to their relevant branches. We annotated APOBEC3 characteristic substitutions i.e. CT or GA in the correct dimer context along branches and calculated their relative proportion across internal branches. APOBEC3 substitutions along the branches are annotated in yellow and red, with the remainder in gray and black. Our new sequences are annotated in red as enlarged tips and as Lineage G.1. The tree was rooted to the new zoonotic outgroup identified in (Parker et al. 2025).

Figure 2: Time-resolved global phylogeny of Clade IIb. The new G.1 lineage in Sierra Leone is annotated in red.

The only non-Clade IIb sequence we identified was a Clade IIa sequence, collected in mid-January 2025, in the Western Area Urban, during the earliest phase of the outbreak. Although highly divergent, it clusters with some of the earliest Clade IIa sequences, including those from 1958, 1961, and 1968, originating from the DRC and historical export events (Figure 3). This sequence has 33 non-APOBEC and 2 APOBEC-like mutations along its terminal branch, and is an additional 6 mutations (4 non-APOBEC, 2 APOBEC) away from its closest sequence. The lack of an APOBEC-like mutational signature in the sequence suggests that this case represents a zoonotic spillover rather than being part of the ongoing human outbreak.

Figure 3: Clade IIa phylogeny with reconstructed SNPs mapped onto branches. We performed ancestral state reconstruction across our Clade IIa phylogeny to map SNPs to their relevant branches. We annotated APOBEC3 characteristic substitutions i.e. CT or GA in the correct dimer context along branches and calculated their relative proportion across internal branches. APOBEC3 substitutions along the branches are annotated in yellow and red, with the remainder in gray and black. Our new sequence is annotated in red as an enlarged tip. The tree was rooted to the new Clade IIb zoonotic outgroup identified in (Parker et al. 2025).

Methods

We generated 77 high-quality sequences collected between 10 January and 7 May 2025 from the districts of Bo (7), Bonthe (5), Kailahun (4), Kenema (15), Kono (1), Port Loko (1), Western Area Urban (42), and Western Area Rural (2). Samples were sequenced using the Twist Viral Surveillance Panel hybrid capture enrichment kit followed by Illumina sequencing. Sequences from KGH were assembled using the viral-ngs assemble_denovo_metagenomic pipeline (Park et al. 2025) with automated reference genome selection from a set of 16 MPXV reference genomes (Park and Mendes 2025). Sequences from CPHRL were assembled by reference-based assembly with an in-house pipeline. Briefly, we mapped reads against a Clade IIb reference genome (NC_063383, an early hMPXV-1 genome from Nigeria) with bwa-mem (Li 2013), and called consensus using samtools (Li et al. 2009) and iVar (Grubaugh et al. 2019). The mean coverage across all sequences ranged from 25x to 8100x, with 75 of the 77 genomes exceeding 96% completeness.

We combined our 76 genomes with all high-quality, publicly available Clade IIb MPXV genomes from Pathoplexus (Dalla Vecchia 2024). As the multi-country outbreak lineage B.1 was not our research focus, we included only a single representative. Additionally, we included the closest zoonotic outgroup to Clade IIb as an outgroup to root the tree (PP852949.1). In total, the dataset comprises 276 sequences. We aligned our dataset to the Clade IIb reference genome (NC_063383) using the ‘squirrel’ package https://github.com/aineniamh/squirrel developed by O’Toole et al. (O’Toole et al. 2023). The alignment was trimmed, and the 3′ terminal repeat region, along with regions of repetition or low complexity and clustered mutations, were masked using the squirrel package.

We investigated the initial phylogenetic placement of our sequences within the global mpox genome phylogeny constructed from all available GenBank sequences across clades. The full MPXV phylogeny was reconstructed using IQ-TREE v2.0 under the Jukes-Cantor substitution model (Minh et al. 2020). We also generated a separate phylogeny for Clade IIb using the same parameters as the global tree. The tree was rooted with PP852949.1, the closest zoonotic outgroup, which was subsequently removed. Branches with zero length were collapsed. Ancestral state reconstruction was performed on the Clade IIb phylogeny using IQ-TREE2 (Minh et al. 2020), and all nucleotide mutations were mapped to their related internal branches of the phylogeny (O’Toole et al. 2023). Lineage assignments for our sequences were made using the Nextclade tool, following the nomenclature established by Happi et al. (Happi et al. 2022).

To estimate the time of G.1’s emergence in Sierra Leone, we adopted the partitioned model developed by O’Toole et al. (O’Toole et al. 2023) to model APOBEC-mediated evolution in the software package BEAST (Suchard et al. 2018) with the BEAGLE high-performance computing library (Ayres et al. 2019). We used a nested exponential coalescent model: the tree from the most recent common ancestor (MRCA) of the G.1 lineage onward was modeled with an exponential growth model, with the earlier phase of Lineage A modeled as a separate exponential growth coalescent model. We ran two independent chains of 70 million states to ensure convergence, discarding the initial 10% of each chain as burn-in. The chains were then combined with LogCombiner. For all subsequent analyses, we assessed convergence using Tracer, and constructed a maximum clade credibility (MCC) tree in TreeAnnotator 1.10 (Rambaut et al. 2018).

For the Delphy analysis (Varilly et al. 2025), we reconstructed a time-resolved tree of the 80 G.1 sequences under an HKY model with fixed overall mutation rate of 7.45 mutations / year, as estimated from the dataset in Parker et al. 2025. The G.1 sequences alone are not sufficiently informative to infer a mutation rate or to use the APOBEC-aware substitution model adapted from O’Toole et al. (O’Toole et al. 2023) that is built into Delphy, though we expect this to change as more samples are sequenced. Chains were run for 500 million steps, sampling every 100,000 steps and discarding the initial 10% of the chain. Convergence was assessed using Tracer, with an estimated effective sample size above 500 for all traced observables.

For JUNIPER analysis, we assumed a prior on the generation interval of 11.4 days per Marziano et al. (2024) and a prior on the sojourn interval (time between infection date and sample collection date) of 15 days. We filtered the 76 Sierra Leone G.1 sequences to APOBEC3 target sites and set a fixed evolution rate of 7e-7 APOBEC-like mutations per site per day, calculated as 6 APOBEC-like mutations per genome per year (O’Toole et al. 2023) / 23,580 APOBEC3 target sites (TC or GA dimer) per genome / 365 days per year. We chose not to estimate the mutation rate due to the relatively low variance in sample collection dates, leading to identifiability concerns. All other parameters were set to their default values. Within-host variants were not used in the analysis.

Data Availability

A total of 77 assembled monkeypox virus (MPXV) genomes have been deposited in Pathoplexus (Mpox Virus - Browse | Pathoplexus). These include 33 genomes available under SeqSet PP_SS_170.1 (SeqSets | Pathoplexus) and 44 genomes under SeqSet PP_SS_171.1 (SeqSets | Pathoplexus). Both datasets will be mirrored to the International Nucleotide Sequence Database Collaboration (INSDC), and associated Illumina short-read sequencing data are being prepared for submission. Although these data are made publicly available, please see Pathoplexus’ terms of use at Open Data Terms of Use | Pathoplexus.

Citations

Ayres, Daniel L., Michael P. Cummings, Guy Baele, Aaron E. Darling, Paul O. Lewis, David L. Swofford, John P. Huelsenbeck, Philippe Lemey, Andrew Rambaut, and Marc A. Suchard. 2019. “BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics.” Systematic Biology 68 (6): 1052–61.

Dalla Vecchia, Elena. 2024. “Pathoplexus: Towards Fair and Transparent Sequence Sharing.” The Lancet. Microbe 5 (12): 100995.

Grubaugh, Nathan D., Karthik Gangavarapu, Joshua Quick, Nathaniel L. Matteson, Jaqueline Goes De Jesus, Bradley J. Main, Amanda L. Tan, et al. 2019. “An Amplicon-Based Sequencing Framework for Accurately Measuring Intrahost Virus Diversity Using PrimalSeq and iVar.” Genome Biology 20 (1): 8.

Happi, Christian, Ifedayo Adetifa, Placide Mbala, Richard Njouom, Emmanuel Nakoune, Anise Happi, Nnaemeka Ndodo, et al. 2022. “Urgent Need for a Non-Discriminatory and Non-Stigmatizing Nomenclature for Monkeypox Virus.” PLoS Biology 20 (8): e3001769.

Li, Heng. 2013. “Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA-MEM.” arXiv [q-bio.GN]. http://arxiv.org/abs/1303.3997

Li, Heng, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin, and 1000 Genome Project Data Processing Subgroup. 2009. “The Sequence Alignment/Map Format and SAMtools.” Bioinformatics 25 (16): 2078–79.

Marziano, Valentina, Giorgio Guzzetta, Ira Longini, & Stefano Merler. “Epidemiologic Quantities for Monkeypox Virus Clade I from Historical Data with Implications for Current Outbreaks, Democratic Republic of the Congo.” Emerg Infect Dis. 2024 Oct;30(10):2042-2046. doi: 10.3201/eid3010.240665. Epub 2024 Sep 10. PMID: 39255234; PMCID: PMC11431919.

O’Toole, Áine, Richard A. Neher, Nnaemeka Ndodo, Vitor Borges, Ben Gannon, João Paulo Gomes, Natalie Groves, et al. 2023. “APOBEC3 Deaminase Editing in Mpox Virus as Evidence for Sustained Human Transmission since at Least 2016.” Science 382 (6670): 595–600.

Park, Daniel, and Inês Mendes. 2025. Broadinstitute/viral-References: 1.0.0. Zenodo. https://doi.org/10.5281/ZENODO.15496867

Park, Daniel, Chris Tomkins-Tinch, Simon Ye, Irwin Jungreis, Flavia, Ilya Shlyakhter, Hayden Metsky, et al. 2025. Broadinstitute/viral-Pipelines: v2.4.1.1. Zenodo. https://doi.org/10.5281/ZENODO.15428507

Parker, Edyth, Ifeanyi F. Omah, Delia Doreen Djuicy, Andrew Magee, Christopher H. Tomkins-Tinch, James Richard Otieno, Patrick Varilly, et al. 2025. “Genomics Reveals Zoonotic and Sustained Human Mpox Spread in West Africa.” Nature, May. https://doi.org/10.1038/s41586-025-09128-2

Rambaut, Andrew, Alexei J. Drummond, Dong Xie, Guy Baele, and Marc A. Suchard. 2018. “Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7.” Systematic Biology 67 (5): 901–4.

Specht, Ivan, Gage K. Moreno, Taylor Brock-Fisher, Lydia A. Krasilnikova, Brittany A. Petros, Jonathan E. Pekar, et al. “JUNIPER: Reconstructing Transmission Events from Next-Generation Sequencing Data at Scale.” medRxiv. https://doi.org/10.1101/2025.03.02.25323192.

Suchard, Marc A., Philippe Lemey, Guy Baele, Daniel L. Ayres, Alexei J. Drummond, and Andrew Rambaut. 2018. “Bayesian Phylogenetic and Phylodynamic Data Integration Using BEAST 1.10.” Virus Evolution 4 (1): vey016.

Vakaniaki, Emmanuel Hasivirwe, Cris Kacita, Eddy Kinganda-Lusamaki, Áine O’Toole, Tony Wawina-Bokalanga, Daniel Mukadi-Bamuleka, Adrienne Amuri-Aziza, et al. 2024. “Sustained Human Outbreak of a New MPXV Clade I Lineage in Eastern Democratic Republic of the Congo.” Nature Medicine, June. https://doi.org/10.1038/s41591-024-03130-3

Varilly, Patrick, Mark Schifferli, Katherine Yang, Tim Burcham, Paul Cronan, Olivia Glennon, Olivia Jacks, et al. 2025. “Delphy: Scalable, near-Real-Time Bayesian Phylogenetics for Outbreaks.” bioRxiv.
https://doi.org/10.1101/2025.03.25.645253.

Partners and Collaborators

National Public Health Agency (NPHA), Sierra Leone

Sierra Leone Ministry of Health

Central Public Health Reference Laboratory, Sierra Leone

Kenema Government Hospital, Sierra Leone

Institute of Genomics and Global Health, Redeemer’s University, Ede, Osun State, Nigeria

The Broad Institute of MIT and Harvard, Cambridge, MA, USA

The Scripps Research Institute, La Jolla, CA, USA

Africa Centres for Disease Control and Prevention (Africa CDC) - Africa PGI

School of Community Health Sciences, Njala University, Sierra Leone

Institute for Ecology and Evolution, University of Edinburgh, Edinburgh, UK

Faculty of Medical Laboratory Sciences and Diagnostics, College of Medicine and Allied Health Sciences, Sierra Leone

Funding

Africa CDC through the Africa PGI supported the establishment of in-country sequencing capacity in Sierra Leone through the provision of sequencing equipment, training, testing and sequencing reagents that enabled in-country sequencing.

This work is made possible by support from Flu Lab and a cohort of generous donors through TED’s Audacious Project, including the ELMA Foundation, MacKenzie Scott, the Skoll Foundation, and Open Philanthropy, The Rockefeller Foundation: [Grant Number #2021 HTH 017]; and The World Bank grants projects ACE-019, ACE-IMPACT and HEPR TF0B8412.