Recent as of 2020-05-26, 127 SARS-CoV-2 genome sequences had been made available globally on GISAID providing insight into the diversity and dynamics of SARS-CoV-2 in the Democratic Republic of the Congo (DRC). The sampled cases from the DRC are the result of repeated introduction of the virus from a range of locations followed by local transmission.
Results & Discussion
Figure 1: Counts of Lineages present in DRC data
The lineage system pangolin uses to assign global lineages is hierarchical and details can be found at Rambaut et al 2020 . There are seven distinct global lineages found in the DRC. Lineage A is the ancestral lineage, the diversity of which originated in China. Lineage B.1 is the dominant lineage circulating and is approximated to the large exponential increase of cases in Italy earlier this year. The B.1 lineage has been exported globally from Europe. Lineages B.1.1.1 and B.1.1 are also European lineages, nested within the genetic diversity of B.1. B.1.6 is a lineage that is represented by sequences that derive from Austria. Lineage B.2.1 is a large lineage with representation from the UK, Europe, Jordan, Australia, USA, India, Ghana. Lineage B.4 has been associated with the large Iranian outbreak and exports from Iran. More detailed information about each lineage can be found at cov-lineages.org.
To further place the DRC sequences within the context of the global SARC-CoV-2 diversity, Figure 2 shows the downsampled maximum likelihood global phylogenetic tree with the 127 DRC sequences highlighted. A number of groups of sequences are inset to highlight specific cases that share genetic similarity.
Figure 2: Maximum likelihood phylogeny with the 127 DRC sequences highlighted.
Group 1 sequences correspond to the sequences that have been assigned Lineage B.1.1.1, a European lineage. A minimum of two people movements would be required to explain this phylogenetic relationship, but more imports are possible (between 2-5 imports). The minimum of two people movements is due to the unique SNP shared by DRC/KN-0038/2020, DRC/236/2020 and Switzerland/110507/2020. The directionality of these movements isn’t known, it could be two imports into DRC or one import and one export.
Additionally, intermediate nodes are not known. It’s most important to bear in mind how sampling bias informs the tree. There is a very heterogeneous sampling of SARS-CoV-2 from across the globe and the nearest neighbour in this phylogeny may not be related in any way to the true transmission event. We have no ability to infer transmission based on sequence data given the rate of mutation of the virus. Many transmission events may occur and the virus could still not have transmitted a new mutation. Group 3 could be a single introduction or many introductions, it’s impossible to tell from the genome data alone.
A detailed figure of Group 2 containing all global sequences available is shown in Supplementary Figure 1. Supplementary Figure 2 shows the entire downsampled ML phylogeny containing all 127 DRC sequences with labels on all tips.
The results from the analysis estimating number of introductions into the DRC based on the downsampled phylogeny estimates a range of between 44 and 58 introductions (lower bound estimated using DELTRAN and upper bound estimated using ACCTRAN parsimony reconstruction algorithms of location across the downsampled tree).
Table 1: Manually curated DRC lineages based on minimum number of introductions required to explain diversity observed within the DRC sequence data.
|DRC Lineage||Num Taxa||Lineage||Date range|
We downloaded all available GISAID data on 2020-05-26 and filtered out any sequences that were not full genomes and those with >5% N content. 127 sequences from the DRC were in the dataset at time of analysis. We masked out the untranslated regions at either end of the genome, leaving only coding sequences. We ran PANGOLIN on the DRC sequence data to estimate the global lineages present in the DRC. This data was subsampled to only contain representative sequences from each country over time, with emphasis on the lineages present in the DRC. All sequences from the DRC data were included. After subsampling, a total of 1,312 sequences were included in this analysis. These sequences were aligned using MAFFT  and IQTREE [2,3] was used to estimate a maximum-likelihood phylogeny with 10,000 ultrafast bootstraps. We used a custom script to collapse polytomies, available on request.
We labelled the tips of the tree with True/ False for DRC vs the rest of the world and reconstructed the ancestral states using ACCTRANS and DELTRANS parsimony algorithms to get upper bound estimates for the number of introduction events into the DRC. To get the minimum number of introductions that would explain the diversity seen within the DRC sequence data relative to the context of global sequences, the global tree of all GISAID sequences, including DRC sequences, was manually inspected. DRC sequences were assigned an introduction based on this minimal-conservative estimate.
Prof. Jean-Jacques Muyembe-Tamfum1,2
Prof. Steve Ahuka-Mundeke1,2
Prof. Placide Mbala-Kingebeni1,2,3
Changa-Changa Jean Claude1
Akonga Okito Marceline1
Nsunda Makanzu Bibiche1
Dr. Michael Wiley4
1Institut National de Recherche Biomédicale (INRB)
2School of Medicine Kinshasa University
3TransVIHMI, IRD, INSERM, University of Montpellier
4University of Nebraska Medical Center
5Scripps Research Center, La Jolla, CA, USA
6Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, UK
7Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, UK
8Division of Virology, Department of Pathology, University of Cambridge, Cambridge, UK
Rambaut, A., Holmes, E.C., O’Toole, Á. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol (2020). https://doi.org/10.1038/s41564-020-0770-5
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol. 2020;37: 1530–1534.
Supplementary Table 1: Taxa included in each DRC lineage. A table of this information along with other metadata for the sequences is in DRC_metadata.csv
|DRC Lineage||Num Taxa||Taxa|
|DRC_1||57||DRC/248/2020 DRC/82/2020 DRC/253/2020 DRC/521/2020 DRC/1326/2020 DRC/KN-0058/2020 DRC/1565/2020 DRC/1486/2020 DRC/2864/2020 DRC/3504/2020 DRC/1422/2020 DRC/2120/2020 DRC/2121/2020 DRC/2122/2020 DRC/2125/2020 DRC/2384/2020 DRC/2496/2020 DRC/1423/2020 DRC/3683/2020 DRC/1952/2020 DRC/1516/2020 DRC/1131/2020 DRC/1234/2020 DRC/3089/2020 DRC/3552/2020 DRC/3440/2020 DRC/3453/2020 DRC/2644/2020 DRC/1715/2020 DRC/3764/2020 DRC/2728/2020 DRC/2363/2020 DRC/3768/2020 DRC/1767/2020 DRC/2580/2020 DRC/1982/2020 DRC/2364/2020 DRC/2819/2020 DRC/2063/2020 DRC/2827/2020 DRC/2727/2020 DRC/2128/2020 DRC/2299/2020 DRC/2855/2020 DRC/2939/2020 DRC/3791/2020 DRC/3041/2020 DRC/2536/2020 DRC/3633/2020 DRC/3632/2020 DRC/3803/2020 DRC/3806/2020 DRC/3490/2020 DRC/3787/2020 DRC/3482/2020 DRC/3481/2020 DRC/3483/2020|
|DRC_2||14||DRC/191/2020 DRC/108/2020 DRC/3778/2020 DRC/2133/2020 DRC/1324/2020 DRC/1319/2020 DRC/3688/2020 DRC/3664/2020 DRC/3662/2020 DRC/3653/2020 DRC/3659/2020 DRC/3661/2020 DRC/3837/2020 DRC/3841/2020|
|DRC_3||11||DRC/2047/2020 DRC/1378/2020 DRC/1398/2020 DRC/1377/2020 DRC/1375/2020 DRC/1382/2020 DRC/1397/2020 DRC/1376/2020 DRC/215/2020 DRC/1779/2020 DRC/2813/2020|
|DRC_4||8||DRC/73/2020 DRC/3451/2020 DRC/KN-0070/2020 DRC/80/2020 DRC/KN-0072/2020 DRC/KN-0043/2020 DRC/KN-0054/2020 DRC/KN-0051/2020|
|DRC_5||8||DRC/402/2020 DRC/1249/2020 DRC/2169/2020 DRC/2529/2020 DRC/3827/2020 DRC/94/2020 DRC/2904/2020 DRC/3595/2020|
|DRC_6||4||DRC/KN-0060/2020 DRC/254/2020 DRC/243/2020 DRC/2942/2020|
|DRC_7||3||DRC/397/2020 DRC/396/2020 DRC/376/2020|