Phylogenetic analysis of SARS-CoV-2 in DRC

Recent as of 2020-05-26, 127 SARS-CoV-2 genome sequences had been made available globally on GISAID providing insight into the diversity and dynamics of SARS-CoV-2 in the Democratic Republic of the Congo (DRC). The sampled cases from the DRC are the result of repeated introduction of the virus from a range of locations followed by local transmission.

Results & Discussion

Figure 1: Counts of Lineages present in DRC data

The lineage system pangolin uses to assign global lineages is hierarchical and details can be found at Rambaut et al 2020 [1]. There are seven distinct global lineages found in the DRC. Lineage A is the ancestral lineage, the diversity of which originated in China. Lineage B.1 is the dominant lineage circulating and is approximated to the large exponential increase of cases in Italy earlier this year. The B.1 lineage has been exported globally from Europe. Lineages B.1.1.1 and B.1.1 are also European lineages, nested within the genetic diversity of B.1. B.1.6 is a lineage that is represented by sequences that derive from Austria. Lineage B.2.1 is a large lineage with representation from the UK, Europe, Jordan, Australia, USA, India, Ghana. Lineage B.4 has been associated with the large Iranian outbreak and exports from Iran. More detailed information about each lineage can be found at cov-lineages.org.

To further place the DRC sequences within the context of the global SARC-CoV-2 diversity, Figure 2 shows the downsampled maximum likelihood global phylogenetic tree with the 127 DRC sequences highlighted. A number of groups of sequences are inset to highlight specific cases that share genetic similarity.

Figure 2: Maximum likelihood phylogeny with the 127 DRC sequences highlighted.

Group 1 sequences correspond to the sequences that have been assigned Lineage B.1.1.1, a European lineage. A minimum of two people movements would be required to explain this phylogenetic relationship, but more imports are possible (between 2-5 imports). The minimum of two people movements is due to the unique SNP shared by DRC/KN-0038/2020, DRC/236/2020 and Switzerland/110507/2020. The directionality of these movements isn’t known, it could be two imports into DRC or one import and one export.

Additionally, intermediate nodes are not known. It’s most important to bear in mind how sampling bias informs the tree. There is a very heterogeneous sampling of SARS-CoV-2 from across the globe and the nearest neighbour in this phylogeny may not be related in any way to the true transmission event. We have no ability to infer transmission based on sequence data given the rate of mutation of the virus. Many transmission events may occur and the virus could still not have transmitted a new mutation. Group 3 could be a single introduction or many introductions, it’s impossible to tell from the genome data alone.

A detailed figure of Group 2 containing all global sequences available is shown in Supplementary Figure 1. Supplementary Figure 2 shows the entire downsampled ML phylogeny containing all 127 DRC sequences with labels on all tips.

The results from the analysis estimating number of introductions into the DRC based on the downsampled phylogeny estimates a range of between 44 and 58 introductions (lower bound estimated using DELTRAN and upper bound estimated using ACCTRAN parsimony reconstruction algorithms of location across the downsampled tree).

Table 1: Manually curated DRC lineages based on minimum number of introductions required to explain diversity observed within the DRC sequence data.

DRC Lineage Num Taxa Lineage Date range
DRC_1 57 B.1 2020-03-17,2020-04-24
DRC_2 14 B.1 2020-03-19,2020-04-25
DRC_3 11 B.1.1.1 2020-03-21,2020-04-16
DRC_4 8 B.1 2020-03-15,2020-04-22
DRC_5 8 B.1 2020-03-19,2020-04-24
DRC_6 4 B.1 2020-03-17,2020-04-17
DRC_7 3 B.1 2020-03-25,2020-03-26
DRC_8 2 B.1 2020-04-15,2020-04-15
DRC_9 2 B.1 2020-03-14,2020-03-22
DRC_10 1 A 2020-03-22
DRC_11 1 A 2020-04-25
DRC_12 1 B.1 2020-03-18
DRC_13 1 B.1 2020-04-17
DRC_14 1 B.1 2020-03-09
DRC_15 1 B.1.1 2020-03-22
DRC_16 1 B.1.1.1 2020-03-28
DRC_17 1 B.1 2020-03-26
DRC_18 1 B.1.6 2020-03-21
DRC_19 1 B.1 2020-04-03
DRC_20 1 B.4 2020-03-20
DRC_21 1 B.2.1 2020-03-20
DRC_22 1 B.1 2020-04-18
DRC_23 1 B.1 2020-04-24
DRC_24 1 B.2 2020-04-05
DRC_25 1 B.1 2020-03-11
DRC_26 1 B.2.1 2020-03-17
DRC_27 1 B.1 2020-04-14

Methods

We downloaded all available GISAID data on 2020-05-26 and filtered out any sequences that were not full genomes and those with >5% N content. 127 sequences from the DRC were in the dataset at time of analysis. We masked out the untranslated regions at either end of the genome, leaving only coding sequences. We ran PANGOLIN on the DRC sequence data to estimate the global lineages present in the DRC. This data was subsampled to only contain representative sequences from each country over time, with emphasis on the lineages present in the DRC. All sequences from the DRC data were included. After subsampling, a total of 1,312 sequences were included in this analysis. These sequences were aligned using MAFFT [2] and IQTREE [2,3] was used to estimate a maximum-likelihood phylogeny with 10,000 ultrafast bootstraps. We used a custom script to collapse polytomies, available on request.

We labelled the tips of the tree with True/ False for DRC vs the rest of the world and reconstructed the ancestral states using ACCTRANS and DELTRANS parsimony algorithms to get upper bound estimates for the number of introduction events into the DRC. To get the minimum number of introductions that would explain the diversity seen within the DRC sequence data relative to the context of global sequences, the global tree of all GISAID sequences, including DRC sequences, was manually inspected. DRC sequences were assigned an introduction based on this minimal-conservative estimate.

Contributors

Prof. Jean-Jacques Muyembe-Tamfum1,2
Prof. Steve Ahuka-Mundeke1,2
Prof. Placide Mbala-Kingebeni1,2,3
Edith Nkwembe-Mgabana1,2
Eddy Kinganda-Lusamaki1,2
Adrienne Amuri-Aziza1
Francisca Muyembe-Mawete1,2
Emmanuel Lokilo-Lofiko1
Changa-Changa Jean Claude1
Akonga Okito Marceline1
Nsunda Makanzu Bibiche1
Dr. Michael Wiley4
Catherine Pratt4
Matthias Pauthner5
Kristian Andersen5
Josh Quick6
Nick Loman6
Áine O’Toole7
Andrew Rambaut7
Ian Goodfellow8

1Institut National de Recherche Biomédicale (INRB)
2School of Medicine Kinshasa University
3TransVIHMI, IRD, INSERM, University of Montpellier
4University of Nebraska Medical Center
5Scripps Research Center, La Jolla, CA, USA
6Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, UK
7Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, UK
8Division of Virology, Department of Pathology, University of Cambridge, Cambridge, UK

Acknowledgements
We acknowledge all those who have shared data on GISAID, a full table of acknowledgements can be found here.

References

  1. Rambaut, A., Holmes, E.C., O’Toole, Á. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol (2020). https://doi.org/10.1038/s41564-020-0770-5

  2. Katoh K, Misawa K, Kuma K-I, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30: 3059–3066.

  3. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol. 2020;37: 1530–1534.

Supplementary Materials

Full_resolution_Figure_2.pdf (538.1 KB)
Supplementary_Figure_1.pdf (50.8 KB)
Supplementary_Figure_2.pdf (314.2 KB)

Supplementary Table 1: Taxa included in each DRC lineage. A table of this information along with other metadata for the sequences is in DRC_metadata.csv

DRC Lineage Num Taxa Taxa
DRC_1 57 DRC/248/2020 DRC/82/2020 DRC/253/2020 DRC/521/2020 DRC/1326/2020 DRC/KN-0058/2020 DRC/1565/2020 DRC/1486/2020 DRC/2864/2020 DRC/3504/2020 DRC/1422/2020 DRC/2120/2020 DRC/2121/2020 DRC/2122/2020 DRC/2125/2020 DRC/2384/2020 DRC/2496/2020 DRC/1423/2020 DRC/3683/2020 DRC/1952/2020 DRC/1516/2020 DRC/1131/2020 DRC/1234/2020 DRC/3089/2020 DRC/3552/2020 DRC/3440/2020 DRC/3453/2020 DRC/2644/2020 DRC/1715/2020 DRC/3764/2020 DRC/2728/2020 DRC/2363/2020 DRC/3768/2020 DRC/1767/2020 DRC/2580/2020 DRC/1982/2020 DRC/2364/2020 DRC/2819/2020 DRC/2063/2020 DRC/2827/2020 DRC/2727/2020 DRC/2128/2020 DRC/2299/2020 DRC/2855/2020 DRC/2939/2020 DRC/3791/2020 DRC/3041/2020 DRC/2536/2020 DRC/3633/2020 DRC/3632/2020 DRC/3803/2020 DRC/3806/2020 DRC/3490/2020 DRC/3787/2020 DRC/3482/2020 DRC/3481/2020 DRC/3483/2020
DRC_2 14 DRC/191/2020 DRC/108/2020 DRC/3778/2020 DRC/2133/2020 DRC/1324/2020 DRC/1319/2020 DRC/3688/2020 DRC/3664/2020 DRC/3662/2020 DRC/3653/2020 DRC/3659/2020 DRC/3661/2020 DRC/3837/2020 DRC/3841/2020
DRC_3 11 DRC/2047/2020 DRC/1378/2020 DRC/1398/2020 DRC/1377/2020 DRC/1375/2020 DRC/1382/2020 DRC/1397/2020 DRC/1376/2020 DRC/215/2020 DRC/1779/2020 DRC/2813/2020
DRC_4 8 DRC/73/2020 DRC/3451/2020 DRC/KN-0070/2020 DRC/80/2020 DRC/KN-0072/2020 DRC/KN-0043/2020 DRC/KN-0054/2020 DRC/KN-0051/2020
DRC_5 8 DRC/402/2020 DRC/1249/2020 DRC/2169/2020 DRC/2529/2020 DRC/3827/2020 DRC/94/2020 DRC/2904/2020 DRC/3595/2020
DRC_6 4 DRC/KN-0060/2020 DRC/254/2020 DRC/243/2020 DRC/2942/2020
DRC_7 3 DRC/397/2020 DRC/396/2020 DRC/376/2020
DRC_8 2 DRC/2563/2020 DRC/2670/2020
DRC_9 2 DRC/KN-0038/2020 DRC/236/2020
DRC_10 1 DRC/300/2020
DRC_11 1 DRC/3834/2020
DRC_12 1 DRC/81/2020
DRC_13 1 DRC/2938/2020
DRC_14 1 DRC/KN-13/2020
DRC_15 1 DRC/241/2020
DRC_16 1 DRC/523/2020
DRC_17 1 DRC/431/2020
DRC_18 1 DRC/214/2020
DRC_19 1 DRC/998/2020
DRC_20 1 DRC/299/2020
DRC_21 1 DRC/158/2020
DRC_22 1 DRC/3070/2020
DRC_23 1 DRC/3829/2020
DRC_24 1 DRC/1151/2020
DRC_25 1 DRC/KN-0017/2020
DRC_26 1 DRC/KN-0059/2020
DRC_27 1 DRC/2369/2020
1 Like