Introduction and local transmission of SARS-CoV-2 cases in Kenya

KEMRI-CGMRC Kilifi, KEMRI-CVR Nairobi, The National Public Health Laboratory-National Influenza Centre (NPHL-NIC)


Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a single-stranded positive-sense RNA virus and the causative agent of COVID-19 disease. From its origin in Wuhan, China 2019, SARS-CoV-2 spread rapidly to nearly all countries in the world resulting in over 5 million cases globally as at May 2020. In this report we provide 122 SARS-CoV-2 genomes from COVID-19 positive samples from Kenya collected from individuals residing in Nairobi (n=20) and Coastal Kenya (n=102). The genomes are from symptomatic and asymptomatic cases whose samples were collected by the respective county rapid response teams. Nucleic acid material was extracted from nasopharyngeal swabs, PCR amplified using a tiled amplicon strategy followed by preparation of a sequencing library and sequenced using a portable MinION device. Consensus genomes were constructed by aligning the approximately 400-700 base-pair long reads to the reference genome (GenBank Accession: MN908947.3). A phylogenetic tree was constructed including random subsets of sequences collected from countries across the world (n=78) and the reference genome sequence. Our short report provides evidence for multiple introductions (n>10) of SARS-CoV-2 virus into Kenya and subsequent local transmission. There was evidence of transmission between Nairobi and the Coast prior to the introduction of restrictions on movement into and out of Nairobi Metropolis, and the Coastal counties.


Near real-time whole genome sequencing can provide valuable information on the status of the epidemic, for example, whether transmission is the result of introductions (e.g. at border crossings or at ports of entry) or from local of onward spread (clusters within or between counties), offering insights as to the effectiveness of contact tracing. Sequence data can provide additional estimates of the rate of spread of the virus (phylodynamics); useful where case surveillance and tracing is sparse. Furthermore, whole genome sequence data allows us to adapt testing reagents for new mutations in the virus to reduce false negative rates.

The first SARS-CoV-2 case in Kenya was reported on 13th March 2020. To date, 25th May 2020, more than 1000 SARS-CoV-2 cases have been detected and confirmed. Mombasa and Nairobi counties have recorded the largest numbers of confirmed infections. Here we report genome sequences from cases that were detected and confirmed between 13th March and April 30th, 2020. The majority of sequences we report are from individuals residing at the coastal city of Mombasa and comprise some of the earliest detected infections nationally. We show that SARS-CoV-2 viruses circulating in Kenya do not differ from viruses circulating elsewhere in the world and provide evidence for ongoing local transmissions in Mombasa county. This is the first report of SARS-CoV-2 sequencing in Kenya.



In brief, RNA was extracted from confirmed SARS-CoV-2 positive samples using standard molecular methods followed by a tiled Polymerase Chain Reaction (PCR) amplification procedure to enrich for SARS-CoV-2 specific viral material using V1 or V3 ARTIC primers (

Genome assembly

Whole genome sequencing was conducted using the Oxford Nanopore Technology (ONT) MinION platform and as described in the ARTIC protocol. The resulting FAST5 sequence files were base called and demultiplexed using Guppy. FASTQ reads associated with each sample were concatenated. Consensus SARS-CoV-2 sequences were assembled for each sample by aligning the respective sample reads to a reference genome (GenBank Accession: MN908947.3) and removal of sequencing primers followed by a polishing using the raw Fast5 signal files. Positions with insufficient genome coverage were masked with N.

Phylogenetics and lineage assignment

The global collection of sequences was obtained from GISAID. 5 sequences from countries representing Europe, Asia, America and Africa were randomly sampled without replacement using the sample function provided by the R statistical environment. Those incomplete or with obvious annotation errors were removed from the analysis. The final selection of 78 global sequences were combined with 122 sequences from Kenya and aligned using MAFTT v7.310. A maximum likelihood phylogenetic tree was created using RAXML-NGS v0.9.0) using a GTR+F0+G4m model and run with 1000 bootstraps. The Pangolin v 1.1.14 was used to assign putative dynamic lineages as defined by Rambaut et all 2020 (


Our sequencing efforts yielded 122 genomes with >80% completeness. We have utilised 122 genome sequences that were completely assembled for an analysis of SARS-CoV-2 cases circulating in Nairobi in March 2020 and from a large number of cases circulating at the coast. Our basic phylogenetic analysis provides evidence for multiple introductions of SARS-CoV-2 virus into the country. We observe that SARS-C oV-2 virus strains circulating in Kenya are like SARS-CoV-2 strains observed elsewhere in the world (Figure 1) and comprise of 10 global lineages. Majority of the cases at the coast are dominated by a lineage B.1 (Figure 1 and 2). We find evidence for multiple introduction of these European-centric lineages into the country. A number of sequences (NIC_228, NIC_168, S304, NIC_448) collected in March fall within the group of global sequences from Africa, Asia and Europe. We observe several sequence clusters of highly related cases (n~8) and belonging to a single dynamic lineage from sequences in coastal Kenya, supporting evidence of ongoing local transmission. We observe sequences from the coast (P269, P293) collected in April that are highly related with early sequences collected in March from Nairobi (NIC_089, NIC_087), suggesting potential transmission between Nairobi and the Coast. The large number of highly related Kenyan virus sequences provide evidence for local transmissions.


Our genomic data analysis suggests infections detected and confirmed in March were largely from virus importation into the country. There was one definite cluster of onward transmissions, and a possibility that, prior to restrictions of movement (brought into effect on 25th of March 2020), one of the infections on the Coast was related to an infection in Nairobi. Sequencing of additional SARS-CoV-2 genomes in Kenya will provide a more detailed picture of local transmission patterns. Sequencing capacity should be established in other laboratories in the country to provide a national dataset of SARS-CoV-2 genomes.


This work was conducted as a collaboration between the KEMRI -CGMRC in Kilifi, the KEMRI-CVR Nairobi, the National Public Health Laboratory-National Influenza Centre (NPHL-NIC) and Mombasa and Kilifi County COVID-19 Rapid Response Teams.


We are grateful to all those who have deposited and shared genome data on GISAID Also, our thanks for technical and material support from the ARTIC Network ( and Oxford Nanopore Technology. The work was funded by the National Institute for Health Research (NIHR) (project reference 17/63/82) using UK aid from the UK Government to support global health research, Tackling Infections to Benefit Africa (TIBA) (grant # 16/136/33), and the UK Department for International Development (DfID) and Wellcome (grant # 102975; 220985). The views expressed in this publication are those of the authors and not necessarily those of any of the funders. This report is published with permission from Director General, KEMRI

SARS_CoV_2_global_context_31_05_2020_v1.pdf (84.4 KB)

Figure 1: A maximum likelihood phylogenetic tree showing the evolutionary relationship between whole genome sequences (>80% complete) of SARS-CoV-2 cases collected from Nairobi and the coastal regions of Kenya between 12th March and 30th April 2020. The sequences were aligned together with a global dataset of 78 sequences that were randomly sampled from countries across Africa, Asia, Europe and America. The sequences were aligned using MAFFT v7.310 and tree construction using RAXML-NGS v0.9.0. Sequences from Kenya are represented by filled circles (Nairobi = purple, Coast = red) and the global sequences by filled grey triangles. Sequences labels denote the isolation and diagnosis laboratory (P for KEMRI-CGMR-C, NIC_XX for NPHL/National Influenza Centre, and S for KEMRI-CVR A total of 11 genomes (Table 1) were from repeated sampling of the same individuals. Where applicable lineages are shown in brackets.

SARS_CoV_2_circulating_lineages_01_06_2020.pdf (21.1 KB)

Figure 2: A bar plot showing 10 SARS-CoV-2 global lineages circulating in Kenya between March 12th and 30th April 2020 by proportion. The lineages were assigned to each of the Kenyan whole genome sequences using the Pangolin toolkit (v1.1.14) and based on an updated (2020-05-19) proposed dynamic nomenclature for SARS-CoV-2 sequences (Rambaut et al 2020, The B lineage has a basis in China with several global exports. B.1, B.1.1 represents a large lineage of sequences circulating in Europe and corresponds to Italian, UK and French outbreaks. B.4 represents the Iran lineage and sequences in this lineage are associated with travel histories from Iran.

Initial sample repeated samples
P232 P486,
P169 P2671, P731, P1889
P293 P593
P332 P555,P1018
P338 P640
P037 P139
P572 P1638
P596 P1214

Table1: A table showing initial diagnostic samples (column1) and the associated repeated samples (column 2) taken at different time points during the course of the infection.

Data Availability
Sequence data has been deposited at GSAID and are awaiting curation and release.