Pre-release - Thirteen EBOV genomes from Nigeria 2014

The African Center of Excellence for Genomics of Infectious Diseases (ACEGID), Redeemers University Nigeria (Ede, Osun State, Nigeria), in collaboration with two partner institutions in Nigeria (the Nigerian Center for Disease Control, Abuja, Nigeria, and the Virology Unit, Central Research Laboratory, College of Medicine, University of Lagos, Lagos State, Nigeria) as well as the Viral Hemorrhagic Fever Consortium, is releasing Ebola virus (EBOV) genomes from thirteen cases from the 2014 EVD outbreak in Nigeria.

The sequences can be downloaded here:
EBOV_NGA_combined.fasta.txt (242.8 KB – this is a fasta file ending in “.txt” because virological only allows a specific set of filename extensions to be attached)

Samples were processed and extracted and Nextera libraries were constructed by ACEGID/Redeemers University staff using protocols described in Gire, et al (2014) and Matranga, et al (2014). Sequences were generated by 101bp PE Illumina sequencing on MiSeq and HiSeq machines at the Broad Institute. Genomes were assembled using the software pipeline described in Park, et al (2015).

We are in the process of gathering metadata, but at this moment we have not finalized the dates or locations, which is why we haven’t yet posted the data to NCBI. The total duration of the Nigerian EVD epidemic was relatively short however. All cases are from Lagos, except for one that is from Port-Harcourt (DHB_015_09_14). The index case that entered the country from Liberia is DHB_001_07_14.

Disclaimer:
Please feel free to download, share, use, and analyze this data. We are currently in the process of preparing a publication and will post progress on this forum. If you intend to use these sequences for publication prior to the release of our paper, please contact us directly. If you are interested in joining our collaboration–or if you have any other questions–then please also contact us directly.

Thanks Danny! Here’s a quick ML tree (based on the Gire et al alignment) showing the Nigerian lineages sprouting off the SL2 lineage as expected. A couple of long branches, but I assume that’s likely due to assembly quality?

I haven’t placed them into the larger phylogeny yet, but essentially this supports the GIN > SEL > LBR > NIG pattern of spread that we previously suggested.

Very cool. Definitely consistent with a source in Liberia. These Nigerian sequences fall into a sub-lineage of SL2 we have been calling LB5. This is the same sub-lineage that caused the infections in Mali (via Guinea).

Ooh, thanks @jtladner. LB5 really spread around huh? It’s the biggest Liberian group in your data set, right?

From looking at a figure you made previously, the earliest LB5 genome you have is from early August 2014. The Nigerian genomes are throughout August, but the first few (especially the index) was mid/late-July, so does that make these your chronologically earliest LB5 genomes?

Your new figure here implies that the Nigerian index genome is only one SNP away from the basal LB5 genome?

@dpark Yeah, our earliest Liberian sequences from LB5 were tested on Aug 8th 2014. However, we have pretty limited sampling from June and July. We estimated the TMRCA for LB5 to be June 19th 2014 (HPD95 late May - mid-July). It is the second most common sub-lineage in our dataset.

It actually looks like there are three substitutions separating the basal LB5 from the first Nigerian sequence (4037, 17,016 and 18,754), but two were collapsed in the haplotype network because of missing data in other sequences.

@dpark, what’s the coverage of the long-branch lineages? Real or artifact?

4037 and 17,016 appear to be unique to the Nigerian samples (in the current data set). The substitution at 18,754 is also present in two Liberian sequences, both tested Aug. 4th 2014.

@Kristian_Andersen, two of them are quite nice (DHB_019 = 233X and DHB_009 = 144X), but the 061304 one is not as nice (12X). @swohl is staring at coverage around specific SNPs at the moment, as well as between-library concordance. I think we feel good about the two higher coverage ones.

@jtladner, we also noticed that possible homoplasy towards the end of the genome and weren’t yet sure how much to believe it. Do you feel good about the sequencing coverage at that site on your two Liberian sequences?

Just FYI, I take back what I said about the homoplasy… I haven’t been as close to the data this time and I got it wrong.

I would be slightly suspicious of the 061304 genome though, it seems a bit odd on a few levels. The rest look pretty good.

Gui>SL>Lib,Gui>Nig,Mal?

Probably GIN>SLE>LBR>GIN>MLI and GIN>SLE>LBR>NGA

… where the bifurcation point is in LBR. That is, both the GIN->MLI and NGA branches stem from LB5 (which itself stems from SL2 which came from GIN).

Yeah, exactly. The GIN>SLE>LBR parts are the same in both of those chains.

@dpark SL2 came from SEL, right? Clade 1 in GIN and Clade 2 in SEL - but yes, what @jtladner is saying is my understanding of the data as well (although the G>S>L>G>M scenario is based on epidemiology to place the last link from G>M). Could also have gone directly from SEL to GIN and then to MLI - I don’t think LBR is necessarily implicated?

@Kristian_Andersen Based on the available data, the LB5 sub-lineage originated in Liberia and was then re-introduced to Guinea. This is illustrated in the haplotype network in this thread, where the small, central pink node (from which most of the major lineages radiate) represents the basal SL2 haplotype.

Great, thanks for the clarification - I didn’t see Sierra Leone in that network, so I was unsure where those lineages would fall in the overall picture. Comparing the haplotype network from this thread with Figure 1 from the Cell paper makes the transmission chain very clear though.

I guess for SL2 we don’t actually know whether this one comes from GIN or SEL because of lack of sampling. We first see it in SEL, but given the adjustment of when that funeral happened in SEL, that diversity could have come from either country. SEL probably the most likely source though.

Yeah, that network only includes the Liberian sequences from SL2 + sequences from other countries that branch from Liberian sub-lineages.

The SL1 to SL2 transition is definitely still a little mysterious. There are four substitutions that differentiate the two, but no intermediate sequences have been sampled thus far. However, given that the basal haplotypes of both SL1 and SL2 were observed in SLE in May 2014, it seems likely that the transition took place in SLE. The basal SL1 haplotype has been seen in Guinea, but not the basal SL2 haplotype.