Mid/Early release - 152 EBOV genomes from Sierra Leone

Kristian_Andersen · March 26, 2015, 1:21pm

Update:
Please use BioProject link instead since the sequences have all be finalized:
http://www.ncbi.nlm.nih.gov/bioproject/PRJNA257197/

As part of our continued collaboration with the Sierra Leone Ministry of Health and Sanitation, Kenema Government Hospital and the VHFC, we’re continuing our early release of EBOV genomes sequenced at the Broad Institute.

This release contains all the EBOV genomes sequenced from December 2014 to March 2015:

152 early-mid release genomes containing all sequenced samples that gave us ‘reasonable’ assemblies (including 99 previously released).
124 mid-release genomes that contain only the highest assembly genomes contained in the 152 samples above.

The sequences can be downloaded here:

All:
http://cl.ly/2S0c0P360P3d

High quality only:
http://cl.ly/2j0V0e1h0K1J

The sequences were generated by 101bp PE Illumina sequencing using the protocols described in the Gire et al. and Matranga et al. papers. The genomes were assembled using Trinity, followed by an alignment refinement step with NovoAlign.

We are in the process of gathering metadata, but at this moment we don’t have any exact dates or other metadata for the individual samples.

Please note that this is an early release, so accuracy can’t be guaranteed at this stage.

Disclaimer:
Please feel free to download, share, use, and analyze this data. We are currently in the process of preparing a publication and will post progress on this forum. If you intend to use these sequences for publication prior to the release of our paper, please contact us directly. If you are interested in joining our collaboration - or if you have any other questions - then please also contact us directly.

system · March 31, 2015, 7:44pm

I know the emphasis has been on generating well-curated full length sequences, but it strikes me, as the battle against Ebola in Sierra Leone is becoming geographically more limited, that real-time sequencing of a shorter region might be helpful. Those estimating progress in limiting the spread of the virus would be able to determine how many viral transmission networks remain by knowing whether the diversity of Ebola sequences was being reduced in the process. Even limited PCR and sequencing equipment in the area, the equivalent of that in a Coroner’s office in the US, would enable comparisons of short 300bp sequences in the carboxy-terminal half of GP1 that are adequate for most phylogenetic applications. Viral eradication has always been a daunting task, and the folks on the ground could use all the helpful information they can get.
Respectfully submitted,
Bill Gallaher

arambaut · March 31, 2015, 9:16pm

It is an interesting question but 300bp just doesn’t seem enough. Some back of the envelope calculations suggest that with a rate of 1.25x10^-3 (a recent estimate) that even after a month of evolution separating two sequences, there is a 96% probability that there would be no differences between them. With a 1000bp region there would be an 88% chance the sequences are identical. With the full 18kbp, this is down to 10%. With a 300bp fragment there is a 50% chance two will be identical even after a year and a half of evolution.

Even if you found a particularly fast evolving 300bp intergenic region you wouldn’t improve things much. The solution is to put NGS machines in diagnostic labs and this is starting to happen.