Continuing the discussion from
Mid/Early release - 45 new EBOV genomes from Sierra Leone:
As part of our continued collaboration with the Sierra Leone Ministry of Health and Sanitation, Kenema Government Hospital and the
VHFC, we’re continuing our early release of EBOV genomes sequenced at the Broad Institute.
This release contains all the genome sequenced in December of this year:
10 new early-release genomes
35 previously released genomes (on this site) that have been refined, including additional sequencing
We have turned a couple of knobs on our assembly pipeline and believe most of these genomes to be accurate. A couple of them will still need additional sequencing before final release to NCBI. The sequences can be downloaded here:
The sequences were generated by 101bp PE Illumina sequencing using the protocols described in the Gire
et al. and Matranga et al. papers. The genomes were assembled using Trinity, followed by an alignment refinement step with NovoAlign.
We are in the process of gathering metadata, but at this moment we don’t have any exact dates or other metadate for the individual samples.
We are currently in the process of preparing the data for GenBank and SRA submissions. Please note that this is an early release, so accuracy can’t be guaranteed at this stage. We have run into a couple of speed bumps releasing the raw data - please contact us directly if you need the raw reads before final release to NCBI (should be completed shortly).
Please feel free to download, share, use, and analyze this data. We are currently in the process of preparing a publication and will post progress on this forum. If you intend to use these sequences for publication prior to the release of our paper, please contact us directly.
We have dates of reporting for many of these sequence but some are missing from the WHO line list. We can impute these dates from other samples with adjacent patient ids so here is some documentation of the logic used for these imputations. The dates here (in dd/mm/yy form) are the dates or reporting of the cases but these are almost always the same as the date of initial sample collection where this is known.
G4955 is likely from 2014-08-13:
G5119 likely from 2014-08-19 or 2014-08-20:
G5640 is likely from 2014-09-10 to 2014-09-12:
G5982, G5983, G5997, G6012 & G6020 are likely from 2014-09-23 to 2014-09-25
The remaining 3 - G6089, G6091, G6104 - aer likely to be on or after 2014-09-25 but probably not by much:
@arambaut. Do you have a .csv file with all the various dates (imputed and otherwise) that you could please share?
Here is a .csv file with for all 45 sequences: