NCBI release: NGS reads for 278 new samples now on SRA

As part of our continued collaboration with the Sierra Leone Ministry of Health and Sanitation, Kenema Government Hospital and the VHFC, we’re continuing our early release of EBOV sequence data produced at the Broad Institute.

Raw Illumina read data has now been posted on the NCBI Short Read Archive (SRA) for 278 samples from Ebola positive patients from Sierra Leone sequenced in December 2014. Due to sample quality issues in this batch, many of these samples did not sequence well and produced limited read coverage, but all are included in this submission.

Please note that the NCBI BioProject linked above will also include all previous samples from the original Gire, et al 2014 paper. To help distinguish the new data from the old, we provide here a short table that identifies the 278 NCBI BioSample IDs for the new batch only (biosamples-20141224.txt).

The sequences were generated by 101bp PE Illumina sequencing using the protocols described in the Gire et al. and Matranga et al. papers. Most samples in this batch have been run on at least six lanes of Illumina HiSeq 2500, but most do not have independent library construction replicates (which is preferred for reliable intrahost variant calling).

Reads have been depleted of all human genetic material using a combination of BMTagger and BLASTN, and PCR duplicate removal has been performed with M-Vicuna (alignment-free). The resulting read sets should contain all microbial bloodborne pathogens and have not been filtered to specific taxa of interest. Metagenomic analyses of non-EBOV pathogens are in progress. As mentioned earlier, some samples do not contain many microbial reads (or EBOV reads), but all samples have been provided to SRA in case they are of interest.

Of these 278 samples, 45 produced sufficient sequencing coverage for high quality assembly. Those assemblies have been previously announced on this forum: Mid/Early release - 45 new EBOV genomes from Sierra Leone. Due to some technical calibrations on the assembly and annotation process, the finalized versions of those genomes will likely be submitted to NCBI Genbank in January 2015.

This represents approximately half of the Sierra Leone samples recently received by Harvard/Tulane. Stay tuned in the new year for sequencing results of the other half.

Disclaimer:
Please feel free to download, share, use, and analyze this data. We are currently in the process of preparing a publication and will post progress on this forum. If you intend to use these sequences for publication prior to the release of our paper, please contact us directly.