Raw data release from Zika virus Illumina sequencing pilot


As part of a new collaboration between Santa Fe de Bogota Foundation, Scripps Translational Science Institute, The Scripps Research Institute, Colorado State University, VHFC, and the Broad Institute we recently received plasma samples from de-identified Zika virus (ZIKV) patients in Colombia. We performed standard QC for ZIKV levels by qPCR and unfortunately most of the patient samples had none or very low detectable values for ZIKV on two separate qPCRs.

Since the patient samples contain so little ZIKV material, this data probably isn’t all that helpful. However, we wanted to make this data available to the research community, as it represents a potentially useful metagenomic resource, as well as a high quality Illumina dataset for the positive control that will prove useful for tuning computational pipelines, etc.

We prepared RNA from two of the qPCR-positive patient samples (Z184 and Z186), as well as a positive control (seed stock of the Malaysian strain P6-740 passaged once on BHK-21 cells) and negative control:


  • Z184, 42 year old female with fever, rash, joint pain, myalgia, eye pain, and cephalgia. Symptom onset December, 2015 [2 ZIKV reads]

  • Z186, 33 year old male with fever, rash, joint pain, myalgia, eye pain, and cephalgia. Symptom onset December, 2015 [33 ZIKV reads]

  • P6-740, positive control of Malaysian strain P6-740 passaged once on BHK-21 cells [20,729 ZIKV reads]

  • Extraction, negative extraction control for metagenomic comparisons. [0 ZIKV reads]


Total RNA was depleted for rRNA and sequenced on the Illumina MiSeq using previously published protocols by Matranga *et al. *with no specific amplification. A water-extraction control was also run. Reads were aligned to the P6-740 reference using Novoalign, duplicates removed using Picard, and realigned using GATK. Filters based on the water control were used to remove contaminants from the samples and Kraken was used to do metagenomic analyses on both the raw and filtered reads. Human reads have been removed from all raw files using bmtagger and SNAP.


Link To Data Repository

Basic Insights from the data

Interestingly, the reads from Z184 and Z186 most closely match the Malaysian strain. Since we use this very strain as our positive control, one might suspect contamination, however, we have a couple of reasons that make us believe that this might not be the case,

  1. We do not have any ZIKV reads in our water-only control.

  2. The % identity between the reads from the patient samples and P6-740 is ~97%; taking Illumina errors into consideration, one would expect the % identity to be closer to 100% (typically we observe >99.5% identity in our Lassa and Ebola studies when we observe seed stock contaminations). Still, since we have so few reads we cannot make any firm conclusions and also cannot rule out contamination at this point in time. More sequencing as well as higher quality inbound samples will help us resolve these issues.

As always, please feel free to contact us directly if you have any questions or comments. We have a larger set of samples currently going through QC and we will continue to make data immediately available as it is generated.


Is there any sequence data on any of the Zika viruses in Cape Verde? Seems like a large outbreak.

Cape Verde is an island off the coast of Senegal in West Africa, but has a long-term connection to Brazil.

Curious to see if the strain is African or Brazilian - potentially important public health implications.


I asked this very same question at the WHO teleconference/internet Zika briefing a couple of weeks ago, and was told “we don’t know”. If it turns out to be African - and Senegal is only a few hundred kilometres to the east - then it would show that there is no inherent reason why African strains can’t also cause epidemic outbreaks, and argue against the theory that the Asian/American variants are in some way more aggressive. On the other hand, if it is a Brazilian one, then the question would still be open.


Didn’t WHO parachute in there months ago?

Seems like this is exactly the type of data that should not be hoarded.


You can hear the recording of Margartet Chan’s answer here: http://www.who.int/emergencies/zika-virus/mediacentre/webcast-22-3-2016/en/ My question starts at -15.34 (15,34 from end, not start).



So important

Did the Brazilian variant adapt to Aedes africanus?

If this is African Zika did it just now get to the island?