As part of a new collaboration between Santa Fe de Bogota Foundation, Scripps Translational Science Institute, The Scripps Research Institute, Colorado State University, VHFC, and the Broad Institute we recently received plasma samples from de-identified Zika virus (ZIKV) patients in Colombia. We performed standard QC for ZIKV levels by qPCR and unfortunately most of the patient samples had none or very low detectable values for ZIKV on two separate qPCRs.
Since the patient samples contain so little ZIKV material, this data probably isn’t all that helpful. However, we wanted to make this data available to the research community, as it represents a potentially useful metagenomic resource, as well as a high quality Illumina dataset for the positive control that will prove useful for tuning computational pipelines, etc.
We prepared RNA from two of the qPCR-positive patient samples (Z184 and Z186), as well as a positive control (seed stock of the Malaysian strain P6-740 passaged once on BHK-21 cells) and negative control:
Z184, 42 year old female with fever, rash, joint pain, myalgia, eye pain, and cephalgia. Symptom onset December, 2015 [2 ZIKV reads]
Z186, 33 year old male with fever, rash, joint pain, myalgia, eye pain, and cephalgia. Symptom onset December, 2015 [33 ZIKV reads]
P6-740, positive control of Malaysian strain P6-740 passaged once on BHK-21 cells [20,729 ZIKV reads]
Extraction, negative extraction control for metagenomic comparisons. [0 ZIKV reads]
Total RNA was depleted for rRNA and sequenced on the Illumina MiSeq using previously published protocols by Matranga *et al. *with no specific amplification. A water-extraction control was also run. Reads were aligned to the P6-740 reference using Novoalign, duplicates removed using Picard, and realigned using GATK. Filters based on the water control were used to remove contaminants from the samples and Kraken was used to do metagenomic analyses on both the raw and filtered reads. Human reads have been removed from all raw files using bmtagger and SNAP.
Link To Data Repository
Basic Insights from the data
Interestingly, the reads from Z184 and Z186 most closely match the Malaysian strain. Since we use this very strain as our positive control, one might suspect contamination, however, we have a couple of reasons that make us believe that this might not be the case,
We do not have any ZIKV reads in our water-only control.
The % identity between the reads from the patient samples and P6-740 is ~97%; taking Illumina errors into consideration, one would expect the % identity to be closer to 100% (typically we observe >99.5% identity in our Lassa and Ebola studies when we observe seed stock contaminations). Still, since we have so few reads we cannot make any firm conclusions and also cannot rule out contamination at this point in time. More sequencing as well as higher quality inbound samples will help us resolve these issues.
As always, please feel free to contact us directly if you have any questions or comments. We have a larger set of samples currently going through QC and we will continue to make data immediately available as it is generated.