A large number of Zika virus genome sequences are going to start appearing in GenBank, etc. Four new genomes of Brazilian isolates just appeared on GenBank, adding the one Brazilian isolate from Dec 2015.
An ongoing challenge to anyone analyzing viral genomes is the limited annotation that often accompanies GenBank records, in terms of Collection Date, Collection Location, etc.
For example the four new Brazil isolates are annotated at the level of:
/strain="BeH818995" /host="Homo sapiens" /country="Brazil" /collection_date="2015"
For many purposes this is just fine but the more information that is provided the better
I would ask the community to consider expanding the annotation as follows
Collection date - provide the full date, or at least the month and year
Location - provide the geographic region within the country - state, province, nearest city
Patient - sex, age
And adopting a nomenclature standard along the lines of those used with Flu and Ebola, amongst other, would be helpful.
For example: Ebola virus/H.sapiens-wt/SLE/2014/Makona-0106_C2_KT2315
This scheme is imperfect but it is more informative than a lab’s internal sample ID
One more thing, providing data prior to publication is invaluable in outbreaks like Ebola and now Zika, but there have been cases where the pre-publication data uses one naming scheme for sample and the published GenBank records use a different one. That means that an end-user has to match up records by sequences and then figure out which records refer to the same isolate. It would be great is we can avoid that with Zika data.
PS Can the administrators set up a Zika virus category on the site ? I think we’ll be needing it…