Some of you may have been contacted by Eneida Hatcher from my group already, but I thought it would be worth posting that our Zika virus resource is now available for testing:
This is a beta release, and there are a couple of improvements that will be made in the next 10 days. That said, we wanted to get this out to the community as quickly as possible. We only ask that you please document any suggestions to improve the resource.
There are several improvements that should be released within the next couple of weeks. These include the option to download only a selected sequence region from a larger GenBank sequence, the naming of partial proteins by our de novo annotation pipeline, and the ability to search by authors.
On another note, having participated in several isolate naming efforts over the past few years there is always a give in take between machine parsable and human readable formats. IMHO, if you are parsing metadata from deflines, you are better off using a resource like ours, where you can download sequences with customized FASTA deflines that incorporate standardized metadata or simply download a table of standardized metadata that accompanies a sequence.
BTW, there is not supposed to be a space in H.sapiens or other species names. It looks like something got last in the edits. The host field was conceived as it was for the filovirus isolate names.
Also, the rationale for the placement of sample ID was that the other fields in front of this are absolutely required for sequence submission to GenBank and should be available for use in the naming construct. This is the same approach that groups I worked with took towards a number of organisms - Human adenovirus, Rotavirus, Filovirus - and will be part of a basis for a universal scheme.
BTW, you can search for these IDs in our resource under “Additional filters” if they are part of the defline.
Reading through the comments, it looks like we can do some things on the submission side to enable easier access to lab IDs and other unique IDs. One of the problems is there is no specific field in GenBank records to accommodate this, but there is in BioSample…