Continuing the discussion from Mid/Early release - 45 new EBOV genomes from Sierra Leone:

We have dates of reporting for many of these sequence but some are missing from the WHO line list. We can impute these dates from other samples with adjacent patient ids so here is some documentation of the logic used for these imputations. The dates here (in dd/mm/yy form) are the dates or reporting of the cases but these are almost always the same as the date of initial sample collection where this is known.

G4955 is likely from 2014-08-13:

G4942 12/08/14
G4946 13/08/14
G4950 13/08/14
G4956 13/08/14
G4960 14/08/14

G5119 likely from 2014-08-19 or 2014-08-20:

G5117 19/08/14
G5118 19/08/14
G5134 20/08/14
G5212 22/08/14

G5640 is likely from 2014-09-10 to 2014-09-12:

G5621 09/09/14
G5643 10/09/14
G5661 12/09/14
G5684 13/09/14

G5982, G5983, G5997, G6012 & G6020 are likely from 2014-09-23 to 2014-09-25

G5948 23/09/14
G5950 23/09/14
G6050 25/09/14
G6060 25/09/14

The remaining 3 - G6089, G6091, G6104 - aer likely to be on or after 2014-09-25 but probably not by much:

G6069 25/09/14


Great, thanks @arambaut. Do you have a .csv file with all the various dates (imputed and otherwise) that you could please share?


Here is a .csv file with for all 45 sequences: