A few updates. Regarding the GP mutations in G5119.1 that @swohl, @alin and @evogytis are discussing, I haven't yet produced iSNV calls systematically since we don't yet have replicate sequencing runs from independent library constructions. But since I've been staring at the reads for G5119.1 between 6675 and 6170, I can say that the typical Makona-2014 sequence in this region is present at about 5% in the reads, and the four SNPs of interest are at about 95%, and they do all appear to be linked (at least the three that are all really close are definitely linked, so basically, just two intrahost haplotypes here. The read support looks quite solid, there's no reason to suspect sequencing errors in this region.
@arambaut - I've fixed the assembly errors in the palindromic region at the very end of the genome by increasing some of our edge trimming parameters. Those SNPs all go away with the exception of 18910, which continues to be well supported by reads. Still have reason to be suspicious of 18910 because of where it is, but I can't come up with a good reason to exclude it, so in my latest assemblies, we keep that one in there.
As for the other potential homoplasies pointed out by @evogytis, the ones I've managed to spot check look well supported. My guess is that the tree topology might shift around a bit as we add new sequences each week? Looking at the sequences, is it impossible to come up with a tree that eliminates these recurrent mutations? Is it possible that at this point we've seen enough evolutionary time pass (within just 2014 itself) that a recurrent mutation is believable?
I'm not sure I have a good sense of scale for that. @arambaut, if you say 1e-3 / site / year, then that means a single genome would experience 38 mutations a year? Right now, the WHO case totals for 2014 are about the same as the number of base pairs in the genome, so does that mean every position in the genome has had 38 opportunities to turn over this year? I must be wrong about the scale of this, that doesn't seem right.