Question - When running analysis is there a reason that I'm missing for why TMRCA's would be 'pulled back in time' when you run the full Asian/African dataset versus just the Asian dataset in phylogeographic analysis?
I've run E gene and full genome for the full dataset (African+Asian) that does not include any of Faye's sequences and all recombinants as detected by RDPv4 or incongruency also removed. This left 7 African lineage isolates in the analysis and there were 23 isolates in the Asian lineage. Both full genome and E gene analysis agree. TMRCA estimated for Brazil was 2010.5406 [2007.1657, 2013.0183].
However when I ran the Asian lineage alone, again full genome and E gene analysis agree - I got a Brazil TMRCA of 2013.1389 [2012.5698, 2013.8562].
I'm actually interested in TMRCA of our isolates from Thailand, but I noticed that when you run Asian separately you get quite different TMRCAs than when you run African and Asian sequences together. Running the dataset with African isolates pulls the TMRCAs back in time with larger HPDs.
So which analysis would be the more accurate one? When estimating TMRCA is it more important to have more samples and sampling dates or to be lineage specific?
I was unable to get a clock rate for the African lineage alone as I only had 7 samples and didn't think this would yield an accurate rate. But for the Asian lineage I was getting 8E-4 and when I ran the whole dataset including the African isolates I still got around 8E-4 for meanRate. So I don't know that I have reason to believe the lineages have/had different rates of evolution and that it would affect TMRCA estimates.
The analysis was lognormal relaxed clock, gamma and CTMC priors were both run and had the same results on clock rates. Model of nt substitution was GTRG. Analysis returned good ESS values from runs of 100 and 250 million iterations. There was nothing in the outputs of any runs that led me to believe one run inferior to the other(s).