Divergence of nCoV-2019 to closest non-human relative

To date, the closest described relative to the nCoV-2019 novel coronavirus in a non-human host is a bat SARSr-CoV called RaTG13 (genbank accession MN996532, Zhou et al (2020) [1]) sampled from a Rhinolophus affinis bat in Yunnan Province in 2013.

Although this has a 96.1% identity with nCoV-2019, at the rate that coronaviruses evolve, this represents signifiant evolutionary time. This can be estimated using BEAST [3,4] by assuming a rate of evolution. Here I have show estimates for a rate of 1e-3 and 5e-4 substitutions per site per year. The former rate seems to be close to that observed in the human outbreak and the latter closer to rates estimated for some other coronaviruses.

Figure 1 | The divergence time estimate between nCoV-2019 and RaTG13 SARSr-CoV. Bars represent the 95% HPD credible intervals, lines represent the complete range of sampled values. Analysis in BEAST v1.10.4 with GTR+G8 substitution model, strict molecular clock, constant size population coalescent prior.

With the faster rate this analysis suggests the human nCoV-2019 and the bat SARSr-CoV last shared a common ancestor around the end of 1992 and likely before mid-1997 (the upper 95% credible interval). For the slower rate this estimate is proportionally older — likely before 1978.

Note — this is an estimate of when the ancestors of RaTG13 and nCoV-2019 were in the same host individual (very likely a bat). It is certainly not when it jumped into humans (that was likely very shortly before the earliest recorded cases in December 2019 - see this post).

Recently it has been announced that a coronavirus recovered from a pangolin may have 99% identity to nCoV-2019 making this a far closer relative. Assuming the faster rate of evolution this would imply a most recent common ancestor approximately 6 years ago (2014).

Addendum — as @R_H_Ebright pointed out on Twitter the exact phrase used was “as high as 99%”.

virus isolate host accession reference
SARSr-CoV RaTG13 Rhinolophus affinis MN996532 Zhou et al (2020) [1]
nCoV-2019 Wuhan-Hu-1 human NC_045512 Wu et al (2020) [2]

Table 1 | virus genome sequence used in this analysis.


  1. Zhou, P., Yang, X., Wang, X. et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature (2020). A pneumonia outbreak associated with a new coronavirus of probable bat origin | Nature
  2. Wu, F., Zhao, S., Yu, B. et al. A new coronavirus associated with human respiratory disease in China. Nature (2020). A new coronavirus associated with human respiratory disease in China | Nature
  3. Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian Phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol (2012) 29: 1969–1973.
  4. Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol (2018) 4: vey016.

Following on from the recombination analyses discussed in a different thread, specifically the analyses by David Robertson and Maciek Boni, we performed similar dating analyses using a full genome alignment were recombination was masked and an alignment that concatenates segments with consistent phylogenetic signal (11.5 kb). As calibration we used a rate distribution based on OC43 (0.00024 [0.00019,0.00029] subst./sites/yr) and MERS (0.00078 [0.00063,0.00092] subst./sites/yr):

Using the OC43 rate, we arrive at a mean estimate for 2019-nCoV/RaTG13 split of about 1925-1930 for both data sets, whereas the MERS rate results in a younger estimate (a mean age of about 1988). The differences with Andrew’s estimates and the variability of all these estimates appears to be almost entirely due to the choice in the rates that are used, raising the question which rates estimates are more appropriate for this.


I posted Feb 6 a much simpler method of tracking the rate of wobble base mutagenesis in highly conserved regions. Different clocks, different assumptions, obviously give different estimates – but we all agree that these “close” bat coronavirus relatives in bats are not really close at all, but have distant TMRCA going back decades. No smoking gun.

Bill Gallaher