I have been privately dealing with rumors and inquiries, focused on the RRAR potential furin cleavage site, that nCoV2019 may have a suspicious origin as an engineered, laboratory-generated virus either accidentally or deliberately released in the area of the Wuhan seafood and animal market. The publication of the highly similar RaTG13 sequence about a week ago has fueled this type of speculation.
As I have told people privately, I see no evidence at all to support such a claim. In sharp contrast, I have studied the question in detail, using RaTG13 and Wuhan sequence at the S1/S2 boundary, and find convincing proof of exactly opposite conclusion – that RaTG13 could NOT be a proximal source of the Wuhan virus.
At first glance of an alignment of the S protein sequence of both, it is natural that the issue of an engineered insertion should be considered. On either side of the new furin site, the amino acid sequence is identical in both from aa614 to aa1133 – an apparent insert of PRRA is the only difference in an otherwise 100% conserved 519 amino acid region.
But that is at first glance.
One has to consider that the PRRA is an unusual sequence to introduce to generate a furin site – others even among coronaviruses like MHV A59 are so much better. Also that the underlying code CCTCGGCGGGCA introduces an unnecessarily G and C rich region where none otherwise exists. Not likely scenarios for something a gene jockey would do.
Then one looks at the actual RNA alignment. The “insert” is actually not in frame, but CTCCTCGGCGGG, or -2 out of frame. Again, who does that?
But the PROOF lies in looking at the 288 alignable nucleotides on either side of the “insert”. While they cover identical protein sequence, the RNA is not at all identical, but 6.6% different – 19 mutations out of 288. All 19 are mutations in the wobble base of their respective codons. There are so many that the frame can be inferred from the 2/1 pattern even without knowing the beginning or the end, or indeed that the encoded protein sequence is identical – those are self-evident by looking at the RNA itself.
We know from influenza H1N1, for which we have serial isolates from 1918 to the present, that wobble base mutagenesis occurs at a rate of 0.95% per decade. This permits an estimation of the TMRCA of the two sequences nCoV2019 and RaTG13 of 69.5 years ago – roughly 1950 +/- 10 years or so.
RaTG13, or anything nearly identical to it at the RNA level, simply could not be a proximal source of nCoV2019. It just LOOKS like it might be…at first glance.
Given that furin cleavage signals are present in other coronaviruses at exactly that point in the S1/S2 boundary region, it only LOOKS unusual, especially against the backdrop of SARS. The preponderance of evidence, coupled with Ockham’s razor (that the simplest explanation is preferred) dictates that the PRRA sequence has been conserved in nCoV2019 from a long ago ancestor virus. It is not of suspicious origin. The closest bat virus sequence is really not close at all.
RNA don’t lie.