GENOMIC GYMNASTICS AND THE NATURAL ORIGIN OF SARS-CoV-2
William R. Gallaher, Ph.D.
In SARS-CoV-2, reference strain Hu-2, the nucleotide sequence, including the out-of frame “12 nucleotide insert” encoding the furin site, is:
23582 tat cag act cag act aat t/ct cct cgg cgg g\ca cgt agt
encoding Y Q T Q T N S P R R A R S
The original form was presumably derived from a divergent relative of Bat RaTG13, specifically:
23582 tat cag act cag act aat tca cgt agt
encoding Y Q T Q T N S R S
The redundant breakpoint oligonucleotides CAGAC encoding QTQT (1) are key to this and all subsequent changes in this region over the last 16 months, creating a “hot spot” for genomic gymnastics at the S1/S2 interface of the spike protein.
I previously proposed that the bulk of the insert came from a downstream region of S in Bat CoV HKU9, involving an identical 10 nucleotides to the last part of the insert, leaving the first dinucleotide CT still orphan and unexplained (see 3 of 23 posts in this thread, from May 2020).
We have since seen, in the noncoding interface between orf 8 and N within the B lineage, additional evidence that the SARS-CoV-2 replicase is capable, even within the human population, of producing direct tandem repeats. This occurs just after the known splice acceptor breakpoint sequence ACGAAC. To wit:
Hu-2 2019 28260 acgaacaaa ct aaaatgtctg
Michigan 2021 acgaacaaa \caaa/ ct aaaatgtctg
(Genbank
MZO57968)
This reinforces the two locations, comparing SARS-Co-V-2 and Bat RaTG13, where a direct tandem repeat of three nucleotides in SARS-CoV-2 follows a CAGAC breakpoint location (1).
So…
I now propose that the intermediate sequence in the insert involved only NINE identical nucleotides from the same region of HKU-9 downstream in S encoding TSAG, but inserted here in a different frame, to yield the recombinant:
23581 tat cag act cag act aat t/ct cgg cgg g\ca cgt agt
Y Q T Q T N S R R A R S
While this insert created the furin site, it would be less accessible and inefficient without an additional amino acid in the peptide loop, particularly if that missing amino acid would be proline introducing a kink in the otherwise freely rotating peptide chain.
The KEY BREAKTHROUGH MUTATION fully enabling the furin site, and producing a SARS-CoV-2 with higher pathogenicity and transmissibility, would be the next step – a direct tandem repeat of ctc just downstream of the redundant CAGAC breakpoint sequence - finally yielding what was seen in the early clinical isolates of SARS-CoV-2 in Wuhan. The key was what I would call “the missing kink” in the direct precursor to the pandemic version of the virus.
That this region of sequence is a “hot spot” for mutation has been amply demonstrated by the multiple nucleotide and amino acid substitutions that have subsequently appeared independently in multiple sub-lineages of the virus while circulating in the human population.
Thus the 12 nucleotide insert occurred in TWO stages, a nine nucleotide recombinant followed later by a three nucleotide direct repeat. Each stage has thorough precedent in the genomic gymnastics of the coronavirus replicase, as well as sequential mutational events at this same site, as demonstrated in the known genetic rearrangements found among bat coronaviruses and even in SARS-CoV-2 while circulating in the human population.
Combined with the earlier work of Boni et al (2) and my own earlier work (1), this scenario fully accounts for a natural origin for every single nucleotide in the SARS-CoV-2 genome, as well for a breakthrough mutation that was the last step in enabling the pandemic potential of the virus.
-
Gallaher WR. A palindromic RNA sequence as a common breakpoint contributor to copy-choice recombination in SARS-COV-2. Arch Virol. 2020 Oct;165(10):2341-2348. doi: 10.1007/s00705-020-04750-z. Epub 2020 Jul 31. PMID: 32737584; PMCID: PMC7394270.
-
Boni MF, Lemey P, Jiang X, Lam TT, Perry BW, Castoe TA, Rambaut A, Robertson DL. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat Microbiol. 2020 Nov;5(11):1408-1417. doi: 10.1038/s41564-020-0771-4. Epub 2020 Jul 28. PMID: 32724171.