I have posted a copy of my preprint entitled " A Palindromic RNA Sequence as Common Breakpoint Contributor to Copy-choice Recombination in SARS-CoV-2" to the ResearchSquare preprint site, as an interim measure while Archives of Virology finishes processing the paper for online publication.
Readers should be aware that I have already assigned copyright to the publisher once the online version appears.
Bill Gallaher

This article has now appeared online in Archives of Virology.

A palindromic RNA sequence as a common breakpoint contributor to copy-choice recombination in SARS-COV-2
Archives of Virology, (), 1-8

Bill Gallaher

Has the RRAR Phoenix arisen? https://science.sciencemag.org/content/early/2020/10/19/science.abd3072.full


William R. Gallaher, Ph.D.


In SARS-CoV-2, reference strain Hu-2, the nucleotide sequence, including the out-of frame “12 nucleotide insert” encoding the furin site, is:

23582 tat cag act cag act aat t/ct cct cgg cgg g\ca cgt agt
encoding Y Q T Q T N S P R R A R S

The original form was presumably derived from a divergent relative of Bat RaTG13, specifically:

23582 tat cag act cag act aat tca cgt agt
encoding Y Q T Q T N S R S

The redundant breakpoint oligonucleotides CAGAC encoding QTQT (1) are key to this and all subsequent changes in this region over the last 16 months, creating a “hot spot” for genomic gymnastics at the S1/S2 interface of the spike protein.

I previously proposed that the bulk of the insert came from a downstream region of S in Bat CoV HKU9, involving an identical 10 nucleotides to the last part of the insert, leaving the first dinucleotide CT still orphan and unexplained (see 3 of 23 posts in this thread, from May 2020).

We have since seen, in the noncoding interface between orf 8 and N within the B lineage, additional evidence that the SARS-CoV-2 replicase is capable, even within the human population, of producing direct tandem repeats. This occurs just after the known splice acceptor breakpoint sequence ACGAAC. To wit:

Hu-2 2019 28260 acgaacaaa ct aaaatgtctg

Michigan 2021 acgaacaaa \caaa/ ct aaaatgtctg

This reinforces the two locations, comparing SARS-Co-V-2 and Bat RaTG13, where a direct tandem repeat of three nucleotides in SARS-CoV-2 follows a CAGAC breakpoint location (1).


I now propose that the intermediate sequence in the insert involved only NINE identical nucleotides from the same region of HKU-9 downstream in S encoding TSAG, but inserted here in a different frame, to yield the recombinant:

23581 tat cag act cag act aat t/ct cgg cgg g\ca cgt agt

While this insert created the furin site, it would be less accessible and inefficient without an additional amino acid in the peptide loop, particularly if that missing amino acid would be proline introducing a kink in the otherwise freely rotating peptide chain.

The KEY BREAKTHROUGH MUTATION fully enabling the furin site, and producing a SARS-CoV-2 with higher pathogenicity and transmissibility, would be the next step – a direct tandem repeat of ctc just downstream of the redundant CAGAC breakpoint sequence - finally yielding what was seen in the early clinical isolates of SARS-CoV-2 in Wuhan. The key was what I would call “the missing kink” in the direct precursor to the pandemic version of the virus.

That this region of sequence is a “hot spot” for mutation has been amply demonstrated by the multiple nucleotide and amino acid substitutions that have subsequently appeared independently in multiple sub-lineages of the virus while circulating in the human population.

Thus the 12 nucleotide insert occurred in TWO stages, a nine nucleotide recombinant followed later by a three nucleotide direct repeat. Each stage has thorough precedent in the genomic gymnastics of the coronavirus replicase, as well as sequential mutational events at this same site, as demonstrated in the known genetic rearrangements found among bat coronaviruses and even in SARS-CoV-2 while circulating in the human population.

Combined with the earlier work of Boni et al (2) and my own earlier work (1), this scenario fully accounts for a natural origin for every single nucleotide in the SARS-CoV-2 genome, as well for a breakthrough mutation that was the last step in enabling the pandemic potential of the virus.

  1. Gallaher WR. A palindromic RNA sequence as a common breakpoint contributor to copy-choice recombination in SARS-COV-2. Arch Virol. 2020 Oct;165(10):2341-2348. doi: 10.1007/s00705-020-04750-z. Epub 2020 Jul 31. PMID: 32737584; PMCID: PMC7394270.

  2. Boni MF, Lemey P, Jiang X, Lam TT, Perry BW, Castoe TA, Rambaut A, Robertson DL. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat Microbiol. 2020 Nov;5(11):1408-1417. doi: 10.1038/s41564-020-0771-4. Epub 2020 Jul 28. PMID: 32724171.

If we assume nefarious purposes, pretty much anything can be explained away if we assume they are competent and want to cover their tracks.
However, if this was made AND released for nefarious purposes AND they wanted to make it less traceable AND it was linked to research in China, THEN it makes no sense that it would be released on the doorstep of the WIV.
Either this was natural, it was an accident, or it was an attempt to frame the WIV.
I only entertain the natural or accidental scenarios, not a nefarious scenario.

We already have a fairly good idea of the exact cave that the sample came from. Lots of people agree with your proposal and would love to go sample there. The Chinese government does not seem to agree with the proposal, however.

This statement, as written, is true. The only scientists with authority are those whose statements are backed, not just by their reputation, but also by data.

In science, jurisprudence, and gathering of intelligence, the coin of the realm is EVIDENCE. Cold, hard, facts and data meticulously analyzed, whether by epidemiological, molecular or analytical means. This thread, dating to 6 Feb 2020, has been a stream of such data and analysis as the bedrock for hypothesis and conclusions. The first post was indeed the first public counterargument, based on sequence analysis of SARS-CoV-2 vs. RaTG13, decrying the lab origin from RaTG13 as bogus. Since that time, no data at all have emerged to link the COVID pandemic to any known virus at any virology laboratory, period.

I would note that when West Nile virus was detected in Flushing, Queens, in NYC, no one raced to the Rockefeller University on the other side of the East River, to hurl accusations at the virologists there. Proximity across the river is not evidence of complicity, not in 1999 and not in 2019.

In this it should be noted that intelligence services have extraordinary means of collecting data from digital devices covertly, analyzing surveillance records that are incredibly detailed, and evaluating chatter also by covert means. They have uncovered nothing of value, or surely we would have heard of it. No alarms, no scurrying, no suspicious activity at the WIV. Just confirmation that the Chinese were caught as flat-footed as anyone by the emergence of COVID in their midst, going into high gear only after what turned out to be the two sub-species of the novel virus had already been found in and around the market neighborhoods.

Beginning with Louis IX in France in the 13th century, the presumption of innocence has been the bedrock of criminal law – and the conjecture of accidental release concerns nothing less than negligent homicide on a mass scale. Presumption of innocence is a universally applicable principle, that is violated whenever surmise, speculation, suspicion or prejudicial conjecture are substituted for gathering and presentation of evidence. Regardless of past accomplishments of scientists who engage in such fact-free banter, one’s authority is always based on one’s evidence. Where there is no evidence, there is no authority.

To summarize the case for natural origin and natural emergence:

Holmes et al. (1) have recently laid out a thorough evidentiary case for a natural origin of COVID, including most especially detailed epidemiological data that excludes the WIV as an epicenter of the Wuhan outbreak.

We know that coronaviruses have all of the capabilities necessary to produce SARS-CoV-2, as it emerged, by entirely natural means.

By Spring of 2020, with my discovery of the CAGACTCAGACT direct repeat immediately 5’ to the furin insert, it has been clear that the nature and sequence of the RNA at the S1/S2 junction is unusually permissive of inserts, deletions and mutations at that site, as an inherent hypervariable property of the RNA genome in that region. In post 24 above, I provide a specific nucleotide by nucleotide hypothesis of how the furin insert was generated at this site from another bat coronavirus, HKU9, whose Rousettus host range extensively overlaps that of Rhinolophus affinis, the suspected bat source of SARS-CoV-2. This is as close to witnessing the insertional event as we are ever likely to get, in terms of how and why that insert appeared where it did. It is simply what coronaviruses are wont to do.

As a matter of simple fact, a sequence that was publicly described as of suspicious origin, launching conspiracy banter, was instead found to be bona fide bat coronavirus RNA by May 2, 2020, from the same geographic range but outside of the Sarbecovirus subfamily. Elapsed time from first appearance of the SARS-CoV-2 sequence? Less than 80 days to find it and post it here.

Copy-choice recombination at non-random locations in coronaviruses has been documented since the late 1980s. Its capability to even insert a large bloc of RNA from an exogenous source was documented by 1991 (2), when it was shown that the bulk of a Hemagglutinin-Esterase gene, 1.5kb, was inserted into the common ancestor of several coronaviruses from a common ancestor of that insert and the corresponding protein in influenza C viruses. Copy-choice recombination has been documented in the history of SARS-Co-2 and has in fact occurred during the human pandemic several times. Natural substitution of the RBD from another virus, or insertion of the furin site, at sites of preferred genetic variability, has thorough background evidentiary support. Likewise, RNA viruses in general, and coronaviruses in particular, have been shown to create tandem repeats or duplications at some distance from the original sequence, similar to the length polymorphisms detected between SARS-CoV-2 and RaTG13, as well as within SARS-CoV-2 during the human pandemic. SARS-CoV-2 has even duplicated the ACGAAC transcriptional regulatory sequence 600 nt downstream in the N gene, effectively creating a new gene, to a region of partial identity at the end of the SR-rich region, during replication in humans while still in China, right under our sequencing noses. Literally every nucleotide of SARS-CoV-2 can be accounted for, within bat coronavirus species with natural ranges that thoroughly overlap in southern China, by these natural mechanisms of genetic variation, recombination and genomic gymnastics. These have been explained in the above posts and other threads here on Virological or the published literature.

When some other investigators can specify the who, what, when, where and how, by specific evidentiary means, for an accidental laboratory release, let them do so. Till then, the only authority is the data, and that points solely to a natural evolution of the pandemic virus, that then emerged as countless viruses have in the past, in the long progression of viral infection around the globe before there was a single virology lab anywhere.

As for the specific arrival of SARS-CoV-2 in Wuhan, viruses have been professional hitchhikers for millennia. Every place on earth can now be reached from any other place in less time than the incubation period of any virus in humans. The 1500 km from Yunnan to Wuhan is nothing to them; West Nile hopped much further from the Old to the New World, along a well worn path of many viruses before it. We have lived in the pandemic age for hundreds of years, and this will not be the last virus to emerge from an only seemingly isolated habitat. Emergence happens.

William R. Gallaher Ph.D.

  1. Holmes EC, Goldstein SA, Rasmussen AL, Robertson DL, Crits-Christoph A, Wertheim JO, Anthony SJ, Barclay WS, Boni MF, Doherty PC, Farrar J, Geoghegan JL, Jiang X, Leibowitz JL, Neil SJD, Skern T, Weiss SR, Worobey M, Andersen KG, Garry RF, Rambaut A. The origins of SARS-CoV-2: A critical review. Cell. 2021 Aug 19:S0092-8674(21)00991-0. doi: 10.1016/j.cell.2021.08.017. Epub ahead of print. PMID: 34480864; PMCID: PMC8373617.
  2. Zhang XM, Kousoulas KG, Storz J. The hemagglutinin/esterase glycoprotein of bovine coronaviruses: sequence and functional comparisons between virulent and avirulent strains. Virology. 1991 Dec;185(2):847-52. doi: 10.1016/0042-6822(91)90557-r. PMID: 1962455; PMCID: PMC7131179.