Naturally occurring indels in multiple coronavirus spikes

Naturally occurring indels in multiple coronavirus spikes

Robert F. Garry1 and William R. Gallaher 2,3

1 Department of Microbiology and Immunology, Tulane University Medical Center, 1430 Tulane Avenue, New Orleans, Louisiana 70112 USA; E-Mail:
2 Mockingbird Nature Research Group, PO Box 568, Pearl River, LA 70452
3 Emeritus Faculty, Department of Microbiology, Immunology & Parasitology, Louisiana State University Health Sciences Center, 1901 Perdido Street, New Orleans, Louisiana 70112, USA ; E-Mail:

Proponents of theories for the unnatural origin of severe acute respiratory syndrome coronavirirus-2 (SARS-CoV-2) have asserted that the 12 nucleotide insert in the spike gene, which results in acquisition of a furin cleavage site in spike, may have arisen by laboratory manipulation (Relman, 2020; Segreto and Deigin, 2020; Seyran et al., 2020; Sirotkin and Sirotkin, 2020). Here, we compile evidence demonstrating that insertion/deletion (indel) events at the S1/S2 and S2’ protease cleavage sites of the spike precursors are commonly occurring natural features of coronavirus evolution. We also identify heretofore undescribed similarities in the S1/S2 and S2’ cleavage sites of multiple diverse coronavirus spikes that provide further evidence against a laboratory origin of SARS-CoV-2.

The Orthocoronaviridae includes four genera, Alpha-, Beta-, Gamma- and the newly described Deltacoronavirus (ICTV, 2020). The Betacoronavirus genus is further subdivided into six subgenuses, including the three subgenuses Sarbecoviruses, Merbecoviruses and Embecoviruses with human pathogens. We performed an alignment of the S1/S2 and S2’ cleavage sites of representative alpha-, beta-, and gammacoronavirus spikes (Fig. 1). Accession numbers: SARS-CoV-2 YP_009724390, SARS-CoV AAP13441.1, RaTG13 QHR63300.2, RmYN02 EPI_ISL_412977, MERS-CoV AGG22542.1, HKU4 MH002339.1 HKU5 AGP04943.1, HKU5 AGP04943.1, HKU1a ABD75561_1, HKU1b ABD96196_1, OC43 AIX10760.1, Bovine CoV CCE89341.1, HKU24 YP_009113025.1, MHV A59 ATN37896.1, MHV A59 fusion defective ACO72893.1 , MHV-3 ACN89743.1, MHV-1 ACN89742.1, MHV ML11 AAF68923.1, MHV-S/3239-17 AFD97607.1, MHV-DVIM AAW47240.1, FIPV14F AIL54258.1, C1Je ABI14448.1, XXN QDM36987.1,KUK-H/L BAN67909.1, FIPV79 AAY32596.1, IBV Beaudette NP_040831.1, IBV SES_15Sk AZP23949.1, Avian CoV CV10 QIM61640.1, Avian CoV SCYB ANI21149.1. To facilitate the identification of insertions we aligned conserved amino acids and other features that flank the cleavage sites, and included spikes from viruses that appear to be ancestral to the subgenuses where known. O-linked glycosylation sites were predicted by Net-O-Glyc v. 4.0 (Steentoft et al., 2013).

Figure 1. Alignment of the S1/S2 and S2’ cleavage sites of representative alpha-, beta-, and gammacoronavirus spikes. Panel A: Betacoronaviruses. Panel B: Mouse hepatitis virus. Panel C: Alphacoronaviruses. Panel D: Gammacoronaviruses.

While recent analyses suggest the existence of as yet unsampled sarbecovirus lineages, it is likely that SARS-CoV-2, and recently described coronaviruses isolated from pangolins, share common ancestors with the bat sarbecovirus RaTG13 (Boni et al., 2020). As previously discussed (Andersen et al., 2020; Coutard et al., 2020; Gallaher, 2020a), alignment of the spike protein of SARS-CoV-2 with spike proteins of RaTG13 and other sarbecoviruses demonstrates that the 12 base insertion in the SARS-CoV-2 spike gene adds 4 amino acids (PRRA) at the S1/S2 junction and converts a monobasic cleavage site ® to a minimal furin (polybasic) cleavage site (RRAR) (Fig. 1A). As previously described (Zhou et al., 2020), alignment of the spike of newly detected RmYN02 with other sarbecovirus spikes demonstrates that the S1/S2 junction of sarbecoviruses is variable. The neural network algorithm Net-O-Glyc predicts that the S1/S2 junction of SARS-CoV-2 spike contains 3 O-linked glycans.

The Merbecovirus subgenus of the Betacoronaviridae includes bat coronaviruses Hong Kong University-4 (HKU4) and HKU5 as well as Middle Eastern Respiratory Syndrome corononavirus (MERS-CoV), which infects camels and humans. Phylogenic analyses place HKU4 at a basal position leading separately to the HKU5 and MERS-CoV lineages (Lau et al., 2013). Relative to the HKU4 spike, the HKU5 spike has an insertion of 3 amino acids (RFR) at the S1/S2 junction (Fig. 1B). This insertion generates an optimal furin cleavage site (RFRR). The S1/S2 junction of the HKU5 spike is predicted to contain two O-linked glycans. Relative to HKU4, the MERS-CoV spike displays an insertion of 6 amino acids (LTPRSV). The insertion in the MERS-CoV spike produces a minimal furin cleavage site (RSVR), albeit without predicted O-linked glycans.

The Embecovirus subgenus of the Betacoronaviridae includes the seasonal coronaviruses OC43 and HKU1. Of additional importance is Betacoronavirus 1, an embecovirus notable for its spread to a large number of diverse animal species (Corman et al., 2018). Betacoronavirus 1 is represented here by a bovine coronavirus. The rat embecovirus HKU24 is ancestral to each of these viruses (Lau et al., 2015). HKU24, OC43 and Betacoronavirus 1 spikes have optimal furin cleavage sites (Fig. 1A). The junctional sequences in Betacoronavirus 1 and OC43 spikes are predicted to contain 4 and 2 O-linked glycans, respectively. Relative to HKU24 spike some variants of HKU1 spike have an insert of 6 amino acids (PSSSS) near the S1/S2 junction. Other variants have inserts of 2 amino acids (PS). It is unclear whether the HKU1 spike inserts have occurred independently or sequentially. HKU1 S1/S2 junctions are predicted to contain 2 or 3 O-linked glycans.

Mouse hepatitis virus (MHV), a well-studied embecovirus, further illustrates the natural variability in the S1/S2 junction of betacornavirus spikes, with examples of monobasic, minimal furin and optimal furin cleavage sites existing in spikes of various isolates (Fig. 1B). As with certain other betacoronaviruses, various serine and threonine residues at the MHV spike S1/S2 junctions are predicted O-linked glycosylation sites. The MHV spike, like other coronavirus spikes, undergoes additional proteolytic cleavages, including cleavage at a site referred to as S2’ (Belouzard et al., 2009; Millet and Whittaker, 2014). Cleavage at S2’ exposes a fusion peptide that interacts with a host cell membrane permitting fusion with the viral envelope. Compared to the spike of ancestral embecovirus HKU24 the S2’ site of MHV spike contains insertions of variable lengths, although deletions may also contribute to variability at the S2’ sites in spikes of different MHV variants. Variable indels are also observed in the S2’ site of certain other embecoviruses, including OC43.

Furin cleavage sites are present in spikes of other coronaviruses. The S1/S2 junctions of feline alphacoronavirus spikes bear similarities to the corresponding regions in sarbeco- and merbecovirus spikes. Feline alphacoronaviruses are divided into two different types (Jaimes et al., 2020). Type I feline coronavirus spikes have an optimal furin cleavage site, whereas Type II feline coronavirus spikes lack either a monobasic or polybasic cleavage site at the S1/S2 junction (Fig. 1C). Relative to Type II feline coronavirus spikes, the optimal furin cleavage sites in Type I spikes are included in 17 or 18 amino acid insertions at the S1/S2 junction. The modification adds 2 or 3 predicted O-linked glycosylation sites. Type I feline coronavirus spikes have a monobasic cleavage site at S2’, whereas feline coronavirus spikes have a three amino acid indel at S2’. In some cases (for example strain KUK-HL) the S2’ junction contains a minimal furin cleavage site.

Gammacoronavirus spikes, including that of infectious bursal disease virus, contain optimal furin cleavage sites at the S1/S2 junction (Fig. 1D). Some avian gammacoronavirus spikes have predicted O-linked glycans at the S1/S2 cleavage site. Avian gammacoronavirus spikes display sequence variability at the S2’ cleavage site; most are monobasic, but infectious bursal disease strain Beaudette spike has acquired an optimal furin cleavage site. The S2’ site of Infectious bursal disease virus variant SES_15SK spike provides an example of a predicted O-linked glycosylation site that is associated with a monobasic cleavage site. Because the evolutionary history of gammacoronaviruses is not well-described, it cannot be inferred whether or not evolution of their S1/S2 or S2’ spike cleavage sites has involved insertions or deletions.

Furin cleavage sites have been generated naturally in the spike proteins of members of at least 3 of 4 orthocoronavirus genera, including three betacoronavirus subgenuses, via insertions and/or deletions. Both the S1/S2 and S2’ junctions of coronavirus spike genes are hotspots for RNA recombination. Furthermore, deletions, but not insertions, in the S1/S2 junction appear to arise commonly on serial passage of SARS-CoV-2 in cell culture, and have also been detected as quasi-species in infected humans (Lau et al., 2020; Liu et al., 2020). These observations stand in direct contrast to proponents of theories that SARS-CoV-2 has a nonnatural origin. A recent analysis by one of us (WRG) suggests that this variability is facilitated by short oligonucleotide “breakpoint sequences” that direct recombination to certain positions in the genome (Gallaher, 2020b).

Further evidence for the natural origin of the furin cleavage site in the SARS-CoV-2 spike is the observation that the indels in other coronavirus spikes often encode sequences with a propensity for O-linked glycosylation. As noted previously, computational prediction of O-linked glycosylation sites does not ensure that these sites are utilized by SARS-CoV-2 (Andersen et al., 2020). The sites may be used only in some cell types or species, or only under specific conditions. They may not be utilized at all. This caveat also applies to the current analyses. However, the frequency that the O-glycosylation sites are predicted across spike proteins from various coronavirus genera suggests that their presence in not due to chance. Regulation of polyprotein cleavage by O-linked glycosylation has been documented in model systems (Anderson and Wharton, 2017). Mucin-like domains which contain O-linked glycans are characterized by an abundance of serines, theonines and prolines. These amino acids form turns in protein structures. Turns potentially contribute to accessibility of furin cleavage sites, and therefore may be under positive selection.

The pattern of proline and serine/threonine residues in or near insertions of the polybasic residues, as noted in SARS-CoV-2, MERS-CoV, HKU1 and Type I feline coronavirus spikes, has not to our knowledge been previously discussed. With notable exceptions, including a predicted mucin-like patch in the carboxy terminal domain of embecovirus spikes, predicted O-glycans are rare in coronavirus spikes other than near cleavage sites. While computational algorithms such as Net-O-Glyc are capable of determining sites that are likely to be O-glycosylated, this pattern would not have been obvious to anyone constructing SARS-CoV-2 in a laboratory, either for gain-of-function research or nefarious purposes.

Previously one of us (WRG) presented additional strong evidence that the furin cleavage site insertion in SARS-CoV2 was generated via a natural process (Gallaher, 2020c). Although the 12 base insertion preserves the reading frame, the insertion is out-of-frame. It is highly implausible that any scientist attempting to insert a furin cleavage site would do so by making an out-of-frame insertion. Previous studies that introduced furin cleavage sites in SARS-CoV and MERS-CoV spike genes did not introduce insertions (Follis et al., 2006; Yang et al., 2015). Nor is it likely that any laboratorian would have engineered a change in the SARS-CoV-2 spike that purposefully resulted in prediction of O-linked glycan sites.

We are grateful to Kristian G. Andersen, Edward C. Holmes and Andrew Rambaut for essential input and discussion. Work on emerging viruses in the Garry Laboratory is supported by the National Institutes of Health, the Coalition for Epidemic Preparedness Innovations, the Burroughs Wellcome Fund, the Wellcome Trust, the Center for Disease Prevention and Control, and the European & Developing Countries Clinical Trials Partnership.

Andersen, K.G., Rambaut, A., Lipkin, W.I., Holmes, E.C., and Garry, R.F. (2020). The proximal origin of SARS-CoV-2. Nat Med 26, 450-452.

Anderson, E.N., and Wharton, K.A. (2017). Alternative cleavage of the bone morphogenetic protein (BMP), Gbb, produces ligands with distinct developmental functions and receptor preferences. J Biol Chem 292, 19160-19178.

Belouzard, S., Chu, V.C., and Whittaker, G.R. (2009). Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites. Proc Natl Acad Sci U S A 106, 5871-5876.

Boni, M.F., Lemey, P., Jiang, X., Lam, T.T., Perry, B.W., Castoe, T.A., Rambaut, A., and Robertson, D.L. (2020). Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nature microbiology 5, 1408-1417.

Corman, V.M., Muth, D., Niemeyer, D., and Drosten, C. (2018). Hosts and Sources of Endemic Human Coronaviruses. Adv Virus Res 100, 163-188.

Coutard, B., Valle, C., de Lamballerie, X., Canard, B., Seidah, N.G., and Decroly, E. (2020). The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade. Antiviral Res 176, 104742.

Follis, K.E., York, J., and Nunberg, J.H. (2006). Furin cleavage of the SARS coronavirus spike glycoprotein enhances cell–cell fusion but does not affect virion entry. Virology 350, 358-369.

Gallaher, W. (2020a). Analysis of Wuhan Coronavirus: Deja Vu. Analysis of Wuhan Coronavirus: Deja Vu.

Gallaher, W.R. (2020b). A palindromic RNA sequence as a common breakpoint contributor to copy-choice recombination in SARS-COV-2. Arch Virol 165, 2341-2348.

Gallaher, W.R. (2020c). Tackling rumors of a suspicious origin of nCoV19. Tackling Rumors of a Suspicious Origin of nCoV2019.

Jaimes, J.A., Millet, J.K., Stout, A.E., André, N.M., and Whittaker, G.R. (2020). A Tale of Two Viruses: The Distinct Spike Glycoproteins of Feline Coronaviruses. Viruses 12.

Lau, S.K., Li, K.S., Tsang, A.K., Lam, C.S., Ahmed, S., Chen, H., Chan, K.H., Woo, P.C., and Yuen, K.Y. (2013). Genetic characterization of Betacoronavirus lineage C viruses in bats reveals marked sequence divergence in the spike protein of pipistrellus bat coronavirus HKU5 in Japanese pipistrelle: implications for the origin of the novel Middle East respiratory syndrome coronavirus. J Virol 87, 8638-8650.

Lau, S.K., Woo, P.C., Li, K.S., Tsang, A.K., Fan, R.Y., Luk, H.K., Cai, J.P., Chan, K.H., Zheng, B.J., Wang, M., et al. (2015). Discovery of a novel coronavirus, China Rattus coronavirus HKU24, from Norway rats supports the murine origin of Betacoronavirus 1 and has implications for the ancestor of Betacoronavirus lineage A. J Virol 89, 3076-3092.

Lau, S.Y., Wang, P., Mok, B.W., Zhang, A.J., Chu, H., Lee, A.C., Deng, S., Chen, P., Chan, K.H., Song, W., et al. (2020). Attenuated SARS-CoV-2 variants with deletions at the S1/S2 junction. Emerging microbes & infections 9, 837-842.

Liu, Z., Zheng, H., Lin, H., Li, M., Yuan, R., Peng, J., Xiong, Q., Sun, J., Li, B., Wu, J., et al. (2020). Identification of Common Deletions in the Spike Protein of Severe Acute Respiratory Syndrome Coronavirus 2. J Virol 94.

Millet, J.K., and Whittaker, G.R. (2014). Host cell entry of Middle East respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein. Proc Natl Acad Sci U S A 111, 15214-15219.

Relman, D.A. (2020). Opinion: To stop the next pandemic, we need to unravel the origins of COVID-19. Proc Natl Acad Sci U S A.

Segreto, R., and Deigin, Y. (2020). The genetic structure of SARS-CoV-2 does not rule out a laboratory origin: SARS-COV-2 chimeric structure and furin cleavage site might be the result of genetic manipulation. BioEssays : news and reviews in molecular, cellular and developmental biology, e2000240.

Seyran, M., Pizzol, D., Adadi, P., El-Aziz, T.M.A., Hassan, S.S., Soares, A., Kandimalla, R., Lundstrom, K., Tambuwala, M., Aljabali, A.A.A., et al. (2020). Questions concerning the proximal origin of SARS-CoV-2. J Med Virol.

Sirotkin, K., and Sirotkin, D. (2020). Might SARS-CoV-2 Have Arisen via Serial Passage through an Animal Host or Cell Culture?: A potential explanation for much of the novel coronavirus’ distinctive genome. BioEssays : news and reviews in molecular, cellular and developmental biology 42, e2000091.

Steentoft, C., Vakhrushev, S.Y., Joshi, H.J., Kong, Y., Vester-Christensen, M.B., Schjoldager, K.T., Lavrsen, K., Dabelsteen, S., Pedersen, N.B., Marcos-Silva, L., et al. (2013). Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. The EMBO journal 32, 1478-1488.

International Committee for the Taxonomy of Viruses (2020). The ICTV Report on Virus Classification and Taxon Nomenclature.

Yang, Y., Liu, C., Du, L., Jiang, S., Shi, Z., Baric, R.S., and Li, F. (2015). Two Mutations Were Critical for Bat-to-Human Transmission of Middle East Respiratory Syndrome Coronavirus. J Virol 89, 9119-9123.

Zhou, H., Chen, X., Hu, T., Li, J., Song, H., Liu, Y., Wang, P., Liu, D., Yang, J., Holmes, E.C., et al. (2020). A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein. Curr Biol 30, 2196-2203.e2193.


This is great stuff Bob. Also, now we know how critically important the cleavage site is to SARS-CoV-2 pathogenesis, transmission, and it’s ability to infect a wide range of cells and tissues compared to, say, SARS-CoV, a virus with this particular site was plausibly much more likely to be able to cause a pandemic than one without it. This is especially true, since we know that these cleavage sites likely play a role in expanding the host range of coronaviruses - e.g., I.e., a virus with this site is more likely to jump species.

So with all this additional knowledge in mind - polybasic cleavage sites are frequent across the coronavirus family, they increase host range, and they’re important for expanding host tropism - a SARS-like coronavirus with a polybasic cleavage site was almost inevitable to emerge in the human population and cause a pandemic at some point. I.e., the likelihood that the first known pandemic of a SARS-like coronavirus would be caused by a virus with a polybasic cleavage site is much higher than one being caused by a virus without (e.g., SARS-CoV).

Nice commentary from both of you. I fully agree with all of the above. It’s notable perhaps that all three betacoronaviruses that have swept/are sweeping through the human population (OC43, HKU1, SARS-CoV-2) have a polybasic cleavage site and are of zoonotic origin. Compelling (if limited by a small sample size) that capacity for furin cleavage may be important for successful host-switches. BetaCoV1 in particular is notoriously promiscuous among hosts. To be fair, the two human alphacoronaviruses don’t have such sites, so perhaps not strictly necessary in all contexts. As far as I can tell spike cleavage of SADSr-CoVs hasn’t been experimentally verified - S1/S2 for these has a perfectly conserved “VRRM” sequence.

We are perhaps lucky a MERSr-CoV that binds a receptor abundant in the upper airway hasn’t emerged.

1 Like

Thanks to Gary R. Whittaker at Cornell, who has done some of the very best work on coronavirus spike processing over the years, for pointing out that there is an error in the FIPV79 S2’ sequence - which should be as below:


This error doesn’t affect the conclusions. Rather, it demonstrates an extra indel and additional variability near the S2’ junction.

We’ll correct this in a future publication.

Thank you Gary!