Early appearance of two distinct genomic lineages of SARS-CoV-2 in different Wuhan wildlife markets suggests SARS-CoV-2 has a natural origin

Early appearance of two distinct genomic lineages of SARS-CoV-2 in different Wuhan wildlife markets suggests SARS-CoV-2 has a natural origin
Robert F. Garry1,2
1Department of Microbiology and Immunology, Tulane University Medical Center, 1430 Tulane Avenue, New Orleans, Louisiana 70112 USA; E-Mail: rfgarry@tulane.edu
2Zalgen Labs, LLC, Germantown, MD, USA

Our previous commentary on the Proximal Origins of SARS-CoV-2 (Andersen et al., 2020) concluded that, “SARS-CoV-2 is not a laboratory construct or a purposefully manipulated virus." The possibility of a laboratory release or Lab Leak was also considered, but it was determined that a natural origin of SARS-CoV-2 is much more likely. Recently, a team of scientists under the auspices of the World Health Organization (WHO) reached similar conclusions (WHO, 2021). Other groups have speculated that SARS-CoV-2 may be the result of undisclosed research on SARS-CoV-2 or a close progenitor and accidental release (Relman, 2020; Butler et al., 2021). It has also been suggested that SARS-CoV-2 may be the product of laboratory passage of a progenitor virus in cell culture or animals or Gain-of-Function research (Relman, 2020; Segreto and Deigin, 2020; Sirotkin and Sirotkin, 2020, Yan et al., 2020; Zhan et al., 2020). New data presented by the WHO study provides clear findings in support of the natural origin of SARS-CoV-2.

The study of SARS-CoV-2 origins conducted by the WHO team provided important new data regarding the role of wildlife markets in the emergence of SARS-CoV-2. The large Huanan seafood market, which also sold wildlife and wildlife products, was a focus of attention because it was linked to the majority of early cases of COVID-19 in Wuhan. The WHO report documented that early cases were not only linked to the Huanan market, but that other early cases were linked to different markets that sold wildlife or wildlife products. Among the first 168 diagnosed cases of COVID-19 in Wuhan with onset date prior to December 31, 2019 and a known history of exposure to wildlife markets 55.4% (93/168) reported such exposures. Of the 168 cases, 28% (47/168) had only been to the Huanan market, 22% (38/168) had exposure to another wildlife market and 4.7% (8/168) had exposure to the Huanan market and another market (Annex E2, Table 1 of WHO, 2021).

The genetic lineages of SARS-CoV-2 associated with early cases in Wuhan were documented in the WHO report. Previously, Rambaut et al, (2020) noted that at the root of the phylogeny of SARS-CoV-2 are two lineages designated lineage A and B. Early lineage A viruses include SARS-Cov-2 isolate EPI_ISL_529213 sampled on 30Dec19 from a person linked to a wildlife market different than the Huanan market (Molecular Epidemiology Table 7, sample 13 in WHO, 2021). Linage A viruses share two nucleotides (T8,782 in ORF1ab and C28,144 in ORF8) with the bat viruses RaTG13 and RmYN02 and other sarbecoviruses. It is likely that the most recent common ancestor (MRCA) of SARS-CoV-2 shares the same genome sequence as these early lineage A sequences (Rambaut et al., 2020). Different nucleotides (C8,782 in ORF1ab and T28,144 in ORF8) are present at those sites in viruses assigned to lineage B, such as SARS-CoV-2 isolate Wuhan-Hu-1 (GenBank accession no. MN908947, Molecular Epidemiology Table 7, sample 06 in WHO, 2021) sampled from the Huanan market on 30Dec19. All virus positive samples from the Huanan market, from venders or customers of the market or from environmental samples, contained SARS-CoV-2 of lineage B. There was limited genetic diversity in the lineage B samples from the Huanan market, which is consistent with the market as a site of a super-spreader event. Lineage A and Lineage B viruses spread throughout Wuhan and to other countries (Rambaut et al., 2020; Worobey et al., 2020).

Competing hypotheses have been put forward to explain the emergence of SARS-CoV-2. Natural spillover directly from a bat or via an intermediate animal host could have occurred via several different scenarios (Fig. 1). Likewise, several variations of the Lab Leak hypothesis have been proposed (Fig. 2). A dispassionate science-based discourse on the topic of the origin of SARS-CoV-2 must account for this new data revealed by the WHO study showing: 1. multiple markets were linked to the early cases, and 2. divergence of SARS-CoV-2into lineages A and B was an early occurrence. These facts are represented by yellow boxes in each figure.

Figure 1. Natural scenarios of SARS-CoV-2 origin.

Natural spillover directly from a bat or via an intermediate host may have involved trade in wildlife susceptible to SARS-CoV-2, either trapping and hunting of wildlife in nature or farming of “wildlife species.” There is now clear evidence that SARS-CoV-2 is capable of effective spread, not only in humans, but to a diverse group of mammals (Garry, 2020). The multi-market aspect of the early outbreak can be explained by distribution of SARS-CoV-2 infected animals to more than one market (Fig. 1). In Natural scenarios diversification of SARS-CoV-2 to lineages A and B could have occurred prior to the distribution, either at a wildlife farm or during transport of the animals to the markets. It is possible that humans involved in the wildlife trade were also infected and involved in this pathway.

Figure 2. Lab Leak scenarios of SARS-CoV-2 origin.

Proponents of Lab Leak theories speculate that SARS-CoV-2 or a close progenitor was present in the Wuhan Institute of Virology or another Wuhan virology lab before the start of the COVID-19 pandemic and the appearance of the first cases in Wuhan (Fig. 2). It has been proposed that SARS-CoV-2 was released either via an infected laboratory worker, an escaped lab animal or via waste disposal (Relman 2020; Butler et al., 2021). Pangolins carry and appear to be naturally infected with at least two lineages of sarbecoviruses (Liu et al., 2019; Lam et al., 2020). The Guangdong strain of pangolin coronavirus carries a receptor binding domain (RBD) that is highly similar to the RBD of SARS-CoV-2 (Lam et al., 2020, Andersen et al., 2020). Some proponents of the Lab Leak theory have speculated the RBD of GD pangolin was recombined via genetic engineering with the backbone of an undisclosed sarbecovirus to produce SARS-CoV-2. Alternatively, it has been suggested that an undisclosed close progenitor of SARS-CoV-2 was passaged on human cells or experimental animals (such a humanized mice) in gain-of-function type experiments to adapt for human replication (Relman, 2020, Segreto and Deigin, 2020; Sirotkin and Sirotkin, 2020). The direct role of the Chinese military in conducting classified research on sarbecoviruses at the WIV has also been proposed (United States, 2021). In Lab Leak scenarios diversification of SARS-CoV-2 to its two separate early lineages A and B would have had to occur in the laboratory setting. Lab Leak scenarios must also account for the fact that the majority of early cases were associated with different wildlife markets in Wuhan.

Lab Leak scenarios are inconsistent with several established facts regarding the origin of SARS-CoV-2. The majority of early cases were linked to different markets that sold wildlife or wildlife products in Wuhan. All theories of the origin of SARS-CoV-2 must account for the linkage to different market engaged in wildlife trade. Theories on SARS-CoV-2 must also account for the fact that two distinct lineages of SARS-CoV-2 were distributed at different Wuhan wildlife markets. Scenarios where an infected laboratory worker, an escaped lab animal or faulty waste disposal spread not one but two lineages of SARS-CoV-2 specifically to different wildlife markets are difficult to rationalize.

The original SARS-CoV outbreaks in 2002-2004 were linked to the wildlife trade (Guan et al., 2003). In contrast to Lab Leak theories, linkage of the origin of SARS-CoV-2 to wildlife or the wildlife trade provides several plausible scenarios for the appearance of SARS-CoV-2 at different wildlife markets. It fully accounts for the fact that the majority of early Wuhan COVID-19 cases were linked to different wildlife markets in a straightforward manner. It also provides simple explanations for the fact that two different lineages of SARS-CoV-2 were linked to markets. In one possible scenario divergence of SARS-CoV-2 to lineages A and B occurred prior to the transport of infected animals to Wuhan and the infected animals were subsequently distributed to different wildlife markets.

Proponents of the Lab Leak theory will point out that none of the animals sold at the Huanan Market tested positive for SARS-CoV-2 (WHO, 2021). Although mentioned in the WHO Report, but not discussed in detail, several independent sources indicate that wildlife species susceptible to SARS-CoV-2, including civets and raccoon dogs, were sold at the Huanan market (Stout, 2020; Yee, 2020; Zhang and Holmes, 2020). Similar species were likely to have been available to purchase at other wildlife markets in Wuhan. Certain species of animals may have been removed after the appearance of the first COVID-19 cases and the linkage of COVID-19 cases to the market, but prior to the closure of the Market on January 1, 2020. It should also be noted that environmental samples that did test positive were associated with the portion of the market where wildlife or wildlife products were sold. A temporal analysis of the early human cases at the Huanan market confirms the pattern of spread from the areas of the market where wildlife products were sold to other parts of the market. (WHO, 2021).

Hybrids of natural and Lab Leak scenarios have been suggested (Relman, 2020; Baker, 2020). Some variations on a hybrid scenario suggest that SARS-CoV-2 is a natural virus that infected a scientist while doing field work resulting in mildly symptomatic or asymptomatic spread or that after being brought back to a laboratory SARS-CoV-2 was released unknowingly, but not successfully cultured or otherwise manipulated. Compared to the millions of worldwide encounters of humans with wildlife, including the trapping of bats for food, the number of high-risk exposures of scientists doing field or laboratory work with samples from wildlife is miniscule. There are no documented cases of laboratory infections with previously unknown, but pathogenic, viruses. Thus, hybrid scenarios are of very low probability. They also fail to explain how two lineages of SARS-CoV-2 came to be distributed at the different wildlife markets.

New data compiled by the WHO team regarding the presence of distinct lineages of SARS-CoV-2 in different Wuhan wildlife markets are inconsistent with a laboratory-based origin of SARS-CoV-2. No data or other evidence has emerged in support of the Lab Leak theory. In contrast, the WHO report significantly adds to the large volume of epidemiological and genomic data that support emergence of SARS-CoV-2 from a zoonotic reservoir, either wildlife or farmed animals.

The important work of the WHO-convened Global Study of Origins of SARS-CoV-2 team is gratefully acknowledged. Kristian G. Andersen, Edward C. Holmes, Andrew Rambaut and William R. Gallaher provided essential input and discussion. Work on emerging viruses in the Garry Laboratory is supported by the National Institutes of Health, the Coalition for Epidemic Preparedness Innovations, the Burroughs Wellcome Fund, the Wellcome Trust, the Center for Disease Prevention and Control, and the European & Developing Countries Clinical Trials Partnership.

Andersen KG, Rambaut A, Lipkin WI, Holmes EC and Garry RF (2020). The proximal origin of SARS-CoV-2. Nat Med 26:450-452.

Baker N. (2021). The Lab-Leak hypothesis. Did the Coronavirus Escape From a Lab?

Butler CD, Canard B, Cap H, Chan YA, Jean-Michel Claverie J-M, Colombo F, Courtier V, de Ribera FA, Decroly E, Maistre R, Demaneuf G, Ebright RH, Goffinet A, Graner F, Halloy J, Leitenberg M, Lentzos F, McFarlane R, Metzl J, Petrovsky N, Quay S, Rahalkar MC, Segreto R, Theißen G, van Helden J. (2021). Call for a full and unrestricted international forensic investigation into the origins of COVID-19. Investigation into Covid Origins Sought - The New York Times

Garry RF (2020). Mutations arising in SARS-CoV-2 spike on sustained human-to-human transmission and human-to-animal passage

Guan Y, Zheng BJ, He YQ, Liu XL, Zhuang ZX, Cheung CL, Luo SW, Li PH, Zhang LJ, Guan YJ, Butt KM, Wong KL, Chan KW, Lim W, Shortridge KF, Yuen KY, Peiris JS, Poon LL. (2003). Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China. Science 302:276-8.

Lam TT, Jia N, Zhang YW, Shum MH, Jiang JF, Zhu HC, et al. (2020). Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Nature 583:282-5.

Liu P, Chen W and Chen, JP. (2019). Viral metagenomics revealed Sendai virus and coronavirus infection of Malayan pangolins (Manis javanica). Viruses 11:979.

Rambaut A, Holmes EC, O’Toole et al. (2020). A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol 5:1403–1407.

Segreto R and Deigin Y. (2020). The genetic structure of SARS-CoV-2 does not rule out a laboratory origin: SARS-COV-2 chimeric structure and furin cleavage site might be the result of genetic manipulation. BioEssays : news and reviews in molecular, cellular and developmental biology, e2000240.

Sirotkin K. and Sirotkin D. (2020). Might SARS-CoV-2 have arisen via serial passage through an animal host or cell culture?: A potential explanation for much of the novel coronavirus’ distinctive genome. BioEssays : news and reviews in molecular, cellular and developmental biology 42, e2000091

Stout KL (2020). “Wuhan SARS”: Tracing the origin of the new virus to China’s wild animal markets. https://www.youtube.com/watch?v=Je0_U2ym_r0.

United States (Department of State). (2021). Fact Sheet: Activity at the Wuhan Institute of Virology. Fact Sheet: Activity at the Wuhan Institute of Virology - United States Department of State.

WHO, 2021. “WHO-convened global study of origins of SARS- CoV-2: China part”; www.who.int/publications/i/item/who-convened-global-study-of-origins-of-sars-cov-2-china-part.

Worobey M, Pekar J, Larsen BB, Nelson MI, Hill V, Joy JB, Rambaut A, Suchard MA, Wertheim JO, Lemey P. (2020). The emergence of SARS-CoV-2 in Europe and North America. Science 370:564-570.

Yan L-M, Kang S, Guan J and Hu S. (2020). Unusual features of the SARS-CoV-2 genome suggesting sophisticated laboratory modification rather than natural evolution and delineation of its probable synthetic route. https://zenodo.org/record/4028830#.YBQc_3dKg0_.

Yee J. (2020). Bizarre Wuhan wet market menu shows over 100 wild animals sold as food.
Bizarre Wuhan Wet Market Menu Shows Over 100 Wild Animals Sold As Food, Link With Virus Unclear.

Zhan S, Deverman B, and Chan Y. (2020). SARS-CoV-2 is well adapted for humans. What does this mean for re-emergence? https://doi.org/10.1101/2020.05.01.073262.

Zhang YZ and Holmes EC. (2020). A Genomic Perspective on the Origin and Emergence of SARS-CoV-2. Cell 181:223-227.

1 Like

Thank you for this very interesting and necessary discussion about the origin of SARS-CoV-2.
I think that an alternative scenario to explain the presence of ancestral viral lineage A in other wildlife markets (OWLM) and of lineage B in the Huanan seafood and wildlife market (HSWLM) is that the outbreak in the HSWLM was initiated by an infected human and not by an infected animal. In this scenario, a wild animal infected with lineage A was transported to a OWLM where the first spillover to humans occurred. Then, intra-host evolution from lineage A to lineage B may have occurred in an individual that probably work at or that frequently visited different wildlife markets and this individual initiate the secondary SARS-CoV-2 lineage B outbreak in the HSWLM. This scenario may explain why none of the animals sold at the HSWLM tested positive for SARS-CoV-2, because transmission was initiated by humans that mainly circulated where wildlife or wildlife products are sold.


Response on the Origin of SARS-CoV-2

Those who know of the longtime proximity and collaboration between Bob Garry and myself will not find it surprising that I concur with the above post. Indeed, I publicly endorsed the “natural origin” hypothesis for SARS-CoV-2 at midnight Feb 6, 2020, two weeks before the Andersen et al. analysis appeared.

The reader will be surprised that, given the long collaborative history of Bob and myself, I will now publicly correct him.

Just prior to the Discussion in the foregoing post, Garry states that : “The Guangdong (GD) strain of pangolin coronavirus carries a receptor binding domain (RBD) that is highly similar to the RBD of SARS-CoV-2 .“ He cites references that deal only with the amino acid sequences of the pangolin and SARS-CoV-2 isolates. At the amino acid level, this statement is true.

However, as Boni et al (1) alluded to in passing, and I detailed later (2), this is far from true at the RNA level. Over a relevant span of 268 nucleotides, the GD pangolin and SARS-CoV-2 RNA sequences differ by 10.4%, virtually all of that difference in an accumulation of synonymous wobble-base substitutions. This degree of difference indicates several decades of evolution, and is completely incompatible with the sequence found in pangolins being the proximal source of the sequence found in the RBD of SARS-CoV-2.

SARS-CoV-2 is a mosaic derived from distinct bat coronavirus lineages. The proximal sources of these mosaic segments are far from identical to any known viral isolates, but rather to inferred second or third cousin divergent RNA sequences with common ancestors dating back decades.

In addition to the RBD just discussed, I also detailed (3) the dissimilar region in orf1A that follows the acidic region of nsp3. From nt 3059 to nt3335, the RNA sequences of SARS-CoV-2 and BatRaTG13 differ by 9.3% and the corresponding amino acid sequence by 18.1%. The sequence in SARS-CoV-2 bears no significant resemblance to any known sampled source.

One can create suspicion by talking about amino acids, but the proof must account for the divergence of RNA that takes a very long time. After sixteen months of evolution in humans, isolates from April 2021 differ by about 0.1% from the December 2019 Hu-1 reference strain, roughly 30 out of 30,000 nucleotides.

My second correction relates to the ongoing speculation about animal intermediates. There are zero data that any animal but a bat served as a host to SARS-CoV-2 prior to its introduction into humans. The 2012 Tongguan mine outbreak in six miners in Yunnan province, of a COVID like pneumonia with thromboembolism, that killed three of the miners, is ample precedent for direct infection of humans from bats (3). The closest known relative to SARS-CoV-2, Bat RaTG13, was isolated from that same mineshaft the following year. That mine is now closed, and referred to by locals as the “mine of death”. That sounds like a “hot zone” to me.

My final correction to my good friend concerns hypothesis vs. theory. In common parlance the “hypothetical” and “theoretical” tend to be used interchangeably. However, in science there is a very clear distinction. A “Theory” arises to that status only when there is a substantial body of evidence that supports what was previously regarded as a hypothesis. As in, Theory of Evolution, Germ Theory of Disease, or Tectonic Plate Theory. Even the Fusion Peptide is still only a hypothesis. So, we should not be throwing around the designation of Theory for hypotheses with zero to little supportive factual information to support them.

The bedrock of science must continue to be data and the RNA code.

William R. Gallaher, Ph.D.


  1. Boni MF, Lemey P, Jiang X, Lam TT, Perry BW, Castoe TA, Rambaut A, Robertson DL. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat Microbiol. 2020 Nov;5(11):1408-1417. doi: 10.1038/s41564-020-0771-4. Epub 2020 Jul 28. PMID: 32724171.

  2. Gallaher WR. A palindromic RNA sequence as a common breakpoint contributor to copy-choice recombination in SARS-COV-2. Arch Virol. 2020 Oct;165(10):2341-2348. doi: 10.1007/s00705-020-04750-z. Epub 2020 Jul 31. PMID: 32737584; PMCID: PMC7394270.

  3. Rahalkar MC, Bahulikar RA. Lethal Pneumonia Cases in Mojiang Miners (2012) and the Mineshaft Could Provide Important Clues to the Origin of SARS-CoV-2. Front Public Health. 2020 Oct 20;8:581569. doi: 10.3389/fpubh.2020.581569. PMID: 33194988; PMCID: PMC7606707.

1 Like

I was wondering if you could expand upon your reasoning for a few of the following statements, as I find the statements to be potentially problematic, or I do not understand how the conclusion was reached.

#1) “In Lab Leak scenarios diversification of SARS-CoV-2 to its two separate early lineages A and B would have had to occur in the laboratory setting.”
Why would the split into lineage A and B have had to occur in the laboratory setting? Why could not the virus circulate in humans after the initial “leak”, and diversify into 2 lineages before infected people reach the various markets where ideal super spreader conditions were found? It is worth noting that the differences between lineage A and B were very small in early December 2019.

#2) “Lab Leak scenarios must also account for the fact that the majority of early cases were associated with different wildlife markets in Wuhan.”
While not false, I think there isn’t much to account for, and this statement may reflect “target fixation”. While many cases did have links to markets, we could also probably find links between early cases and using cell phones, with 100% of the cases having a link with eating food. These links don’t need to be accounted for. The markets were good places for a virus to spread, and people, infected or not, visit those markets.

#3) “The multi-market aspect of the early outbreak can be explained by distribution of SARS-CoV-2 infected animals to more than one market”
This seems unlikely to me, as it implies multiple independent animal-human outbreaks, from multiple infected animals, when we have yet to detect evidence of any animal infected with early SARS-CoV-2. This explanation requires multiple near simultaneous sources of infection, or a single source that went to multiple markets, only in Wuhan, but the infection was missed by the search for infected animals of farms in the Hubei province. If the source came from outside of Hugbei, it seems unlikely that multiple markets in Wuhan received infected animals, but no other cities did.

Overall, my take away from the presence of links to multiple markets, not just the main seafood market neither suggests that the virus did or did not have a natural origin. It suggests that there was cryptic spread in humans in Wuhan, that was noticed after super-spreading at markets.

As noted with the main Wuhan seafood market where lineage B spread, it was probably just an early super-spreader event. What evidence is there that the cases associated with other markets were anything but similar h2h spreader events, as it is established that these markets are an effective place for h2h transmission?

It seems that we have no reason to think that the cases linked to other markets are any different. It seems unlikely that there would be multiple near simultaneous animal-human spillovers at different markets unless they all came from the same source.
However, a single source it seems like it should be easy to identify by cross-referencing a list of market suppliers, and identifying what supplier the markets had in common. In such case, the identification of a suplier with apparently a rampant infection within the animals should also be linked with earlier cases at the source, outside of Wuhan.

In my view, I haven’t seen anything in the association between early confirmed cases and the markets that suggests anything about the origin of SARS-CoV-2, other than that the time of the origin had to be prior to December 2.