Response to “On the origin and continuing evolution of SARS-CoV-2”

Response to “On the origin and continuing evolution of SARS-CoV-2”

Oscar A. MacLean*, Richard Orton, Joshua B. Singer, David L. Robertson.
MRC-University of Glasgow Centre for Virus Research (CVR).

*To whom correspondence should be addressed: [email protected].

An analysis of genetic data from the ongoing COVID-19 outbreak was recently published in the journal National Science Review by Tang et al. (2020). Two of the key claims made by this paper appear to have been reached by misunderstanding and over-interpretation of the SARS-CoV-2 data, with an additional analysis suffering from methodological limitations.

The first claim
This criticism concerns the claim that there are two clearly definable “major types” of SARS-CoV2 in this outbreak and that they have differentiable transmission rates.

Tang et al. term these two types L and S type: “two major types (L and S types): the S type is ancestral, and the L type evolved from S type. Intriguingly, the S and L types can be clearly defined by just two tightly linked SNPs at positions 8,782 (orf1ab: T8517C, synonymous) and 28,144 (ORF8: C251T, S84L).”

One nonsynonymous mutation, which has not been assessed for functional significance, is not sufficient to define a distinct “type” nor “major type”. As of 2nd March 2020, there are 111 nonsynonymous mutations that have been identified in the outbreak, these have been catalogued here in the CoV-GLUE resource and can be visualised in Figure 1. At current, there is no evidence that any of these 111 mutations have any significance in a functional context of within-host infections or transmission rates. Additionally, when you choose to define “types” purely on the basis of two mutations, it is not intriguing that these “types” then differ by those two mutations.

Figure 1. A visualisation of the 111 nonsynonymous mutations (red) observed to date in the COVID-19 outbreak by plotting a grid of mutations where each column is a sample and each row is one of the observed mutations in the phylogeny. The columns are ordered by the position of each sample in the phylogeny. Synonymous mutations are shown in yellow. The C251T (nonsynonymous) and T8517C (synonymous) mutations are visible on the right side of the plot.

However, they further claim that these two types have differing transmission rates: “Thus far, we found that, although the L type is derived from the S type, L (~70%) is more prevalent than S (~30%) among the sequenced SARS-CoV-2 genomes we examined. This pattern suggests that L has a higher transmission rate than the S type.” The abstract of the paper goes even further, stating outright that: “the S type, which is evolutionarily older and less aggressive…”

It is, however, important to appreciate that finding a majority of samples with a particular mutation is not evidence that viruses with that mutation transmit more readily. To make this claim would, at very minimum, require a comparison to be made to expectations under a null distribution assuming equal transmission rates. As this has not been performed by the authors, we believe there is insufficient evidence to make this suggestion, and therefore it is incorrect (and irresponsible) to state that there is any difference in transmission rates. Differences in the observed numbers of samples with and without this mutation are far more likely to be due to stochastic epidemiological effects.

Basic evolutionary theory predicts that selectively neutral mutations change in frequency over time through the process of genetic drift. In a viral outbreak, each transmission event from one infected person to another is a random probabilistic event, with some infected individuals transmitting more or less often than others. People may transmit at higher rates than others for a variety of reasons, e.g., because they cough onto their palms and use overcrowded public transport, or just because their friends and coworkers got lucky (or unlucky!). These small-scale epidemiological phenomena add up over time to create substantial variation in the frequencies of mutations observed during an outbreak.

Additionally, when a virus spreads to a new area/country that was previously uninfected, a founder effect can occur. As a small number of virus copies rapidly spread into an epidemic, any mutations in the initial viral infections will rapidly become very common, even if they were initially rare in the country that seeded the transmission. This is particularly likely to be the case in an outbreak caused by a novel virus such as COVID-19 as there are a large number of susceptible hosts for the virus. These founder effects have been observed in previous studies of viral outbreaks (e.g., Foley et al. 2004; Rai et al. 2010; Tsetsarkin et al. 2011). Combined, these factors mean that the frequency of a particular mutation in and of itself is not suggestive of any functional significance. Evidence from the widespread media uptake (35 articles at last count), and many comments on social media in response to this article, suggests that the unsupported claims made by Tang et al. have already spread undue fear.

It’s also important to appreciate that the smaller the population of viruses is, the more these small scale variations are likely to affect the frequency of mutations (in the same way that the more coins you flip, the closer to the 0.5 heads average you expect to be). Given that this mutation appears to have occurred very early on in the outbreak, when fewer individuals were infected, it’s frequency will very likely have been particularly influenced by genetic drift.

The second claim
Tang et al. compare the frequencies of nonsynonymous and synonymous mutations in the data, claiming that there is significant evidence of selection suppressing the frequency of nonsynonymous mutations in the outbreak. This analysis is flawed on three grounds:

(1) The numbers in this figure do not make sense. According to the presented data, seven (synonymous) mutations have a derived frequency of >50%, and two of these mutations have derived frequencies greater than 95% in the population. A cursory glance at the tree (Figure 2; taken from Nextstrain) shows that this cannot be true. “Derived” in this context should mean since the last common ancestor of the outbreak. For two mutations to have derived frequencies greater than 95%, there would need to be a small number of samples which branch as a sister lineage to the rest of the outbreak tree. However, this is not the case.

Figure 2. A screenshot of the SARS-CoV-2 time tree phylogeny from NextStrain. Colours indicate geographic location of sample. Date of sampling is shown below the tree.

The only way Tang et al. can get the results they present is by defining the ancestral state as being at some point way back in the bat coronavirus tree before the outbreak began. They then estimate the ancestral state for each mutation independently, ignoring the very informative tree of the current outbreak. This method only makes sense when using a much more closely related outgroup species, to infer the ancestral states of mutations in a freely recombinant species with unlinked mutations with independent ancestry. Whereas the most recent common ancestor of SARS-CoV-2 and the nearest bat sarbecovirus is shared many decades ago (discussed on Virological here). Additionally, such methods should incorporate the inherent uncertainty in inferring the ancestral state (e.g., est-sfs; Keightley and Jackson 2018), which Tang’s implementation does not.

Implementing this method of inferring ancestral states in a viral context, where we assume there is no recombination, means that “high frequency derived mutations” are actually just new mutations in the outbreak that have mutated back to the inferred ancestral state (in bats). This is a completely meaningless definition of “derived”. These high frequency derived mutations should instead be classed as low frequency derived mutations.

Tang et al. claim 16.3% of (7 out of 43) synonymous mutations have a derived frequency >0.5. However, given the levels of synonymous divergence, and remembering that mutations probabilities are biased, which increases the likelihood of back-mutations, this 16.3% figure is broadly in line with the expected proportion of synonymous mutations that would back-mutate to the nucleotide found in bat infecting strains. Because nonsynonymous sites are much less diverged (<4%) than synonymous sites (19%) to the most closely related bat sequence, new nonsynonymous mutations are much more likely to be away from the inferred ancestral state in bats than new synonymous mutations are. Therefore, using this flawed definition of “derived”, a much smaller proportion of nonsynonymous mutations are expected to be high frequency “derived” mutations without any action of natural selection at all.

(2) The way this data has been presented in Tang et al.’s Figure 2 will falsely suggest that purifying selection is acting even if their methodology was sensible, and there were no such selection. The height of the bars in their figure compares the raw numbers of mutations at each frequency without scaling the heights of the bars for the number of each class of mutation. Because there is a greater number of nonsynonymous polymorphisms than synonymous polymorphisms in the population, and as most mutations are expected to be at low frequency (irregardless of the action of natural selection), this presentation will always make it look like there’s proportionately more low frequency nonsynonymous mutations.

(3) When interpreting their results, Tang et al. do not consider that sequencing error could be a driver of a relative excess of singleton nonsynonymous mutations. This possibility is important because sequencing errors will be at low frequency as they are rare and cannot be transmitted, but real mutations can be at any frequency because they can be transmitted. Additionally, purifying selection can only act on real mutations, and not sequencing errors. Therefore it is very possible that sequencing error will have a higher nonsynonymous to synonymous ratio, and these mutations will be at low frequency, which will mimic the action of purifying selection suppressing the frequency of nonsynonymous mutations.

Taken together, Tang’s analysis tells us absolutely nothing about purifying selection within the viral outbreak. We have performed an additional analysis below to test for signatures of purifying selection in the SARS-CoV2 outbreak.

Additional methodological issue
The authors used the software PAML (Yang et al. 2007) to estimate selection parameters. PAML does not allow for synonymous rate variation, but they explicitly state in the paper they believe there are mutational hotspots. Recent work has shown that false positive rates of positive selection inference are unacceptably high when such synonymous rate variation occurs (Wisotsky et al. 2020). Therefore, if there truly is synonymous rate variation, to reliably identify signatures of positive selection within the phylogeny of SARS-CoV2, methods which model mutation rate variation must be used (e.g., provided by many of the models from the Hyphy package).

Given these flaws, we believe that Tang et al. should retract their paper, as the claims made in it are clearly unfounded and risk spreading dangerous misinformation at a crucial time in the outbreak.

Our Additional analysis
To test for potential purifying selection in a simple and robust manner, the number of observed synonymous and nonsynonymous mutations was compared to the null expectation by comparing the relative number of synonymous and nonsynonymous sites. The data for this analysis was taken from the CoV-GLUE resource with four samples removed from the analysis due to concern over their error rates.

The relative number of sites was estimated using the Goldman and Yang (1994) codon model. This model estimates mutation probabilities between all 61 possible coding codons using the observed frequencies of each of the 61 codons weighted by the transition to transversion ratio estimated from the data (2.9). It estimates there are 2.43 times more nonsynonymous than synonymous sites in the SARS-CoV2 genome.

This null expectation under no selection was compared to that observed from the outbreak data using a chi-squared test on the below table. This yielded a non significant P-value of 0.113. This result is not unexpected, as the current rapid growth rate of the viral population is likely to allow viruses with unfit mutations, as well as viruses with neutral mutations to be transmitted. However, we urge caution in over analysing these results, as statistical power is limited until more sequencing data accumulates.

Nonsynonymous mutations Synonymous mutations
Null expectation 119.7 49.2
Observed in outbreak 105 64

Table 1. Neutral null expectation under no selection. Adjusting for the number of sites, the point estimate for the ratio of these classes of mutations (Dn/Ds) is 0.68.

Foley, B., et al. “Apparent founder effect during the early years of the San Francisco HIV type 1 epidemic (1978–1979).” AIDS Research and Human Retroviruses 16.15 (2000): 1463-1469.

Goldman, N., and Ziheng Y… “A codon-based model of nucleotide substitution for protein-coding DNA sequences.” Molecular biology and evolution 11.5 (1994): 725-736.

Keightley, P. D., & Jackson, B. C. “Inferring the probability of the derived vs. the ancestral allelic state at a polymorphic site”. Genetics 209.3 (2018): 897-906.

Rai, Mohammad A., et al. “Evidence for a” Founder Effect" among HIV-infected injection drug users (IDUs) in Pakistan." BMC infectious diseases 10.1 (2010): 7.

Tsetsarkin, Konstantin A., et al. “Chikungunya virus emergence is constrained in Asia by lineage-specific adaptive landscapes.” Proceedings of the National Academy of Sciences 108.19 (2011): 7872-7877.

Wisotsky, Sadie R., et al. “Synonymous site-to-site substitution rate variation dramatically inflates false positive rates of selection analyses: ignore at your own peril.” Molecular Biology and Evolution (2020).

Yang, Ziheng. “PAML 4: phylogenetic analysis by maximum likelihood.” Molecular biology and Evolution 24.8 (2007): 1586-1591.

We would like to thank all the authors who have kindly deposited and shared genome data on GISAID. A table with genome sequence acknowledgments can be found on the CoV-GLUE website.

We thank Joseph Hughes for helpful comments.


In our phylogenetic analyses, “L type” likely formed a monophyletic derived clade.
“L type” in Tang et al. is same to “Group CTC” in our phylogenic tree.

Response to MacLean et al.’s ‘Response to “On the origin and continuing evolution of SARS-CoV-2”’

Jian Lu
State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, 100871, China

To whom correspondence should be addressed: [email protected]

The criticisms by MacLean et al. (Response to “On the origin and continuing evolution of SARS-CoV-2” ) of the Tang et al.’s recent publication (“On the origin and continuing evolution of SARS-CoV-2”) on National Science Review (NSR) will be briefly answered below. Both parties have agreed that full-length exchanges should appear in NSR, which has generously offered to host this very public and open debate.

There are two main criticisms by MacLean et al. First, they argued that S and L types have no functional significance since we did not measure within-host infections. To evolutionary geneticists like ourselves, the “functional” test is the frequency of each mutation in the population. If a mutation has a fitness (or transmission) advantage, it will be much more common than the neutral prediction. Here, we use the synonymous mutations that do not alter protein sequences as the neutral reference.

As shown in Fig. 2 of Tang et al., there is only one nonsynonymous mutation that appears in > 50% of the samples and that is the L mutation. In contrast, there are 7 synonymous mutations. In the complete absence of selection, one expects to see a nonsynonyous : synonymous ratio of ~ 3.5 : 1. In short, while we expect to see 7 x 3.5 = 24.5 nonsynonymous mutations in the population, we see only one, which is the L mutation. It is not clear why MacLean et al. failed to see this simple and central point.

We may then flip the coin and observe the ancestral S variant. It occurs in only one of the 27 samples (3.7%) collected before Jan. 7 (mainly from Wuhan). In contrast, it occurs in 28 of the 73 samples (38.4%) after Jan. 7 (mainly from outside of Wuhan). Again, the difference is startling and highly significant by any statistical test. MacLean et al. claimed that the two patterns can be explained by pure stochastic forces, such as population bottleneck. These finer points will be debated in NSR. We only wish to add one point. Stochastic changes are pronounced only when the population size is small. Even if each infected patient has only ONE viral particle, the effective population size of the virus is not small.

The second criticism of MacLean et al. is that there is no purifying (i.e., negative) selection against non-synonymous mutations in the circulating viral population. We shall again refer to the ratio of mutations at > 50% in frequency. Recall that the expected nonsynonymous: synonymous mutations should be 24.5 : 7 but we observed only 1 : 7. If this 24.5 fold difference is not due to negative selection, what is it then?

This 24.5 fold difference stands in sharp contrast with the test that MacLean et al. used to argue against purifying (their Table 1). The technical errors will be discussed in detail in the formal debate. In particular, because deleterious mutations occur in low frequencies, often as singletons (i.e., one occurrence in all samples), the study of purifying selection has to filter out deleterious mutations that have not been purged yet. Apparently, these authors fail to take into account the fundamental rule of population genetics in their analyses.

As the editor of NSR states in public, the proper place to have a thoughtful debate is on NSR, which will invite experts to review the submissions. Our public statement on this matter prior to the account published by NSR will be limited to the messages here.


While the debate above is important, its fairly academic. What is missing is discussion about the use of “aggressive”, which is not a standard epidemiological term. Any work on the subject of COVID-19 is going to draw attention, so we all need to be very mindful about the public health messaging. The way that I assume that you are using “aggressive” is to imply transmission rate or fitness, as you are only looking at the frequencies of the lineages. Many in the public, however, see “aggressive” and think severity or virulence. It’s this part that is drawing the most attention to your paper and in-sighting unnecessary fear. The press loves this as articles about fear generate more clicks. Now there are common threads all over twitter that suggest if you are infected with the “L” strain, you will be more likely to have severe disease and die. We now have to spend considerable energy undoing this misinformation. Much of the damage, however, is already done. Please think about the messaging around future papers to avoid confusion and unnecessary panic.

Please also address this in your response. It will be very important to have clarification from the authors.

Sorry for the tone, I do not intend any direct offense to you or your co-authors. This is a learning experience for all of us.



1 Like

@ grubaughlab Thanks. The advice is well taken.

The genomic sequence hCoV-19/USA/IL1/2020 was interpreted in the NSR paper as a patient carrying a mixture of L and S types. The patient later infected a family member. The virus isolated from the family member is hCoV-19/USA/IL2/2020, which is the S type. This single case doesn’t support S type being “less aggressive”.

@ fuyutao
Thank you for providing the information about hCoV-19/USA/IL2/2020. We did not talk about his patient in our paper. Would you please tell me in which part of our paper we showed this as evidence to support "S type being less aggressive” ?


Response to Lu’s response to MacLean et al.’s Response to “On the origin and continuing evolution of SARS-CoV-2”

Oscar A. MacLean*, Richard Orton, Joshua B. Singer, David L. Robertson.

MRC-University of Glasgow Centre for Virus Research (CVR).

*To whom correspondence should be addressed: [email protected].

As Jian Lu on behalf of Tang et al. has chosen to post their response here, we see no reason to replicate our critique in the NSR journal. Concerning their response, respectfully to these authors, they are missing our key issues with their manuscript:

As we pointed out in our initial response, there are many epidemiological factors which may drive changes in frequency of a mutation in a global viral outbreak which involve no natural selection. The culmination of these processes drives mutation frequency changes, these changes are fully expected. It is thus meaningless to state as the authors have that “the “functional” test is the frequency of each mutation in the population”.

The claim that a difference in S and L frequencies across time would be “significant by any measure” is the key issue here. For example, a simple Fisher’s exact test (which was used in Tang et al’s original paper) measures the probability of seeing an observed difference in frequencies across countries or time, if the underlying frequencies were identical. This test does not assess if those mutation frequency changes observed across the time period are driven by selection or a null model under genetic drift. The question that is being addressed is not “is there a true difference in frequencies” (stochastic processes will cause changes in the frequency of mutations, which will likely drive significant Fisher’s exact tests), rather it’s whether this change in frequency over time is distinct from that under a neutral null evolutionary model. The former and the latter questions are very different, with greatly differing wider significance. We would very much like to see an analysis answering the latter question be performed, but would be immensely surprised if it had the power to detect any signatures of selection given the current data.

Any analysis of allele frequencies also needs to consider another important factor that we neglected to mention in our initial reply: biased sampling. Each sample that is sequenced is not independent from another. For example, as contact tracing is a significant driver of case detection, there will be a correlation between samples detected, driving oversampling of particular genotypes and mutations. Additionally, the sampling of infections for sequencing is greatly biased by the country they occur in, for example, 80% of COVID-19 cases to date (9/3/2020) come from China, but a far smaller proportion (~40% as of 2/3/2020) of sequenced genomes do. This biased sampling will further exaggerate these epidemiological variations driving variations in observed mutation frequencies.

Lu has also restated their claim that “there is only one nonsynonymous mutation that appears in > 50% of the samples and that is the L mutation. In contrast, there are 7 synonymous mutations” without any further qualification, completely ignoring our explanation in our first posting of why their methodology is flawed and why there are not seven such mutations.

Some additional points:

We did not make the claim “…that there is no purifying (i.e., negative) selection against non-synonymous mutations in the circulating viral population.” We simply highlighted a lack of power to detect it with the limited data currently available. The authors are over-interpreting our results here.

Whilst the paper by Charlesworth and Eyre-Walker (2008), which suggests removing variants under 5% frequency in count based analysis of polymorphisms, is fairly well cited (158 as of 9/3/2020), we think describing this data processing step as “the fundamental rule of population genetics” is unwarranted. However, if this frequency threshold were to be used, only three nonsynonymous and two synonymous mutations would remain for analysis (using data from 2/3/2020). This further highlights the lack of power in the current SARS-CoV-2 polymorphism data to make inferences about the pattern of purifying selection.

Lu provides a 3.5:1 nonsynonymous to synonymous ratio with no explanation of the methodology that has been used to generate it, or why it differs from our 2.43:1 estimate. Counting the relative numbers of sites for a given gene/genome is a non-trivial process, and the choice of model will be a significant driver in the estimated ratios.

To reiterate our first posting, Tang et al. have not provided any evidence that there are two major types of SARS-CoV-2, and certainly no evidence that part of the outbreak has been “more aggressive”. We agree with Nathan Grubaugh’s post that such a claim is misleading and has led to the spread of misinformation in the press. Rather than focus on our response, the authors should urgently correct the confusion they are responsible for.

Charlesworth, Jane, and Adam Eyre-Walker. “The McDonald–Kreitman test and slightly deleterious mutations.” Molecular biology and evolution 25.6 (2008): 1007-1015.


It was shown experimentally during the 2013-2016 outbreak of Ebola virus in West Africa that some of the non-synonymous mutations in the glycoprotein increased infectivity in human cells (, Diehl et al 2016, Urbanowicz et al., 2016). In the case of SARS-CoV-2 here, it seems a little premature to be labelling mutations “aggressive” just based on frequencies and without any experimental validation yet, as many epidemiological factors can have a major influence, as well as sampling issues. The S and L “types” relate to a C/T mutation at 28,144 resulting in the amino acid serine (S) or leucine (L) at this site. Both these mutations are still circulating in the epidemic, and as the virus is naturally evolving (and passing through various bottlenecks), this leads to further distinct clusters of viral samples as new mutations are acquired and passed on. For example, within the L cluster, a subcluster containing non-synonymous mutations at genome positions 14,408 and 23,403 and a synonymous mutation at 3,037 is now apparent, which consistently appear together on the viral genome; a further subcluster additionally containing mutations at 3 neighbouring positions is also apparent within this. Whilst the S cluster has a subcluster containing non-synonymous mutations at genome positions 17,747 and 17,858 along with a nonsynonymous mutation at 18,060 (many of the recent sequences from the USA are within this cluster). All these clusters (or “types”) are readily apparent from the phylogenetic trees, but it is not clear if any of the mutations give the virus a fitness advantage. Only experiments can truly determine the role of any of these mutations and whether they affect transmissibility/infectivity etc, or whether their accumulation is just due to random stochastic/bottleneck effects. Both the S and L viruses are still circulating, and there are a lot of other mutations accumulating across the viral genome.

Figure. Heatmap visualisation of all synonymous (yellow) and non-synonymous (red) nucleotide mutations from all complete human SARS-CoV-2 genome sequences available from GISAID on 11th March 2020. Created in R using d3heatmap (the hclust algorithm for grouping related samples is a little crude). Columns represent individual samples, rows represent genome positions. The S and L clusters are labelled with blue and purple boxes respectively, and genome positions mentioned in the text are labelled, numbering relates to the genome positions in the GenBank sequence MN908947. We would like to thank all the authors who have kindly deposited and shared genome data on GISAID; a table with genome sequence acknowledgments can be found on the CoV-GLUE website.


@oscar.maclean oscar.maclean](

An update of Response to MacLean et al.’s ‘Response to “On the origin and continuing evolution of SARS-CoV-2”’

Jian Lu

State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, 100871, China

To whom correspondence should be addressed: [email protected]

Since MacLean et al. saw “no reason to replicate their critique in the NSR journal”, on behalf of all authors, I update our response here.
The first debated issue is about the two types (S and L) we defined to describe the circulating virus population. Our definition was based on two loci that are ~20 kb apart in the ~30 kb viral genome, each has a minor allele frequency of ~30%, and show nearly complete linkage (r2 = 0.95). Hence, it is a simple FACT that there are two major clones or types circulating in the population. As for the functional interpretation, the frequency of S type changed from 3.8% to 37.8% between samples collected before and after Jan. 7 2020. We believed the nearly 10-fold difference, supported by P = 0.0008, was worth reporting to the community. Nevertheless, as we stated in Tang et al., we were fully aware that the finding might be biased by the limited samples available at that moment. We note that it is important for scientists to find a balance between being mute and reporting the observations during such a pandemic. We encourage people to keep an eye on the potential connection between virus types and clinical symptoms. Meanwhile, we recognize that within the context of Tang et al. the term “aggressive” is misleading and should be replaced by a more neutral term “a higher frequency”. We have made an addendum to to clarify this.

The second debated issue is about the detection of purifying selection. MacLean et al. argued that they did not detect purifying selection acting on the nonsynonymous mutations (P = 0.113, Table 1). To do that, MacLean et al. estimated the relative number of sites using the Goldman and Yang (1994) codon model. They concluded that “It estimates there are 2.43 times more nonsynonymous than synonymous sites in the SARS-CoV2 genome.” Based on their calculation, the ratio of nonsynonymous (N) to synonymous (S) sites should be 3.43 : 1. We obtained very similar ratios (roughly 22,789 N to 6,443 S sites, which is roughly 3.5: 1) whenever we used the parameter settings MacLean et al. did or various other parameters.

MacLean et al. used an expected N:S site ratio of 2.43:1 to test if the observed 169 mutations (105 N + 64 S) have been shaped by purifying selection. They failed to detect purifying selection because of a non-significant P value (= 0.113) they obtained by using a chi-square test. This is simply WRONG.

First, MacLean et al. used a ratio of 2.43: 1 to calculate the expected number of N and S mutations, which gave them a wrong conclusion (P = 0.113, as shown in Table 1 of MacLean et al.). When the ratio of 3.43: 1 is used in the calculation, the P value should be 0.003, if we used the chi-square test as MacLean et al. did.

Second, the chi-squared test MacLean et al. used is not appropriate. They should contrast the numbers of mutated N and S sites (105 and 64, respectively) to the numbers of non-mutated N sites and S sites in the virus genome, which are 22,684 and 6,379, respectively, according to our calculation. These numbers are consistent with MacLean et al.’s claim that “there are 2.43 times more nonsynonymous than synonymous sites in the SARS-CoV2 genome.” Then, the P-value should be 1.04x10-6.

Hence, both ways would overturn MacLean et al.’s conclusion. MacLean et al. failed to detect the signature of purifying selection based on erroneous calculations.

Moreover, consistent with our results, purifying selection on SARS-Cov-2 was also detected in a recent independent study (

We are pleased to see the addendum added to the Tang et al. manuscript in National Science Review, which we have copied below for reference. However we note that the online abstract, which will be by far the most read part of the paper, is unchanged in the strength of it’s claims “On the other hand, the S type, which is evolutionarily older and less aggressive”.

On the specific methodological comment on our purifying selection analysis:
To investigate the origin of the estimates of nonsynonymous to synonymous site ratios given by Tang et al. above, we ran the PAML software they cited (Yang 2007) using two different models. These two models differ in their estimation of codon frequencies, the 1x4 model uses average base frequencies across sites, and the codon frequency model uses the observed counts of each codon in the alignment. The estimates of ratios of nonsynonymous to synonymous sites these models produced ranged from 2.76 to 3.75. All models for these count estimates are fundamentally wrong, as they are approximations using limited data. Even the gold standard for generating this data, mutation accumulation experiments, are limited by the difficulty in observing lethal mutations. However, given that PAML uses a more powerful maximum likelihood framework and has been cited over 5,000 times (as of 16/3/20), we will happily use these estimated ratios, rather than our own (2.43). Both of these ratios from PAML would produce a significant chi squared test on our count data in Table 1 of the original post P<0.036.

We thank the authors for bringing our attention to this. We therefore agree it is a fair conclusion that significant evidence of purifying selection can be observed, which is filtering out nonsynonymous mutations before they can be observed in the outbreak. We would note that this is a subtly different result from that in the original Tang et al. paper, which suggested evidence of purifying selection suppressing the frequency of the observed mutations in the outbreak. Our criticism of that analysis remains unchanged.

Tang et al’s addendum for reference:
“In our recent publication, we showed that among circulating SARS-CoV-2 (with 103 genomes analyzed) two different viral genomes co-exist. We identified them as lineages L and S. The concerned amino acid we used to define the L and S lineages is located in ORF8 (open reading frame 8), which plays a yet undefined role in the viral life cycle. Based on the finding that “L” lineage has a higher frequency than lineage S, we described the L lineage as aggressive. We now recognize that within the context of our study the term “aggressive” is misleading and should be replaced by a more precise term “a higher frequency”. In short, while we have shown that the two lineages naturally co-exist, we provided no evidence supporting any epidemiological conclusion regarding the virulence or pathogenicity of SARS-CoV-2. By saying so, corrections will be made in the print version of this paper to avoid being misleading.”

Yang, Ziheng. “PAML 4: phylogenetic analysis by maximum likelihood.” Molecular biology and evolution 24.8 (2007): 1586-1591.


A quick comment is that MacLean et al. still did not perform the statistical test properly. When they tested whether the observed N/S number (105/ 64) departures from the expected N/S ratio (2.76 to 3.75, according to their updated analysis), they should use a binomial test, rather than the Chi-squared test as presented in their Table 1. Using the binomial test, we get P = 0.0009 under the expected N/S ratio of 2.76, and P = 4.41e-07 under the expected N/S ratio of 3.75.

Alternatively, if MacLean et al. insisted on using the chi-squared test, the most appropriate approach is to compare the number of mutated N and S sites to the number of non-mutated N and S sites, as I showed previously.