Response to “On the origin and continuing evolution of SARS-CoV-2”

Response to MacLean et al.’s ‘Response to “On the origin and continuing evolution of SARS-CoV-2”’

Jian Lu
State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, 100871, China

To whom correspondence should be addressed: [email protected]

The criticisms by MacLean et al. (Response to “On the origin and continuing evolution of SARS-CoV-2” ) of the Tang et al.’s recent publication (“On the origin and continuing evolution of SARS-CoV-2”) on National Science Review (NSR) will be briefly answered below. Both parties have agreed that full-length exchanges should appear in NSR, which has generously offered to host this very public and open debate.

There are two main criticisms by MacLean et al. First, they argued that S and L types have no functional significance since we did not measure within-host infections. To evolutionary geneticists like ourselves, the “functional” test is the frequency of each mutation in the population. If a mutation has a fitness (or transmission) advantage, it will be much more common than the neutral prediction. Here, we use the synonymous mutations that do not alter protein sequences as the neutral reference.

As shown in Fig. 2 of Tang et al., there is only one nonsynonymous mutation that appears in > 50% of the samples and that is the L mutation. In contrast, there are 7 synonymous mutations. In the complete absence of selection, one expects to see a nonsynonyous : synonymous ratio of ~ 3.5 : 1. In short, while we expect to see 7 x 3.5 = 24.5 nonsynonymous mutations in the population, we see only one, which is the L mutation. It is not clear why MacLean et al. failed to see this simple and central point.

We may then flip the coin and observe the ancestral S variant. It occurs in only one of the 27 samples (3.7%) collected before Jan. 7 (mainly from Wuhan). In contrast, it occurs in 28 of the 73 samples (38.4%) after Jan. 7 (mainly from outside of Wuhan). Again, the difference is startling and highly significant by any statistical test. MacLean et al. claimed that the two patterns can be explained by pure stochastic forces, such as population bottleneck. These finer points will be debated in NSR. We only wish to add one point. Stochastic changes are pronounced only when the population size is small. Even if each infected patient has only ONE viral particle, the effective population size of the virus is not small.

The second criticism of MacLean et al. is that there is no purifying (i.e., negative) selection against non-synonymous mutations in the circulating viral population. We shall again refer to the ratio of mutations at > 50% in frequency. Recall that the expected nonsynonymous: synonymous mutations should be 24.5 : 7 but we observed only 1 : 7. If this 24.5 fold difference is not due to negative selection, what is it then?

This 24.5 fold difference stands in sharp contrast with the test that MacLean et al. used to argue against purifying (their Table 1). The technical errors will be discussed in detail in the formal debate. In particular, because deleterious mutations occur in low frequencies, often as singletons (i.e., one occurrence in all samples), the study of purifying selection has to filter out deleterious mutations that have not been purged yet. Apparently, these authors fail to take into account the fundamental rule of population genetics in their analyses.

As the editor of NSR states in public, the proper place to have a thoughtful debate is on NSR, which will invite experts to review the submissions. Our public statement on this matter prior to the account published by NSR will be limited to the messages here.