Response to “On the origin and continuing evolution of SARS-CoV-2”

oscar.maclean · March 10, 2020, 12:58pm

Response to Lu’s response to MacLean et al.’s Response to “On the origin and continuing evolution of SARS-CoV-2”

Oscar A. MacLean*, Richard Orton, Joshua B. Singer, David L. Robertson.

MRC-University of Glasgow Centre for Virus Research (CVR).

*To whom correspondence should be addressed: [email protected].

As Jian Lu on behalf of Tang et al. has chosen to post their response here, we see no reason to replicate our critique in the NSR journal. Concerning their response, respectfully to these authors, they are missing our key issues with their manuscript:

As we pointed out in our initial response, there are many epidemiological factors which may drive changes in frequency of a mutation in a global viral outbreak which involve no natural selection. The culmination of these processes drives mutation frequency changes, these changes are fully expected. It is thus meaningless to state as the authors have that “the “functional” test is the frequency of each mutation in the population”.

The claim that a difference in S and L frequencies across time would be “significant by any measure” is the key issue here. For example, a simple Fisher’s exact test (which was used in Tang et al’s original paper) measures the probability of seeing an observed difference in frequencies across countries or time, if the underlying frequencies were identical. This test does not assess if those mutation frequency changes observed across the time period are driven by selection or a null model under genetic drift. The question that is being addressed is not “is there a true difference in frequencies” (stochastic processes will cause changes in the frequency of mutations, which will likely drive significant Fisher’s exact tests), rather it’s whether this change in frequency over time is distinct from that under a neutral null evolutionary model. The former and the latter questions are very different, with greatly differing wider significance. We would very much like to see an analysis answering the latter question be performed, but would be immensely surprised if it had the power to detect any signatures of selection given the current data.

Any analysis of allele frequencies also needs to consider another important factor that we neglected to mention in our initial reply: biased sampling. Each sample that is sequenced is not independent from another. For example, as contact tracing is a significant driver of case detection, there will be a correlation between samples detected, driving oversampling of particular genotypes and mutations. Additionally, the sampling of infections for sequencing is greatly biased by the country they occur in, for example, 80% of COVID-19 cases to date (9/3/2020) come from China, but a far smaller proportion (~40% as of 2/3/2020) of sequenced genomes do. This biased sampling will further exaggerate these epidemiological variations driving variations in observed mutation frequencies.

Lu has also restated their claim that “there is only one nonsynonymous mutation that appears in > 50% of the samples and that is the L mutation. In contrast, there are 7 synonymous mutations” without any further qualification, completely ignoring our explanation in our first posting of why their methodology is flawed and why there are not seven such mutations.

Some additional points:

We did not make the claim “…that there is no purifying (i.e., negative) selection against non-synonymous mutations in the circulating viral population.” We simply highlighted a lack of power to detect it with the limited data currently available. The authors are over-interpreting our results here.

Whilst the paper by Charlesworth and Eyre-Walker (2008), which suggests removing variants under 5% frequency in count based analysis of polymorphisms, is fairly well cited (158 as of 9/3/2020), we think describing this data processing step as “the fundamental rule of population genetics” is unwarranted. However, if this frequency threshold were to be used, only three nonsynonymous and two synonymous mutations would remain for analysis (using data from 2/3/2020). This further highlights the lack of power in the current SARS-CoV-2 polymorphism data to make inferences about the pattern of purifying selection.

Lu provides a 3.5:1 nonsynonymous to synonymous ratio with no explanation of the methodology that has been used to generate it, or why it differs from our 2.43:1 estimate. Counting the relative numbers of sites for a given gene/genome is a non-trivial process, and the choice of model will be a significant driver in the estimated ratios.

To reiterate our first posting, Tang et al. have not provided any evidence that there are two major types of SARS-CoV-2, and certainly no evidence that part of the outbreak has been “more aggressive”. We agree with Nathan Grubaugh’s post that such a claim is misleading and has led to the spread of misinformation in the press. Rather than focus on our response, the authors should urgently correct the confusion they are responsible for.

Reference
Charlesworth, Jane, and Adam Eyre-Walker. “The McDonald–Kreitman test and slightly deleterious mutations.” Molecular biology and evolution 25.6 (2008): 1007-1015.