Selection analysis of GISAID SARS-CoV-2 data

We are performing daily screens of individual genes from the SARS-CoV-2 genomes from GISAID to keep track of variants that may be positively or negatively selected. This analysis is for the current human pandemic viruses (it does not consider selection during zoonosis or in other species; there are plenty other studies that have done it).

Given the rate of sequencing, this is a unique opportunity to see if any of these analyses have utility in identifying potentially interesting sites; whereas traditionally they are used to examine evolution retrospectively.

At the moment it appears that (as expected) there is not a whole lot going on, with many “high diversity” sites like S D614 not showing much signal for positive selection. There are few potential interesting sites where frequencies are increasing over time and sequences with variants are not in a single tree clade (see http://covid19.datamonkey.org and https://observablehq.com/@spond/natural-selection-analysis-of-sars-cov-2-covid-19), or something else is going on ( nsp2 T85I, nsp13 P504L, spike S943P/T).

We are also working with @anekrut on parallel comprehensive analysis of intra-host variation from NGS data and linking the two levels of analysis (some of it is already there).