Transparent analysis of raw COVID-19 data: lack and low quality of raw data

anekrut · February 25, 2020, 6:09pm

The Galaxy Project and HyPhy/Datamonkey teams are partnering in making analyses of COVID-19 data globally available:

Most of analyses we describe begin with raw data (sequencing reads). The bottom line so far is:

There are very few raw datasets
Some raw datasets contain no COVID-19 data at all
Lack of raw data prevents assessing the extent of viral heterogeneity such as, for example,
an A-to-C substitution (MAF 38%) at position 24,323 (resulting in Lys921Gln in protein S) in sample “wuhan2”

anekrut · February 27, 2020, 8:16pm

New high quality raw reads from the University of Wisconsin. This is the first truly high quality set of raw reads.

david_h_oconnor · February 27, 2020, 10:25pm

I had to figure out how to use the emoji