Transparent analysis of raw COVID-19 data: lack and low quality of raw data

The Galaxy Project and HyPhy/Datamonkey teams are partnering in making analyses of COVID-19 data globally available:

Most of analyses we describe begin with raw data (sequencing reads). The bottom line so far is:

  1. There are very few raw datasets
  2. Some raw datasets contain no COVID-19 data at all
  3. Lack of raw data prevents assessing the extent of viral heterogeneity such as, for example,
    an A-to-C substitution (MAF 38%) at position 24,323 (resulting in Lys921Gln in protein S) in sample “wuhan2”

New high quality raw reads from the University of Wisconsin. This is the first truly high quality set of raw reads.

I had to figure out how to use the :blush: emoji