Ready-to-use global platform for variant detection in SARS-CoV-2

anekrut · April 2, 2021, 6:45pm

Wolfgang Maier¹, Simon Bray¹, Marius van den Beek², Dave Bouvier², Nathaniel Coraor², Milad Miladi¹, Babita Singh³, Jordi Rambla De Argila³, Dannon Baker⁴, Nathan Roach⁵, Simon Gladman⁶, Frederik Coppens^7,8, Darren P Martin⁹, Andrew Lonie⁶, Björn Grüning¹, Sergei L. Kosakovsky Pond¹⁰, and Anton Nekrutenko²

¹ University of Freiburg / usegalaxy.eu, Freiburg, Germany
² The Pennsylvania State University / usegalaxy.org, University Park, PA, USA
³ GalaxyWorks Inc, Baltimore, MD, USA
⁴ Centre for Genomic Regulation, Viral Beacon Project, Barcelona, Spain
⁵ Johns Hopkins University / usegalaxy.org, Baltimore, MD, USA
⁶ University of Melbourne / usegalaxy.org.au, Melbourne, Australia
⁷ Ghent University, Ghent / usegalaxy.be, Belgium
⁸ VIB Center for Plant Systems Biology, Ghent, Belgium
⁹ University of Cape Town, Cape Town, South Africa
¹⁰ Temple University, Philadelphia / datamonkey.org, PA, USA

Why?

Global inequality in COVID-19 response is not limited to vaccine development and distribution. Virtually no one talks about another type of inequality—inequality at the research level and specifically at the level of data analysis. We continuously hear about UK, South African, and Brazilian “variants of concern”. This is not because there is something peculiar about these countries. Rather, it is indicative of the monitoring infrastructures that they possess. Surely, there are other “variants of concern”. We simply do not know about them as countries where they originate do not have means for reliable monitoring and early detection.

Furthermore, even in developed countries different research groups perform sequence analyses in distinct ways with some approaches being more appropriate than others. This multitude of analytical approaches makes it hard to integrate newly acquired knowledge and compare results across studies.
Our work is the direct response to these challenges. We assembled a global data analysis consortium leveraging curing edge public computational infrastructure from the US, Europe, and Australia to deliver a platform that can be used now by any researcher from any country. Our workflows can be used to analyze existing or new data no matter how big or complex these datasets may be. We also emphasize the fact that the access to raw data is essential and demonstrate how sequencing read data can be used to uncover potentially important viral heterogeneity within samples.

What do we want from Virological?

We need your help in testing and fine tuning the workflows, documentation, and tutorials! Our system is housed on three globally distributed public Galaxy instances in the US (http://usegalaxy.org), the EU (http://usegalaxy.eu), and Australia (http://usegalaxy.org.au). Each is capable of supporting thousands of users running hundreds of thousands of analyses per month. Anyone can create an account and obtain immediate access to as much computation as one might reasonably need (with a limit on the number of concurrent analyses) and 250 Gb of disk space, which can be increased based on the needs of an individual user.

We currently have five workflows for dealing with Ampliconic data generated with Illumina and ONT as well as RNAseq (WGS) data:

How do I use it?

We created a documentation page at Global platform | COVID-19 analysis on usegalaxy.★

Where do I get help?

You can either email us ([email protected]) or use our help system at https://help.galaxyproject.org