Seasonal CoV & SARS CoV Live Full-Genome Builds

28 Jan 2020
Emma Hodcroft, Biozentrum, University of Basel, Switzerland
emma.hodcroft(at)unibas.ch

In order to provide context and background to the 2019 novel Coronavirus (nCoV-2019), we have prepared a Nextstrain/augur/auspice pipeline which automates full-genome, human-focused builds of Betacoronavirus 1, Human coronavirus 229E, and SARS-CoV, using data from ViPR.

Analyses

Betacoronavirus 1

The phylogenetic analysis of the Betacoronavirus 1 sequences is filtered to human and chimpanzee infections, and thus contains only Human coronavirus OC43 sequences.

HCoV-OC43 is distributed worldwide, considered a ‘seasonal’ coronavirus, and is one of the viruses responsible for the common cold.

Sequences date from between 1997 and 2019 and cover 9 countries, and the estimated mutation rate is between 2-3 x10-4 subs/site/yr.

The build can be viewed live here:

A fairly rough tanglegram comparing the first ~20,000bp to the last ~10,000bp can be viewed here:

229E

Human coronavirus 229E can be found in animals, including camels and bats, but the phylogenetic analysis here is filtered to only human samples.

Like HCoV-OC43, HCoV-229E is distributed worldwide and responsible for the common cold.

Sequences date from 1993 to 2019 and cover 5 countries, with an estimated mutation rate of 2-3 x10-4 subs/site/yr.

The build can be viewed live here:
image

A fairly rough tanglegram comparing the first ~20,000bp to the last ~10,000bp can be viewed here:

SARS

The Severe acute respiratory syndrome-related coronavirus phylogenetic analysis was filtered to exclude samples from bats, as these are more divergent*. The majority of the remaining samples are from palm civets and humans.
*nCoV-2019 sits among these bat samples.

SARS-CoV was responsible for an outbreak of severe respiratory illness in Asia in 2002-2003 (with secondary cases worldwide). There have been no cases of SARS in humans since 2004.

Sequences range from 2002-2004 in 6 countries, with a mutation rate estimated at ~3.6 x10-4 subs/site/yr.

The build can be viewed live here:

About the builds

All code and data, plus information about how to run the builds, plus the assumptions made for each run and some details on filtering and exclusions, can be found at github.com/nextstrain/cov.

The builds are filtered to be human-focused, and can be easily updated if/when new sequences are available on ViPR, by automatic detection of which sequences are new since the last run. Only these will be downloaded from Genbank and aligned.

2 Likes

Hi Emma,
There is some odd stuff going on with your SARS tree. The first wave of epidemic ran from March to June 2003 (it probably started well before that but there are no sequences). There was a lab escape in Singapore in October 2003 involving some secondary cases. Then there was some more cases in 2004 associated with civets (which is that red clade at the bottom). So I think there are some wildly-off dates. I managed to find 16 genomes to which I could reliably find a date (mostly late march, early April and on in May).

Hi Andrew, Yes, there are plenty where I know the year is 2003, but for whatever reason, they are more divergent (they hit the barrier at 31-Dec-03 and line up on the tree). There also are plenty with almost no information, as well. For the moment I left a lot in, but it needs more work, and I agree there must be many that are wrong.

I’d love to compare notes on dates with you and see where we match and where we don’t. I managed to find one today sampled 11 Feb 03 and another with ‘symptom onset’ 7 Feb 03. (However, the first was excluded from another paper for having too many irregularities…) I’ll be in Edinburgh next week from Tues, or happy to chat on Slack or by email.