A resource for comparing SARS-CoV2 to other coronaviruses: Coordinate mapping and residue annotation

Delphine Lariviere1, Sergei Kosakovsky Pond2, Anton Nekrutenko1 + HyPhy and Galaxy Teams

1Penn State / 2Temple / covid19.galaxyproject.org | covid19.datamonkey.org


The gist

Here we describe an on-line, curated resource that serves two functions:

  1. It provides a coordinate conversion table between amino-acid coordinates in SARS-CoV-2 and a set of well-studied coronaviruses;
  2. It makes an attempt to annotate amino acid residues of SARS-CoV-2 using available coronavirus literature.

Why compare coronaviruses

While SARS-CoV2 is new, coronaviruses have been studied for decades. A good deal is known about the structure and function of individual genes and protein products encoded by these genes in different viruses (e.g., [1, 2, 3, 4, 5, 6]), and comparison between homologous residues is the “go-to” technique for annotation. For example, what may be the significance of a mutation/polymorphism at amino-acid 921 of the SARS-CoV-2 Spike protein? We can use a straightforward but tedious alignment procedure to determine that the corresponding residue in SARS-CoV-1 is 903. A literature review reveals that this residue in SARS-CoV-1, is involved in a Salt bridge, and is highly conserved in Beta Coronaviruses. These annotations could indicate that amino-acid mutations at position 921 could have a high impact on the Spike protein function (see [7]). Correspondence between homologous residues in different viral genomes is established by straightforward and tedious procedures that are performed on a per-study basis and are rarely described in enough detail to be reused.

Here we describe an on-line, curated resource that serves two functions. First, it provides a coordinate conversion table between amino-acid coordinates in SARS-CoV-2 and a set of well-studied coronaviruses. Second, it makes an attempt to annotate amino acid residues of SARS-CoV-2 using available coronavirus literature.

We generated alignments for each gene of the SARS-CoV-2 genome to build tables of coordinate conversion between different coronaviruses species. These tables provide coordinates of corresponding residues across viral taxa to allow a straightforward transfer of annotations. We also reviewed a subset of the available literature describing functional regions of the genome and produced tables of annotation of all residues to link them to functional annotations available in corresponding residues in other species of coronavirus.

The tables we developed can be accessed through online ObservableHQ notebooks linked at the end of this post. To find positions equivalent to a residue or region of interest :

  • Select the Gene
  • Select the species in which the coordinates are
  • Select the region of interest

The tables are here:


Please help curating and maintaining!

Since the beginning of this project, we have been gathering functional information from literature for each residue of each gene of SARS-CoV-2. The literature covers multiple coronavirus species, and we used the coordinate tables above to transfer the annotations between species. In cases where the residue is different from the one annotated, it is specified in the table, and all annotations are linked to their article of origin.

Due to the considerable volume of literature available, the annotations are incomplete, but we are working on enriching them every day. Please contribute to the annotation effort by adding annotations in the tables on Github!


References

  1. Yang D, Leibowitz JL. The structure and functions of coronavirus genomic 3’ and 5’ ends. Virus Res. 2015 Aug 3;206:120–133. PMCID: PMC4476908
  2. Venkataraman S, Prasad BVLS, Selvarajan R. RNA Dependent RNA Polymerases: Insights from Structure, Function and Evolution. Viruses [Internet]. 2018 Feb 10;10(2). Available from: Viruses | Free Full-Text | RNA Dependent RNA Polymerases: Insights from Structure, Function and Evolution PMCID: PMC5850383
  3. Narayanan K, Huang C, Makino S. SARS coronavirus accessory proteins. Virus Res. 2008 Apr;133(1):113–121. PMCID: PMC2720074
  4. Posthuma CC, Te Velthuis AJW, Snijder EJ. Nidovirus RNA polymerases: Complex enzymes handling exceptional RNA genomes. Virus Res. 2017 Apr 15;234:58–73. PMCID: PMC7114556
  5. Denison MR, Zoltick PW, Hughes SA, Giangreco B, Olson AL, Perlman S, Leibowitz JL, Weiss SR. Intracellular processing of the N-terminal ORF 1a proteins of the coronavirus MHV-A59 requires multiple proteolytic events. Virology. 1992 Jul;189(1):274–284. PMCID: PMC7130892
  6. Shang J, Ye G, Shi K, Wan Y, Luo C, Aihara H, Geng Q, Auerbach A, Li F. Structural basis of receptor recognition by SARS-CoV-2. Nature. 2020 May;581(7807):221–224. PMCID: PMC7328981
  7. Teams GAHD, Galaxy And Hyphy, Nekrutenko A, Kosakovsky Pond SL. No more business as usual: agile and effective responses to emerging pathogen threats require open data and open analytics [Internet]. Available from: http://dx.doi.org/10.1101/2020.02.21.959973