Genomic coordinates for B.1.1.7, P.1, and B.1.351

For those of you who are driven mad by local (gene-based) coordinate systems, below are Python dicts for mutation sets listed on PANGO lineages website.

Python dicts

The values correspond to 0-based coordinates of the first nucleotide in the corresponding codon (aside from deletions of course).

B.1.1.7

B117 = {
'T1001I':3265,
'A1708D':5386,
'I2230T':6952,
'del9':11288,
'del6':21765,
'del3':21991,
'N501Y':23062,
'A570D':23269,
'P681H':23602,
'T716I':23707,
'S982A':24505,
'D1118H':24913,
'Q27stop':27971,
'Y73C':28109,
'D3L':28279,
'S235F':28975
}

P.1

P1 = {
'S1188L':3826,
'K1795Q':5647,
'del':11288,
'L18F':21613,
'T20N':21619,
'P26S':21637,
'D138Y':21973,
'R190S':22129,
'K417T':22810,
'E484K':28011,
'N501Y':23062,
'H655Y':23524,
'T1027I':24640,
'G174C':25911,
'E92K':28166,
'P80R':28510
}

B.1.351

B1351 = {
'P71L':28885,
'T205I':26454,
'K1655N':5227,
'D80A':21799,
'D215G':22204,
'K417N':22810,
'A701V':23662,
'N501Y':23062,
'E484K':23011}

Notebook

This Colab Notebook validates these coordinates against SARS-CoV-2 genome.

1 Like

Two important corrections

1. E484K in B.1.351 should be 23011

Erik Alm from ECDC has noted incorrect coordinate for E484K. The correct coordinate is 23011.

2. P71L and T205I in B.1.351 were flipped

The correct coordinates are:

  • P71L:26454
  • T205I':28885

GitHub Gist with correct dicts …

… is here → VOC coordinates in NC_045512.2 · GitHub

In E484K in P.1 too…