Pango Lineage Nomenclature: provisional rules for naming recombinant lineages

The rules outlined below constitute a provisional naming convention for identifiable SARS-CoV-2 recombinant lineages within the Pango dynamic nomenclature system (https://doi.org/10.1038/s41564-020-0770-5). These rules are being considered for ratification by the Pango nomenclature committee.

NOTE: New lineage names are designated by the Pango committee and not by individual researchers or groups. If you would like to make a lineage suggestion, please read the nomenclature documentation and submit a request here.

General naming conventions:

• Pango lineage names comprise an alphabetical prefix and a numerical suffix. The alphabetical prefix contains latin characters only which are case insensitive.

• The letters I, O and X are not used in the prefix of the names of standard lineages.

• Each dot in the numerical suffix means “descendent of” and is applied when one ancestor can be clearly identified. So lineage B.1.1.7 is the seventh named descendent of lineage B.1.1 and C.1 is the first named descendent of lineage C.

• The suffix can contain a maximum of 3 hierarchical levels, referred to as the primary, secondary and tertiary suffixes.

• In order to avoid four or more suffix levels, a new lineage suffix is introduced, which acts as an alias. For example, C is an alias of B.1.1.1 hence the descendent of B.1.1.1 is called C.1 (rather than B.1.1.1.1). Consequently the name C, by itself, is never directly applied to a sequence.

• In some instances, it is not possible to unambiguously identify an ancestral lineage within the Pango nomenclature for a given lineage of interest. This is the case for lineages A and B, because of their position near the root of the phylogeny. For these “special case ancestors”, the alphabetical part alone can be applied directly to sequences. In all other cases the suffix is mandatory.

Recombinant lineage naming rules:

• Every new recombinant lineage is given a new top-level lineage prefix.

• The minimum number of genomes required to designate a new recombinant lineage is the same as the number required to designate a non-recombinant lineage.

• All top-level lineages that are recombinants have a prefix that begins with X.

• In order of discovery, recombinant lineages prefixes are XA, XB, XC…, XAA, XAB,…XBA, etc.

• Recombinant lineage names do not contain information about their putative parental lineages. Any such information (which might be uncertain or incomplete) can be provided in the Pango lineage summary table (currently available here).

• Recombinant lineages are “special case ancestors” because they don’t have a single unambiguous ancestral lineage within the Pango nomenclature (see above). Therefore sequences can be directly allocated to the prefix without a numerical suffix. In this way, recombinant lineages behave like lineages A and B.

• Non-recombinant descendent lineages follow the usual suffixing rules, XA.1.1, XA.1.2 etc.

• When the maximum number of suffix levels is reached, the usual aliasing rules apply. So if AJ is the next available top-level prefix, then XA.1.1.1.1 becomes AJ.1. (AJ is an alias of XA.1.1.1 but is not used without a suffix because it is not a special case ancestor.)

• Any recombinant of recombinants is given the next available top-level name that starts with an X. Information about ancestry is added to the lineage summary table (see above).

Posted by Oliver Pybus on behalf of the Pango Nomenclature Committee
Email: enquiries@pango.network.
Web: http://pango.network.
Twitter: @PangoNetwork