A fool’s errand: predicting the evolutionary future of the Omicron SARS-CoV-2 lineage

Sergei Kosakovsky Pond and Darren Martin

As Omicron spreads worldwide and more sequence data become available, we can begin making short-term predictions as to which evolutionary trajectories the virus could follow, what might be the fate of individual Omicron mutations, and what patterns of sequence variation would be expected under what scenarios. These patterns can be detected using comparative sequence analysis and used to rank the relative evolutionary likelihoods of different scenarios.

Fate of the original Omicron lineage

Scenario 1: spread with no selective pressure. This scenario would recapitulate the early spread of SARS-CoV-2 in 2020: a situation where the virus is already sufficiently adapted to global spread and almost all nucleotide variation detected amongst sequenced variants would be either neutral or mildly deleterious. This pattern of evolution would be associated with a decrease relative to past SARS-CoV-2 lineages in the numbers of codon sites across the genome that are detectably evolving under positive selection (i.e. with an excess of non-synonymous substitutions) and an increase in the numbers of codons evolving either under negative selection (i.e. with an excess of synonymous substitutions) or neutrally. We view this scenario as unlikely because the fitness landscape upon which SARS-CoV-2 is evolving in December 2021 is very different from the slowly shifting fitness landscape that it was evolving on within the first 11 months of the pandemic. Prior to the emergence of the first VOCs between October and December 2020, the average human was neither vaccinated nor previously infected and, in most instances upon entering a person, SARS-CoV-2 would encounter no strong immune pressures before it had the opportunity to be onwardly transmitted. In December 2021 most people have either been vaccinated and/or previously infected with SARS-CoV-2 and the virus must now contend with an immunologically heterogeneous population of human hosts, a substantial fraction of whom possess at least partly efficacious immune responses. This continually changing immunological environment means that, moving forward, there will be no persistently effective combinations of SARS-CoV-2 immune evasion mutations which allows the virus to “coast”.

Scenario 2: spread with weak/moderate ongoing selective pressures. This scenario would be consistent with a general selective sweep, where Omicron becomes the dominant strain and evolves relatively slowly without significant changes in function/transmissibility/resistance. This would mirror the spread of Alpha and Delta. Over a six to eight month timeframe, and contingent on no other VOCs unexpectedly emerging and competitively displacing Omicron, this is a very likely scenario. An impending selective sweep is likely because Delta, the only viable competitor with Omicron at the moment, will almost certainly be outperformed by Omicron in head-to-head competitions to infect susceptible hosts. This is already being born out by by observed epidemiological dynamics in Africa and Europe. Initially at least, the complement of immune evasion mutations carried by Omicron is likely to provide it with a clear selective advantage over Delta when spreading in vaccinated/recovered individuals; neutralization and vaccine efficacy data are still being collected, but initial reports strongly support a significant degree of immune evasion in Omicron. This advantage over non-Omicron lineages will, however, wane over time either naturally as individuals who have been infected with Omicron acquire additional immunity, or if Omicron-targeted vaccines are deployed. The evolutionary dynamics associated with competition between different Omicron lineages against the backdrop of a changing immunological environment should be detectable as positive selection signals at particular codon sites that encode amino acids that fall within or adjacent to both the surface-exposed binding sites on the Spike, matrix or envelope proteins that are targeted by neutralizing and non-neutralizing antibody classes, and CTL epitopes spread throughout the SARS-CoV-2 proteome (but particularly common in the Spike and nucleocapsid proteins). Further, Omicron sublineages that acquire genuinely adaptive mutations at these positively selected sites would be expected to increase (potentially quite slowly) in frequency over time relative to Omicron lineages that lacked mutations at these sites. However, such mutations would not be expected to reach fixation deterministically as they would only ever be advantageous within subsets of infections. Signals of balancing selection (such as mutation toggling) might be detectable at the codon sites where such mutations occur if the mutations involve a trade-off between immune evasion and replicative or transmission efficiency.

Scenario 3: spread with strong ongoing selective pressures. In this scenario, Omicron experiences significant short-term selective pressures, and resolves these by acquiring and rapidly fixing additional mutations (potentially including reversion mutations). The sources of selection could be (i) competition with the fittest Delta sublineages or other variants yet to be discovered; (ii) low genetic barriers to either acquiring additional mutations that confer escape to major neutralizing antibody classes (e.g., the S/346 mutation seen in >5% of current Omicron sequences and showing evidence of positive and directional selection), enhanced receptor binding or cell-fusion; (iii) if Omicron evolved within the context of a long-term infection, it might experience selective pressure to “transition” from intra-host evolution to community spread (which could involve reversions of some host-specific immune evasion mutations); or (iv) if Omicron evolved with an alternative animal species, selective pressures to reoptimize binding to human cellular receptors and evasion from human antibody classes or HLA-alleles. This scenario would be expected to play-out over a considerably shorter timeframe than scenarios one or two and would be expected to transition into either of these other two scenarios. Scenario 3 would be perceptible as one or more selective sweeps with Omicron lineages carrying new highly advantageous mutations initially rising rapidly in frequency but then being displaced by newer Omicron lineages carrying additional highly adaptive mutations. When all of the easily accessible mutations with large positive fitness effects have been discovered, selective sweeps will become less discernible with conditionally adaptive mutations rising more slowly in frequency within the Omicron population until they are either fixed or reach an equilibrium frequency that reflects the proportion of infections within which they are advantageous (i.e. a transition to scenario two).

Scenario 4: recombination wildcard. There is already evidence of two distinct Omicron lineages (BA.1 and BA.2) and more may be discovered at any time. The degree of genetic difference between both the different Omicron lineages and between Omicron and other SARS-CoV-2 lineages is large enough that recombination between these could yield genetic variants with substantially altered biological characteristics relative to parental viruses. The increased infectivity of lineages such as Omicron and Delta also substantially increases the probability of mixed infections of these viruses. However, ongoing co-circulation of Delta and Omicron seems unlikely given the outcomes of past head-to-head competitions between cocirculating variants. Under this scenario, in any given region of the world where Omicron is displacing Delta, there would be only a limited time-window (perhaps no longer than two to four weeks) when the two variants were co-circulating at high enough frequencies for large numbers of mixed Omicron/ Delta infections to occur. When such infections occur we presently have no idea whether the potential even exists for recombinants to arise that have a higher degree of fitness than both Delta and Omicron and, if it does, what the biological properties of such recombinants might be. Several credible reports of recombination in SARS-CoV-2 have been published (Ref 1; Ref 2), yet the recombinant variants did not outcompete “pure” lineage forms. Nevertheless it may be possible to detect these recombinants when they are sampled during routine genomic surveillance, and, if so, it will also be possible to infer whether the recombinants have a fitness advantage relative to the non-recombinant Omicron lineages among which they cocirculate. If, for example, a recombinant lineage repeatedly displaces Omicron in the regions of the world where it occurs, we would expect the evolutionary dynamics of the recombinant to then proceed according to scenarios one through three. It is important to note that SARS-CoV-2 sequences remain very genetically homogeneous (on the order of 1/1000 nucleotide differences between contemporaneous strains). Recombination is much more efficient at providing evolutionary “short-cuts” when genetic diversity is higher, and a recombination event brings together genomic regions that would otherwise require numerous mutations to reach the same end results.

Fate of individual mutations

1. Population-level maladaptive mutations, if carried over from intra-host evolution, will revert. This could happen rapidly if these mutations involved substantial fitness trade-offs (for example, between immune evasion and transmissibility), or more slowly if the involved less-substantial fitness trade-offs (for example, between cell-entry efficiency and immune escape). When the costs of such trade-offs outweigh their benefits within the context of whatever host population the virus is spreading within, the trade-off can be resolved either by back-mutation or via genetic recombination with a variant that never made the trade-off in the first place. If the trade-off involved a single mutation then back-mutation would likely be the most effective mechanism of reversion However, if the trade-off involved primary and secondary compensatory mutations that are clustered together within a gene then genetic recombination with a variant that never made the trade-off is likely to be the most efficient mechanism of reversion. Reversion might never occur in more complex situations where the trade-off involved multiple interacting mutations distributed over a large region of the genome; in this case additional compensatory mutations may arise.
2. Adaptive mutations at previously negatively selected sites should be maintained by negative selection. If clusters of mutations in Omicron that have occurred at codon sites that were detectably evolving under negative selection in non-Omicron lineages are adaptive (Ref 3), we would expect going forward, that the codons where these mutations occurred will resume evolution under negative selection to maintain the adaptive combinations of mutations within these clusters. Further, mutations previously identified as being adaptive in other lineages will in general also be maintained by negative selection unless there is a fitness trade-off involved (i.e. sites evolving under balancing selection). For example, the perpetually changing host immunological landscape will be expected to shift the balance between the fitness benefits of a given immune evasion mutation and whatever fitness costs the mutation imposed on some other aspect of the virus’ biology.
3. We expect some of the mutations seen in the 501Y meta-signature (Ref 4) to appear in the Omicron lineage (e.g. 5F,18F, 701V etc) and rise in frequency over time - although, based on tracked mutation frequency changes in previously dominant VOCs, it is unlikely that these will be fixed before Omicron is eventually replaced by whatever comes next.
4. It is especially important that the acquisition of mutations that could confer additional immune escape is carefully monitored. It might be difficult to identify these mutations only from sequence data since the Omicron Spike sequence has changed so substantially from that of Wuhan-Hu-1 that RBD immune escape mutations identified by deep mutational scanning on the Wuhan-Hu-1 genetic background might no longer provide an accurate phenotypic impact report in the Omicron genetic background.
5. We would expect all the “easily accessible” large effect immune escape and transmission advantage mutations to arise and propagate quite rapidly. It will take considerably longer for more complex novel constellations of epistatically interacting mutations to coalesce on the Omicron genetic background. The difference between the Omicron spike and those of other SARS-CoV-2 variants is so substantial that there likely exist completely novel two or three mutation combinations, analogous to the E484K and N501Y combination seen in other SARS-CoV-2 lineages, that will have high fitness impacts but have never before been seen in other Spike proteins.
6. We should make better use of deep sequencing data to systematically investigate intra-host variability, especially occurring at sub-consensus levels, because the fate of such mutations have been shown to correlate with eventual inter-host evolutionary dynamics in pathogens such as HIV (sites of intra-host adaptation are strongly correlated with sites of inter-host adaptation).

Ref 1:


Ref 2:

Ref 3:

Ref 4: