To count introductions, we use the (limited) direct data on travel that we have. We split our Russian samples into five distinct groups, depending on their phylogenetic position relative to other Russian and non-Russian samples (see figure):
For Russian singletons and Russian transmission lineages, we used maximum parsimony, assuming that they each result from a distinct introduction.
For stem clusters, stem-derived singletons and stem-derived transmission lineages, it’s more complex. For example, the pattern in the left panel in the figure above could result from anywhere between 1 and 8 distinct introductions, depending on which of the transmissions corresponding to the ancestral node occurred prior to introduction, and which in Russia.
Facing a similar problem (on a much larger UK dataset), Pybus et al. (Preliminary analysis of SARS-CoV-2 importation & establishment of UK transmission lineages) assumed that the ancestral state was non-UK, so that each transmission lineage resulted from a distinct introduction. It would be tempting to use a similar simple rule to estimate the number of introductions for stem clusters and stem-derived singletons.
However, from travel data, we see that no simple rule would work. E.g., for some of the stem-derived transmission lineages, we know that most individuals haven’t travelled:
(Russian flag means no travel), suggesting that this lineage could have resulted from transmission within Russia. In other lineages, however, we see multiple individuals who have travelled:
To address this as well as we can, we use a mixed approach. We assume that the number of introductions for each of the categories above is proportional to the fraction of individuals who have travelled, among all individuals with travel history. This gives us ~0.33 imports per stem-derived transmission lineage; ~0.14 imports per stem-derived singleton; and ~0.36 imports per sequence in a stem cluster. For details, see here: https://www.medrxiv.org/content/medrxiv/suppl/2020/07/17/2020.07.14.20150979.DC1/2020.07.14.20150979-1.pdf
This yields our estimate of 67 introductions overall giving rise to the sampled diversity.