Number of mutations along a transmission chain

Time, mutations, and transmissions

There have been a number of interesting posts, here and on twitter, discussing the time scale of the current 2019-nCoV tree, and what we can and can’t infer from the genetic data given the sparsity of temporal signal. Most of the discussion has focused on estimating the TMRCA in order to compare these estimates to epidemiological reports.

At a finer scale, it is also interesting to think about how long (in time) a transmission chain is, given we observe some number of mutations, and how many transmission events this represents. Conversely, how many mutations (and transmissions) do we expect down a transmission chain of n days? These topics have been discussed in detail by Xavier Didelot and Caroline Colijn and are implemented for full phylogenies in their transphylo package (here). Recently, @trvrb sketched out similar calculations in his informative thread.

To get an intuitive idea of these probabilities, I built the small interactive web app which should be displayed below (but if not it’s also here). It allows for fiddling with the evolutionary rate, serial interval, and number of observed mutations or days along a transmission chain, and plots the corresponding probability densities.

As expected, any branch marked by a few mutations likely represents a long transmission chain both in time and number of events. Conversely, long chains marked by no mutations are possible, and even likely in some cases. Granted all of this is conditional on knowing there is a direct transmission path between two samples.