Early phylodynamics analysis of the COVID19 epidemics in France
21Apr2020
Gonché Danesh, Baptiste Elie and Samuel Alizon
ETE Modelling group, MIVEGEC, CNRS, IRD, Université de Montpellier
Introduction
This report focuses on the COVID19 epidemic in France using the SARSCov2 genome data available from the GISAID  Global Initiative on Sharing All Influenza Data on April 10, 2020.
The first cases of COVID19 were detected in France in Jan 2020, mostly from travelers, but they remained isolated. Incidence based on screening data only started to increase steadily in France on Feb 27. Limited measures were announced on Feb 28, but schools were closed from Mar 16 and a nationwide lockdown was implemented from Mar 17. On Apr 19, the prime minister E Philippe gave the first official estimate of the basic reproduction number (\mathcal{R}_0), which was 3.5. He also announced the temporal reproduction number had been dropped to 0.6.
We performed phylodynamics analyses using an exponential growth coalescent model in Beast 1.8.3 and a BDSKY model in Beast 2.3. Due to the low temporal signal in the data, we used molecular clocks with fixed values.
Here, we report results on the origin of the tree, the epidemic doubling time, the generation time and the temporal reproduction number \mathcal{R}(t).
Data used
On April 10, 194 sequences were available from samples originating from France. The vast majority of these sequences (94%) belong to the same clade labelled as A2 by nextstrain.
These sequences were aligned and cleaned using the Augur pipeline developed by nextstrain. One sequence was removed due to low qualityn, which led to a dataset of 186 sequences. The list is shown in Appendix.
We screened the dataset with RDP, which did not detect any recombination events.
The following graph shows the 186 sequences used, classified according to the date on which the sampling was carried out and the sampling region and France.
Some dates are overrepresented in the dataset, which could bias the analysis. To correct for this, we sampled 7 sequences for each of the dates where more than 7 sequences were available. This was done twice to generate two datesets with 122 sequences (France122 and France122b).
To investigate temporal effects using the coalescent model, we created three other subsets of the France122 dataset:

France611 contains the 61 sequences sampled first (i.e. from Feb 21 to Mar 12),

France612 contains the 61 sequences sampled more recently (i.e. from Mar 12 to Mar 24),

France81 contains the 81 sequences sampled first (i.e. from Feb 21 to Mar 17).
With the exponential coalescent model (DT), we analysed the five subsets of data (Fra611, Fra612, Fra81, Fra122, and Fra122b), whereas we only report the most complete datasets (Fra122 and Fra122b) for the BDSKY model.
Results
Dating the TMRCA
We first report the estimation of the time to the most recent common ancestor (TMRCA) of all the sequences in the sample. Although this is the ancestor of the vast majority of the French sequences grouped in the A2 clade, it needs not be a French infection because there might have been multiple introductions of the epidemics in the country.
As pointed out by Louis Du Plessis, estimates of molecular clock should be treated with care. This is true in our case since we are analysing a small subset of the data. We therefore fix the default molecular clock to 8.8\cdot 10^{4} substitutions/position/year as estimated by Andrew Rambaut on a larger phylogeny. We also investigated a low value (4.4\cdot 10^{4} subst./pos./year) and a high value (13.2\cdot 10^{4} subst./pos./year).
As expected, the molecular clock value directly affected the time to the most recent common ancestor. Note that the sampling of the 122 sequences amongst the 183 has a much smaller impact.
The table below shows the dates for a mean evolution rate value (8.8\cdot 10^{4} subst/position/year).
Model  Size  Most recent sample  lower bound (95%)  median  upper bound (95%) 

Fix8.8DT  122  24 Mar 2020  19 Jan 2020  31 Jan 2020  09 Feb 2020 
Fix8.8BDSKY  122  24 Mar 2020  20 Jan 2020  31 Jan 2020  11 Feb 2020 
Fix8.8DT  122b  24 Mar 2020  20 Jan 2020  02 Feb 2020  10 Feb 2020 
Fix13.2DT  122  24 Mar 2020  30 Jan 2020  08 Feb 2020  15 Feb 2020 
Fix4.4DT  122  24 Mar 2020  11 Dec 2019  01 Jan 2020  17 Jan 2020 
Focusing on our largest datasets (with 122 sequences), our medium and high molecular clock assumptions date the common ancestor of the main clade regrouping the vast majority of the French sequences between midJanuary and midFebruary. This large interval is due to the rarity of “old” sequences (the first one collected in this clade dates from Feb 21) and on the fact that this clade averages the epidemics in several regions of France, which could have been seeded by independent introductions from outside France. Note that the “slow” molecular clock finds a root which seems very early given the data.
These dates are consistent with those obtained by Andrew Rambaut regarding the beginning of the epidemic in China, which is dated November 17, 2019 with a confidence interval between August 27 and December 19, 2020.
Doubling time
Using a coalescent model with exponential growth and serial sampling (Drummond et al 2002 Genetics), we can estimate the doubling time, which corresponds to the number of days for the epidemic to double in size. This parameter is key to calculate the basic reproduction number \mathcal{R}_0 (Wallinga & Lipsitch 2007 Proc B).
We show this doubling time for three of our datasets. Since the France122 dataset includes more recent sequences than France81, which itself includes more recent sequences than France61, our hypothesis is that we can detect variations in doubling time over the course of the epidemic.
As can be seen in the figure, adding more recent sequence data leds to an increase in epidemic doubling time. Initially, with the first 61 sequences (which run from Feb 21 to Mar 12), the epidemic spreads rapidly, with a median doubly time of 2.5
days. With the addition of sequences sampled between March 12 and 17, the doubling time increases to 3.3
days. Finally, by adding sequences sampled between March 17 and 24, the doubling time rises to 3.7
days.
We also studied the effect of the molecular clock on the doubling time. For our realistic molecular clocks, the effect is limited: the median is 3.4
days assuming a high value for the molecular clock and 3.7
days for our default (medium) value.
The effect of tip sampling is much less pronounced. The low value of the molecular clock, which already led to unrealistic estimates for the origin of the epidemics, also led to a high median doubling time of 5.6
days. This is unrealistic given the incidence data in France, which indicates an exponential growth rate of 0.23 days ^{1} which corresponds to a doubling time of 3 days.
In comparison, phylodynamic inferences made from data from China (with 86 genomes, Andrew Rambaut) found a median doubling time of about 7 days with a confidence interval between 4.7 and 16.3 days). The reason for the slower growth rate of the epidemic compared to ours is that we have focused on one rapidly expanding clade of the epidemic and neglected the smaller clades.
Duration of contagiousness
The birthdeath skyline (BDSKY) model (Stadler et al 2013 Proc Nat Acad Sci USA) allows us to estimate of the duration of contagiousness and the reproduction number of the epidemic (i.e. the number of secondary infections caused by an infected host). The exponential growth coalescent model described above cannot distinguish between these two quantities.
Note that the BDSKY model requires more parameter values to be estimated. In addition, it is also necessary to estimate the sampling rate after Feb 21 (the date on which the oldest sequence was sampled) whereas the coalescent model assumes that this sampling is negligible.
Posterior distribution for the rate of end of contagiousness estimated by Beast.
From this rate, we can obtain the duration of contagiousness, knowing that it is also necessary to account for the sampling rate because patients whose infections are sequenced can be assumed not to transmit the infection after this detection. The sampling rate is estimated at 0.093
days ^{1} with a (wide) 95% confidence interval between 0.006
and 0.627
days ^{1} (it is set to 0 prior to Feb 21).
The distribution of contagiousness durations is obtained by taking the inverse of the sum of the sampling rate and the contagiousness end rate (the inverse of a rate is a duration). The median of this distribution is 5.19
days and 95% of its values are between 1.52
and
8.52
days.
This result obtained using only dated sequences are consistent with those inferred using contact tracing. The latter measure serial intervals, which are used to approximate the length of the contagiousness period in the calculation of \mathcal{R}_0. For instance, Nishiura et al. (2020, Int J Infect Dis) find a distribution with a median of 4 days and a 95% confidence interval between 3.1 and 4.9 days.
Reproduction number
With the BDSKY model, we can estimate the effective reproductive number noted \mathcal{R}(t). Here, given the limited size of the dataset, we only divided the time into 3 intervals:

\mathcal{R}_1 estimates the temporal reproduction number before Feb 19,

\mathcal{R}_2 estimates the temporal reproduction number between Feb 19 and Mar 7,

\mathcal{R}_3 estimates the temporal reproduction number between Mar 7 and Mar 24.
These results are very consistent with those obtained for the doubling time, even if the time periods are different. For the period before Feb 19, the estimate is the least accurate with values of \mathcal{R}_1 included at 95% between 0.13
and 7.01
. This is consistent with the fact that the oldest sequence dates from February 21, while the tree root is estimated at the beginning of February. Over the second time period, rapid growth is detected since the values of \mathcal{R}_2 between 1.69
and 8.77
. Finally, the most recent period after Mar 7 detects a slowing down of the epidemic with a \mathcal{R}_3 between 0.8
and 2.36
.
Discussion and limits
Analysing SRASCov2 genome sequences with a known date of sampling allows to infer phylogenies of infections and to estimate epidemiological parameters. We performed this analysis based on the sequences representing the largest French clade within the phylogeny of all available SARSCov2 sequences. This clade can be visualized on the nextstrain representation.
Before summarizing the results, we prefer to point out several limitations of our analysis:

the French clade we analyzed is in fact a European clade (and even international since many US sequences origate from it): although French most French sequences appear to be grouping into two subclades within this calde, it is possible that the variations in epidemic growth that we detect are more due to European than French control policies;

some French regions (e.g. AuvergneRhôneAlpes) are more represented than others, which could bias the analysis at the national level;

the molecular clock had to be set in this analysis because we do not have sufficient sampling during the month of February in France (the results with our average and high molecular clock are similar and consistent with the incidence data).
Despite these limitations, our results suggest a slowing down of the epidemic in France. Thus, by adding sequences sampled between March 12 and 24 to the phylogeny, the doubling time of the epidemic estimated by a exponential growth coalescent model increased by 48
%. This slowdown more clearly detected using a birth death model via the temporal reproduction number \mathcal{R}(t): the median value increased from 3.88
before Mar 12 to a median value of 1.22
ater Mar 12. This is consistent with the implementation of strict control measures in France as of March 16. These variations and even these orders of magnitude are consistent with our estimates based on the time series of incidence of new hospitalizations and deaths in our Report 5.
Finally, the BDSKY model also provides us with an estimate of the duration of contagiousness, which is essential in the calculation of \mathcal{R}_0. The median value obtained (5.19
days) is consistent with some serial interval estimates for COVID19 epidemics. To date, there is no estimate of the serial interval in France.
By increasing the number of SARSCoV2 genomic sequences from the French epidemic (and the number of people working on the subject), in particular sequences collected at the beginning of the epidemic, it would be possible to :

better estimate the date at which the epidemic took off in France,

better understand the spread between the different French regions,

estimate the number of virus introductions into the country.
Appendix
Priors
Exponential coalescent
Parameter  Value 

Molecular clock  fixed 
Evolution model  GTR 
kappa  lognormal[1, 1.25] 
frequencies  uniform[0,1] 
popsize  1/x 
growth rate  Gamma[0.001,1000] 
BDSKY
Parameter  Value 

Molecular clock  fixed 
Evolution model  GTR 
kappa  lognormal[1, 1.25] 
frequencies  uniform[0,1] 
Rate of end of infection  Unif(1.2,\infty) 
Sampling rate  Beta(1,1) 
Reprodution number  LogNorm(0,1.2) with max at 10 
French sequence IDs
Click to see the whole table.
Accession Number  Sampling Date  Region 

EPI_ISL_418415  20200315  AuvergneRhôneAlpes 
EPI_ISL_418414  20200315  AuvergneRhôneAlpes 
EPI_ISL_418413  20200315  AuvergneRhôneAlpes 
EPI_ISL_418412  20200315  AuvergneRhôneAlpes 
EPI_ISL_418419  20200316  AuvergneRhôneAlpes 
EPI_ISL_418418  20200316  AuvergneRhôneAlpes 
EPI_ISL_418417  20200316  AuvergneRhôneAlpes 
EPI_ISL_418416  20200316  AuvergneRhôneAlpes 
EPI_ISL_418431  20200318  AuvergneRhôneAlpes 
EPI_ISL_418430  20200318  AuvergneRhôneAlpes 
EPI_ISL_418422  20200317  AuvergneRhôneAlpes 
EPI_ISL_418421  20200317  AuvergneRhôneAlpes 
EPI_ISL_418420  20200317  AuvergneRhôneAlpes 
EPI_ISL_418426  20200317  AuvergneRhôneAlpes 
EPI_ISL_418425  20200317  AuvergneRhôneAlpes 
EPI_ISL_418424  20200317  AuvergneRhôneAlpes 
EPI_ISL_418423  20200317  AuvergneRhôneAlpes 
EPI_ISL_418429  20200318  AuvergneRhôneAlpes 
EPI_ISL_418428  20200317  AuvergneRhôneAlpes 
EPI_ISL_418427  20200317  AuvergneRhôneAlpes 
EPI_ISL_416750  20200306  AuvergneRhôneAlpes 
EPI_ISL_416754  20200306  AuvergneRhôneAlpes 
EPI_ISL_416751  20200305  AuvergneRhôneAlpes 
EPI_ISL_416752  20200304  AuvergneRhôneAlpes 
EPI_ISL_416757  20200307  AuvergneRhôneAlpes 
EPI_ISL_416758  20200308  AuvergneRhôneAlpes 
EPI_ISL_416756  20200306  AuvergneRhôneAlpes 
EPI_ISL_416748  20200304  AuvergneRhôneAlpes 
EPI_ISL_416749  20200304  AuvergneRhôneAlpes 
EPI_ISL_416746  20200303  AuvergneRhôneAlpes 
EPI_ISL_416747  20200304  AuvergneRhôneAlpes 
EPI_ISL_416745  20200310  AuvergneRhôneAlpes 
EPI_ISL_418231  20200315  Hauts de France 
EPI_ISL_418230  20200313  Île de France 
EPI_ISL_418235  20200316  Île de France 
EPI_ISL_418234  20200314  Île de France 
EPI_ISL_418233  20200315  Île de France 
EPI_ISL_418232  20200315  Île de France 
EPI_ISL_418239  20200316  Hauts de France 
EPI_ISL_418238  20200316  Hauts de France 
EPI_ISL_418237  20200316  Hauts de France 
EPI_ISL_418236  20200316  Hauts de France 
EPI_ISL_418220  20200228  Hauts de France 
EPI_ISL_418224  20200308  Hauts de France 
EPI_ISL_418223  20200305  Hauts de France 
EPI_ISL_418222  20200304  CentreVal de Loire 
EPI_ISL_418221  20200302  Hauts de France 
EPI_ISL_418228  20200312  Hauts de France 
EPI_ISL_418227  20200312  Hauts de France 
EPI_ISL_418226  20200309  Hauts de France 
EPI_ISL_418225  20200308  Hauts de France 
EPI_ISL_418229  20200312  Île de France 
EPI_ISL_418240  20200316  Île de France 
EPI_ISL_416493  20200308  Hauts de France 
EPI_ISL_416496  20200310  Hauts de France 
EPI_ISL_416497  20200310  Hauts de France 
EPI_ISL_416494  20200304  Normandie 
EPI_ISL_416495  20200310  Hauts de France 
EPI_ISL_416498  20200311  Île de France 
EPI_ISL_416499  20200311  Île de France 
EPI_ISL_418219  20200226  AuvergneRhôneAlpes 
EPI_ISL_418218  20200221  Hauts de France 
EPI_ISL_414631  20200304  Grand Est 
EPI_ISL_414630  20200303  Hauts de France 
EPI_ISL_414633  20200304  Île de France 
EPI_ISL_414632  20200304  Grand Est 
EPI_ISL_414635  20200304  Hauts de France 
EPI_ISL_414634  20200304  Hauts de France 
EPI_ISL_414626  20200229  Hauts de France 
EPI_ISL_414625  20200226  Pays de la Loire 
EPI_ISL_414627  20200302  Hauts de France 
EPI_ISL_414629  20200303  Hauts de France 
EPI_ISL_414624  20200226  Normandie 
EPI_ISL_414637  20200304  Hauts de France 
EPI_ISL_414636  20200304  Hauts de France 
EPI_ISL_414638  20200304  Hauts de France 
EPI_ISL_420607  20200323  AuvergneRhôneAlpes 
EPI_ISL_420606  20200322  AuvergneRhôneAlpes 
EPI_ISL_420609  20200323  AuvergneRhôneAlpes 
EPI_ISL_420608  20200323  AuvergneRhôneAlpes 
EPI_ISL_420605  20200322  AuvergneRhôneAlpes 
EPI_ISL_420604  20200323  AuvergneRhôneAlpes 
EPI_ISL_420625  20200324  AuvergneRhôneAlpes 
EPI_ISL_420624  20200324  AuvergneRhôneAlpes 
EPI_ISL_420621  20200324  AuvergneRhôneAlpes 
EPI_ISL_420620  20200323  AuvergneRhôneAlpes 
EPI_ISL_420623  20200324  AuvergneRhôneAlpes 
EPI_ISL_420622  20200324  AuvergneRhôneAlpes 
EPI_ISL_420618  20200323  AuvergneRhôneAlpes 
EPI_ISL_420617  20200323  AuvergneRhôneAlpes 
EPI_ISL_420619  20200323  AuvergneRhôneAlpes 
EPI_ISL_420614  20200323  AuvergneRhôneAlpes 
EPI_ISL_420613  20200323  AuvergneRhôneAlpes 
EPI_ISL_420616  20200323  AuvergneRhôneAlpes 
EPI_ISL_420615  20200323  AuvergneRhôneAlpes 
EPI_ISL_420610  20200323  AuvergneRhôneAlpes 
EPI_ISL_420612  20200323  AuvergneRhôneAlpes 
EPI_ISL_420611  20200323  AuvergneRhôneAlpes 
EPI_ISL_416511  20200307  Bretagne 
EPI_ISL_416512  20200307  Bretagne 
EPI_ISL_416510  20200306  Bretagne 
EPI_ISL_416513  20200307  Bretagne 
EPI_ISL_416508  20200306  Bretagne 
EPI_ISL_416509  20200306  Bretagne 
EPI_ISL_416506  20200303  Bretagne 
EPI_ISL_416500  20200311  Île de France 
EPI_ISL_416501  20200310  Île de France 
EPI_ISL_416504  20200302  Bretagne 
EPI_ISL_416505  20200302  Bretagne 
EPI_ISL_416502  20200226  Bretagne 
EPI_ISL_417340  20200307  AuvergneRhôneAlpes 
EPI_ISL_417333  20200304  AuvergneRhôneAlpes 
EPI_ISL_417336  20200306  AuvergneRhôneAlpes 
EPI_ISL_417337  20200307  AuvergneRhôneAlpes 
EPI_ISL_417334  20200304  AuvergneRhôneAlpes 
EPI_ISL_417338  20200307  AuvergneRhôneAlpes 
EPI_ISL_417339  20200308  AuvergneRhôneAlpes 
EPI_ISL_420056  20200322  Hauts de France 
EPI_ISL_420055  20200317  Grand Est 
EPI_ISL_420058  20200320  Île de France 
EPI_ISL_420057  20200322  Hauts de France 
EPI_ISL_420052  20200320  Île de France 
EPI_ISL_420051  20200319  Île de France 
EPI_ISL_420054  20200320  Île de France 
EPI_ISL_420053  20200319  Hauts de France 
EPI_ISL_420050  20200319  Hauts de France 
EPI_ISL_420049  20200319  Hauts de France 
EPI_ISL_420048  20200318  Île de France 
EPI_ISL_420045  20200317  Grand Est 
EPI_ISL_420044  20200318  Hauts de France 
EPI_ISL_420047  20200317  Île de France 
EPI_ISL_420046  20200317  Île de France 
EPI_ISL_420041  20200317  Hauts de France 
EPI_ISL_420040  20200312  Grand Est 
EPI_ISL_420043  20200318  Île de France 
EPI_ISL_420042  20200317  Île de France 
EPI_ISL_420038  20200317  AuvergneRhôneAlpes 
EPI_ISL_420039  20200312  Grand Est 
EPI_ISL_420063  20200322  Île de France 
EPI_ISL_420062  20200322  Île de France 
EPI_ISL_420064  20200323  Île de France 
EPI_ISL_420061  20200323  Île de France 
EPI_ISL_420060  20200320  Île de France 
EPI_ISL_420059  20200321  Île de France 
EPI_ISL_421513  20200323  Île de France 
EPI_ISL_421514  20200320  CentreVal de Loire 
EPI_ISL_421511  20200323  Hauts de France 
EPI_ISL_421512  20200323  Île de France 
EPI_ISL_421510  20200323  Hauts de France 
EPI_ISL_421508  20200323  Île de France 
EPI_ISL_421509  20200323  Hauts de France 
EPI_ISL_421506  20200321  Île de France 
EPI_ISL_421507  20200323  Île de France 
EPI_ISL_421504  20200314  Île de France 
EPI_ISL_421505  20200315  Île de France 
EPI_ISL_421502  20200312  Île de France 
EPI_ISL_421503  20200312  Île de France 
EPI_ISL_421500  20200311  Hauts de France 
EPI_ISL_421501  20200312  Île de France 
EPI_ISL_415650  20200302  Île de France 
EPI_ISL_415652  20200305  AuvergneRhôneAlpes 
EPI_ISL_415651  20200305  AuvergneRhôneAlpes 
EPI_ISL_415654  20200309  Hauts de France 
EPI_ISL_415653  20200308  Hauts de France 
EPI_ISL_415649  20200305  Hauts de France 
EPI_ISL_419169  20200321  AuvergneRhôneAlpes 
EPI_ISL_419168  20200317  AuvergneRhôneAlpes 
EPI_ISL_419188  20200322  AuvergneRhôneAlpes 
EPI_ISL_419187  20200322  AuvergneRhôneAlpes 
EPI_ISL_419186  20200322  AuvergneRhôneAlpes 
EPI_ISL_419185  20200322  AuvergneRhôneAlpes 
EPI_ISL_419180  20200322  AuvergneRhôneAlpes 
EPI_ISL_419184  20200322  AuvergneRhôneAlpes 
EPI_ISL_419183  20200322  AuvergneRhôneAlpes 
EPI_ISL_419182  20200322  AuvergneRhôneAlpes 
EPI_ISL_419181  20200322  AuvergneRhôneAlpes 
EPI_ISL_419177  20200322  AuvergneRhôneAlpes 
EPI_ISL_419176  20200321  AuvergneRhôneAlpes 
EPI_ISL_419175  20200321  AuvergneRhôneAlpes 
EPI_ISL_419174  20200320  AuvergneRhôneAlpes 
EPI_ISL_419179  20200322  AuvergneRhôneAlpes 
EPI_ISL_419178  20200322  AuvergneRhôneAlpes 
EPI_ISL_419173  20200321  AuvergneRhôneAlpes 
EPI_ISL_419172  20200321  AuvergneRhôneAlpes 
EPI_ISL_419171  20200321  AuvergneRhôneAlpes 
EPI_ISL_419170  20200321  AuvergneRhôneAlpes 
Software used

Sequence alignment was obtained through the Augur pipeline developed by nextstrain.

The choice of the surrogate genetic model was made using SMS software.

The maximum likelihood phylogeny was inferred using PhyML.

The search for temporal signal of molecular evolution was carried out using the software TempEst.

The phylodynamic analyses were performed using version 1.8.3 of the software Beast and version 2.3 of the software Beast2.

Additional manipulations were performed in R using the ape package.
Sources and acknowledgements

DNA sequence data have been provided by several international laboratories and made available through GISAID  Global Initiative on Sharing All Influenza Data.

We thank the patients, nurses, doctors and all the French laboratories who made this work possible by generating and sharing the virus genome sequences.

The authors thank the highperformance computer itrop (platform South Green) of the IRD of Montpellier for the provision of the highperformance computing resources, which contributed to the results presented in this work (more details on bioinfo.ird.fr).

The ETE modelling team is composed of Samuel Alizon, Thomas Bénéteau, Marc Choisy, Gonché Danesh, Ramsès DjidjouDemasse, Baptiste Elie, Yannis Michalakis, Bastien Reyné, Quentin Richard, Christian Selinger, Mircea T. Sofonea.

Contribution to this work:

conception of the work: the whole team

carrying out analyses: GD, BE and SA

report writing: SA

contacts: samuel.alizon@cnrs.fr, gonche.danesh@ird.fr, baptiste.elie@ird.fr

approval: the whole team


This work is made available under the terms of the Creative Commons AttributionNoncommercial 4.0 International License.