# Early phylodynamics analysis of the COVID-19 epidemics in France using 194 genomes - April 10, 2020

Early phylodynamics analysis of the COVID-19 epidemics in France

21-Apr-2020

Gonché Danesh, Baptiste Elie and Samuel Alizon

ETE Modelling group, MIVEGEC, CNRS, IRD, Université de Montpellier

# Introduction

This report focuses on the COVID-19 epidemic in France using the SARS-Cov-2 genome data available from the GISAID - Global Initiative on Sharing All Influenza Data on April 10, 2020.

The first cases of COVID-19 were detected in France in Jan 2020, mostly from travelers, but they remained isolated. Incidence based on screening data only started to increase steadily in France on Feb 27. Limited measures were announced on Feb 28, but schools were closed from Mar 16 and a nationwide lockdown was implemented from Mar 17. On Apr 19, the prime minister E Philippe gave the first official estimate of the basic reproduction number (\mathcal{R}_0), which was 3.5. He also announced the temporal reproduction number had been dropped to 0.6.

We performed phylodynamics analyses using an exponential growth coalescent model in Beast 1.8.3 and a BDSKY model in Beast 2.3. Due to the low temporal signal in the data, we used molecular clocks with fixed values.

Here, we report results on the origin of the tree, the epidemic doubling time, the generation time and the temporal reproduction number \mathcal{R}(t).

# Data used

On April 10, 194 sequences were available from samples originating from France. The vast majority of these sequences (94%) belong to the same clade labelled as A2 by nextstrain.

These sequences were aligned and cleaned using the Augur pipeline developed by nextstrain. One sequence was removed due to low qualityn, which led to a dataset of 186 sequences. The list is shown in Appendix.

We screened the dataset with RDP, which did not detect any recombination events.

The following graph shows the 186 sequences used, classified according to the date on which the sampling was carried out and the sampling region and France.

Some dates are over-represented in the dataset, which could bias the analysis. To correct for this, we sampled 7 sequences for each of the dates where more than 7 sequences were available. This was done twice to generate two datesets with 122 sequences (France122 and France122b).

To investigate temporal effects using the coalescent model, we created three other subsets of the France122 dataset:

• France61-1 contains the 61 sequences sampled first (i.e. from Feb 21 to Mar 12),

• France61-2 contains the 61 sequences sampled more recently (i.e. from Mar 12 to Mar 24),

• France81 contains the 81 sequences sampled first (i.e. from Feb 21 to Mar 17).

With the exponential coalescent model (DT), we analysed the five subsets of data (Fra61-1, Fra61-2, Fra81, Fra122, and Fra122b), whereas we only report the most complete datasets (Fra122 and Fra122b) for the BDSKY model.

# Results

## Dating the TMRCA

We first report the estimation of the time to the most recent common ancestor (TMRCA) of all the sequences in the sample. Although this is the ancestor of the vast majority of the French sequences grouped in the A2 clade, it needs not be a French infection because there might have been multiple introductions of the epidemics in the country.

As pointed out by Louis Du Plessis, estimates of molecular clock should be treated with care. This is true in our case since we are analysing a small subset of the data. We therefore fix the default molecular clock to 8.8\cdot 10^{-4} substitutions/position/year as estimated by Andrew Rambaut on a larger phylogeny. We also investigated a low value (4.4\cdot 10^{-4} subst./pos./year) and a high value (13.2\cdot 10^{-4} subst./pos./year).

As expected, the molecular clock value directly affected the time to the most recent common ancestor. Note that the sampling of the 122 sequences amongst the 183 has a much smaller impact.

The table below shows the dates for a mean evolution rate value (8.8\cdot 10^{-4} subst/position/year).

Model Size Most recent sample lower bound (95%) median upper bound (95%)
Fix8.8-DT 122 24 Mar 2020 19 Jan 2020 31 Jan 2020 09 Feb 2020
Fix8.8-BDSKY 122 24 Mar 2020 20 Jan 2020 31 Jan 2020 11 Feb 2020
Fix8.8-DT 122b 24 Mar 2020 20 Jan 2020 02 Feb 2020 10 Feb 2020
Fix13.2-DT 122 24 Mar 2020 30 Jan 2020 08 Feb 2020 15 Feb 2020
Fix4.4-DT 122 24 Mar 2020 11 Dec 2019 01 Jan 2020 17 Jan 2020

Focusing on our largest datasets (with 122 sequences), our medium and high molecular clock assumptions date the common ancestor of the main clade regrouping the vast majority of the French sequences between mid-January and mid-February. This large interval is due to the rarity of “old” sequences (the first one collected in this clade dates from Feb 21) and on the fact that this clade averages the epidemics in several regions of France, which could have been seeded by independent introductions from outside France. Note that the “slow” molecular clock finds a root which seems very early given the data.

These dates are consistent with those obtained by Andrew Rambaut regarding the beginning of the epidemic in China, which is dated November 17, 2019 with a confidence interval between August 27 and December 19, 2020.

## Doubling time

Using a coalescent model with exponential growth and serial sampling (Drummond et al 2002 Genetics), we can estimate the doubling time, which corresponds to the number of days for the epidemic to double in size. This parameter is key to calculate the basic reproduction number \mathcal{R}_0 (Wallinga & Lipsitch 2007 Proc B).

We show this doubling time for three of our datasets. Since the France122 dataset includes more recent sequences than France81, which itself includes more recent sequences than France61, our hypothesis is that we can detect variations in doubling time over the course of the epidemic.

As can be seen in the figure, adding more recent sequence data leds to an increase in epidemic doubling time. Initially, with the first 61 sequences (which run from Feb 21 to Mar 12), the epidemic spreads rapidly, with a median doubly time of 2.5 days. With the addition of sequences sampled between March 12 and 17, the doubling time increases to 3.3 days. Finally, by adding sequences sampled between March 17 and 24, the doubling time rises to 3.7 days.

We also studied the effect of the molecular clock on the doubling time. For our realistic molecular clocks, the effect is limited: the median is 3.4 days assuming a high value for the molecular clock and 3.7 days for our default (medium) value.
The effect of tip sampling is much less pronounced. The low value of the molecular clock, which already led to unrealistic estimates for the origin of the epidemics, also led to a high median doubling time of 5.6 days. This is unrealistic given the incidence data in France, which indicates an exponential growth rate of 0.23 days ^{-1} which corresponds to a doubling time of 3 days.

In comparison, phylodynamic inferences made from data from China (with 86 genomes, Andrew Rambaut) found a median doubling time of about 7 days with a confidence interval between 4.7 and 16.3 days). The reason for the slower growth rate of the epidemic compared to ours is that we have focused on one rapidly expanding clade of the epidemic and neglected the smaller clades.

## Duration of contagiousness

The birth-death skyline (BDSKY) model (Stadler et al 2013 Proc Nat Acad Sci USA) allows us to estimate of the duration of contagiousness and the reproduction number of the epidemic (i.e. the number of secondary infections caused by an infected host). The exponential growth coalescent model described above cannot distinguish between these two quantities.

Note that the BDSKY model requires more parameter values to be estimated. In addition, it is also necessary to estimate the sampling rate after Feb 21 (the date on which the oldest sequence was sampled) whereas the coalescent model assumes that this sampling is negligible.

Posterior distribution for the rate of end of contagiousness estimated by Beast.

From this rate, we can obtain the duration of contagiousness, knowing that it is also necessary to account for the sampling rate because patients whose infections are sequenced can be assumed not to transmit the infection after this detection. The sampling rate is estimated at 0.093 days ^{-1} with a (wide) 95% confidence interval between 0.006 and 0.627 days ^{-1} (it is set to 0 prior to Feb 21).

The distribution of contagiousness durations is obtained by taking the inverse of the sum of the sampling rate and the contagiousness end rate (the inverse of a rate is a duration). The median of this distribution is 5.19 days and 95% of its values are between 1.52 and
8.52 days.

This result obtained using only dated sequences are consistent with those inferred using contact tracing. The latter measure serial intervals, which are used to approximate the length of the contagiousness period in the calculation of \mathcal{R}_0. For instance, Nishiura et al. (2020, Int J Infect Dis) find a distribution with a median of 4 days and a 95% confidence interval between 3.1 and 4.9 days.

## Reproduction number

With the BDSKY model, we can estimate the effective reproductive number noted \mathcal{R}(t). Here, given the limited size of the dataset, we only divided the time into 3 intervals:

• \mathcal{R}_1 estimates the temporal reproduction number before Feb 19,

• \mathcal{R}_2 estimates the temporal reproduction number between Feb 19 and Mar 7,

• \mathcal{R}_3 estimates the temporal reproduction number between Mar 7 and Mar 24.

These results are very consistent with those obtained for the doubling time, even if the time periods are different. For the period before Feb 19, the estimate is the least accurate with values of \mathcal{R}_1 included at 95% between 0.13 and 7.01. This is consistent with the fact that the oldest sequence dates from February 21, while the tree root is estimated at the beginning of February. Over the second time period, rapid growth is detected since the values of \mathcal{R}_2 between 1.69 and 8.77. Finally, the most recent period after Mar 7 detects a slowing down of the epidemic with a \mathcal{R}_3 between 0.8 and 2.36.

# Discussion and limits

Analysing SRAS-Cov-2 genome sequences with a known date of sampling allows to infer phylogenies of infections and to estimate epidemiological parameters. We performed this analysis based on the sequences representing the largest French clade within the phylogeny of all available SARS-Cov-2 sequences. This clade can be visualized on the nextstrain representation.

Before summarizing the results, we prefer to point out several limitations of our analysis:

• the French clade we analyzed is in fact a European clade (and even international since many US sequences origate from it): although French most French sequences appear to be grouping into two subclades within this calde, it is possible that the variations in epidemic growth that we detect are more due to European than French control policies;

• some French regions (e.g. Auvergne-Rhône-Alpes) are more represented than others, which could bias the analysis at the national level;

• the molecular clock had to be set in this analysis because we do not have sufficient sampling during the month of February in France (the results with our average and high molecular clock are similar and consistent with the incidence data).

Despite these limitations, our results suggest a slowing down of the epidemic in France. Thus, by adding sequences sampled between March 12 and 24 to the phylogeny, the doubling time of the epidemic estimated by a exponential growth coalescent model increased by 48%. This slowdown more clearly detected using a birth death model via the temporal reproduction number \mathcal{R}(t): the median value increased from 3.88 before Mar 12 to a median value of 1.22 ater Mar 12. This is consistent with the implementation of strict control measures in France as of March 16. These variations and even these orders of magnitude are consistent with our estimates based on the time series of incidence of new hospitalizations and deaths in our Report 5.

Finally, the BDSKY model also provides us with an estimate of the duration of contagiousness, which is essential in the calculation of \mathcal{R}_0. The median value obtained (5.19 days) is consistent with some serial interval estimates for COVID-19 epidemics. To date, there is no estimate of the serial interval in France.

By increasing the number of SARS-CoV-2 genomic sequences from the French epidemic (and the number of people working on the subject), in particular sequences collected at the beginning of the epidemic, it would be possible to :

• better estimate the date at which the epidemic took off in France,

• better understand the spread between the different French regions,

• estimate the number of virus introductions into the country.

# Appendix

## Priors

### Exponential coalescent

Parameter Value
Molecular clock fixed
Evolution model GTR
kappa lognormal[1, 1.25]
frequencies uniform[0,1]
popsize 1/x
growth rate Gamma[0.001,1000]

### BDSKY

Parameter Value
Molecular clock fixed
Evolution model GTR
kappa lognormal[1, 1.25]
frequencies uniform[0,1]
Rate of end of infection Unif(1.2,\infty)
Sampling rate Beta(1,1)
Reprodution number LogNorm(0,1.2) with max at 10

### French sequence IDs

Click to see the whole table.
Accession Number Sampling Date Region
EPI_ISL_418415 2020-03-15 Auvergne-Rhône-Alpes
EPI_ISL_418414 2020-03-15 Auvergne-Rhône-Alpes
EPI_ISL_418413 2020-03-15 Auvergne-Rhône-Alpes
EPI_ISL_418412 2020-03-15 Auvergne-Rhône-Alpes
EPI_ISL_418419 2020-03-16 Auvergne-Rhône-Alpes
EPI_ISL_418418 2020-03-16 Auvergne-Rhône-Alpes
EPI_ISL_418417 2020-03-16 Auvergne-Rhône-Alpes
EPI_ISL_418416 2020-03-16 Auvergne-Rhône-Alpes
EPI_ISL_418431 2020-03-18 Auvergne-Rhône-Alpes
EPI_ISL_418430 2020-03-18 Auvergne-Rhône-Alpes
EPI_ISL_418422 2020-03-17 Auvergne-Rhône-Alpes
EPI_ISL_418421 2020-03-17 Auvergne-Rhône-Alpes
EPI_ISL_418420 2020-03-17 Auvergne-Rhône-Alpes
EPI_ISL_418426 2020-03-17 Auvergne-Rhône-Alpes
EPI_ISL_418425 2020-03-17 Auvergne-Rhône-Alpes
EPI_ISL_418424 2020-03-17 Auvergne-Rhône-Alpes
EPI_ISL_418423 2020-03-17 Auvergne-Rhône-Alpes
EPI_ISL_418429 2020-03-18 Auvergne-Rhône-Alpes
EPI_ISL_418428 2020-03-17 Auvergne-Rhône-Alpes
EPI_ISL_418427 2020-03-17 Auvergne-Rhône-Alpes
EPI_ISL_416750 2020-03-06 Auvergne-Rhône-Alpes
EPI_ISL_416754 2020-03-06 Auvergne-Rhône-Alpes
EPI_ISL_416751 2020-03-05 Auvergne-Rhône-Alpes
EPI_ISL_416752 2020-03-04 Auvergne-Rhône-Alpes
EPI_ISL_416757 2020-03-07 Auvergne-Rhône-Alpes
EPI_ISL_416758 2020-03-08 Auvergne-Rhône-Alpes
EPI_ISL_416756 2020-03-06 Auvergne-Rhône-Alpes
EPI_ISL_416748 2020-03-04 Auvergne-Rhône-Alpes
EPI_ISL_416749 2020-03-04 Auvergne-Rhône-Alpes
EPI_ISL_416746 2020-03-03 Auvergne-Rhône-Alpes
EPI_ISL_416747 2020-03-04 Auvergne-Rhône-Alpes
EPI_ISL_416745 2020-03-10 Auvergne-Rhône-Alpes
EPI_ISL_418231 2020-03-15 Hauts de France
EPI_ISL_418230 2020-03-13 Île de France
EPI_ISL_418235 2020-03-16 Île de France
EPI_ISL_418234 2020-03-14 Île de France
EPI_ISL_418233 2020-03-15 Île de France
EPI_ISL_418232 2020-03-15 Île de France
EPI_ISL_418239 2020-03-16 Hauts de France
EPI_ISL_418238 2020-03-16 Hauts de France
EPI_ISL_418237 2020-03-16 Hauts de France
EPI_ISL_418236 2020-03-16 Hauts de France
EPI_ISL_418220 2020-02-28 Hauts de France
EPI_ISL_418224 2020-03-08 Hauts de France
EPI_ISL_418223 2020-03-05 Hauts de France
EPI_ISL_418222 2020-03-04 Centre-Val de Loire
EPI_ISL_418221 2020-03-02 Hauts de France
EPI_ISL_418228 2020-03-12 Hauts de France
EPI_ISL_418227 2020-03-12 Hauts de France
EPI_ISL_418226 2020-03-09 Hauts de France
EPI_ISL_418225 2020-03-08 Hauts de France
EPI_ISL_418229 2020-03-12 Île de France
EPI_ISL_418240 2020-03-16 Île de France
EPI_ISL_416493 2020-03-08 Hauts de France
EPI_ISL_416496 2020-03-10 Hauts de France
EPI_ISL_416497 2020-03-10 Hauts de France
EPI_ISL_416494 2020-03-04 Normandie
EPI_ISL_416495 2020-03-10 Hauts de France
EPI_ISL_416498 2020-03-11 Île de France
EPI_ISL_416499 2020-03-11 Île de France
EPI_ISL_418219 2020-02-26 Auvergne-Rhône-Alpes
EPI_ISL_418218 2020-02-21 Hauts de France
EPI_ISL_414631 2020-03-04 Grand Est
EPI_ISL_414630 2020-03-03 Hauts de France
EPI_ISL_414633 2020-03-04 Île de France
EPI_ISL_414632 2020-03-04 Grand Est
EPI_ISL_414635 2020-03-04 Hauts de France
EPI_ISL_414634 2020-03-04 Hauts de France
EPI_ISL_414626 2020-02-29 Hauts de France
EPI_ISL_414625 2020-02-26 Pays de la Loire
EPI_ISL_414627 2020-03-02 Hauts de France
EPI_ISL_414629 2020-03-03 Hauts de France
EPI_ISL_414624 2020-02-26 Normandie
EPI_ISL_414637 2020-03-04 Hauts de France
EPI_ISL_414636 2020-03-04 Hauts de France
EPI_ISL_414638 2020-03-04 Hauts de France
EPI_ISL_420607 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420606 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_420609 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420608 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420605 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_420604 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420625 2020-03-24 Auvergne-Rhône-Alpes
EPI_ISL_420624 2020-03-24 Auvergne-Rhône-Alpes
EPI_ISL_420621 2020-03-24 Auvergne-Rhône-Alpes
EPI_ISL_420620 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420623 2020-03-24 Auvergne-Rhône-Alpes
EPI_ISL_420622 2020-03-24 Auvergne-Rhône-Alpes
EPI_ISL_420618 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420617 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420619 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420614 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420613 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420616 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420615 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420610 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420612 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_420611 2020-03-23 Auvergne-Rhône-Alpes
EPI_ISL_416511 2020-03-07 Bretagne
EPI_ISL_416512 2020-03-07 Bretagne
EPI_ISL_416510 2020-03-06 Bretagne
EPI_ISL_416513 2020-03-07 Bretagne
EPI_ISL_416508 2020-03-06 Bretagne
EPI_ISL_416509 2020-03-06 Bretagne
EPI_ISL_416506 2020-03-03 Bretagne
EPI_ISL_416500 2020-03-11 Île de France
EPI_ISL_416501 2020-03-10 Île de France
EPI_ISL_416504 2020-03-02 Bretagne
EPI_ISL_416505 2020-03-02 Bretagne
EPI_ISL_416502 2020-02-26 Bretagne
EPI_ISL_417340 2020-03-07 Auvergne-Rhône-Alpes
EPI_ISL_417333 2020-03-04 Auvergne-Rhône-Alpes
EPI_ISL_417336 2020-03-06 Auvergne-Rhône-Alpes
EPI_ISL_417337 2020-03-07 Auvergne-Rhône-Alpes
EPI_ISL_417334 2020-03-04 Auvergne-Rhône-Alpes
EPI_ISL_417338 2020-03-07 Auvergne-Rhône-Alpes
EPI_ISL_417339 2020-03-08 Auvergne-Rhône-Alpes
EPI_ISL_420056 2020-03-22 Hauts de France
EPI_ISL_420055 2020-03-17 Grand Est
EPI_ISL_420058 2020-03-20 Île de France
EPI_ISL_420057 2020-03-22 Hauts de France
EPI_ISL_420052 2020-03-20 Île de France
EPI_ISL_420051 2020-03-19 Île de France
EPI_ISL_420054 2020-03-20 Île de France
EPI_ISL_420053 2020-03-19 Hauts de France
EPI_ISL_420050 2020-03-19 Hauts de France
EPI_ISL_420049 2020-03-19 Hauts de France
EPI_ISL_420048 2020-03-18 Île de France
EPI_ISL_420045 2020-03-17 Grand Est
EPI_ISL_420044 2020-03-18 Hauts de France
EPI_ISL_420047 2020-03-17 Île de France
EPI_ISL_420046 2020-03-17 Île de France
EPI_ISL_420041 2020-03-17 Hauts de France
EPI_ISL_420040 2020-03-12 Grand Est
EPI_ISL_420043 2020-03-18 Île de France
EPI_ISL_420042 2020-03-17 Île de France
EPI_ISL_420038 2020-03-17 Auvergne-Rhône-Alpes
EPI_ISL_420039 2020-03-12 Grand Est
EPI_ISL_420063 2020-03-22 Île de France
EPI_ISL_420062 2020-03-22 Île de France
EPI_ISL_420064 2020-03-23 Île de France
EPI_ISL_420061 2020-03-23 Île de France
EPI_ISL_420060 2020-03-20 Île de France
EPI_ISL_420059 2020-03-21 Île de France
EPI_ISL_421513 2020-03-23 Île de France
EPI_ISL_421514 2020-03-20 Centre-Val de Loire
EPI_ISL_421511 2020-03-23 Hauts de France
EPI_ISL_421512 2020-03-23 Île de France
EPI_ISL_421510 2020-03-23 Hauts de France
EPI_ISL_421508 2020-03-23 Île de France
EPI_ISL_421509 2020-03-23 Hauts de France
EPI_ISL_421506 2020-03-21 Île de France
EPI_ISL_421507 2020-03-23 Île de France
EPI_ISL_421504 2020-03-14 Île de France
EPI_ISL_421505 2020-03-15 Île de France
EPI_ISL_421502 2020-03-12 Île de France
EPI_ISL_421503 2020-03-12 Île de France
EPI_ISL_421500 2020-03-11 Hauts de France
EPI_ISL_421501 2020-03-12 Île de France
EPI_ISL_415650 2020-03-02 Île de France
EPI_ISL_415652 2020-03-05 Auvergne-Rhône-Alpes
EPI_ISL_415651 2020-03-05 Auvergne-Rhône-Alpes
EPI_ISL_415654 2020-03-09 Hauts de France
EPI_ISL_415653 2020-03-08 Hauts de France
EPI_ISL_415649 2020-03-05 Hauts de France
EPI_ISL_419169 2020-03-21 Auvergne-Rhône-Alpes
EPI_ISL_419168 2020-03-17 Auvergne-Rhône-Alpes
EPI_ISL_419188 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419187 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419186 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419185 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419180 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419184 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419183 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419182 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419181 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419177 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419176 2020-03-21 Auvergne-Rhône-Alpes
EPI_ISL_419175 2020-03-21 Auvergne-Rhône-Alpes
EPI_ISL_419174 2020-03-20 Auvergne-Rhône-Alpes
EPI_ISL_419179 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419178 2020-03-22 Auvergne-Rhône-Alpes
EPI_ISL_419173 2020-03-21 Auvergne-Rhône-Alpes
EPI_ISL_419172 2020-03-21 Auvergne-Rhône-Alpes
EPI_ISL_419171 2020-03-21 Auvergne-Rhône-Alpes
EPI_ISL_419170 2020-03-21 Auvergne-Rhône-Alpes

## Software used

• Sequence alignment was obtained through the Augur pipeline developed by nextstrain.

• The choice of the surrogate genetic model was made using SMS software.

• The maximum likelihood phylogeny was inferred using PhyML.

• The search for temporal signal of molecular evolution was carried out using the software TempEst.

• The phylodynamic analyses were performed using version 1.8.3 of the software Beast and version 2.3 of the software Beast2.

• Additional manipulations were performed in R using the ape package.

# Sources and acknowledgements

• DNA sequence data have been provided by several international laboratories and made available through GISAID - Global Initiative on Sharing All Influenza Data.

• We thank the patients, nurses, doctors and all the French laboratories who made this work possible by generating and sharing the virus genome sequences.

• The authors thank the high-performance computer itrop (platform South Green) of the IRD of Montpellier for the provision of the high-performance computing resources, which contributed to the results presented in this work (more details on bioinfo.ird.fr).

• The ETE modelling team is composed of Samuel Alizon, Thomas Bénéteau, Marc Choisy, Gonché Danesh, Ramsès Djidjou-Demasse, Baptiste Elie, Yannis Michalakis, Bastien Reyné, Quentin Richard, Christian Selinger, Mircea T. Sofonea.

• Contribution to this work:

• conception of the work: the whole team

• carrying out analyses: GD, BE and SA

• report writing: SA

• approval: the whole team

• This work is made available under the terms of the Creative Commons Attribution-Noncommercial 4.0 International License.

Correct sequence data table

We apologize but the uploaded table with the sequence data was missing important columns with the laboratories and the teams who did all the work to generate the sequences.

The correct table can be seen here

We again thank the patients, nurses, doctors and sequencing teams who allowed us to perform this work.

Thanks for sharing this interesting analysis!
Quick comment: as the sampling rate in the bdsky model is the time between infection and swabbing (to get the RNA to be sequenced), this period is the time until symptomes plus the time from symptomes to swabbing. Since people only show symptomes after 5 days, I assume the period until sampling must be longer than 5 days? (in CH, we estimate it more to be around 8-9 days). Becoming noninfectious indeed is estimated in the literature to be around 5 days.

Thanks for the comment!
Actually we reported the sampling rate but did not discuss it.
The median value is 0.093 days ^{-1}, which means 10.8 days. This kind of makes sense since in France most of the sampling (and even the screening in general) was done on severe cases upon hospital admission.
However, the confidence interval on the sampling rate is also huge with the current data (from 1.6 to 166 (!) days). We could look into the importance of the prior shape on the result but I guess what we need is more phylogenetic signal…