A paper by Ji et al. suggesting that snakes might serve as a likely reservoir for the novel nCoV-2019 virus was recently published after accelerated review and widely circulated by the news media (https://onlinelibrary.wiley.com/doi/abs/10.1002/jmv.25682).
The author’s claim was based on the observation that the codon usage of nCoV-2019 was more similar to snakes than other potential hosts they investigated, however, this premise is incorrect - only rarely is the codon usage of a virus most closely matched to that of a known reservoir host.
To investigate the claim by Ji et al. and to further build the argument put forward by @david.l.robertson (nCoV's relationship to bat coronaviruses & recombination signals (no snakes) - no evidence the 2019-nCoV lineage is recombinant) that there is no data to support that snakes would be a likely reservoir for nCoV-2019, I calculated the codon usage of SARS-CoV (likely/known reservoir: bats), MERS-CoV (known reservoir: camels), and nCoV-2019 (reservoir: unknown), and investigated how closely matched they were to a range of different species (including known reservoir hosts).
I used the same codon tables used by Ji et al. from the commonly used Kazusa codon table database (Codon Usage Database) - however, several of these codon tables are out of date and for some of the species investigated, severely undersampled. I also obtained codon tables from a larger set of species using the more comprehensive tables from “HIVE-CUT”: A new and updated resource for codon usage tables | BMC Bioinformatics | Full Text. Raw “codon adaptation index” (CAI) was calculated for each virus sequence from SARS-CoV, MERS-CoV, and nCoV-2019 against each of the potential reservoir species using the command line version of CAICal (CAIcal: A combined set of tools to assess codon usage adaptation | Biology Direct | Full Text). The codon adaptation index is effectively a measure of how well the codon usage of a particular sequence matches that of a putative host. To normalize for GC content and AA usage, an expected CAI was also calculated for each species (note: it is unclear to me if Ji et al. also normalized their values).
As can be seen in Figure 1, that while nCoV-2019 does indeed have a high CAI to several different snake species, the same is also true for both MERS-CoV and SARS-CoV that have camels and bats as known reservoir species, respectively. In fact, both MERS-CoV and SARS-CoV sequences have higher normalized CAI values than nCoV-2019. In addition, nCoV-2019 (as well as MERS-CoV and SARS-CoV) also have high CAI values to hosts that are even more unlikely than snakes to serve as the reservoir, including several fungi. And no, fungi are not likely to have started the outbreak in Wuhan.
Figure 2 shows the most relevant host species, giving a clearer picture of the issues noted above.
In conclusion, the study by Ji et al. is flawed and there is no evidence for snakes being the reservoir for nCoV-2019. This does not mean that snakes couldn’t be the reservoir, however, there is currently no data to support this claim and I find that hypothesis unlikely given that nCoV-2019 is closely related to SARS-CoV-like viruses circulating in bats. At this stage, we still do not know what the reservoir for nCoV-2019 is and how widespread it is (although bats seem likely). Finally, the premise that one would be able to use a single simple measure such as codon adaptation to identify reservoir hosts for novel viruses is incorrect.
A somewhat interesting finding from these analyses is the fact that nCoV-2019 overall has a lower CAI to almost all species tested. I wouldn’t really read too much into that though - we have seen that for other virus genera in the past.
Codon tables as well as raw and normalized CAI values can be downloaded below.
codon_tables.hive_cut.zip (44.9 KB) codon_tables.kazusa.zip (20.3 KB) csv_files.zip (8.6 KB)
Fig1.pdf (3.1 MB) Fig2.pdf (1.8 MB)