Preliminary in silico assessment of the specificity of published molecular assays and design of new assays using the available whole genome sequences of 2019-nCoV

Preliminary in silico assessment of the specificity of published molecular assays and design of new assays using the available whole genome sequences of 2019-nCoV

Authors:

Mitchell Holland1, Daniel Negrón1, Shane Mitchell1, Mychal Ivancich1, Katharine W. Jennings1, Bruce Goodwin2, and Shanmuga Sozhamannan2.3

1Noblis, Reston, VA 20191

2Defense Biological Product Assurance Office, Frederick, MD 21702

3Logistics Management Institute, Tysons, VA 22102

Acknowledgement: All the investigators/labs that published the sequence data at GISAID- also see table at the end

Similar analyses/References: Marion Koopmans; Initial assessment of the ability of published coronavirus primers sets to detect the Wuhan coronavirus: http://virological.org/t/initial-assessment-of-the-ability-of-published-coronavirus-primers-sets-to-detect-the-wuhan-coronavirus/321Source

WGS: 25 WGS sequences have been downloaded from GISAID; All other data from NCBI BLAST databases (nt, gss, and env_nt - last updated December 2019)

Attached zip file with data visualizations: nCoV_pset_report.zip (459.5 KB)

Analyses:

BioLaboro is an application for rapidly designing de novo assays and validating existing PCR detection assays. It is a user-friendly new assay discovery pipeline composed of three tools: BioVelocity®, Primer3, and PSET. BioVelocity® uses a rapid, accurate hashing algorithm to align sequencing reads to a large set of references (e.g. Genbank) (Sozhamannan et al., 2015). BioVelocity® creates a k-mer index to determine all possible matches between query sequences and references simultaneously using a large RAM system (i.e. an IBM Power8). This algorithm makes it possible to very quickly identify sequences conserved within or omitted from a set of target references. Primer3 (http://primer3.sourceforge.net/) is a tool for designing primers and probes for real-time PCR reactions. It considers a range of criteria such as oligonucleotide melting temperature, size, GC content, and primer-dimer possibilities. We use Primer3 along with our signature detection process to identify potential new primer sets. PSET (PCR Signature Erosion Tool) tests PCR assays in silico against the latest versions of public sequence repositories, or other reference datasets, to determine if primers and probes match only to their intended targets. Using this information, an assay provider can be better aware of potential false hits and be better prepared to design new primers when false hits become intractable.

Results:

The BioLaboro application detected four highly specific signature sequence regions that hit all 25 (available at the time of analysis) Wuhan genomes (Table 1). The detected signatures were found to occur in disparate locations on the genome (Figure 1). All four signatures were found to target all current Wuhan genomes, and three out of four of these signature regions did not sufficiently align to any known coronavirus or other organism in NCBI BLAST databases (Table 2).

Table 1. List of PCR assays evaluated in this analysis. First four assays newly created using BioLaboro to be specific to Wuhan coronavirus. Last four assays from Diagnostic detection of Wuhan coronavirus 2019 by real-time RTPCR by Corman et al 2020.

Identifier length forward probe reverse
2019-nCoV-noblis_1 165 TGATGGTGGTGTCACTCGTG TGGTTTAGCCAGCGTGGTGGT GAAGTGGGTTTTGTCGTGCC
2019-nCoV-noblis_2 168 GCCGCTGTTGATGCACTATG ACGTGCTCGTGTAGAGTGTTTTGAT ATGCATTGCCTGAGACGACA
2019-nCoV-noblis_3 272 CGGATGGCTTATTGTTGGCG TGCTCGTTGCTGCTGGCCTT TTGGCTTTGCTGGAAATGCC
2019-nCoV-noblis_4 218 TGTCGTTGACAGGACACGAG TTCGTCCGTGTTGCAGCCGA CGTACGTGGCTTTGGAGACT
ncov_e_gene 113 ACAGGTACGTTAATAGTTAATAGCGT ACACTAGCCATCCTTACTGCGCTTCG TGTGTGCGTACTGCTGCAATAT
ncov_n_gene 128 CACATTGGCACCCGCAATC ACTTCCTCAAGGAACAACATTGCCA CAAGCCTCTTCTCGTTCCTC
ncov_rdrp_1 100 GTGARATGGTCATGTGTGGCGG CCAGGTGGWACRTCATCMGGTGATGC TATGCTAATAGTGTSTTTAACATYTG
ncov_rdrp_2 100 GTGARATGGTCATGTGTGGCGG CAGGTGGAACCTCATCAGGAGATGC TATGCTAATAGTGTSTTTAACATYTG

Figure 1. Map of the Wuhan genome (NCBI Accession: MN908947.3) with assay signature locations (created using DNA Features Viewer Python library). Corman assays in blue, Noblis assays in red, and gene regions in green.

Table 2. Results from PSET analysis. The four new Noblis assays were compared alongside the four assays from Corman. Each assay was tested using Wuhan coronavirus (25 genomes) as the intended target. All off-target hits (PF, FP) are to entries in NCBI BLAST databases (nt, gss, and env_nt).

Assay Confusion
identifier type target PT TP TN PF FP FN
2019-nCoV-noblis_1 Probe Wuhan 22 3 415 NA NA NA
2019-nCoV-noblis_2 Probe Wuhan 25 NA 691 NA NA NA
2019-nCoV-noblis_3 Probe Wuhan 25 NA 656 NA NA NA
2019-nCoV-noblis_4 Probe Wuhan 25 NA 356 NA 3 NA
ncov_e_gene Probe Wuhan 25 NA 75 353 15 NA
ncov_n_gene Probe Wuhan 25 NA 73 NA 339 NA
ncov_rdrp_1 Probe Wuhan NA* 25 169 433 85 NA
ncov_rdrp_2 Probe Wuhan NA* 25 586 1 61 NA
PT = Perfect True Positive = All assay components hit with 100% identity to the correct target
TP = True Positive = All assay components hit with >=90% identity over >=90% of the component length to the correct target
TN = True Negative = Partial hit to assay amplicon but insufficient assay component alignments to an incorrect target
PF = Perfect False Positive = All assay components hit with 100% identity to an incorrect target
FP = False Positive = All assay components hit with >=90% identity over >=90% of the component length to an incorrect target
FN = False Negative = Partial hit to assay amplicon but insufficient assay component alignments to the correct target
* The ncov_rdrp_1 and ncov_rdrp_2 assays have one mismatch in the reverse primer to all current Wuhan coronavirus genomes which prevents it from being called a perfect hit. This is an incorrect match between an ambiguous base code, “S”, and the reference sequence “T”, in the middle of the primer which will likely not affect binding. NA == 0.

Conclusions and Caveats:

This is preliminary analyses based on 25 sequences available at the time.

New sequences are added on an hourly basis and these Noblis signatures need to be tested against the new sequences to verify that no signature erosion is occurring, as described in Sozhamannan et al 2015 for Ebola sequences. These designs were generated entirely in silico , and have yet to be tested in the lab. Although the BioLaboro pipeline is designed on sound scientific principles and the results from analyses of Ebola and Lassa viruses using the in silico components have been demonstrated (Sozhamannan et al 2015 and Wiley et al 2019) these assays await validation before conclusions regarding their use for clinical testing can be made.

Our intent in publishing these nCoV real-time PCR assays is to make the community aware of the existence of these potential unique signature regions as well the availability of BioLaboro for rapid evaluation of existing assays and design of new assays.

References:

  1. Diagnostic detection of Wuhan coronavirus 2019 by real-time RTPCR -Protocol and preliminary evaluation as of Jan 13, 2020- Victor Corman, Tobias Bleicker, Sebastian Brünink, Christian Drosten, Charité Virology, Berlin, Germany; Olfert Landt, Tib-Molbiol, Berlin, Germany; Marion Koopmans, Erasmus MC, Rotterdam, The Netherlands; Maria Zambon, Public Health England, London, Additional advice by Malik Peiris, University of Hong Kong; contact: christian.drosten@charite.de https://virologie-ccm.charite.de/en/

  2. Sozhamannan, Shanmuga, et al. “Evaluation of signature erosion in Ebola virus due to genomic drift and its impact on the performance of diagnostic assays.” Viruses 7.6 (2015): 3130-3154.

  3. Wiley, Michael R., et al. “Lassa virus circulating in Liberia: a retrospective genomic characterisation.” The Lancet Infectious Diseases 19.12 (2019): 1371-1378.

Appendix 1: Acknowledgement for 2019-nCoV genome sequences

The following table is from this blog at Virological.org: Phylogenetic analysis of 23 nCoV-2019 genomes, 2020-01-23; Phylogenetic analysis of 23 nCoV-2019 genomes, 2020-01-23.

Table 3 . nCoV2019 genome sequences used in this analysis, the GISAID 6 accession numbers and submitting labs.

GISAID Accession Strain Location Collection date Lab
EPI_ISL_404227 BetaCoV/Zhejiang/WZ-01/2020 Zhejiang, China 2020-01-16 1
EPI_ISL_404228 BetaCoV/Zhejiang/WZ-02/2020 Zhejiang, China 2020-01-17 1
EPI_ISL_402132 BetaCoV/Wuhan/HBCDC-HB-01/2019 China/Hubei Province 2019-12-30 2
EPI_ISL_402127 BetaCoV/Wuhan/WIV02/2019 China / Hubei Province / Wuhan City 2019-12-30 3
EPI_ISL_402128 BetaCoV/Wuhan/WIV05/2019 China / Hubei Province / Wuhan City 2019-12-30 3
EPI_ISL_402129 BetaCoV/Wuhan/WIV06/2019 China / Hubei Province / Wuhan City 2019-12-30 3
EPI_ISL_402130 BetaCoV/Wuhan/WIV07/2019 China / Hubei Province / Wuhan City 2019-12-30 3
EPI_ISL_403963 BetaCoV/Nonthaburi/74/2020 Thailand/ Nonthaburi Province 2020-01-13 4
EPI_ISL_403962 BetaCoV/Nonthaburi/61/2020 Thailand/ Nonthaburi Province 2020-01-08 4
EPI_ISL_402120 BetaCoV/Wuhan/IVDC-HB-04/2020 China / Hubei Province / Wuhan City 2020-01-01 5
EPI_ISL_402119 BetaCoV/Wuhan/IVDC-HB-01/2019 China / Hubei Province / Wuhan City 2019-12-30 5
EPI_ISL_402121 BetaCoV/Wuhan/IVDC-HB-05/2019 China / Hubei Province / Wuhan City 2019-12-30 5
EPI_ISL_402124 BetaCoV/Wuhan/WIV04/2019 China / Hubei Province / Wuhan City 2019-12-30 6
EPI_ISL_402123 BetaCoV/Wuhan/IPBCAMS-WH-01/2019 China / Hubei Province / Wuhan City 2019-12-24 7
EPI_ISL_402125 BetaCoV/Wuhan-Hu-1/2019 China 2019-12 8
EPI_ISL_403931 BetaCoV/Wuhan/IPBCAMS-WH-02/2019 China / Hubei Province / Wuhan City 2019-12-30 9
EPI_ISL_403928 BetaCoV/Wuhan/IPBCAMS-WH-05/2020 China / Hubei Province / Wuhan City 2020-01-01 9
EPI_ISL_403930 BetaCoV/Wuhan/IPBCAMS-WH-03/2019 China / Hubei Province / Wuhan City 2019-12-30 9
EPI_ISL_403929 BetaCoV/Wuhan/IPBCAMS-WH-04/2019 China / Hubei Province / Wuhan City 2019-12-30 9
EPI_ISL_403937 BetaCoV/Guangdong/20SF040/2020 Guangdong, China 2020-01-18 10
EPI_ISL_403936 BetaCoV/Guangdong/20SF028/2020 Guangdong, China 2020-01-17 10
EPI_ISL_403935 BetaCoV/Guangdong/20SF025/2020 Guangdong, China 2020-01-15 10
EPI_ISL_403934 BetaCoV/Guangdong/20SF014/2020 Guangdong, China 2020-01-15 10
EPI_ISL_403933 BetaCoV/Guangdong/20SF013/2020 Guangdong, China 2020-01-15 10
EPI_ISL_403932 BetaCoV/Guangdong/20SF012/2020 Guangdong, China 2020-01-14 10

[1] Department of Microbiology, Zhejiang Provincial Center for Disease Control and Prevention

[2] Hubei Provincial Center for Disease Control and Prevention

[3] Wuhan Institute of Virology, Chinese Academy of Sciences

[4] Department of Medical Sciences, Ministry of Public Health, Thailand & Thai Red Cross Emerging Infectious Diseases - Health Science Centre & Department of Disease Control, Ministry of Public Health, Thailand

[5] National Institute for Viral Disease Control and Prevention, China CDC

[6] Wuhan Institute of Virology, Chinese Academy of Sciences

[7] Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College

[8] National Institute for Communicable Disease Control and Prevention (ICDC) Chinese Center for Disease Control and Prevention (China CDC)

[9] Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College

[10] Department of Microbiology, Guangdong Provincial Center for Diseases Control and Prevention

1 Like

I couldn’t match the primer from Lancet article (by Huang et al.) with the nCoV sequences.
Can anybody check this primer?

https://doi.org/10.1016/S0140-6736(20)30183-5
forward primer 5′-TCAGAATGCCAATCTCCCCAAC-3′;
reverse primer 5′-AAAGGTCCACCCGATACATTGA-3′;

hhttps://www.thelancet.com/action/showPdf?pii=S0140-6736%2820%2930183-5

Reported in Methods section, in Procedures subsection on page 2 of article:

“The primers and probe target to envelope gene of CoV were used and the sequences were as follows: forward primer 5′-TCAGAATGCCAATCTCCCCAAC-3′; reverse primer 5′-AAAGGTCCACCCGATACATTGA-3′; and the probe 5′CY5-CTAGTTACACTAGCCATCCTTACTGC-3′BHQ1.”

The forward and reverse primers reported here do not seem to correspond to nCoV sequences. However, I found them in another paper listed below and are from Saffold Cardiovirus.

Saffold Cardiovirus in Children with Acute Gastroenteritis, Beijing, ChinaLili Ren, Richard Gonzalez, Yan Xiao, Xiwei Xu, Lan Chen, Guy Vernet, Gláucia Paranhos-Baccalà, Qi Jin, and Jianwei Wang

“Because VP1 genes of 2 SAFV-positive samples could not be amplified in this way, a newly designed primer pair (cardioVP1Fn: TCAGAATGCCAATCTCCCCAAC and cardioVP1Rn: AAAGGTCCACCCGATACATTGA) was used in combination with cardioVP1-2F/3R to amplify the VP1 gene based on the sequences obtained from our positive samples.”

Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 15, No. 9, September 2009

1 Like

I got it. Thank you so much for your nice explanation and nice work!
I checked Saffold virus sequences.
The primer pair perfectly matches to Saffold virus sequences.

PSET results updated with three new 2019-nCoV genomes; 28 total genomes. Also adding three assays for comparison from the CDC (https://www.cdc.gov/coronavirus/2019-ncov/downloads/rt-pcr-panel-primer-probes.pdf). New genomes have resulted in mismatches in previously perfect alignments for some assays.

Table 1. Results from PSET analysis. The four Noblis assays were compared alongside the four assays from Corman and three assays from the CDC. Each assay was tested using 2019-nCoV (28 genomes) as the intended target. All off-target hits (TN, PF, FP) are to entries in NCBI BLAST databases (nt, gss, and env_nt).

identifier provider PT TP TN PF FP FN
2019-nCoV-noblis_1 Noblis 23 5 415 NA NA NA
2019-nCoV-noblis_2 Noblis 27 1 691 NA NA NA
2019-nCoV-noblis_3 Noblis 27 1 638 NA NA NA
2019-nCoV-noblis_4 Noblis 28 NA 356 NA 3 NA
ncov_e_gene Corman et al 28 NA 75 353 15 NA
ncov_n_gene Corman et al 27 NA 73 NA 339 1
ncov_rdrp_1 Corman et al NA 28 169 433 85 NA
ncov_rdrp_2 Corman et al 1 27 586 1 61 NA
2019-nCoV_N1 CDC 27 1 381 NA NA NA
2019-nCoV_N2 CDC 27 1 371 NA NA NA
2019-nCoV_N3 CDC 28 NA 48 NA 346 NA

PSET results updated with new 2019-nCoV genomes; 46 total genomes.

Table 1. Results from PSET analysis. The four Noblis assays were compared alongside the four assays from Corman and three assays from the CDC. Each assay was tested using 2019-nCoV (46 genomes) as the intended target. All off-target hits (TN, PF, FP) are to entries in NCBI BLAST databases (nt, gss, and env_nt).

identifier provider PT TP TN PF FP FN
2019-nCoV-noblis_1 Noblis 36 10 415 NA NA NA
2019-nCoV-noblis_2 Noblis 45 1 689 NA NA NA
2019-nCoV-noblis_3 Noblis 45 1 637 NA NA NA
2019-nCoV-noblis_4 Noblis 46 NA 356 NA 3 NA
ncov_e_gene Corman et al 46 NA 75 353 15 NA
ncov_n_gene Corman et al 44 1 73 NA 339 1
ncov_rdrp_1 Corman et al NA 46 169 433 85 NA
ncov_rdrp_2 Corman et al 1 45 586 1 61 NA
cdc_n1 CDC 44 2 381 NA NA NA
cdc_n2 CDC 45 1 371 NA NA NA
cdc_n3 CDC 45 1 48 NA 346 NA

Figure 1. Updated genome map showing locations of all assays. Map of the Wuhan genome (NCBI Accession: MN908947.3) with assay signature locations (created using DNA Features Viewer Python library). Corman assays in blue, Noblis assays in red, CDC assays in purple, and gene regions in green.

PSET results updated with new 2019-nCoV genomes; 66 total genomes.

Table 1. Results from PSET analysis. The four Noblis assays were compared alongside the four assays from Corman and three assays from the CDC. Each assay was tested using 2019-nCoV (66 genomes) as the intended target. All off-target hits (TN, PF, FP) are to entries in NCBI BLAST databases (nt, gss, and env_nt).

identifier provider PT TP TN PF FP FN
2019-nCoV-noblis_1 Noblis 47 19 417 NA NA NA
2019-nCoV-noblis_2 Noblis 65 1 688 NA NA NA
2019-nCoV-noblis_3 Noblis 65 1 682 NA NA NA
2019-nCoV-noblis_4 Noblis 65 NA 357 NA 3 1
ncov_e_gene Corman et al 66 NA 76 353 15 NA
ncov_n_gene Corman et al 64 1 74 NA 339 1
ncov_rdrp_1 Corman et al NA 66 169 433 85 NA
ncov_rdrp_2 Corman et al 1 65 586 1 61 NA
cdc_n1 CDC 64 2 381 NA NA NA
cdc_n2 CDC 65 1 371 NA NA NA
cdc_n3 CDC 65 1 48 NA 346 NA

PSET results updated with new 2019-nCoV genomes; 96 total genomes. Now just using the subset of genomes on GISAID marked as high quality. Sequence IDs tested in this analysis listed here: ncov_ids_tested.zip (386 Bytes)

Assays continue to perform well against new genome submissions. Only one assay, 2019-nCoV-noblis_4, showing potential false negatives. Assay 2019-nCoV-noblis_1 has just one mismatch in the probe for 27 genomes, resulting in less perfect matches, but still likely functional for all 96 genomes.

Table 1. Results from PSET analysis. The four Noblis assays were compared alongside the four assays from Corman and three assays from the CDC. Each assay was tested using 2019-nCoV (96 genomes) as the intended target. All off-target hits (TN, PF, FP) are to entries in NCBI BLAST databases (nt, gss, and env_nt).

identifier provider PT TP TN PF FP FN
2019-nCoV-noblis_1 Noblis 69 27 372 NA NA NA
2019-nCoV-noblis_2 Noblis 96 NA 568 NA NA NA
2019-nCoV-noblis_3 Noblis 96 NA 439 NA NA NA
2019-nCoV-noblis_4 Noblis 94 NA 343 NA 3 2
ncov_e_gene Corman et al 96 NA 42 353 15 NA
ncov_n_gene Corman et al 95 1 55 NA 339 NA
ncov_rdrp_1 Corman et al NA 96 75 433 87 NA
ncov_rdrp_2 Corman et al NA 96 526 1 66 NA
cdc_n1 CDC 95 1 363 NA NA NA
cdc_n2 CDC 95 1 361 NA NA NA
cdc_n3 CDC 94 2 17 NA 346 NA

PSET results updated with new 2019-nCoV genomes; 129 total genomes. Now just using the subset of genomes on GISAID marked as high quality. Sequence IDs tested in this analysis listed here: ncov_ids_tested.zip (437 Bytes)

Assays have more potential false negatives, although the majority of FNs (38 out of 42) are from samples collected from Pangolins. Only assays ncov_rdp_1 and cdc_n3 currently have no potential for FNs.

Table 1. Results from PSET analysis. The four Noblis assays were compared alongside the four assays from Corman and three assays from the CDC. Each assay was tested using 2019-nCoV (129 genomes) as the intended target. All off-target hits (TN, PF, FP) are to entries in NCBI BLAST databases (nt, gss, and env_nt).

identifier provider PT TP TN PF FP FN
2019-nCoV-noblis_1 Noblis 91 32 372 NA NA 6
2019-nCoV-noblis_2 Noblis 123 NA 568 NA NA 6
2019-nCoV-noblis_3 Noblis 122 2 439 NA NA 5
2019-nCoV-noblis_4 Noblis 121 5 343 NA 3 3
ncov_e_gene Corman et al 127 1 42 353 15 1
ncov_n_gene Corman et al 122 2 55 NA 339 5
ncov_rdrp_1 Corman et al NA 129 75 433 87 NA
ncov_rdrp_2 Corman et al NA 124 526 1 66 5
cdc_n1 CDC 122 2 363 NA NA 5
cdc_n2 CDC 122 1 361 NA NA 6
cdc_n3 CDC 121 8 17 NA 346 NA

PSET results updated with new 2019-nCoV genomes; 152 total genomes. Now just using the subset of genomes on GISAID marked as high quality. Sequence IDs tested in this analysis listed here: ncov_ids_tested.zip (475 Bytes)

Table 1. Results from PSET analysis. The four Noblis assays were compared alongside the four assays from Corman and three assays from the CDC. Each assay was tested using 2019-nCoV (152 genomes) as the intended target. All off-target hits (TN, PF, FP) are to entries in NCBI BLAST databases (nt, gss, and env_nt).

identifier provider PT TP TN PF FP FN
2019-nCoV-noblis_1 Noblis 106 39 373 NA NA 7
2019-nCoV-noblis_2 Noblis 145 NA 568 NA NA 7
2019-nCoV-noblis_3 Noblis 144 2 439 NA NA 6
2019-nCoV-noblis_4 Noblis 144 5 343 NA 3 3
ncov_e_gene Corman et al 149 2 42 353 15 1
ncov_n_gene Corman et al 144 3 55 NA 339 5
ncov_rdrp_1 Corman et al NA 152 75 433 87 NA
ncov_rdrp_2 Corman et al NA 147 526 1 66 5
cdc_n1 CDC 144 3 363 NA NA 5
cdc_n2 CDC 144 1 361 NA NA 7
cdc_n3 CDC 143 9 17 NA 346 NA

PSET results updated with new 2019-nCoV genomes; 190 total genomes. Now just using the subset of genomes on GISAID marked as high quality sampled from humans (pangolin samples removed). Sequence IDs tested in this analysis listed here: ncov_ids_tested.zip (534 Bytes)

Table 1. Results from PSET analysis. The four Noblis assays were compared alongside the four assays from Corman and three assays from the CDC. Each assay was tested using 2019-nCoV (190 genomes) as the intended target. All off-target hits (TN, PF, FP) are to entries in NCBI BLAST databases (nt, gss, and env_nt).

identifier provider PT TP TN PF FP FN
2019-nCoV-noblis_1 Noblis 134 56 373 0 0 0
2019-nCoV-noblis_2 Noblis 188 2 568 0 0 0
2019-nCoV-noblis_3 Noblis 189 1 439 0 0 0
2019-nCoV-noblis_4 Noblis 184 0 343 0 3 6
ncov_e_gene Corman et al 187 2 42 353 15 1
ncov_n_gene Corman et al 189 1 55 0 339 0
ncov_rdrp_1 Corman et al 190 0 75 433 87 0
ncov_rdrp_2 Corman et al 190 0 526 1 66 0
cdc_n1 CDC 188 2 363 0 0 0
cdc_n2 CDC 189 1 361 0 0 0
cdc_n3 CDC 183 7 17 0 346 0

PSET results updated with new 2019-nCoV genomes; 432 total genomes. Just using the subset of genomes on GISAID marked as high quality sampled from humans. Sequence IDs tested in this analysis listed here: 432_ids.zip (833 Bytes)

Noblis assay 4 showing some potential FNs, but most of these appear to be the result of gaps or sequencing errors for some sequences at the very start of the genome. This assay falls within the first 500bp of the genome. All other assays still performing very well in silico against new sequences.

Table 1. Results from PSET analysis. The four Noblis assays were compared alongside the four assays from Corman and three assays from the CDC. Each assay was tested using 2019-nCoV (432 genomes) as the intended target. All off-target hits (TN, PF, FP) are to entries in NCBI BLAST databases (nt, gss, and env_nt).

identifier provider PT TP TN PF FP FN
2019-nCoV-noblis_1 Noblis 325 105 373 0 0 2
2019-nCoV-noblis_2 Noblis 422 10 568 0 0 0
2019-nCoV-noblis_3 Noblis 431 1 439 0 0 0
2019-nCoV-noblis_4 Noblis 409 4 343 0 3 19
ncov_e_gene Corman et al 428 3 42 353 15 1
ncov_n_gene Corman et al 431 1 55 0 339 0
ncov_rdrp_1 Corman et al 0 432 75 433 87 0
ncov_rdrp_2 Corman et al 0 432 526 1 66 0
cdc_n1 CDC 429 3 363 0 0 0
cdc_n2 CDC 430 2 361 0 0 0
cdc_n3 CDC 412 20 17 0 346 0

PSET results updated with new 2019-nCoV genomes; 655 total genomes. Just using the subset of genomes on GISAID marked as high quality sampled from humans. Sequence IDs tested in this analysis listed here: 655_ids.zip (1.4 KB)

Previous Noblis assays showed some false negatives due to Ns, sequence gaps, and in one case the assay’s position at the very start of the genome. These have been replaced with five new assays generated at a later date using 96 complete genomes. The Noblis.57 assay and the German ncov_e_gene assay each have one FN that’s due to a stretch of Ns. All other assays still performing very well in silico against new sequences.

Table 1. Results from PSET analysis. The five Noblis assays were compared alongside the four assays from Corman and three assays from the CDC. Each assay was tested using 2019-nCoV (655 genomes) as the intended target. All off-target hits (TN, PF, FP) are to entries in NCBI BLAST databases (nt, gss, and env_nt).

identifier provider PT TP TN PF FP FN
Noblis.12 Noblis 653 2 2 0 0 0
Noblis.40 Noblis 635 20 316 0 0 0
Noblis.42 Noblis 655 0 275 0 0 0
Noblis.44 Noblis 655 0 277 0 0 0
Noblis.57 Noblis 654 0 359 0 0 1
ncov_e_gene Corman et al 651 3 42 353 15 1
ncov_n_gene Corman et al 652 3 55 0 339 0
ncov_rdrp_1 Corman et al 0 655 75 433 87 0
ncov_rdrp_2 Corman et al 1 654 526 1 66 0
cdc_n1 CDC 647 8 363 0 0 0
cdc_n2 CDC 653 2 361 0 0 0
cdc_n3 CDC 628 27 17 0 346 0

Figure 1. Map of the SARS-CoV-2 genome (NCBI Accession: MN908947.3) with assay signature locations (created using DNA Features Viewer Python library). Noblis assays in red, Corman assays in blue, CDC assays in purple, and gene regions in green.

PSET results updated with new 2019-nCoV genomes; 1620 total genomes. Just using the subset of genomes on GISAID marked as high quality sampled from humans. Sequence IDs tested in this analysis listed here: 1620_ids.zip (12.2 KB)

The Noblis.57 assay and the German ncov_e_gene assay still each have one FN that’s due to a stretch of Ns. All other assays still performing very well in silico against new sequences.

Table 1. Results from PSET analysis. The five Noblis assays were compared alongside the four assays from Corman and three assays from the CDC. Each assay was tested using 2019-nCoV (1620 genomes) as the intended target. All off-target hits (TN, PF, FP) are to entries in NCBI BLAST databases (nt, gss, and env_nt).

identifier provider PT TP TN PF FP FN
Noblis.12 Noblis 1616 4 2 0 0 0
Noblis.40 Noblis 1546 74 316 0 0 0
Noblis.42 Noblis 1620 0 275 0 0 0
Noblis.44 Noblis 1619 1 277 0 0 0
Noblis.57 Noblis 1605 14 359 0 0 1
ncov_e_gene Corman et al 1616 3 42 353 15 1
ncov_n_gene Corman et al 1608 12 55 0 339 0
ncov_rdrp_1 Corman et al 0 1620 75 433 87 0
ncov_rdrp_2 Corman et al 1 1619 526 1 66 0
cdc_n1 CDC 1591 29 363 0 0 0
cdc_n2 CDC 1616 4 361 0 0 0
cdc_n3 CDC 1560 60 17 0 346 0