Fieldbioinformatics Updates

Sam_W · January 13, 2025, 10:37am

>

Fieldbioinformatics updated to 1.6.0

Fieldbioinformatics has been the first line recommendation by the artic network for generation of consensus sequences from Nanopore viral amplicon sequencing data since the west african ebola outbreak in 2016. The pipeline has changed a great deal since then, but had remained mostly unchanged from the early stages of the SARS-CoV-2 (SC2) outbreak when the sudden need for reliable analysis pipelines prompted rapid development of the pipeline. Its widespread use in the global SC2 surveillance network discouraged further modification however. Similarly, the ongoing surveillance of Mpox in central Africa has prompted long overdue development to fit the changing needs of the community and to deal with specific complexities presented by Mpox as a virus.

What has changed?

Rapid barcoding is now officially supported, previously the pipeline relied on reads covering the full length of amplicons for primer trimming and read depth normalisation but the algorithm has now been modified to support reads where amplicons have been fragmented during the tagmentation stage of the rapid barcoding protocol.

Previously fieldbioinformatics supported two separate workflows which differed by the tool used to call variants vs the reference sequence, Nanopolish and Medaka. Both workflows have presented issues with processing modern sequencing data, Nanopolish does not support r10.4 sequencing chemistries, pod5 files, or modern VBZ pod5/fast5 signal data compression. Whereas, Medaka does not support long insertion or deletion calling, such as a lineage defining deletion in Mpox clade 1b. As demonstrated by Hall et al. 2024 Clair3 offers superior variant calling performance in almost all use cases. To support Clair3 variant calling with minimal inconvenience to the end user we have added tooling to the pipeline to automatically determine the appropriate pre-trained Clair3 model from the reads themselves. This is achieved by checking the basecall_model_version_id header added to reads by all ONT basecallers but may be provided manually by the user if desired with the --model parameter. The majority of pre-trained models are not provided by default but the new command artic_get_models will automatically fetch all pre-trained models kindly provided by ONT.

Longshot was utilised in the pipeline to help filter variants produced by the above variant callers, especially in the case of medaka which would often produce a large number of spurious insertions and deletions in some circumstances. With the higher performance demonstrated by Clair3, longshot was no longer required and was therefore removed from the pipeline.

We now fetch primer schemes from the canonical store for primer schemes generated by primalscheme3 if they are not available locally, readers may wish to browse the primalscheme labs website which details all schemes available for automatic fetching. The command line arguments for doing so have been modified with the aim to make it simpler for users, in many cases users will only have to provide a scheme name and scheme version. For custom primer schemes users may also provide primer BED and reference FASTA files directly.

For primer schemes which have been developed to support multiple possible reference sequences (currently only artic-inrb-mpox) logic to automatically pick the most appropriate reference has been added. This is especially useful for Mpox due to large inter-clade divergence.

Segmented virus and multi pathogen schemes are now supported by the pipeline, all segments / pathogens are output in the same consensus FASTA file so care should be taken to ensure that consensus sequences are separated downstream in cases of multi pathogen schemes.

Automatic docker image builds are now created on every github release and pushed to Quay.io with the following container URL: quay.io/artic/fieldbioinformatics e.g. quay.io/artic/fieldbioinformatics:1.6.0, these containers are preloaded with pre-trained Clair3 models.

A longstanding issue with fieldbioinformatics occurs where a passing variant and failing variant overlap leading to bcftools to raise an error when assembling the final consensus (described very well by Sam Sims here). We have added a step in the pipeline which will normalise all passing variants against the pre consensus FASTA to prevent the bcftools issue from occurring.

MAFFT is now used instead of MUSCLE to align the output consensus sequence against the reference and has been made optional.

Too many words, just tell me what I need to know! (TL;DR)

The example fieldbioinformatics command now looks like this:

artic minion --normalise 200 --scheme-directory ~/primer_schemes --scheme-name artic-inrb-mpox --scheme-length 2500 --scheme-version v1.0.0 --read-file run_name_barcode03.fastq samplename

Medaka, Nanopolish, and Longshot have been removed and replaced with Clair3.
You may need to run artic_get_models after installation, the pipeline will tell you to do so if necessary.
Primer schemes are now specified by their --scheme-name and --scheme-version, where there are multiple amplicon lengths available for this scheme, as in the case of artic-inrb-mpox, a --scheme-length argument should be provided. The pipeline will raise an error giving you the possible lengths available in this instance.
It is also possible to provide a scheme using the --bed and --ref arguments to simplify the process of using a custom scheme.

Please enjoy some AI generated fieldbioinformatics slop!