Note: Adding this here as a reference since this was an issue with earlier versions of the Illumina pipeline and can lead to spurious reversions to reference bases.
bcftools consensus calls a consensus sequence by “applying” variants to a reference sequence. However, the alignment file might have regions of low coverage due to issues like amplicon dropout and the low coverage might not be sufficient to reliably call variants. If such regions of low coverage are not masked (typically using N) properly, the consensus sequence generated will contain reference bases in place of any real variants that might be present in the “true” consensus sequence. To avoid this issue, regions of low coverage should be masked using tools like
bedtools genomecov + bedtools maskfasta and this masked reference sequence should be supplied to
bcftools consensus to call a reliable consensus sequence.