Ragged 5' and 3' ends of ebolavirus genome sequences


Many of the ebolavirus genome sequences have uneven, or ragged, 5’ and 3’ sequence ends.

For example, here are a few 5’ ends of sequences from the Broad/Harvard/et al genomes

KM233109  cggacacacaaaaagaaagaagaatttttaggatcttttgtgtgcgaataactatgaggaagattaataattttcctctcattgaaatttatatcggaat
KM233108  --------------------------------------------cgaataactatgaggaagattaataattttcctctcattgaaatttatatcggaat
KM233107  ---------------------------------------------gaataactatgaggaagattaataattttcctctcattgaaatttatatcggaat
KM233106  -------------agaaagaagaatttttaggatcttttgtgtgcgaataactatgaggaagattaataattttcctctcattgaaatttatatcggaat
KM233105  --------------------------------------------cgaataactatgaggaagattaataattttcctctcattgaaatttatatcggaat
KM233104  cggacacacaaaaagaaagaagaatttttaggatcttttgtgtgcgaataactatgaggaagattaataattttcctctcattgaaatttatatcggaat
KM233103  --------caaaaagaaagaagaatttttaggatcttttgtgtgcgaataactatgaggaagattaataattttcctctcattgaaatttatatcggaat

Are these artifacts of sequencing / assembly / etc or is the variation real?
I assume the former, but I have not seen this addressed directly in any of the papers.

Can anyone offer any insight ?


–Rob Jones


Hi Rob,

What you’re seeing represents variation in how well we were able to sequence and assemble the ends of the genome for each sample. This is not likely to be biological variation. Our assemblies report unambiguous bases only at positions where we have good coverage of the genome. Missing bases at the end simply lacked sufficient coverage there. N’s in the middle of the genome mean the same thing.



Just to add to Danny’s reply - some of the sequences out there (not by us) were generated by placing primers at the ends of the genomes. This essentially means that primer sequences were being sequenced instead of the actual genomic sequences and hence the ends are missing for all those sequences too.

Given our experience with both Lassa virus and Ebola virus, I think it’s fair to say that the ends are most likely fully conserved and no variation is observed.