Interesting SNPs in EBOV GP

I have been looking into the 4 unique mutations (all T->C) mentioned by @evogytis in EBOV_2014_G5119.1. These mutations are all found in the GP gene, so they might be of interest. Two of these mutations (positions 6675 and 6677) are in Y213, causing a Y->H amino acid change. The mutation at position 6678 also causes a Y->H change, this time in Y214. The fourth mutation is synonymous, at residue G224.

The non-synonymous mutations at Y213 and Y214 might be interesting because residues 190-213 are thought to be the site of endosomal cathepsin cleavage, which removes the mucin-like region of GP, perhaps exposing the receptor-binding region during viral infection (see paper). Does anyone have other thoughts on this region? @alin

Two other observations:

  • the synonymous mutation at position 6710 was also seen in three sequences from the 1994/1995 outbreak. This suggests to me that it’s probably not just an error, since the mutations have been seen before

  • position 6677 is a T in all 2014 sequences (except G5119.1), but a C in all previously published sequences. Is this a back-mutation to the sequence before 2014?

If those are real mutations, the clustering in a single region is interesting. Given that it’s in GP, my first guess would be immune escape mutations. Some evidence that this region may be in a B cell epitope (http://www.viprbrc.org/brc/curatedEpitopeDetail.spg?iedbId=13781&decorator=filo_ebola&context=1418836196599#referenceSection, based on http://www.ncbi.nlm.nih.gov/pubmed/24914933). Other groups have also tested overlapping k-mers binding to MHC as putative T cell epitopes (e.g., http://www.viprbrc.org/brc/curatedEpitopeDetail.spg?iedbId=91993&decorator=filo_ebola&context=1418836196599#referenceSection). H is bigger than Y and could easily disrupt Ab/TCR binding, but it’s hard to say given that this is a totally unique mutation. Maybe we can scan the intrahost variations for the presence of these mutations?

If it is B-cell immune escape mutations then I would doubt it is at the population level (i.e., allowing repeat infection) - but it might be within-host evasion of elicited immune response. The same mutations may be cropping up in different patients - perhaps those with longer infection durations. If true, this would certainly have implications for theraputics. Probably need to rule out artefacts first.

T-cell escape perhaps might more plausibly increase in frequency at the population level if the population is homogeneious in HLA type. You could hypotheses that a CTL escape mutation could spread if it were more likely to mismatch the HLA of the new host and establish an infection (or increase the viremia so increasing probability of transmission).

In scanning the 5119 sequence I noted a number of T to C mutations, some shared by other isolates but most not. Wondered whether this was a host effect, albeit slight, since it seemed unusually limited to 5119. With the highly repetitive Illumina sequencing I have no doubt of the validity at the sequencing level, but either the host or initial amplification might have skewed the product slightly to C for this isolate. It is otherwise similar to 4861 and 5112 with which it shares at least 5 unique SNPs (based on the first 21 new releases only). It will be interesting if the .double Y to H mutation persists in later isolates.

Overall, as mutations accumulate in the West African sequence set, it is remarkable that we see so little sign of immune escape mutations in GP. In general, the many amino acid substitutions throughout the proteome have hit on variable amino acids, when viewed against the variation seen among Ebola species. Correct me if I am wrong, but thus far I see a maximum of 66 amino acid substitutions, relative to Mayinga, genome-wide out of a total (if I recall correctly) of 4832 amino acids – and that is in G4999. The proteome is remarkably stable, even as SNPs proliferate…
Bill Gallaher