Technologies for neoantigen discovery are critical for developing personalized cancer vaccines and neoantigen-based biomarkers. Precision neoantigen discovery entails comprehensive detection of tumor-specific genomic variants and accurate prediction of MHC presentation of epitopes originating from such variants. Our ImmunoID NeXT™ Platform enables a comprehensive survey of putative neoantigens by combining highly sensitive and exome scale DNA and RNA sequencing with advanced analytics. Here, we present Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), our pan-predictive machine learning model for predicting MHC class I presentation and identifying potentially immunogenic patient-specific neoantigens.
To train our algorithms, we generated high-quality and unambiguous training data using approximately 60 genetically engineered mono-allelic K562 cell lines. Briefly, MHC-peptide complexes were immunoprecipitated using W6/32 antibody followed by peptide elution and sequencing using tandem mass spectrometry. Our alleles, visualized on a clustered heatmap of all known IMGT/HLA alleles based on binding pocket similarity, effectively capture binding pocket diversity. The population coverage of our mono-allelic dataset, estimated using allele frequencies from Allele Frequencies Net Database, is robust across several ethnic world populations.