11.05.2026
HLA Binding Prediction: How Alithea Bio Outperforms MHCflurry 2.0
11.05.2026
HLA binding prediction is one of the most critical — and most limiting — steps in neoantigen discovery, TCR therapy development, and cancer vaccine design. Yet most prediction tools remain blind to peptides outside the standard 9-mer, leaving a massive gap in your analysis. At Alithea Bio, we built a better approach.
Most HLA binding prediction models are trained on limited datasets — predominantly standard 9-mer peptides and a narrow range of common HLA alleles. This creates three critical blind spots for researchers:
The result: researchers miss critical insights in their samples — potential neoantigens go undetected, vaccine candidates are deprioritised incorrectly, and TCR therapy programmes carry avoidable risk.
To address this gap, Alithea Bio’s team developed HLA-Compass AI — a deep learning-powered HLA binding prediction model trained on a record-breaking dataset of 1.4 million unique peptides.
The model delivers high-resolution accuracy across 124 HLA class I alleles, covering both common and rare alleles that traditional tools miss. It significantly outperforms MHCflurry 2.0 — the current benchmark — across peptide lengths and allele types, ensuring researchers extract the maximum amount of information from every sample.
| Feature | HLA-Compass AI | MHCflurry 2.0 |
|---|---|---|
| Training dataset size | 1.4 million unique peptides | Smaller curated datasets |
| HLA class I alleles covered | 124 alleles | Fewer alleles |
| Peptide length range | 8–12+ mers | Primarily 9-mer focused |
| Rare allele performance | High-resolution accuracy | Limited coverage |
| Data source | Large-scale immunopeptidomics | Mixed sources |
The foundation of HLA-Compass AI’s superior HLA binding prediction performance is the scale and quality of its training data. Alithea Bio’s platform draws from the world’s largest quantitative HLA peptidomics database — comprising over 17 million peptide identifications across 4,160 high-quality immunopeptidomics samples from healthy tissue, cancer cell lines, and tumour tissue.
This dataset powers a deep learning architecture designed to capture the nuanced sequence preferences of each HLA allele — including those that appear rarely in population studies but are critically important for individual patients.
Dr. Margarita Pertseva, Scientist in Proteomics and AI at Alithea Bio, led the development of this approach. The full technical details are documented in Alithea Bio’s exclusive white paper: “Advancing Peptide–HLA Binding Prediction with Deep Learning and High-Resolution Immunopeptidomics Data.”
The central thesis of Alithea Bio’s approach is straightforward: scale unlocks accuracy. Here is why large-scale data integration is essential for next-generation HLA binding prediction:
Most existing HLA binding prediction tools perform well on the most common HLA alleles — those that appear frequently in well-studied populations. But for patients carrying less common alleles, prediction accuracy drops significantly. HLA-Compass AI’s training on 1.4 million peptides across 124 alleles ensures meaningful coverage even for rare alleles — an essential capability for precision medicine and global population studies.
HLA class I molecules present peptides ranging from 8 to 12+ amino acids. Focusing exclusively on 9-mers — as many tools do — means discarding a significant proportion of biologically relevant peptides. HLA-Compass AI’s model is trained to handle the full range of peptide lengths, capturing binding interactions that standard tools miss entirely.
For neoantigen discovery, the accuracy of HLA binding prediction directly determines which candidates are prioritised for experimental validation and clinical development. False negatives — neoantigens missed by low-resolution tools — represent lost therapeutic opportunities. HLA-Compass AI’s high-resolution predictions reduce this risk substantially.
Alithea Bio has released an exclusive technical white paper for researchers who want to go deeper on the science behind HLA-Compass AI’s HLA binding prediction model.
The white paper — “Advancing Peptide–HLA Binding Prediction with Deep Learning and High-Resolution Immunopeptidomics Data” — explores the strategies used to improve peptide–HLA class I prediction accuracy, with a specific focus on the challenges of large-scale data integration and rare allele performance. It is essential reading for researchers working on advanced neoantigen discovery, TCR therapy development, and cancer vaccine design.
HLA binding prediction is the computational process of estimating whether a given peptide will bind to a specific HLA (Human Leukocyte Antigen) molecule and be presented on the surface of a cell for immune recognition. It is a foundational step in neoantigen discovery, cancer vaccine design, and TCR therapy development.
Larger, higher-quality training datasets allow models to learn the binding preferences of a wider range of HLA alleles — including rare ones — and across a broader range of peptide lengths. HLA-Compass AI was trained on 1.4 million unique peptides, giving it a significant accuracy advantage over tools trained on smaller datasets.
HLA-Compass AI significantly outperforms MHCflurry 2.0 across peptide lengths and allele types, particularly for rare alleles and non-9-mer peptides. The performance advantage is a direct result of training on Alithea Bio’s large-scale immunopeptidomics dataset of 1.4 million unique peptides across 124 HLA class I alleles.
HLA-Compass AI covers 124 HLA class I alleles, including both common and rare alleles that are underrepresented or absent in most existing prediction tools.
HLA-Compass AI supports peptide lengths from 8 to 12+ amino acids, going well beyond the 9-mer focus of most standard HLA binding prediction tools.
Download the white paper via alithea-bio.com It was authored by Dr. Margarita Pertseva, Scientist in Proteomics and AI at Alithea Bio, and covers the full technical methodology behind HLA-Compass AI’s binding prediction model.
HLA-Compass AI is part of Alithea Bio’s broader immunopeptidomics platform — combining the world’s largest quantitative HLA peptide database with AI-powered tools for binding prediction, off-target toxicity assessment, and automated reanalysis.
Related reading: Neoantigens | NeoZOOM | Immunopeptidomics Technology | Proteogenomics