11.05.2026

HLA Binding Prediction: How Alithea Bio Outperforms MHCflurry 2.0

HLA binding prediction tool HLA-Compass AI Alithea Bio deep learning immunopeptidomics

HLA binding prediction is one of the most critical — and most limiting — steps in neoantigen discovery, TCR therapy development, and cancer vaccine design. Yet most prediction tools remain blind to peptides outside the standard 9-mer, leaving a massive gap in your analysis. At Alithea Bio, we built a better approach.

Contents hide

1 The Problem with Standard HLA Binding Prediction Tools

2 HLA-Compass AI: A New Standard for HLA Binding Prediction

3 How HLA-Compass AI’s HLA Binding Prediction Model Was Built

4 Why Scale Is the Missing Piece in HLA Binding Prediction

4.1 Rare Allele Coverage

4.2 Varied Peptide Lengths

4.3 Neoantigen Discovery

5 Access the White Paper: Advancing Peptide–HLA Binding Prediction

6 Frequently Asked Questions About HLA Binding Prediction

6.1 What is HLA binding prediction?

6.2 Why does training dataset size matter for HLA binding prediction?

6.3 How does HLA-Compass AI compare to MHCflurry 2.0?

6.4 Which HLA alleles does HLA-Compass AI cover?

6.5 What peptide lengths does HLA-Compass AI support?

6.6 How can I access the HLA-Compass AI white paper?

7 Explore HLA-Compass AI

The Problem with Standard HLA Binding Prediction Tools

Most HLA binding prediction models are trained on limited datasets — predominantly standard 9-mer peptides and a narrow range of common HLA alleles. This creates three critical blind spots for researchers:

Non-9-mer peptides are ignored. Biologically relevant peptides of 8, 10, 11, and 12+ amino acids fall outside the prediction window entirely.
Rare alleles are underrepresented. Patients with less common HLA alleles receive lower-confidence predictions — or no predictions at all.
Small training datasets limit accuracy. Without large-scale, high-resolution immunopeptidomics data, models cannot capture the full complexity of peptide-HLA interactions.

The result: researchers miss critical insights in their samples — potential neoantigens go undetected, vaccine candidates are deprioritised incorrectly, and TCR therapy programmes carry avoidable risk.

HLA-Compass AI: A New Standard for HLA Binding Prediction

To address this gap, Alithea Bio’s team developed HLA-Compass AI — a deep learning-powered HLA binding prediction model trained on a record-breaking dataset of 1.4 million unique peptides.

The model delivers high-resolution accuracy across 124 HLA class I alleles, covering both common and rare alleles that traditional tools miss. It significantly outperforms MHCflurry 2.0 — the current benchmark — across peptide lengths and allele types, ensuring researchers extract the maximum amount of information from every sample.

Feature	HLA-Compass AI	MHCflurry 2.0
Training dataset size	1.4 million unique peptides	Smaller curated datasets
HLA class I alleles covered	124 alleles	Fewer alleles
Peptide length range	8–12+ mers	Primarily 9-mer focused
Rare allele performance	High-resolution accuracy	Limited coverage
Data source	Large-scale immunopeptidomics	Mixed sources

How HLA-Compass AI’s HLA Binding Prediction Model Was Built

The foundation of HLA-Compass AI’s superior HLA binding prediction performance is the scale and quality of its training data. Alithea Bio’s platform draws from the world’s largest quantitative HLA peptidomics database — comprising over 17 million peptide identifications across 4,160 high-quality immunopeptidomics samples from healthy tissue, cancer cell lines, and tumour tissue.

This dataset powers a deep learning architecture designed to capture the nuanced sequence preferences of each HLA allele — including those that appear rarely in population studies but are critically important for individual patients.

Dr. Margarita Pertseva, Scientist in Proteomics and AI at Alithea Bio, led the development of this approach. The full technical details are documented in Alithea Bio’s exclusive white paper: “Advancing Peptide–HLA Binding Prediction with Deep Learning and High-Resolution Immunopeptidomics Data.”

Why Scale Is the Missing Piece in HLA Binding Prediction

The central thesis of Alithea Bio’s approach is straightforward: scale unlocks accuracy. Here is why large-scale data integration is essential for next-generation HLA binding prediction:

Rare Allele Coverage

Most existing HLA binding prediction tools perform well on the most common HLA alleles — those that appear frequently in well-studied populations. But for patients carrying less common alleles, prediction accuracy drops significantly. HLA-Compass AI’s training on 1.4 million peptides across 124 alleles ensures meaningful coverage even for rare alleles — an essential capability for precision medicine and global population studies.

Varied Peptide Lengths

HLA class I molecules present peptides ranging from 8 to 12+ amino acids. Focusing exclusively on 9-mers — as many tools do — means discarding a significant proportion of biologically relevant peptides. HLA-Compass AI’s model is trained to handle the full range of peptide lengths, capturing binding interactions that standard tools miss entirely.

Neoantigen Discovery

For neoantigen discovery, the accuracy of HLA binding prediction directly determines which candidates are prioritised for experimental validation and clinical development. False negatives — neoantigens missed by low-resolution tools — represent lost therapeutic opportunities. HLA-Compass AI’s high-resolution predictions reduce this risk substantially.

Access the White Paper: Advancing Peptide–HLA Binding Prediction

Alithea Bio has released an exclusive technical white paper for researchers who want to go deeper on the science behind HLA-Compass AI’s HLA binding prediction model.

The white paper — “Advancing Peptide–HLA Binding Prediction with Deep Learning and High-Resolution Immunopeptidomics Data” — explores the strategies used to improve peptide–HLA class I prediction accuracy, with a specific focus on the challenges of large-scale data integration and rare allele performance. It is essential reading for researchers working on advanced neoantigen discovery, TCR therapy development, and cancer vaccine design.

Download the White Paper →

Frequently Asked Questions About HLA Binding Prediction

What is HLA binding prediction?

HLA binding prediction is the computational process of estimating whether a given peptide will bind to a specific HLA (Human Leukocyte Antigen) molecule and be presented on the surface of a cell for immune recognition. It is a foundational step in neoantigen discovery, cancer vaccine design, and TCR therapy development.

Why does training dataset size matter for HLA binding prediction?

Larger, higher-quality training datasets allow models to learn the binding preferences of a wider range of HLA alleles — including rare ones — and across a broader range of peptide lengths. HLA-Compass AI was trained on 1.4 million unique peptides, giving it a significant accuracy advantage over tools trained on smaller datasets.

How does HLA-Compass AI compare to MHCflurry 2.0?

HLA-Compass AI significantly outperforms MHCflurry 2.0 across peptide lengths and allele types, particularly for rare alleles and non-9-mer peptides. The performance advantage is a direct result of training on Alithea Bio’s large-scale immunopeptidomics dataset of 1.4 million unique peptides across 124 HLA class I alleles.

Which HLA alleles does HLA-Compass AI cover?

HLA-Compass AI covers 124 HLA class I alleles, including both common and rare alleles that are underrepresented or absent in most existing prediction tools.

What peptide lengths does HLA-Compass AI support?

HLA-Compass AI supports peptide lengths from 8 to 12+ amino acids, going well beyond the 9-mer focus of most standard HLA binding prediction tools.

How can I access the HLA-Compass AI white paper?

Download the white paper via alithea-bio.com It was authored by Dr. Margarita Pertseva, Scientist in Proteomics and AI at Alithea Bio, and covers the full technical methodology behind HLA-Compass AI’s binding prediction model.

Explore HLA-Compass AI

HLA-Compass AI is part of Alithea Bio’s broader immunopeptidomics platform — combining the world’s largest quantitative HLA peptide database with AI-powered tools for binding prediction, off-target toxicity assessment, and automated reanalysis.