CHIMERA-Bench v1.0
A unified benchmark for epitope-specific antibody CDR sequence-structure co-design.
Paper: CHIMERA-Bench: A Benchmark Dataset for Epitope-Specific Antibody Design (ICLR 2026 GEM Workshop)
Code: github.com/mansoorbaloch/chimera-bench
Dataset Summary
| Property | Value |
|---|---|
| Complexes | 2,922 |
| PDB structures | 2,721 |
| Pre-computed features | 2,941 .pt files |
| Splits | 3 (epitope-group, antigen-fold, temporal) |
| Numbering schemes | IMGT, Chothia |
| Contact cutoff | 4.5 A |
| Resolution cutoff | 4.0 A |
| Baselines evaluated | 11 methods, 6 paradigms |
Download
This dataset contains binary PyTorch files (.pt) and PDB structures that require manual download. Use the HuggingFace CLI:
export CHIMERA_DATA_ROOT=/path/to/chimera-bench-v1.0
huggingface-cli download mansoorbaloch/chimera-bench --repo-type dataset --local-dir $CHIMERA_DATA_ROOT
Directory Structure
chimera-bench-v1.0/
README.md
metadata/
final_summary.csv # 2,922 complexes with 32 columns
excluded_complexes.csv # 59 excluded complexes with reasons
antibody_sequences.fasta # VH+VL sequences for all complexes
contamination_audit.json # PLM training data overlap analysis
splits/
epitope_group.json # Primary split (2338/292/292)
antigen_fold.json # Fold-based generalization (2338/292/292)
temporal.json # Prospective evaluation (2337/292/293)
complex_features/ # Per-complex PyTorch tensors (2,941 files)
{complex_id}.pt
structures/ # PDB structure files (2,721 files)
{pdb}.pdb
Complex Features Format
Each .pt file is a Python dict with:
Sequences
complex_id: str -- unique identifier ({pdb}{Hchain}{Lchain}_{Agchain})heavy_sequence,light_sequence,antigen_sequence: str -- one-letter AA
Coordinates
heavy_atom14_coords: float32 (N_h, 14, 3) -- 14-atom representationheavy_atom14_mask: bool (N_h, 14) -- valid atom flagsheavy_ca_coords: float32 (N_h, 3) -- CA-only coordinates- Same for
light_*andantigen_*
Annotations
epitope_residues: list of (chain, resid, resname) tuplesparatope_residues: list of (chain, resid, resname) tuplescontact_pairs: list of (ab_chain, ab_resid, ab_resname, ag_chain, ag_resid, ag_resname, distance)
Numbering
numbering: dict withimgtandchothiasub-dicts, each containingheavyandlightlists of (resnum, icode, aa) tuplescdr_masks: dict withimgtandchothiasub-dicts, each containingheavyandlightint lists (-1=framework; heavy: 0=H1, 1=H2, 2=H3; light: 3=L1, 4=L2, 5=L3)
Surface Features
ag_surface_points: float32 (128, 3) -- sampled antigen surface pointsag_surface_normals: float32 (128, 3)ag_surface_curvatures: float32 (128, 2) -- mean and Gaussian curvatureag_surface_chemical_feats: float32 (128, 6) -- hydropathy, charge, H-bond donor/acceptor, aromaticity, polarity- Same for
heavy_surface_*andlight_surface_*
Splits
| Split | Train | Val | Test | Generalization Axis |
|---|---|---|---|---|
| epitope_group | 2,338 | 292 | 292 | Unseen epitope patterns |
| antigen_fold | 2,338 | 292 | 292 | Unseen antigen folds |
| temporal | 2,337 | 292 | 293 | Prospective (by deposition date) |
Each split JSON has keys train, val, test mapping to lists of complex_id strings.
Evaluation Metrics
| Group | Metrics |
|---|---|
| Sequence quality | AAR, CAAR, PPL |
| Structural accuracy | RMSD (Kabsch-aligned CA), TM-score |
| Binding interface | Fnat, iRMSD, DockQ |
| Epitope specificity | EpiF1 (precision, recall, F1) |
| Designability | n_liabilities (NG, DG, DS, DD, NS, NT, M motifs) |
Quick Start
import torch, json, pandas as pd
# Load metadata
summary = pd.read_csv("metadata/final_summary.csv")
# Load a split
with open("splits/epitope_group.json") as f:
split = json.load(f)
print(f"Train: {len(split['train'])}, Val: {len(split['val'])}, Test: {len(split['test'])}")
# Load a complex
feat = torch.load(f"complex_features/{split['test'][0]}.pt", weights_only=False)
print(feat['complex_id'], feat['heavy_sequence'][:20], "...")
print(f"Epitope residues: {len(feat['epitope_residues'])}")
print(f"CDR-H3 (IMGT): positions where cdr_masks['imgt']['heavy'] == 2")
Citation
@inproceedings{
ahmed2026chimerabench,
title={{CHIMERA}-Bench: A Benchmark Dataset for Epitope-Specific Antibody Design},
author={Mansoor Ahmed and Nadeem Taj and Imdad Ullah Khan and Hemanth Venkateswara and Murray Patterson},
booktitle={ICLR 2026 Workshop on Generative and Experimental Perspectives for Biomolecular Design},
year={2026},
url={https://openreview.net/forum?id=PyZvVIJbSy}
}
License
Data: CC-BY 4.0. Code: MIT.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support