|
|
--- |
|
|
license: apache-2.0 |
|
|
metrics: |
|
|
- accuracy |
|
|
tags: |
|
|
- denovo |
|
|
- antiboy |
|
|
- sequence |
|
|
- NLP |
|
|
--- |
|
|
|
|
|
# AbNovoBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Monoclonal Antibody De Novo Sequencing Analysis |
|
|
|
|
|
This repository contains a curated collection of state-of-the-art de novo peptide sequencing models specifically benchmarked for monoclonal antibody (mAb) sequencing from mass spectrometry data. AbNovoBench provides the largest high-quality dataset to date, comprising 1,638,248 peptide-spectrum matches derived from 131 mAbs across six species and 11 proteases, supplemented by eight mAbs with known sequence information for assessing full-length reconstruction. |
|
|
|
|
|
## π Models |
|
|
|
|
|
This repository includes the following models that have been comprehensively evaluated in our benchmark: |
|
|
|
|
|
|
|
|
|
|
|
### **AdaNovo** |
|
|
- **Model**: `AdaNovo/epoch=2-step=170451.ckpt` |
|
|
- **Description**: Adaptive de novo peptide sequencing model with enhanced accuracy for complex spectra |
|
|
- **Repository**: [https://github.com/Westlake-OmicsAI/adanovo_v1](https://github.com/Westlake-OmicsAI/adanovo_v1) |
|
|
|
|
|
### **CasaNovo** |
|
|
- **Models**: |
|
|
- `CasaNovoV1/epoch=10-step=600000.ckpt` (V1) |
|
|
- `CasaNovoV2/epoch=7-step=400000.ckpt` (V2) |
|
|
- **Description**: High-throughput de novo peptide sequencing models with improved performance |
|
|
- **Repository**: [https://github.com/Noble-Lab/casanovo](https://github.com/Noble-Lab/casanovo) |
|
|
|
|
|
### **ContraNovo** |
|
|
- **Model**: `ContraNovo/ControNovo.ckpt` |
|
|
- **Description**: Contrastive learning-based de novo peptide sequencing model |
|
|
- **Repository**: [https://github.com/BEAM-Labs/ContraNovo](https://github.com/BEAM-Labs/ContraNovo) |
|
|
|
|
|
### **DeepNovo** |
|
|
- **Model**: `DeepNovo/translate.ckpt-283400.*` |
|
|
- **Description**: Deep learning-based de novo peptide sequencing with attention mechanisms |
|
|
- **Repository**: [https://github.com/nh2tran/DeepNovo](https://github.com/nh2tran/DeepNovo) |
|
|
|
|
|
### **InstaNovo** |
|
|
- **Model**: `InstaNovo/epoch=59-step=1700000.ckpt` |
|
|
- **Description**: Real-time de novo peptide sequencing model optimized for speed and accuracy |
|
|
- **Repository**: [https://github.com/instadeepai/InstaNovo](https://github.com/instadeepai/InstaNovo) |
|
|
|
|
|
### **PepNet** |
|
|
- **Model**: `PepNet/model.h5` |
|
|
- **Description**: Neural network-based peptide sequence prediction model |
|
|
- **Repository**: [https://github.com/lkytal/pepnet](https://github.com/lkytal/pepnet) |
|
|
|
|
|
### **PGPointNovo** |
|
|
- **Models**: |
|
|
- `PGPointNovo/backward_deepnovo.pth` |
|
|
- `PGPointNovo/forward_deepnovo.pth` |
|
|
- **Description**: Point-based graph neural network for de novo peptide sequencing |
|
|
- **Repository**: [https://github.com/shallFun4Learning/PGPointNovo](https://github.com/shallFun4Learning/PGPointNovo) |
|
|
|
|
|
### **pi-HelixNovo** |
|
|
- **Model**: `pi-HelixNovo/epoch=14-step=800000.ckpt` |
|
|
- **Description**: Helix-inspired architecture for peptide sequence prediction |
|
|
- **Repository**: [https://github.com/PHOENIXcenter/pi-HelixNovo](https://github.com/PHOENIXcenter/pi-HelixNovo) |
|
|
|
|
|
### **pi-PrimeNovo** |
|
|
- **Model**: `pi-PrimeNovo/model_massive.ckpt` |
|
|
- **Description**: Prime-based de novo peptide sequencing model with massive training |
|
|
- **Repository**: [https://github.com/PHOENIXcenter/pi-HelixNovo](https://github.com/PHOENIXcenter/pi-HelixNovo) |
|
|
|
|
|
### **PointNovo** |
|
|
- **Models**: |
|
|
- `PointNovo/backward_deepnovo.pth` |
|
|
- `PointNovo/forward_deepnovo.pth` |
|
|
- **Description**: Point cloud-based approach for de novo peptide sequencing |
|
|
- **Repository**: [https://github.com/irleader/PointNovo](https://github.com/irleader/PointNovo) |
|
|
|
|
|
### **SMSNet** |
|
|
- **Model**: `SMSNet/translate.ckpt-680000.*` |
|
|
- **Description**: Sequence-to-sequence model for mass spectrometry-based peptide sequencing |
|
|
- **Repository**: [https://github.com/cmb-chula/SMSNet](https://github.com/cmb-chula/SMSNet) |
|
|
|
|
|
## π Usage |
|
|
|
|
|
For detailed usage instructions, implementation examples, and model-specific documentation, please refer to the original repositories listed above for each model. Each repository contains: |
|
|
|
|
|
- **Installation instructions** |
|
|
- **Model loading examples** |
|
|
- **Training procedures** |
|
|
- **Inference code** |
|
|
- **Performance benchmarks** |
|
|
- **Dataset information** |
|
|
|
|
|
This collection serves as a centralized repository of pre-trained models for easy access and comparison. |
|
|
|
|
|
## π Benchmark Results |
|
|
|
|
|
Our comprehensive evaluation of 13 deep learning-based de novo peptide sequencing algorithms across six metric categories revealed: |
|
|
|
|
|
### **Peptide Sequencing Performance** |
|
|
- **Transformer-based models** (ContraNovo, Casanovo V1, and InstaNovo) showed superior performance |
|
|
- **Precision and recall**: 0.73β0.79 for amino acids and 0.60β0.67 for peptides |
|
|
- **High efficacy** in detecting post-translational modifications |
|
|
- **Excellent generalization** across diverse enzymes and species |
|
|
|
|
|
### **Assembly Performance** |
|
|
- **Template-guided Fusion assembler** achieved error-free reconstruction of all chains and complementarity-determining regions (CDRs) |
|
|
- **Superior coverage, accuracy, and gap minimization** when using high-quality peptide reads from six algorithms |
|
|
- **Comprehensive evaluation** across coverage depth and assembly score metrics |
|
|
|
|
|
## π¬ Research Applications |
|
|
|
|
|
AbNovoBench is specifically designed for monoclonal antibody research and applications: |
|
|
|
|
|
- **Antibody Discovery**: De novo sequencing of monoclonal antibodies from mass spectrometry data |
|
|
- **Therapeutic Development**: Characterization of antibody sequences for drug development |
|
|
- **Clinical Diagnostics**: Antibody sequencing for diagnostic applications |
|
|
- **Proteomics Research**: Standardized benchmarking for antibody-specific algorithm development |
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you use AbNovoBench in your research, please cite our paper: |
|
|
|
|
|
```bibtex |
|
|
@misc{jiang2025abnovobench, |
|
|
title = {AbNovoBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Monoclonal Antibody De Novo Sequencing Analysis}, |
|
|
author = {Wenbin Jiang and Ling Luo and Lihong Huang and Jin Xiao and Zihan Lin and Yijie Qiu and Jiying Wang and Ouyang Hu and Sainan Zhang and Mengsha Tong and Ningshao Xia and Yueting Xiong and Quan Yuan and Rongshan Yu}, |
|
|
year = {2025}, |
|
|
howpublished = {https://github.com/dumbgoos/AbNovoBench} |
|
|
} |
|
|
``` |
|
|
|
|
|
## π€ Contributing |
|
|
|
|
|
We welcome contributions to improve the models or add new ones. Please: |
|
|
|
|
|
1. Fork the repository |
|
|
2. Create a feature branch |
|
|
3. Make your changes |
|
|
4. Submit a pull request |
|
|
|
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
We thank the original authors of each model for their contributions to the field of de novo peptide sequencing. This collection represents the collaborative effort of the proteomics community. AbNovoBench is available at [https://abnovobench.com](https://abnovobench.com) and provides a scalable, community-driven platform enriched with an extensive antibody MS data resource to accelerate antibody-specific algorithm development and enhance proteomic reproducibility. |
|
|
|
|
|
## π Contact |
|
|
|
|
|
For questions or support, please open an issue on this repository or contact the maintainers. |
|
|
|
|
|
--- |
|
|
|
|
|
**Note**: These models are provided for research purposes. Please ensure you have the appropriate licenses and permissions for your specific use case. |