PepForge β Model Weights
Pre-trained model weights for PepForge, a hierarchical deep learning framework for generating peptides with special connections using HELM notation.
Architecture
PepForge uses a three-stage cascade (Layout β Content β Connection) for generation and a 4-model MCC-weighted ensemble for AMP activity prediction. The prediction ensemble was retrained 2026-04-28/29 on CLSI MIC-only DBAASP data with members selected by validation MCC (test set never consulted at the selection step).
Generation Models
| Stage | File | Architecture | Test PPL / Metric |
|---|---|---|---|
| Layout | Generation/Layout/260210_GPT.pt |
GPT (d=64, L=1) | PPL = 2.24 |
| Content (autoregressive, default) | Generation/Content/GPT_L_260226.pt |
GPT (d=768, L=12) | PPL = 6.61 |
| Content (masked, infilling) | Generation/Content/BERT_L_260301.pt |
BERT (d=768, L=12) | PPL = 9.15 |
| Connection | Generation/Connection/GAT_L_260226.pt |
GAT (d=768, L=6) | Exist F1 = 0.971, Type Macro-F1 = 0.912 |
Prediction Models β AMP Ensemble (260428/260429)
Each member is the best of its (encoding, model-type) quadrant by validation MCC.
| File | Type | Encoding | Test Acc | Test Macro-F1 | Test MCC | Weight (val MCC) |
|---|---|---|---|---|---|---|
Prediction/AMP/LSTM_L_260428_SMILES.pt |
LLM | SMILES | 0.7167 | 0.5663 | 0.5871 | 0.6121 |
Prediction/AMP/LSTM_M_260429_HELM.pt |
LLM | HELM | 0.7058 | 0.5811 | 0.5717 | 0.6021 |
Prediction/AMP/GCN_L_260429_HELM.pt |
GNN | HELM | 0.6355 | 0.5047 | 0.4844 | 0.5136 |
Prediction/AMP/GCN_L_260428_SMILES.pt |
GNN | SMILES | 0.6165 | 0.4478 | 0.4630 | 0.4791 |
Held-out ensemble performance (test split, 2,206 samples; full report in ensemble_test_eval.json):
| Strategy | Acc | Macro-F1 | Weighted-F1 | MCC |
|---|---|---|---|---|
soft_vote (uniform 0.25 each) |
0.7393 | 0.6049 | 0.7377 | 0.6175 |
weighted_vote (val-MCC weights, default) |
0.7421 | 0.6092 | 0.7403 | 0.6216 |
The weighted ensemble exceeds the best single member (LSTM/L SMILES, MCC 0.5871) by +0.0345.
Quick Start
git clone https://github.com/wqx1999/PepForge.git
cd PepForge
python install.py # Installs env + downloads all models & data
# Generation + AMP prediction in one cascade call
python Pipelines/Inference.py --num_samples 100 --predict amp
For details, see the GitHub repository.
File Structure
pepforge-model/
βββ Generation/
β βββ Layout/260210_GPT.pt (534 KB)
β βββ Content/GPT_L_260226.pt (1.0 GB)
β βββ Content/BERT_L_260301.pt (1.0 GB)
β βββ Connection/GAT_L_260226.pt (606 MB)
β βββ MODEL_REGISTRY.md
βββ Prediction/AMP/
βββ ensemble_config.json
βββ ensemble_test_eval.json
βββ LSTM_L_260428_SMILES.pt (812 MB, LLM, SMILES)
βββ LSTM_M_260429_HELM.pt (270 MB, LLM, HELM)
βββ GCN_L_260429_HELM.pt (545 MB, GNN, HELM)
βββ GCN_L_260428_SMILES.pt (1.3 GB, GNN, SMILES)
βββ MODEL_REGISTRY.md
Total size: ~5.5 GB
Related Resources
- Code: wqx1999/PepForge
- Training data: pepforge-training-data
- Generated library: pepforge-generated-data
- Figure data: pepforge-fig-data
Citation
@article{wang2026pepforge,
title={PepForge: Hierarchical HELM-Based Peptide Generation},
author={Wang, Qingxin and SΓΌssmuth, Roderich D.},
journal={bioRxiv},
year={2026},
doi={10.64898/2026.05.29.728379},
url={https://www.biorxiv.org/content/10.64898/2026.05.29.728379v1}
}
License
CC-BY-4.0