PepForge — Model Weights

Pre-trained model weights for PepForge, a hierarchical deep learning framework for generating peptides with special connections using HELM notation.

Architecture

PepForge uses a three-stage cascade (Layout → Content → Connection) for generation and a 4-model MCC-weighted ensemble for AMP activity prediction. The prediction ensemble was retrained 2026-04-28/29 on CLSI MIC-only DBAASP data with members selected by validation MCC (test set never consulted at the selection step).

Generation Models

Stage	File	Architecture	Test PPL / Metric
Layout	`Generation/Layout/260210_GPT.pt`	GPT (d=64, L=1)	PPL = 2.24
Content (autoregressive, default)	`Generation/Content/GPT_L_260226.pt`	GPT (d=768, L=12)	PPL = 6.61
Content (masked, infilling)	`Generation/Content/BERT_L_260301.pt`	BERT (d=768, L=12)	PPL = 9.15
Connection	`Generation/Connection/GAT_L_260226.pt`	GAT (d=768, L=6)	Exist F1 = 0.971, Type Macro-F1 = 0.912

Prediction Models — AMP Ensemble (260428/260429)

Each member is the best of its (encoding, model-type) quadrant by validation MCC.

File	Type	Encoding	Test Acc	Test Macro-F1	Test MCC	Weight (val MCC)
`Prediction/AMP/LSTM_L_260428_SMILES.pt`	LLM	SMILES	0.7167	0.5663	0.5871	0.6121
`Prediction/AMP/LSTM_M_260429_HELM.pt`	LLM	HELM	0.7058	0.5811	0.5717	0.6021
`Prediction/AMP/GCN_L_260429_HELM.pt`	GNN	HELM	0.6355	0.5047	0.4844	0.5136
`Prediction/AMP/GCN_L_260428_SMILES.pt`	GNN	SMILES	0.6165	0.4478	0.4630	0.4791

Held-out ensemble performance (test split, 2,206 samples; full report in ensemble_test_eval.json):

Strategy	Acc	Macro-F1	Weighted-F1	MCC
`soft_vote` (uniform 0.25 each)	0.7393	0.6049	0.7377	0.6175
`weighted_vote` (val-MCC weights, default)	0.7421	0.6092	0.7403	0.6216

The weighted ensemble exceeds the best single member (LSTM/L SMILES, MCC 0.5871) by +0.0345.

Quick Start

git clone https://github.com/wqx1999/PepForge.git
cd PepForge
python install.py          # Installs env + downloads all models & data

# Generation + AMP prediction in one cascade call
python Pipelines/Inference.py --num_samples 100 --predict amp

For details, see the GitHub repository.

File Structure

pepforge-model/
├── Generation/
│   ├── Layout/260210_GPT.pt              (534 KB)
│   ├── Content/GPT_L_260226.pt           (1.0 GB)
│   ├── Content/BERT_L_260301.pt          (1.0 GB)
│   ├── Connection/GAT_L_260226.pt        (606 MB)
│   └── MODEL_REGISTRY.md
└── Prediction/AMP/
    ├── ensemble_config.json
    ├── ensemble_test_eval.json
    ├── LSTM_L_260428_SMILES.pt           (812 MB, LLM, SMILES)
    ├── LSTM_M_260429_HELM.pt             (270 MB, LLM, HELM)
    ├── GCN_L_260429_HELM.pt              (545 MB, GNN, HELM)
    ├── GCN_L_260428_SMILES.pt            (1.3 GB, GNN, SMILES)
    └── MODEL_REGISTRY.md

Total size: ~5.5 GB

Related Resources

Code: wqx1999/PepForge
Training data: pepforge-training-data
Generated library: pepforge-generated-data
Figure data: pepforge-fig-data

Citation

@article{wang2026pepforge,
  title={PepForge: Hierarchical HELM-Based Peptide Generation},
  author={Wang, Qingxin and Süssmuth, Roderich D.},
  journal={bioRxiv},
  year={2026},
  doi={10.64898/2026.05.29.728379},
  url={https://www.biorxiv.org/content/10.64898/2026.05.29.728379v1}
}

License

CC-BY-4.0

Downloads last month: -; Downloads are not tracked for this model. How to track