PepForge β€” Model Weights

Pre-trained model weights for PepForge, a hierarchical deep learning framework for generating peptides with special connections using HELM notation.

Architecture

PepForge uses a three-stage cascade (Layout β†’ Content β†’ Connection) for generation and a 4-model MCC-weighted ensemble for AMP activity prediction. The prediction ensemble was retrained 2026-04-28/29 on CLSI MIC-only DBAASP data with members selected by validation MCC (test set never consulted at the selection step).

Generation Models

Stage File Architecture Test PPL / Metric
Layout Generation/Layout/260210_GPT.pt GPT (d=64, L=1) PPL = 2.24
Content (autoregressive, default) Generation/Content/GPT_L_260226.pt GPT (d=768, L=12) PPL = 6.61
Content (masked, infilling) Generation/Content/BERT_L_260301.pt BERT (d=768, L=12) PPL = 9.15
Connection Generation/Connection/GAT_L_260226.pt GAT (d=768, L=6) Exist F1 = 0.971, Type Macro-F1 = 0.912

Prediction Models β€” AMP Ensemble (260428/260429)

Each member is the best of its (encoding, model-type) quadrant by validation MCC.

File Type Encoding Test Acc Test Macro-F1 Test MCC Weight (val MCC)
Prediction/AMP/LSTM_L_260428_SMILES.pt LLM SMILES 0.7167 0.5663 0.5871 0.6121
Prediction/AMP/LSTM_M_260429_HELM.pt LLM HELM 0.7058 0.5811 0.5717 0.6021
Prediction/AMP/GCN_L_260429_HELM.pt GNN HELM 0.6355 0.5047 0.4844 0.5136
Prediction/AMP/GCN_L_260428_SMILES.pt GNN SMILES 0.6165 0.4478 0.4630 0.4791

Held-out ensemble performance (test split, 2,206 samples; full report in ensemble_test_eval.json):

Strategy Acc Macro-F1 Weighted-F1 MCC
soft_vote (uniform 0.25 each) 0.7393 0.6049 0.7377 0.6175
weighted_vote (val-MCC weights, default) 0.7421 0.6092 0.7403 0.6216

The weighted ensemble exceeds the best single member (LSTM/L SMILES, MCC 0.5871) by +0.0345.

Quick Start

git clone https://github.com/wqx1999/PepForge.git
cd PepForge
python install.py          # Installs env + downloads all models & data
# Generation + AMP prediction in one cascade call
python Pipelines/Inference.py --num_samples 100 --predict amp

For details, see the GitHub repository.

File Structure

pepforge-model/
β”œβ”€β”€ Generation/
β”‚   β”œβ”€β”€ Layout/260210_GPT.pt              (534 KB)
β”‚   β”œβ”€β”€ Content/GPT_L_260226.pt           (1.0 GB)
β”‚   β”œβ”€β”€ Content/BERT_L_260301.pt          (1.0 GB)
β”‚   β”œβ”€β”€ Connection/GAT_L_260226.pt        (606 MB)
β”‚   └── MODEL_REGISTRY.md
└── Prediction/AMP/
    β”œβ”€β”€ ensemble_config.json
    β”œβ”€β”€ ensemble_test_eval.json
    β”œβ”€β”€ LSTM_L_260428_SMILES.pt           (812 MB, LLM, SMILES)
    β”œβ”€β”€ LSTM_M_260429_HELM.pt             (270 MB, LLM, HELM)
    β”œβ”€β”€ GCN_L_260429_HELM.pt              (545 MB, GNN, HELM)
    β”œβ”€β”€ GCN_L_260428_SMILES.pt            (1.3 GB, GNN, SMILES)
    └── MODEL_REGISTRY.md

Total size: ~5.5 GB

Related Resources

Citation

@article{wang2026pepforge,
  title={PepForge: Hierarchical HELM-Based Peptide Generation},
  author={Wang, Qingxin and SΓΌssmuth, Roderich D.},
  journal={bioRxiv},
  year={2026},
  doi={10.64898/2026.05.29.728379},
  url={https://www.biorxiv.org/content/10.64898/2026.05.29.728379v1}
}

License

CC-BY-4.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support