Retrieval-augmented prediction discovers antimicrobials across the phage pan-proteome
Intended use
Predict log10 MIC (µM) for peptide × strain pairs for research and proteome mining.
Architecture (from exported config)
- Branches: meanpool, cnn, attnpool, bi_ssm
- Conditioning: concat
- LoRA: layers=9, rank=8, alpha=16
Data split
- mmseqs90_39species: MMseqs2 90% identity cluster-disjoint splits (train/val/test).
Reference metrics (confirmatory test; winner selected on validation only)
- Val R² (mean±std over seeds): 0.415 ± 0.004
- Test ensemble: R²=0.467, RMSE=0.588, MAE=0.437, W1D=0.452
Quickstart
Recommended: use the Colab notebook shipped with the code repository for end-to-end inference.
If running locally, install the code and point it at this bundle's artifacts:
# 1) Install code (example: editable install from the GitHub repo)
# git clone https://github.com/RRocaP/peptaidev5.git
# pip install -e ./peptaidev5
# 2) Run screening inference (single checkpoint)
phamp screen \
--input peptides.fasta \
--output preds.csv \
--checkpoint checkpoints/seed0/best_model.pt \
--genome-features genome_features.pt \
--species-policy none \
--genome-policy mean \
--retrieval off \
--device cpu
Config summary: meanpool + cnn + attnpool + bi_ssm, cond=concat
Paper
- Title: Retrieval-augmented prediction discovers antimicrobials across the phage pan-proteome
Limitations
- Predictions are species/strain-conditioned and depend on genome feature coverage.
- Retrieval artifacts (if exported) must be built from TRAIN ONLY to avoid leakage.
- Computational predictions require experimental validation before translational use.
- Downloads last month
- 17
Model tree for rrp31/phamp-mic-ablate-ssm
Base model
EvolutionaryScale/esmc-600m-2024-12Evaluation results
- R² on mmseqs90_39species test splitself-reported0.467
- MAE on mmseqs90_39species test splitself-reported0.437
- RMSE on mmseqs90_39species test splitself-reported0.588