Retrieval-augmented prediction discovers antimicrobials across the phage pan-proteome

Intended use

Predict log10 MIC (µM) for peptide × strain pairs for research and proteome mining.

Architecture (from exported config)

  • Branches: meanpool, cnn, attnpool, bi_ssm
  • Conditioning: concat
  • LoRA: layers=9, rank=8, alpha=16

Data split

  • mmseqs90_39species: MMseqs2 90% identity cluster-disjoint splits (train/val/test).

Reference metrics (confirmatory test; winner selected on validation only)

  • Val R² (mean±std over seeds): 0.415 ± 0.004
  • Test ensemble: R²=0.467, RMSE=0.588, MAE=0.437, W1D=0.452

Quickstart

Recommended: use the Colab notebook shipped with the code repository for end-to-end inference.

If running locally, install the code and point it at this bundle's artifacts:

# 1) Install code (example: editable install from the GitHub repo)
# git clone https://github.com/RRocaP/peptaidev5.git
# pip install -e ./peptaidev5

# 2) Run screening inference (single checkpoint)
phamp screen \
  --input peptides.fasta \
  --output preds.csv \
  --checkpoint checkpoints/seed0/best_model.pt \
  --genome-features genome_features.pt \
  --species-policy none \
  --genome-policy mean \
  --retrieval off \
  --device cpu

Config summary: meanpool + cnn + attnpool + bi_ssm, cond=concat

Paper

  • Title: Retrieval-augmented prediction discovers antimicrobials across the phage pan-proteome

Limitations

  • Predictions are species/strain-conditioned and depend on genome feature coverage.
  • Retrieval artifacts (if exported) must be built from TRAIN ONLY to avoid leakage.
  • Computational predictions require experimental validation before translational use.
Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rrp31/phamp-mic-ablate-ssm

Adapter
(1)
this model

Evaluation results