Retrieval-augmented prediction discovers antimicrobials across the phage pan-proteome

Intended use

Predict log10 MIC (µM) for peptide × strain pairs for research and proteome mining.

Architecture (from exported config)

Branches: meanpool, cnn, attnpool, bi_ssm
Conditioning: concat
LoRA: layers=9, rank=8, alpha=16

Data split

mmseqs90_39species: MMseqs2 90% identity cluster-disjoint splits (train/val/test).

Reference metrics (confirmatory test; winner selected on validation only)

Val R² (mean±std over seeds): 0.415 ± 0.004
Test ensemble: R²=0.467, RMSE=0.588, MAE=0.437, W1D=0.452

Quickstart

Recommended: use the Colab notebook shipped with the code repository for end-to-end inference.

If running locally, install the code and point it at this bundle's artifacts:

# 1) Install code (example: editable install from the GitHub repo)
# git clone https://github.com/RRocaP/peptaidev5.git
# pip install -e ./peptaidev5

# 2) Run screening inference (single checkpoint)
phamp screen \
  --input peptides.fasta \
  --output preds.csv \
  --checkpoint checkpoints/seed0/best_model.pt \
  --genome-features genome_features.pt \
  --species-policy none \
  --genome-policy mean \
  --retrieval off \
  --device cpu

Config summary: meanpool + cnn + attnpool + bi_ssm, cond=concat

Paper

Title: Retrieval-augmented prediction discovers antimicrobials across the phage pan-proteome

Limitations

Predictions are species/strain-conditioned and depend on genome feature coverage.
Retrieval artifacts (if exported) must be built from TRAIN ONLY to avoid leakage.
Computational predictions require experimental validation before translational use.

Downloads last month: 17

Model tree for rrp31/phamp-mic-ablate-ssm

Base model

EvolutionaryScale/esmc-600m-2024-12

Adapter

(1)

this model

Evaluation results

R² on mmseqs90_39species test split
self-reported

0.467
MAE on mmseqs90_39species test split
self-reported

0.437
RMSE on mmseqs90_39species test split
self-reported

0.588