PriorProtoNet β checkpoints & embedding stores
Artifacts for "Integrating Qualitative Context and Quantitative Evidence: Dynamic Information Blending in Few-Shot Bioactivity Prediction" (Piguave, Maier, & Savoie).
PriorProtoNet is a minimal text-conditioned prototypical network for few-shot small-molecule bioactivity prediction. It synthesizes per-class prototypes from a frozen language-model embedding of the assay description, predicting before any labeled molecules are seen (zero-shot) and blending the description with labeled evidence in the low-shot regime (k β€ 5). Evaluated on FS-Mol (N = 157 test tasks).
- Code: https://github.com/bryanpiguave/PriorProtoNet
- Benchmark: microsoft/FS-Mol (MIT)
Repository contents
priorprotonet/
βββ embeddings/ # frozen assay-description embedding stores (the prior)
β βββ <store>/ # one dir per store
β βββ task_embeddings.safetensors
β βββ embedding_meta.json
β βββ chembl_to_row.json
β βββ row_to_aid.json # when present
βββ checkpoints/headline/
βββ geometry_ecfp_dim512_cectr_l0p05/ # Ξ»β=0.05 geometry runs (ECFP, dim 512, euclidean)
β βββ sbiobert/best_validation.pt
β βββ qwen3_prefix/best_validation.pt
β βββ qwen3_aug/best_validation.pt
βββ encoder_ablation/
βββ protonet_ecfp_baseline/best_validation.pt
βββ protonet_gnn_ecfp_baseline/best_validation.pt
βββ priorproto_ecfp_sbiobert/best_validation.pt
βββ priorproto_gnn_ecfp_sbiobert/best_validation.pt
Embedding stores
Each embeddings/<store>/ dir holds the description prior for one text encoder /
ablation. A usable store contains task_embeddings.safetensors,
embedding_meta.json, and chembl_to_row.json. Pass the dir to training/eval
via --embedding-dir. Text encoders covered: s-BioBERT and Qwen3-8B
(prefix + augmented-assay variants), plus input-ablation and semantic-probe
stores.
Checkpoints
Seven headline best_validation.pt PyTorch checkpoints:
| Path | Model | Mol encoder | Prior |
|---|---|---|---|
geometry_β¦_l0p05/sbiobert |
PriorProtoNet | ECFP, dim 512 | s-BioBERT prism prefix |
geometry_β¦_l0p05/qwen3_prefix |
PriorProtoNet | ECFP, dim 512 | Qwen3 prism prefix |
geometry_β¦_l0p05/qwen3_aug |
PriorProtoNet | ECFP, dim 512 | Qwen3 augmented assay |
encoder_ablation/protonet_ecfp_baseline |
ProtoNet (no prior) | ECFP+FC | β |
encoder_ablation/protonet_gnn_ecfp_baseline |
ProtoNet (no prior) | GNN+ECFP+FC | β |
encoder_ablation/priorproto_ecfp_sbiobert |
PriorProtoNet | ECFP | s-BioBERT prism prefix |
encoder_ablation/priorproto_gnn_ecfp_sbiobert |
PriorProtoNet | GNN+ECFP | s-BioBERT prism prefix |
Usage
Download a checkpoint + its matching embedding store, then run from the code repo:
from huggingface_hub import snapshot_download
local = snapshot_download(
"bryanpiguavellano/priorprotonet",
allow_patterns=[
"checkpoints/headline/geometry_ecfp_dim512_cectr_l0p05/sbiobert/*",
"embeddings/s_biobert_prism_unified_prefix/*",
],
)
python LLM_FP/PriorProtoNet/zero_shot_prior_proto_ablation.py \
--checkpoint <local>/checkpoints/headline/geometry_ecfp_dim512_cectr_l0p05/sbiobert/best_validation.pt \
--data-path datasets/fs-mol \
--task-list-file datasets/fsmol-0.1.json \
--embedding-dir <local>/embeddings/s_biobert_prism_unified_prefix
Match each PriorProtoNet checkpoint to the embedding store of the same text encoder. ProtoNet baselines need no embedding store.
License
MIT. Vendored FS-Mol utilities are Β© Microsoft (MIT).
Citation
@article{piguave_priorprotonet,
title = {Integrating Qualitative Context and Quantitative Evidence:
Dynamic Information Blending in Few-Shot Bioactivity Prediction},
author = {Piguave, Bryan and Maier and Savoie},
year = {2026}
}