PriorProtoNet β€” checkpoints & embedding stores

Artifacts for "Integrating Qualitative Context and Quantitative Evidence: Dynamic Information Blending in Few-Shot Bioactivity Prediction" (Piguave, Maier, & Savoie).

PriorProtoNet is a minimal text-conditioned prototypical network for few-shot small-molecule bioactivity prediction. It synthesizes per-class prototypes from a frozen language-model embedding of the assay description, predicting before any labeled molecules are seen (zero-shot) and blending the description with labeled evidence in the low-shot regime (k ≀ 5). Evaluated on FS-Mol (N = 157 test tasks).

Repository contents

priorprotonet/
β”œβ”€β”€ embeddings/          # frozen assay-description embedding stores (the prior)
β”‚   └── <store>/         # one dir per store
β”‚       β”œβ”€β”€ task_embeddings.safetensors
β”‚       β”œβ”€β”€ embedding_meta.json
β”‚       β”œβ”€β”€ chembl_to_row.json
β”‚       └── row_to_aid.json        # when present
└── checkpoints/headline/
    β”œβ”€β”€ geometry_ecfp_dim512_cectr_l0p05/   # λ₁=0.05 geometry runs (ECFP, dim 512, euclidean)
    β”‚   β”œβ”€β”€ sbiobert/best_validation.pt
    β”‚   β”œβ”€β”€ qwen3_prefix/best_validation.pt
    β”‚   └── qwen3_aug/best_validation.pt
    └── encoder_ablation/
        β”œβ”€β”€ protonet_ecfp_baseline/best_validation.pt
        β”œβ”€β”€ protonet_gnn_ecfp_baseline/best_validation.pt
        β”œβ”€β”€ priorproto_ecfp_sbiobert/best_validation.pt
        └── priorproto_gnn_ecfp_sbiobert/best_validation.pt

Embedding stores

Each embeddings/<store>/ dir holds the description prior for one text encoder / ablation. A usable store contains task_embeddings.safetensors, embedding_meta.json, and chembl_to_row.json. Pass the dir to training/eval via --embedding-dir. Text encoders covered: s-BioBERT and Qwen3-8B (prefix + augmented-assay variants), plus input-ablation and semantic-probe stores.

Checkpoints

Seven headline best_validation.pt PyTorch checkpoints:

Path Model Mol encoder Prior
geometry_…_l0p05/sbiobert PriorProtoNet ECFP, dim 512 s-BioBERT prism prefix
geometry_…_l0p05/qwen3_prefix PriorProtoNet ECFP, dim 512 Qwen3 prism prefix
geometry_…_l0p05/qwen3_aug PriorProtoNet ECFP, dim 512 Qwen3 augmented assay
encoder_ablation/protonet_ecfp_baseline ProtoNet (no prior) ECFP+FC β€”
encoder_ablation/protonet_gnn_ecfp_baseline ProtoNet (no prior) GNN+ECFP+FC β€”
encoder_ablation/priorproto_ecfp_sbiobert PriorProtoNet ECFP s-BioBERT prism prefix
encoder_ablation/priorproto_gnn_ecfp_sbiobert PriorProtoNet GNN+ECFP s-BioBERT prism prefix

Usage

Download a checkpoint + its matching embedding store, then run from the code repo:

from huggingface_hub import snapshot_download
local = snapshot_download(
    "bryanpiguavellano/priorprotonet",
    allow_patterns=[
        "checkpoints/headline/geometry_ecfp_dim512_cectr_l0p05/sbiobert/*",
        "embeddings/s_biobert_prism_unified_prefix/*",
    ],
)
python LLM_FP/PriorProtoNet/zero_shot_prior_proto_ablation.py \
    --checkpoint     <local>/checkpoints/headline/geometry_ecfp_dim512_cectr_l0p05/sbiobert/best_validation.pt \
    --data-path      datasets/fs-mol \
    --task-list-file datasets/fsmol-0.1.json \
    --embedding-dir  <local>/embeddings/s_biobert_prism_unified_prefix

Match each PriorProtoNet checkpoint to the embedding store of the same text encoder. ProtoNet baselines need no embedding store.

License

MIT. Vendored FS-Mol utilities are Β© Microsoft (MIT).

Citation

@article{piguave_priorprotonet,
  title  = {Integrating Qualitative Context and Quantitative Evidence:
            Dynamic Information Blending in Few-Shot Bioactivity Prediction},
  author = {Piguave, Bryan and Maier and Savoie},
  year   = {2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support