pid-runs-v2.2 — V2.2 canonical H100 checkpoints
Trained-model checkpoints from the canonical V2.2 evidence run of
brandon-behring/prompt-injection-sdd.
This repo ships 14 fragment checkpoints (LoRA adapters +
full-FT DeBERTa-v3-base) so a colleague can reproduce the canonical V2.2
numbers without re-training on an A100.
Provenance
- source run id:
20260511T181707Z-6a180a3a - source repo commit: see the run's
run_metadata.jsonfor the canonical SHA - result schema:
v2.2-evidence-1 - hardware: NVIDIA H100 80 GB HBM3 (RunPod)
- evidence package: GitHub Release
v2.2-evidence - claim-gate status: 10/10 claim gates passed; evidence-package gate passed; stronger-model claim gate failed (V2.2 is a successful evidence-package run, not a promoted stronger-model result).
Fragments
full_ft_lr1e-5_seed_42/full_ft_v21_seed_42/full_ft_v21_seed_43/full_ft_v21_seed_44/lora_no_notinject_seed_42/lora_no_notinject_seed_43/lora_no_notinject_seed_44/lora_r16_qv_seed_42/lora_r16_qv_seed_43/lora_r16_qv_seed_44/lora_v21_seed_42/lora_v21_seed_43/lora_v21_seed_44/lora_v21_seed_45/
Each fragment directory contains the files needed to reload the model for
inference: tokenizer, config, weights, and the training config that
produced them. The reference scorers (frozen_probe, lr_tfidf,
protectai_v1, protectai_v2) are inference-only and are not included
in this repo.
Usage
from huggingface_hub import snapshot_download
# Download a single fragment:
local_dir = snapshot_download(
repo_id="BBehring/pid-runs-v2.2",
allow_patterns=["lora_r16_qv_seed_43/*"],
)
Or fetch the entire repo for an end-to-end reanalyze workflow as documented
in docs/DIAGNOSTICS.md
Level 4 path 4a.
How to read the V2.2 evidence
Per the comprehensive evidence report:
- Eval slices answer different claim questions; do not macro-average them.
older_poc_holdoutis the external-shift anchor; treatProtectAI v2's 0.938 PR-AUC there as leakage-suspected (seeanalysis/deep_dive/protectai_leakage_refinement.jsonin the source repo).lakera_within_source_heldoutis saturated (all scorers ≥ 0.989 PR-AUC) — use it as a split-hygiene check, not a robustness claim.
License
MIT (matches the source repo).
Model tree for BBehring/pid-runs-v2.2
Base model
microsoft/deberta-v3-base