| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - text-classification |
| - prompt-injection |
| - deberta-v3 |
| - lora |
| base_model: microsoft/deberta-v3-base |
| datasets: |
| - deepset/prompt-injections |
| - Lakera/gandalf_ignore_instructions |
| - Lakera/gandalf_summarization |
| - OpenAssistant/oasst1 |
| - microsoft/llmail-inject-challenge |
| - leolee99/NotInject |
| --- |
| |
| # pid-runs-v2.2 — V2.2 canonical H100 checkpoints |
|
|
| Trained-model checkpoints from the canonical V2.2 evidence run of |
| [`brandon-behring/prompt-injection-sdd`](https://github.com/brandon-behring/prompt-injection-sdd). |
| This repo ships **14 fragment checkpoints** (LoRA adapters + |
| full-FT DeBERTa-v3-base) so a colleague can reproduce the canonical V2.2 |
| numbers without re-training on an A100. |
|
|
| ## Provenance |
|
|
| - source run id: `20260511T181707Z-6a180a3a` |
| - source repo commit: see the run's `run_metadata.json` for the canonical SHA |
| - result schema: `v2.2-evidence-1` |
| - hardware: NVIDIA H100 80 GB HBM3 (RunPod) |
| - evidence package: [GitHub Release `v2.2-evidence`](https://github.com/brandon-behring/prompt-injection-sdd/releases/tag/v2.2-evidence) |
| - claim-gate status: **10/10 claim gates passed**; evidence-package gate **passed**; |
| stronger-model claim gate **failed** (V2.2 is a successful evidence-package |
| run, not a promoted stronger-model result). |
|
|
| ## Fragments |
|
|
| - `full_ft_lr1e-5_seed_42/` |
| - `full_ft_v21_seed_42/` |
| - `full_ft_v21_seed_43/` |
| - `full_ft_v21_seed_44/` |
| - `lora_no_notinject_seed_42/` |
| - `lora_no_notinject_seed_43/` |
| - `lora_no_notinject_seed_44/` |
| - `lora_r16_qv_seed_42/` |
| - `lora_r16_qv_seed_43/` |
| - `lora_r16_qv_seed_44/` |
| - `lora_v21_seed_42/` |
| - `lora_v21_seed_43/` |
| - `lora_v21_seed_44/` |
| - `lora_v21_seed_45/` |
|
|
| Each fragment directory contains the files needed to reload the model for |
| inference: tokenizer, config, weights, and the training config that |
| produced them. The reference scorers (`frozen_probe`, `lr_tfidf`, |
| `protectai_v1`, `protectai_v2`) are inference-only and are not included |
| in this repo. |
|
|
| ## Usage |
|
|
| ```python |
| from huggingface_hub import snapshot_download |
| |
| # Download a single fragment: |
| local_dir = snapshot_download( |
| repo_id="BBehring/pid-runs-v2.2", |
| allow_patterns=["lora_r16_qv_seed_43/*"], |
| ) |
| ``` |
|
|
| Or fetch the entire repo for an end-to-end reanalyze workflow as documented |
| in [`docs/DIAGNOSTICS.md`](https://github.com/brandon-behring/prompt-injection-sdd/blob/main/docs/DIAGNOSTICS.md) |
| Level 4 path 4a. |
|
|
| ## How to read the V2.2 evidence |
|
|
| Per the |
| [comprehensive evidence report](https://github.com/brandon-behring/prompt-injection-sdd/blob/main/docs/v2-2-comprehensive-evidence-report.md): |
|
|
| - Eval slices answer different claim questions; do **not** macro-average them. |
| - `older_poc_holdout` is the external-shift anchor; treat `ProtectAI v2`'s |
| 0.938 PR-AUC there as leakage-suspected (see |
| `analysis/deep_dive/protectai_leakage_refinement.json` in the source repo). |
| - `lakera_within_source_heldout` is saturated (all scorers ≥ 0.989 PR-AUC) |
| — use it as a split-hygiene check, not a robustness claim. |
|
|
| ## License |
|
|
| MIT (matches the source repo). |
|
|