add Model Card

2115324 verified 18 days ago

3.05 kB

license: mit
language:
  - en
tags:
  - text-classification
  - prompt-injection
  - deberta-v3
  - lora
base_model: microsoft/deberta-v3-base
datasets:
  - deepset/prompt-injections
  - Lakera/gandalf_ignore_instructions
  - Lakera/gandalf_summarization
  - OpenAssistant/oasst1
  - microsoft/llmail-inject-challenge
  - leolee99/NotInject

pid-runs-v2.2 — V2.2 canonical H100 checkpoints

Trained-model checkpoints from the canonical V2.2 evidence run of brandon-behring/prompt-injection-sdd. This repo ships 14 fragment checkpoints (LoRA adapters + full-FT DeBERTa-v3-base) so a colleague can reproduce the canonical V2.2 numbers without re-training on an A100.

Provenance

source run id: 20260511T181707Z-6a180a3a
source repo commit: see the run's run_metadata.json for the canonical SHA
result schema: v2.2-evidence-1
hardware: NVIDIA H100 80 GB HBM3 (RunPod)
evidence package: GitHub Release v2.2-evidence
claim-gate status: 10/10 claim gates passed; evidence-package gate passed; stronger-model claim gate failed (V2.2 is a successful evidence-package run, not a promoted stronger-model result).

Fragments

full_ft_lr1e-5_seed_42/
full_ft_v21_seed_42/
full_ft_v21_seed_43/
full_ft_v21_seed_44/
lora_no_notinject_seed_42/
lora_no_notinject_seed_43/
lora_no_notinject_seed_44/
lora_r16_qv_seed_42/
lora_r16_qv_seed_43/
lora_r16_qv_seed_44/
lora_v21_seed_42/
lora_v21_seed_43/
lora_v21_seed_44/
lora_v21_seed_45/

Each fragment directory contains the files needed to reload the model for inference: tokenizer, config, weights, and the training config that produced them. The reference scorers (frozen_probe, lr_tfidf, protectai_v1, protectai_v2) are inference-only and are not included in this repo.

Usage

from huggingface_hub import snapshot_download

# Download a single fragment:
local_dir = snapshot_download(
    repo_id="BBehring/pid-runs-v2.2",
    allow_patterns=["lora_r16_qv_seed_43/*"],
)

Or fetch the entire repo for an end-to-end reanalyze workflow as documented in docs/DIAGNOSTICS.md Level 4 path 4a.

How to read the V2.2 evidence

Per the comprehensive evidence report:

Eval slices answer different claim questions; do not macro-average them.
older_poc_holdout is the external-shift anchor; treat ProtectAI v2's 0.938 PR-AUC there as leakage-suspected (see analysis/deep_dive/protectai_leakage_refinement.json in the source repo).
lakera_within_source_heldout is saturated (all scorers ≥ 0.989 PR-AUC) — use it as a split-hygiene check, not a robustness claim.

License

MIT (matches the source repo).