robustintelligence
/

pi-mmbert-v3.5

@@ -15,7 +15,7 @@ Fine-tuned from [**jhu-clsp/mmBERT-base**](https://huggingface.co/jhu-clsp/mmBER
 This model outputs two scores: `prompt_injection` (index 0) and `toxic` (index 1). A **tiered detection strategy** combines both heads to achieve higher recall than a single PI threshold alone.
 **Usage:**
-For a single text input, tokenize and split into overlapping chunks of ≤512 tokens (overlap=200, stride=312), run them in a batch, and take the **maximum logit across chunks** per head before applying sigmoid. Apply the tiered rule to the resulting PI and toxic probabilities.
 ---
@@ -37,8 +37,8 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
 | Dataset | Recall | FPR |
 |:--------|-------:|----:|
-| test (262K) | 43.59% | 0.127% |
-| customer_test (1.4M) | 42.42% | 0.388% |
 ### Thresholds at 0.5% FPR
@@ -50,8 +50,8 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
 | Dataset | Recall | FPR |
 |:--------|-------:|----:|
-| test (262K) | 70.56% | 0.625% |
-| customer_test (1.4M) | 73.34% | 2.691% |
 ### Thresholds at 1% FPR
@@ -63,8 +63,8 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
 | Dataset | Recall | FPR |
 |:--------|-------:|----:|
-| test (262K) | 75.26% | 1.008% |
-| customer_test (1.4M) | 76.30% | 3.271% |
 ### Thresholds for POV
@@ -76,8 +76,8 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
 | Dataset | Recall | FPR |
 |:--------|-------:|----:|
-| test (262K) | 97.34% | 9.336% |
-| customer_test (1.4M) | 94.82% | 6.338% |
 ---
@@ -85,9 +85,9 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
 | Test FPR | pi-mmbert-v2 Recall | pi-mmbert-v3.5 Recall | Δ |
 |:---------|--------------------:|----------------------:|--:|
-| 0.127% | 35.31% | **43.59%** | +8.28pp |
-| 0.625% | 60.46% | **70.56%** | +10.10pp |
-| 1.008% | 67.13% | **75.26%** | +8.13pp |
 > v2 is a single PI-head model; v3.5 adds a toxic head with tiered detection. v2 thresholds calibrated on test benign to match v3.5 FPR exactly.
@@ -127,8 +127,8 @@ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
 # --- Inference parameters ---
 max_length = 512
-chunk_overlap = 200
-stride = max_length - chunk_overlap  # 312
 # --- Tiered thresholds (0.1% FPR — PI-only, no tier rescue) ---
 # pi_thresh = 0.996
@@ -142,7 +142,7 @@ toxic_thresh = 0.98
 # pi_thresh = 0.982
 # pi_lower_bound = 0.5
 # toxic_thresh = 0.90
-# --- Thresholds for POV (test: recall=97.34%, FPR=9.336%) ---
 # pi_thresh = 0.29
 # pi_lower_bound = 0.10
 # toxic_thresh = 0.89

 This model outputs two scores: `prompt_injection` (index 0) and `toxic` (index 1). A **tiered detection strategy** combines both heads to achieve higher recall than a single PI threshold alone.
 **Usage:**
+For a single text input, tokenize and split into overlapping chunks of ≤512 tokens (overlap=100, stride=412), run them in a batch, and take the **maximum logit across chunks** per head before applying sigmoid. Apply the tiered rule to the resulting PI and toxic probabilities.
 ---
 | Dataset | Recall | FPR |
 |:--------|-------:|----:|
+| test (262K) | 43.70% | 0.136% |
+| customer_test (1.4M) | 44.28% | 0.612% |
 ### Thresholds at 0.5% FPR
 | Dataset | Recall | FPR |
 |:--------|-------:|----:|
+| test (262K) | 70.32% | 0.608% |
+| customer_test (1.4M) | 70.11% | 2.474% |
 ### Thresholds at 1% FPR
 | Dataset | Recall | FPR |
 |:--------|-------:|----:|
+| test (262K) | 75.00% | 0.992% |
+| customer_test (1.4M) | 73.10% | 2.550% |
 ### Thresholds for POV
 | Dataset | Recall | FPR |
 |:--------|-------:|----:|
+| test (262K) | 97.33% | 9.281% |
+| customer_test (1.4M) | 94.83% | 6.268% |
 ---
 | Test FPR | pi-mmbert-v2 Recall | pi-mmbert-v3.5 Recall | Δ |
 |:---------|--------------------:|----------------------:|--:|
+| 0.136% | 35.31% | **43.70%** | +8.39pp |
+| 0.608% | 60.46% | **70.32%** | +9.86pp |
+| 0.992% | 67.13% | **75.00%** | +7.87pp |
 > v2 is a single PI-head model; v3.5 adds a toxic head with tiered detection. v2 thresholds calibrated on test benign to match v3.5 FPR exactly.
 # --- Inference parameters ---
 max_length = 512
+chunk_overlap = 100
+stride = max_length - chunk_overlap  # 412
 # --- Tiered thresholds (0.1% FPR — PI-only, no tier rescue) ---
 # pi_thresh = 0.996
 # pi_thresh = 0.982
 # pi_lower_bound = 0.5
 # toxic_thresh = 0.90
+# --- Thresholds for POV (test: recall=97.33%, FPR=9.281%) ---
 # pi_thresh = 0.29
 # pi_lower_bound = 0.10
 # toxic_thresh = 0.89