robustintelligence
/

pi-mmbert-v3.5

@@ -44,14 +44,14 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
 | Parameter | Value |
 |:----------|------:|
-| `pi_thresh` | 0.990 |
 | `pi_lower_bound` | 0.50 |
-| `toxic_thresh` | 0.95 |
 | Dataset | Recall | FPR |
 |:--------|-------:|----:|
-| test (262K) | 69.83% | 0.644% |
-| customer_test (1.4M) | 69.83% | 2.977% |
 ### Thresholds at 1% FPR
@@ -66,6 +66,19 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
 | test (262K) | 75.26% | 1.008% |
 | customer_test (1.4M) | 76.30% | 3.271% |
 ---
 ## Comparison vs pi-mmbert-v2
@@ -73,7 +86,7 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
 | Test FPR | pi-mmbert-v2 Recall | pi-mmbert-v3.5 Recall | Δ |
 |:---------|--------------------:|----------------------:|--:|
 | 0.127% | 35.31% | **43.59%** | +8.28pp |
-| 0.644% | 60.46% | **69.83%** | +9.37pp |
 | 1.008% | 67.13% | **75.26%** | +8.13pp |
 > v2 is a single PI-head model; v3.5 adds a toxic head with tiered detection. v2 thresholds calibrated on test benign to match v3.5 FPR exactly.
@@ -122,9 +135,9 @@ stride = max_length - chunk_overlap  # 312
 # pi_lower_bound = 0.5
 # toxic_thresh = 1.00  # effectively disabled
 # --- Tiered thresholds (0.5% FPR) ---
-pi_thresh = 0.990
 pi_lower_bound = 0.5
-toxic_thresh = 0.95
 # --- Tiered thresholds (1% FPR) ---
 # pi_thresh = 0.982
 # pi_lower_bound = 0.5
@@ -209,3 +222,9 @@ print(f"PI probability:    {pi_prob:.4f}")
 print(f"Toxic probability: {toxic_prob:.4f}")
 print(f"Prompt injection detected? {'FLAG' if is_flagged else 'ALLOW'}")
 ```

 | Parameter | Value |
 |:----------|------:|
+| `pi_thresh` | 0.984 |
 | `pi_lower_bound` | 0.50 |
+| `toxic_thresh` | 0.98 |
 | Dataset | Recall | FPR |
 |:--------|-------:|----:|
+| test (262K) | 70.56% | 0.625% |
+| customer_test (1.4M) | 73.34% | 2.691% |
 ### Thresholds at 1% FPR
 | test (262K) | 75.26% | 1.008% |
 | customer_test (1.4M) | 76.30% | 3.271% |
+### Thresholds for POV
+| Parameter | Value |
+|:----------|------:|
+| `pi_thresh` | 0.29 |
+| `pi_lower_bound` | 0.10 |
+| `toxic_thresh` | 0.89 |
+| Dataset | Recall | FPR |
+|:--------|-------:|----:|
+| test (262K) | 97.34% | 9.336% |
+| customer_test (1.4M) | 94.82% | 6.338% |
 ---
 ## Comparison vs pi-mmbert-v2
 | Test FPR | pi-mmbert-v2 Recall | pi-mmbert-v3.5 Recall | Δ |
 |:---------|--------------------:|----------------------:|--:|
 | 0.127% | 35.31% | **43.59%** | +8.28pp |
+| 0.625% | 60.46% | **70.56%** | +10.10pp |
 | 1.008% | 67.13% | **75.26%** | +8.13pp |
 > v2 is a single PI-head model; v3.5 adds a toxic head with tiered detection. v2 thresholds calibrated on test benign to match v3.5 FPR exactly.
 # pi_lower_bound = 0.5
 # toxic_thresh = 1.00  # effectively disabled
 # --- Tiered thresholds (0.5% FPR) ---
+pi_thresh = 0.984
 pi_lower_bound = 0.5
+toxic_thresh = 0.98
 # --- Tiered thresholds (1% FPR) ---
 # pi_thresh = 0.982
 # pi_lower_bound = 0.5
 print(f"Toxic probability: {toxic_prob:.4f}")
 print(f"Prompt injection detected? {'FLAG' if is_flagged else 'ALLOW'}")
 ```
+---
+## Author
+**Karthick** — [karthkal@cisco.com](mailto:karthkal@cisco.com)