Karthkal commited on
Commit
57a1249
·
verified ·
1 Parent(s): c2a0384

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -15,7 +15,7 @@ Fine-tuned from [**jhu-clsp/mmBERT-base**](https://huggingface.co/jhu-clsp/mmBER
15
  This model outputs two scores: `prompt_injection` (index 0) and `toxic` (index 1). A **tiered detection strategy** combines both heads to achieve higher recall than a single PI threshold alone.
16
 
17
  **Usage:**
18
- For a single text input, tokenize and split into overlapping chunks of ≤512 tokens (overlap=200, stride=312), run them in a batch, and take the **maximum logit across chunks** per head before applying sigmoid. Apply the tiered rule to the resulting PI and toxic probabilities.
19
 
20
  ---
21
 
@@ -37,8 +37,8 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
37
 
38
  | Dataset | Recall | FPR |
39
  |:--------|-------:|----:|
40
- | test (262K) | 43.59% | 0.127% |
41
- | customer_test (1.4M) | 42.42% | 0.388% |
42
 
43
  ### Thresholds at 0.5% FPR
44
 
@@ -50,8 +50,8 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
50
 
51
  | Dataset | Recall | FPR |
52
  |:--------|-------:|----:|
53
- | test (262K) | 70.56% | 0.625% |
54
- | customer_test (1.4M) | 73.34% | 2.691% |
55
 
56
  ### Thresholds at 1% FPR
57
 
@@ -63,8 +63,8 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
63
 
64
  | Dataset | Recall | FPR |
65
  |:--------|-------:|----:|
66
- | test (262K) | 75.26% | 1.008% |
67
- | customer_test (1.4M) | 76.30% | 3.271% |
68
 
69
  ### Thresholds for POV
70
 
@@ -76,8 +76,8 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
76
 
77
  | Dataset | Recall | FPR |
78
  |:--------|-------:|----:|
79
- | test (262K) | 97.34% | 9.336% |
80
- | customer_test (1.4M) | 94.82% | 6.338% |
81
 
82
  ---
83
 
@@ -85,9 +85,9 @@ flag = (pi >= pi_thresh) OR (pi >= pi_lower_bound AND toxic >= toxic_thresh)
85
 
86
  | Test FPR | pi-mmbert-v2 Recall | pi-mmbert-v3.5 Recall | Δ |
87
  |:---------|--------------------:|----------------------:|--:|
88
- | 0.127% | 35.31% | **43.59%** | +8.28pp |
89
- | 0.625% | 60.46% | **70.56%** | +10.10pp |
90
- | 1.008% | 67.13% | **75.26%** | +8.13pp |
91
 
92
  > v2 is a single PI-head model; v3.5 adds a toxic head with tiered detection. v2 thresholds calibrated on test benign to match v3.5 FPR exactly.
93
 
@@ -127,8 +127,8 @@ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
127
 
128
  # --- Inference parameters ---
129
  max_length = 512
130
- chunk_overlap = 200
131
- stride = max_length - chunk_overlap # 312
132
 
133
  # --- Tiered thresholds (0.1% FPR — PI-only, no tier rescue) ---
134
  # pi_thresh = 0.996
@@ -142,7 +142,7 @@ toxic_thresh = 0.98
142
  # pi_thresh = 0.982
143
  # pi_lower_bound = 0.5
144
  # toxic_thresh = 0.90
145
- # --- Thresholds for POV (test: recall=97.34%, FPR=9.336%) ---
146
  # pi_thresh = 0.29
147
  # pi_lower_bound = 0.10
148
  # toxic_thresh = 0.89
 
15
  This model outputs two scores: `prompt_injection` (index 0) and `toxic` (index 1). A **tiered detection strategy** combines both heads to achieve higher recall than a single PI threshold alone.
16
 
17
  **Usage:**
18
+ For a single text input, tokenize and split into overlapping chunks of ≤512 tokens (overlap=100, stride=412), run them in a batch, and take the **maximum logit across chunks** per head before applying sigmoid. Apply the tiered rule to the resulting PI and toxic probabilities.
19
 
20
  ---
21
 
 
37
 
38
  | Dataset | Recall | FPR |
39
  |:--------|-------:|----:|
40
+ | test (262K) | 43.70% | 0.136% |
41
+ | customer_test (1.4M) | 44.28% | 0.612% |
42
 
43
  ### Thresholds at 0.5% FPR
44
 
 
50
 
51
  | Dataset | Recall | FPR |
52
  |:--------|-------:|----:|
53
+ | test (262K) | 70.32% | 0.608% |
54
+ | customer_test (1.4M) | 70.11% | 2.474% |
55
 
56
  ### Thresholds at 1% FPR
57
 
 
63
 
64
  | Dataset | Recall | FPR |
65
  |:--------|-------:|----:|
66
+ | test (262K) | 75.00% | 0.992% |
67
+ | customer_test (1.4M) | 73.10% | 2.550% |
68
 
69
  ### Thresholds for POV
70
 
 
76
 
77
  | Dataset | Recall | FPR |
78
  |:--------|-------:|----:|
79
+ | test (262K) | 97.33% | 9.281% |
80
+ | customer_test (1.4M) | 94.83% | 6.268% |
81
 
82
  ---
83
 
 
85
 
86
  | Test FPR | pi-mmbert-v2 Recall | pi-mmbert-v3.5 Recall | Δ |
87
  |:---------|--------------------:|----------------------:|--:|
88
+ | 0.136% | 35.31% | **43.70%** | +8.39pp |
89
+ | 0.608% | 60.46% | **70.32%** | +9.86pp |
90
+ | 0.992% | 67.13% | **75.00%** | +7.87pp |
91
 
92
  > v2 is a single PI-head model; v3.5 adds a toxic head with tiered detection. v2 thresholds calibrated on test benign to match v3.5 FPR exactly.
93
 
 
127
 
128
  # --- Inference parameters ---
129
  max_length = 512
130
+ chunk_overlap = 100
131
+ stride = max_length - chunk_overlap # 412
132
 
133
  # --- Tiered thresholds (0.1% FPR — PI-only, no tier rescue) ---
134
  # pi_thresh = 0.996
 
142
  # pi_thresh = 0.982
143
  # pi_lower_bound = 0.5
144
  # toxic_thresh = 0.90
145
+ # --- Thresholds for POV (test: recall=97.33%, FPR=9.281%) ---
146
  # pi_thresh = 0.29
147
  # pi_lower_bound = 0.10
148
  # toxic_thresh = 0.89