Correct Key Results table + add FPR explanation
Browse files
README.md
CHANGED
|
@@ -43,11 +43,17 @@ A 184M-parameter DeBERTa-v3-base guardrail classifier trained on 57,000+ real-wo
|
|
| 43 |
- B-10 Regulatory Enquiry Mishandling — EU AI Act Art. 52
|
| 44 |
- B-11 AML/Sanctions Evasion — FATF Recommendation 10
|
| 45 |
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
---
|
| 53 |
|
|
@@ -75,18 +81,6 @@ Generated using **Quantum Circuit Born Machine (QCBM)** sampling on PennyLane
|
|
| 75 |
|
| 76 |
1,273 adversarial prompts generated via QCBM + simulated annealing targeting Semalith's decision boundary. Covers professional and retail registers. Overall Semalith miss rate: 14.3%.
|
| 77 |
|
| 78 |
-
Techniques: SA Annealing (344), QCBM 8q boundary (173), QCBM 8q gradient (125), 10q Paraphrase Fix1 (123), QCBM 8q joint (100), retail customer mobile (157), RM internal (105), PG-miss professional (84), PG adversarial B-03 (3).
|
| 79 |
-
|
| 80 |
-
---
|
| 81 |
-
|
| 82 |
-
## Key Results
|
| 83 |
-
|
| 84 |
-
| Model | Size | HackaPrompt R | AgentHarm FPR | WildGuardMix F1 | Latency |
|
| 85 |
-
|---|---|---|---|---|---|
|
| 86 |
-
| **Semalith v1.5** | **184M** | **0.994** | **0.000** | **0.62** | **11.6ms** |
|
| 87 |
-
| LlamaGuard-3-8B | 8B | 0.941 | 0.063 | 0.58 | ~180ms |
|
| 88 |
-
| PromptGuard-86M | 86M | 0.981 | 0.126 | 0.41 | 8ms |
|
| 89 |
-
|
| 90 |
---
|
| 91 |
|
| 92 |
## Research
|
|
|
|
| 43 |
- B-10 Regulatory Enquiry Mishandling — EU AI Act Art. 52
|
| 44 |
- B-11 AML/Sanctions Evasion — FATF Recommendation 10
|
| 45 |
|
| 46 |
+
---
|
| 47 |
+
|
| 48 |
+
## Key Results
|
| 49 |
+
|
| 50 |
+
| Model | Size | HackaPrompt R | AgentHarm FPR | WildGuardMix F1 | Latency |
|
| 51 |
+
|---|---|---|---|---|---|
|
| 52 |
+
| **Semalith v1.5** | **184M** | **0.994** | **0.5%** | **0.303** | **11.6ms** |
|
| 53 |
+
| PromptGuard-86M | 86M | 1.000 | 96.9% | 0.095 | 8ms |
|
| 54 |
+
| LlamaGuard-3-8B | 8B | — | — | — | ~180ms |
|
| 55 |
+
|
| 56 |
+
**FPR (False Positive Rate)** measures how often a model incorrectly flags a legitimate, benign request as an attack. A model with high FPR cannot be deployed in production — it blocks real customers. PromptGuard's 96.9% FPR means it flags nearly every legitimate agentic task as an injection attack, making it a syntax detector rather than a safety classifier. Semalith's 0.5% FPR (1 false alarm in 208 agentic tasks) reflects domain-specific training that distinguishes attack intent from legitimate BFSI queries.
|
| 57 |
|
| 58 |
---
|
| 59 |
|
|
|
|
| 81 |
|
| 82 |
1,273 adversarial prompts generated via QCBM + simulated annealing targeting Semalith's decision boundary. Covers professional and retail registers. Overall Semalith miss rate: 14.3%.
|
| 83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
---
|
| 85 |
|
| 86 |
## Research
|