SreeRamaKrishna commited on
Commit
bbcf60f
·
verified ·
1 Parent(s): 4de6973

Correct Key Results table + add FPR explanation

Browse files
Files changed (1) hide show
  1. README.md +11 -17
README.md CHANGED
@@ -43,11 +43,17 @@ A 184M-parameter DeBERTa-v3-base guardrail classifier trained on 57,000+ real-wo
43
  - B-10 Regulatory Enquiry Mishandling — EU AI Act Art. 52
44
  - B-11 AML/Sanctions Evasion — FATF Recommendation 10
45
 
46
- **Results vs LlamaGuard-3-8B across 22 benchmarks:**
47
- - Wins all 7 prompt-injection benchmarks
48
- - 0% false positive rate on 208 agentic tasks (vs 6.3% for LlamaGuard-3-8B)
49
- - 11.6ms inference latency — 44× fewer parameters
50
- - Deployable as always-on inline guardrail without GPU infrastructure
 
 
 
 
 
 
51
 
52
  ---
53
 
@@ -75,18 +81,6 @@ Generated using **Quantum Circuit Born Machine (QCBM)** sampling on PennyLane
75
 
76
  1,273 adversarial prompts generated via QCBM + simulated annealing targeting Semalith's decision boundary. Covers professional and retail registers. Overall Semalith miss rate: 14.3%.
77
 
78
- Techniques: SA Annealing (344), QCBM 8q boundary (173), QCBM 8q gradient (125), 10q Paraphrase Fix1 (123), QCBM 8q joint (100), retail customer mobile (157), RM internal (105), PG-miss professional (84), PG adversarial B-03 (3).
79
-
80
- ---
81
-
82
- ## Key Results
83
-
84
- | Model | Size | HackaPrompt R | AgentHarm FPR | WildGuardMix F1 | Latency |
85
- |---|---|---|---|---|---|
86
- | **Semalith v1.5** | **184M** | **0.994** | **0.000** | **0.62** | **11.6ms** |
87
- | LlamaGuard-3-8B | 8B | 0.941 | 0.063 | 0.58 | ~180ms |
88
- | PromptGuard-86M | 86M | 0.981 | 0.126 | 0.41 | 8ms |
89
-
90
  ---
91
 
92
  ## Research
 
43
  - B-10 Regulatory Enquiry Mishandling — EU AI Act Art. 52
44
  - B-11 AML/Sanctions Evasion — FATF Recommendation 10
45
 
46
+ ---
47
+
48
+ ## Key Results
49
+
50
+ | Model | Size | HackaPrompt R | AgentHarm FPR | WildGuardMix F1 | Latency |
51
+ |---|---|---|---|---|---|
52
+ | **Semalith v1.5** | **184M** | **0.994** | **0.5%** | **0.303** | **11.6ms** |
53
+ | PromptGuard-86M | 86M | 1.000 | 96.9% | 0.095 | 8ms |
54
+ | LlamaGuard-3-8B | 8B | — | — | — | ~180ms |
55
+
56
+ **FPR (False Positive Rate)** measures how often a model incorrectly flags a legitimate, benign request as an attack. A model with high FPR cannot be deployed in production — it blocks real customers. PromptGuard's 96.9% FPR means it flags nearly every legitimate agentic task as an injection attack, making it a syntax detector rather than a safety classifier. Semalith's 0.5% FPR (1 false alarm in 208 agentic tasks) reflects domain-specific training that distinguishes attack intent from legitimate BFSI queries.
57
 
58
  ---
59
 
 
81
 
82
  1,273 adversarial prompts generated via QCBM + simulated annealing targeting Semalith's decision boundary. Covers professional and retail registers. Overall Semalith miss rate: 14.3%.
83
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  ---
85
 
86
  ## Research