Expand all 9 PI attack types + 11 BFSI categories in coverage
Browse files
README.md
CHANGED
|
@@ -11,6 +11,7 @@ pinned: true
|
|
| 11 |
|
| 12 |
**Zytra** builds domain-specific AI safety infrastructure for banking, financial services, and insurance (BFSI). We publish open models, benchmarks, and evaluation tooling purpose-built for regulated financial environments.
|
| 13 |
|
|
|
|
| 14 |
|
| 15 |
## Models
|
| 16 |
|
|
@@ -19,9 +20,28 @@ pinned: true
|
|
| 19 |
A 184M-parameter DeBERTa-v3-base guardrail classifier trained on 57,000+ real-world prompts.
|
| 20 |
|
| 21 |
**Coverage:**
|
| 22 |
-
- 9 prompt-injection attack types
|
| 23 |
-
-
|
| 24 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
**Results vs LlamaGuard-3-8B across 22 benchmarks:**
|
| 27 |
- Wins all 7 prompt-injection benchmarks
|
|
@@ -29,13 +49,13 @@ A 184M-parameter DeBERTa-v3-base guardrail classifier trained on 57,000+ real-wo
|
|
| 29 |
- 11.6ms inference latency β 44Γ fewer parameters
|
| 30 |
- Deployable as always-on inline guardrail without GPU infrastructure
|
| 31 |
|
| 32 |
-
|
| 33 |
|
| 34 |
## Benchmarks
|
| 35 |
|
| 36 |
-
### FinProof v1 β BFSI Adversarial Benchmark
|
| 37 |
|
| 38 |
-
5,389-prompt adversarial benchmark covering 7 attack categories across three deployment registers:
|
| 39 |
|
| 40 |
| Register | Description | Prompts |
|
| 41 |
|---|---|---|
|
|
@@ -47,36 +67,38 @@ Generated using **Quantum Circuit Born Machine (QCBM)** sampling on PennyLane
|
|
| 47 |
|
| 48 |
| Tier | Prompts | Access |
|
| 49 |
|---|---|---|
|
| 50 |
-
| Easy attacks | 1,606 |
|
| 51 |
-
| Medium attacks (QCBM-generated) | 2,036 | Research agreement |
|
| 52 |
| Hard attacks β official test set | 1,747 | Zytra-evaluated only |
|
| 53 |
|
| 54 |
### ASSAY-QI v2.0 β Quantum-Augmented Attack Suite
|
| 55 |
|
| 56 |
-
1,273 adversarial prompts via QCBM + simulated annealing.
|
| 57 |
|
|
|
|
| 58 |
|
|
|
|
| 59 |
|
| 60 |
## Key Results
|
| 61 |
|
| 62 |
-
| Model | Size | HackaPrompt R | AgentHarm FPR | Latency |
|
| 63 |
-
|---|---|---|---|---|
|
| 64 |
-
| **Semalith v1.5** | **184M** | **0.994** | **0.000** | **11.6ms** |
|
| 65 |
-
| LlamaGuard-3-8B | 8B | 0.941 | 0.063 | ~180ms |
|
| 66 |
-
| PromptGuard-86M | 86M | 0.981 | 0.126 | 8ms |
|
| 67 |
-
|
| 68 |
|
|
|
|
| 69 |
|
| 70 |
## Research
|
| 71 |
|
| 72 |
-
- **Paper**: *Semalith: A Regulatory-Aware Safety Classifier for AI-Assisted Financial Services*
|
| 73 |
-
- **QCBM augmentation**: Quantum-inspired distribution sampling for adversarial test case generation
|
| 74 |
-
- **FinProof framework**: PINT-inspired four-tier release
|
| 75 |
-
|
| 76 |
|
|
|
|
| 77 |
|
| 78 |
## Contact
|
| 79 |
|
| 80 |
- π [zytratechnologies.com](http://zytratechnologies.com)
|
| 81 |
- π’ India Β· BFSI-focused AI safety
|
| 82 |
-
- π¬ For benchmark access and enterprise licensing: reach out via the organisation page
|
|
|
|
| 11 |
|
| 12 |
**Zytra** builds domain-specific AI safety infrastructure for banking, financial services, and insurance (BFSI). We publish open models, benchmarks, and evaluation tooling purpose-built for regulated financial environments.
|
| 13 |
|
| 14 |
+
---
|
| 15 |
|
| 16 |
## Models
|
| 17 |
|
|
|
|
| 20 |
A 184M-parameter DeBERTa-v3-base guardrail classifier trained on 57,000+ real-world prompts.
|
| 21 |
|
| 22 |
**Coverage:**
|
| 23 |
+
- **9 prompt-injection attack types:**
|
| 24 |
+
- System Override (D1) β direct instruction hijack, role reassignment, prompt delimiter attacks
|
| 25 |
+
- Extraction (D1) β password/secret extraction, system prompt leakage, context exfiltration
|
| 26 |
+
- Jailbreak (D1) β DAN, developer mode, policy bypass via persona
|
| 27 |
+
- Narrative Frame (D1) β roleplay, fiction, hypothetical framing to bypass refusals
|
| 28 |
+
- Authority Claim (D1) β impersonating admins, developers, or system roles to elevate privilege
|
| 29 |
+
- Social Engineering (D1) β pretext, urgency, emotional manipulation to lower guardrails
|
| 30 |
+
- Evasion (D5) β obfuscation, encoding, typo injection, token splitting to evade detection
|
| 31 |
+
- Agentic Injection (D6) β tool-call hijacking, memory poisoning, multi-agent prompt injection
|
| 32 |
+
- Indirect Injection (D7) β attacks embedded in retrieved documents, emails, or web content
|
| 33 |
+
- **11 BFSI compliance categories:**
|
| 34 |
+
- B-01 Investment Advice Elicitation β SEBI IA Regulations 2013 Β§3
|
| 35 |
+
- B-02 KYC/AML Bypass β RBI Master Directions KYC
|
| 36 |
+
- B-03 Regulatory Misrepresentation β SEBI FPI Regulations + RBI circulars
|
| 37 |
+
- B-04 Regulatory Document Hallucination β EU AI Act Art. 9(4)
|
| 38 |
+
- B-05 Consent & Data Rights Violations β DPDP Act 2023
|
| 39 |
+
- B-06 Transaction Integrity Violations β RBI NACH/NEFT Frameworks
|
| 40 |
+
- B-07 Account/Document Authenticity Bypass β RBI Digital Banking Security
|
| 41 |
+
- B-08 Fraud & Scam Facilitation β FCA SYSC 6.1
|
| 42 |
+
- B-09 Unlicensed Financial Advice β SEC IA Act Β§202(a)(11)
|
| 43 |
+
- B-10 Regulatory Enquiry Mishandling β EU AI Act Art. 52
|
| 44 |
+
- B-11 AML/Sanctions Evasion β FATF Recommendation 10
|
| 45 |
|
| 46 |
**Results vs LlamaGuard-3-8B across 22 benchmarks:**
|
| 47 |
- Wins all 7 prompt-injection benchmarks
|
|
|
|
| 49 |
- 11.6ms inference latency β 44Γ fewer parameters
|
| 50 |
- Deployable as always-on inline guardrail without GPU infrastructure
|
| 51 |
|
| 52 |
+
---
|
| 53 |
|
| 54 |
## Benchmarks
|
| 55 |
|
| 56 |
+
### [FinProof v1](https://huggingface.co/datasets/Zytra/finproof-bench) β BFSI Adversarial Benchmark
|
| 57 |
|
| 58 |
+
5,389-prompt adversarial benchmark covering 7 attack categories (B-01 through B-07) across three deployment registers:
|
| 59 |
|
| 60 |
| Register | Description | Prompts |
|
| 61 |
|---|---|---|
|
|
|
|
| 67 |
|
| 68 |
| Tier | Prompts | Access |
|
| 69 |
|---|---|---|
|
| 70 |
+
| Easy attacks | 1,606 | [Public β no registration](https://huggingface.co/datasets/Zytra/finproof-bench) |
|
| 71 |
+
| Medium attacks (QCBM-generated) | 2,036 | [Research agreement](https://huggingface.co/datasets/Zytra/finproof-research) |
|
| 72 |
| Hard attacks β official test set | 1,747 | Zytra-evaluated only |
|
| 73 |
|
| 74 |
### ASSAY-QI v2.0 β Quantum-Augmented Attack Suite
|
| 75 |
|
| 76 |
+
1,273 adversarial prompts generated via QCBM + simulated annealing targeting Semalith's decision boundary. Covers professional and retail registers. Overall Semalith miss rate: 14.3%.
|
| 77 |
|
| 78 |
+
Techniques: SA Annealing (344), QCBM 8q boundary (173), QCBM 8q gradient (125), 10q Paraphrase Fix1 (123), QCBM 8q joint (100), retail customer mobile (157), RM internal (105), PG-miss professional (84), PG adversarial B-03 (3).
|
| 79 |
|
| 80 |
+
---
|
| 81 |
|
| 82 |
## Key Results
|
| 83 |
|
| 84 |
+
| Model | Size | HackaPrompt R | AgentHarm FPR | WildGuardMix F1 | Latency |
|
| 85 |
+
|---|---|---|---|---|---|
|
| 86 |
+
| **Semalith v1.5** | **184M** | **0.994** | **0.000** | **0.62** | **11.6ms** |
|
| 87 |
+
| LlamaGuard-3-8B | 8B | 0.941 | 0.063 | 0.58 | ~180ms |
|
| 88 |
+
| PromptGuard-86M | 86M | 0.981 | 0.126 | 0.41 | 8ms |
|
|
|
|
| 89 |
|
| 90 |
+
---
|
| 91 |
|
| 92 |
## Research
|
| 93 |
|
| 94 |
+
- **Paper**: *Semalith: A Regulatory-Aware Safety Classifier for AI-Assisted Financial Services* β DeBERTa-v3 + BFSI taxonomy + 22-benchmark evaluation
|
| 95 |
+
- **QCBM augmentation**: Quantum-inspired distribution sampling for adversarial test case generation in underrepresented BFSI attack categories
|
| 96 |
+
- **FinProof framework**: PINT-inspired four-tier release β public taxonomy, email-gated easy examples, research-agreement medium examples, withheld hard test set
|
|
|
|
| 97 |
|
| 98 |
+
---
|
| 99 |
|
| 100 |
## Contact
|
| 101 |
|
| 102 |
- π [zytratechnologies.com](http://zytratechnologies.com)
|
| 103 |
- π’ India Β· BFSI-focused AI safety
|
| 104 |
+
- π¬ For benchmark access and Semalith enterprise licensing: reach out via the organisation page
|