SreeRamaKrishna commited on
Commit
4de6973
Β·
verified Β·
1 Parent(s): 09b37fe

Expand all 9 PI attack types + 11 BFSI categories in coverage

Browse files
Files changed (1) hide show
  1. README.md +42 -20
README.md CHANGED
@@ -11,6 +11,7 @@ pinned: true
11
 
12
  **Zytra** builds domain-specific AI safety infrastructure for banking, financial services, and insurance (BFSI). We publish open models, benchmarks, and evaluation tooling purpose-built for regulated financial environments.
13
 
 
14
 
15
  ## Models
16
 
@@ -19,9 +20,28 @@ pinned: true
19
  A 184M-parameter DeBERTa-v3-base guardrail classifier trained on 57,000+ real-world prompts.
20
 
21
  **Coverage:**
22
- - 9 prompt-injection attack types (system override, extraction, jailbreak, indirect injection, social engineering…)
23
- - 11 BFSI compliance categories: investment advice, KYC/AML bypass, regulatory misrepresentation, document hallucination, consent & data rights, transaction integrity, account bypass, fraud, AML/sanctions, unlicensed advice, regulatory enquiry
24
- - Regulatory anchors: MiFID II, PSD2, FATF Recommendations, EU AI Act Art. 52, DPDP Act 2023, RBI Master Directions, SEBI IA Regulations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  **Results vs LlamaGuard-3-8B across 22 benchmarks:**
27
  - Wins all 7 prompt-injection benchmarks
@@ -29,13 +49,13 @@ A 184M-parameter DeBERTa-v3-base guardrail classifier trained on 57,000+ real-wo
29
  - 11.6ms inference latency β€” 44Γ— fewer parameters
30
  - Deployable as always-on inline guardrail without GPU infrastructure
31
 
32
-
33
 
34
  ## Benchmarks
35
 
36
- ### FinProof v1 β€” BFSI Adversarial Benchmark *(coming soon)*
37
 
38
- 5,389-prompt adversarial benchmark covering 7 attack categories across three deployment registers:
39
 
40
  | Register | Description | Prompts |
41
  |---|---|---|
@@ -47,36 +67,38 @@ Generated using **Quantum Circuit Born Machine (QCBM)** sampling on PennyLane
47
 
48
  | Tier | Prompts | Access |
49
  |---|---|---|
50
- | Easy attacks | 1,606 | Email registration |
51
- | Medium attacks (QCBM-generated) | 2,036 | Research agreement |
52
  | Hard attacks β€” official test set | 1,747 | Zytra-evaluated only |
53
 
54
  ### ASSAY-QI v2.0 β€” Quantum-Augmented Attack Suite
55
 
56
- 1,273 adversarial prompts via QCBM + simulated annealing. Professional and retail registers. Semalith miss rate: 14.3%.
57
 
 
58
 
 
59
 
60
  ## Key Results
61
 
62
- | Model | Size | HackaPrompt R | AgentHarm FPR | Latency |
63
- |---|---|---|---|---|
64
- | **Semalith v1.5** | **184M** | **0.994** | **0.000** | **11.6ms** |
65
- | LlamaGuard-3-8B | 8B | 0.941 | 0.063 | ~180ms |
66
- | PromptGuard-86M | 86M | 0.981 | 0.126 | 8ms |
67
-
68
 
 
69
 
70
  ## Research
71
 
72
- - **Paper**: *Semalith: A Regulatory-Aware Safety Classifier for AI-Assisted Financial Services*
73
- - **QCBM augmentation**: Quantum-inspired distribution sampling for adversarial test case generation
74
- - **FinProof framework**: PINT-inspired four-tier release with withheld official test set
75
-
76
 
 
77
 
78
  ## Contact
79
 
80
  - 🌐 [zytratechnologies.com](http://zytratechnologies.com)
81
  - 🏒 India Β· BFSI-focused AI safety
82
- - πŸ’¬ For benchmark access and enterprise licensing: reach out via the organisation page
 
11
 
12
  **Zytra** builds domain-specific AI safety infrastructure for banking, financial services, and insurance (BFSI). We publish open models, benchmarks, and evaluation tooling purpose-built for regulated financial environments.
13
 
14
+ ---
15
 
16
  ## Models
17
 
 
20
  A 184M-parameter DeBERTa-v3-base guardrail classifier trained on 57,000+ real-world prompts.
21
 
22
  **Coverage:**
23
+ - **9 prompt-injection attack types:**
24
+ - System Override (D1) β€” direct instruction hijack, role reassignment, prompt delimiter attacks
25
+ - Extraction (D1) β€” password/secret extraction, system prompt leakage, context exfiltration
26
+ - Jailbreak (D1) β€” DAN, developer mode, policy bypass via persona
27
+ - Narrative Frame (D1) β€” roleplay, fiction, hypothetical framing to bypass refusals
28
+ - Authority Claim (D1) β€” impersonating admins, developers, or system roles to elevate privilege
29
+ - Social Engineering (D1) β€” pretext, urgency, emotional manipulation to lower guardrails
30
+ - Evasion (D5) β€” obfuscation, encoding, typo injection, token splitting to evade detection
31
+ - Agentic Injection (D6) β€” tool-call hijacking, memory poisoning, multi-agent prompt injection
32
+ - Indirect Injection (D7) β€” attacks embedded in retrieved documents, emails, or web content
33
+ - **11 BFSI compliance categories:**
34
+ - B-01 Investment Advice Elicitation β€” SEBI IA Regulations 2013 Β§3
35
+ - B-02 KYC/AML Bypass β€” RBI Master Directions KYC
36
+ - B-03 Regulatory Misrepresentation β€” SEBI FPI Regulations + RBI circulars
37
+ - B-04 Regulatory Document Hallucination β€” EU AI Act Art. 9(4)
38
+ - B-05 Consent & Data Rights Violations β€” DPDP Act 2023
39
+ - B-06 Transaction Integrity Violations β€” RBI NACH/NEFT Frameworks
40
+ - B-07 Account/Document Authenticity Bypass β€” RBI Digital Banking Security
41
+ - B-08 Fraud & Scam Facilitation β€” FCA SYSC 6.1
42
+ - B-09 Unlicensed Financial Advice β€” SEC IA Act Β§202(a)(11)
43
+ - B-10 Regulatory Enquiry Mishandling β€” EU AI Act Art. 52
44
+ - B-11 AML/Sanctions Evasion β€” FATF Recommendation 10
45
 
46
  **Results vs LlamaGuard-3-8B across 22 benchmarks:**
47
  - Wins all 7 prompt-injection benchmarks
 
49
  - 11.6ms inference latency β€” 44Γ— fewer parameters
50
  - Deployable as always-on inline guardrail without GPU infrastructure
51
 
52
+ ---
53
 
54
  ## Benchmarks
55
 
56
+ ### [FinProof v1](https://huggingface.co/datasets/Zytra/finproof-bench) β€” BFSI Adversarial Benchmark
57
 
58
+ 5,389-prompt adversarial benchmark covering 7 attack categories (B-01 through B-07) across three deployment registers:
59
 
60
  | Register | Description | Prompts |
61
  |---|---|---|
 
67
 
68
  | Tier | Prompts | Access |
69
  |---|---|---|
70
+ | Easy attacks | 1,606 | [Public β€” no registration](https://huggingface.co/datasets/Zytra/finproof-bench) |
71
+ | Medium attacks (QCBM-generated) | 2,036 | [Research agreement](https://huggingface.co/datasets/Zytra/finproof-research) |
72
  | Hard attacks β€” official test set | 1,747 | Zytra-evaluated only |
73
 
74
  ### ASSAY-QI v2.0 β€” Quantum-Augmented Attack Suite
75
 
76
+ 1,273 adversarial prompts generated via QCBM + simulated annealing targeting Semalith's decision boundary. Covers professional and retail registers. Overall Semalith miss rate: 14.3%.
77
 
78
+ Techniques: SA Annealing (344), QCBM 8q boundary (173), QCBM 8q gradient (125), 10q Paraphrase Fix1 (123), QCBM 8q joint (100), retail customer mobile (157), RM internal (105), PG-miss professional (84), PG adversarial B-03 (3).
79
 
80
+ ---
81
 
82
  ## Key Results
83
 
84
+ | Model | Size | HackaPrompt R | AgentHarm FPR | WildGuardMix F1 | Latency |
85
+ |---|---|---|---|---|---|
86
+ | **Semalith v1.5** | **184M** | **0.994** | **0.000** | **0.62** | **11.6ms** |
87
+ | LlamaGuard-3-8B | 8B | 0.941 | 0.063 | 0.58 | ~180ms |
88
+ | PromptGuard-86M | 86M | 0.981 | 0.126 | 0.41 | 8ms |
 
89
 
90
+ ---
91
 
92
  ## Research
93
 
94
+ - **Paper**: *Semalith: A Regulatory-Aware Safety Classifier for AI-Assisted Financial Services* β€” DeBERTa-v3 + BFSI taxonomy + 22-benchmark evaluation
95
+ - **QCBM augmentation**: Quantum-inspired distribution sampling for adversarial test case generation in underrepresented BFSI attack categories
96
+ - **FinProof framework**: PINT-inspired four-tier release β€” public taxonomy, email-gated easy examples, research-agreement medium examples, withheld hard test set
 
97
 
98
+ ---
99
 
100
  ## Contact
101
 
102
  - 🌐 [zytratechnologies.com](http://zytratechnologies.com)
103
  - 🏒 India Β· BFSI-focused AI safety
104
+ - πŸ’¬ For benchmark access and Semalith enterprise licensing: reach out via the organisation page