boying07 commited on
Commit
ce0932c
·
verified ·
1 Parent(s): a938937

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +170 -183
README.md CHANGED
@@ -9,251 +9,238 @@ tags:
9
  - bert
10
  - guardrail
11
  ---
12
- HomayShield: CPU-Based AI Guardrail for Turkish & English Security Filtering
13
 
14
- HomayShield is a lightweight CPU-based AI guardrail system designed to detect malicious, adversarial, and suspicious inputs targeting AI systems.
15
 
16
- The project focuses on providing practical AI security for organizations that cannot deploy GPU-heavy guardrail solutions.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  Supported languages:
19
 
20
- Turkish
21
- English
22
- Mixed Turkish-English prompts
23
- Why HomayShield?
24
 
25
- As AI adoption grows, organizations increasingly deploy:
26
 
27
- LLM applications
28
- Chatbots
29
- AI agents
30
- RAG systems
31
- Internal AI assistants
32
- Web-integrated AI pipelines
33
- These systems introduce new attack surfaces.
34
 
35
- Examples include:
 
 
 
 
 
 
36
 
37
- Prompt injection
38
- Jailbreak attacks
39
- Instruction override
40
- Data exfiltration
41
- Tool abuse
42
- Indirect prompt injection
43
- Modern guardrails often rely on LLM-based security analysis.
44
 
45
- These systems are powerful, but they introduce major operational challenges:
46
 
47
- High infrastructure cost
48
- GPU dependency
49
- High inference latency
50
- Expensive scaling
51
- Complex deployment
52
- Many small and mid-sized organizations cannot afford dedicated GPU infrastructure for security layers.
53
 
54
- This creates a major security gap.
55
 
56
- Project Goal
 
57
 
58
- HomayShield aims to provide a practical alternative.
59
 
60
- Main objectives:
61
 
62
- CPU-based inference
63
- Low latency
64
- No GPU requirement in production
65
- Easy enterprise deployment
66
- Lower operational cost
67
- Strong baseline AI security
68
- HomayShield is designed for:
69
 
70
- SOC environments
71
- Enterprise AI systems
72
- Air-gapped systems
73
- On-prem deployments
74
- CPU-only environments
75
- Important Note
76
 
77
- HomayShield is not intended to replace LLM-based guardrails.
78
 
79
- LLM guardrails typically provide:
 
 
 
 
80
 
81
- deeper reasoning
82
- better contextual understanding
83
- stronger zero-day detection
84
- more adaptive behavior
85
- In most scenarios, LLM-based guardrails are more powerful.
86
 
87
- However, HomayShield offers an important tradeoff:
88
 
89
- lower detection capability than advanced LLM guardrails
90
- significantly lower infrastructure cost
91
- much easier deployment
92
- much faster CPU inference
93
- For many organizations, deployability matters.
94
 
95
- A CPU-based guardrail is better than having no guardrail.
96
- Core Architecture
97
 
98
- HomayShield is built around one key principle:
 
 
99
 
100
- Run encoder once. Use output twice.
101
- A single shared encoder generates embeddings used by both:
102
 
103
- Semantic similarity detection
104
- Classifier prediction
105
- Architecture:
106
- ![Screenshot 2026-06-26 at 14.30.30](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/hqHqkwOQtg1lY0RTQdxxE.png)
107
 
108
- Why Shared Encoder?
109
 
110
- Traditional guardrail systems may run:
111
 
112
- Language model
113
- Embedding model
114
- Classifier model
115
- Policy model
116
- This increases:
117
 
118
- CPU/GPU utilization
119
- latency
120
- memory consumption
121
- infrastructure complexity
122
- HomayShield avoids this by sharing the encoder.
123
 
124
- Advantages:
125
 
126
- Lower CPU usage
127
- Faster inference
128
- Lower memory footprint
129
- Better scalability
130
- Consistent semantic representation
131
- Supported Languages
132
 
133
- Current supported languages:
134
 
135
- Turkish (tr)
136
- English (en)
137
- Inference begins with language detection.
138
 
139
- If input language is unsupported:
 
140
 
141
- Reject or
142
- Skip evaluation
143
- Detection Strategy
144
 
145
- HomayShield combines two detection mechanisms.
146
 
147
- 1) Semantic Detection
148
 
149
- Semantic similarity compares incoming prompt embeddings against known malicious attack embeddings.
 
 
150
 
151
- Useful for detecting:
152
 
153
- similar attacks
154
- prompt injection variants
155
- jailbreak attempts
156
- semantic anomalies
157
- adversarial patterns
158
- 2) Classifier Detection
159
 
160
- Classifier predicts attack probability using shared embeddings.
161
 
162
- Useful for detecting:
163
 
164
- known attack patterns
165
- learned malicious behavior
166
- structured adversarial prompts
167
- Inference Modes
168
 
169
- HomayShield supports 3 inference strategies.
170
 
171
- Option 1 — OR Logic
 
172
 
173
- Security-first mode.
174
 
175
- if semantic_score >= semantic_threshold or classifier_score >= classifier_threshold:
176
- ATTACK
177
- else:
178
- NORMAL
179
- Best for:
180
 
181
- strict environments
182
- low false negatives
183
- Option 2 — Weighted Fusion
184
 
185
- Balanced mode.
186
 
187
- fusion_score = semantic_weight * semantic_score + classifier_weight * classifier_score
188
- Best for:
189
 
190
- balanced security
191
- tunable sensitivity
192
- Option 3 — Single Signal
193
 
194
- Choose one:
 
 
195
 
196
- semantic only
197
- classifier only
198
- Useful for benchmarking or lightweight deployments.
199
 
200
- Training Pipeline
201
 
202
- Training consists of 2 stages.
203
 
204
- Stage 1 — Encoder Training
 
 
 
 
205
 
206
- The encoder is trained using similarity learning.
207
 
208
- Goal:
 
 
 
 
 
 
209
 
210
- similar attacks cluster together
211
- similar normal prompts cluster together
212
- attacks and normal prompts separate clearly
213
- Loss:
214
 
215
- CosineEmbeddingLoss
216
- Stage 2 — Classifier Training
217
 
218
- After encoder training:
219
 
220
- embeddings are extracted
221
- classifier head is trained on embeddings
222
- Loss:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
223
 
224
- BCEWithLogitsLoss
225
- Outputs:
226
 
227
- trained encoder
228
- trained classifier
229
- attack embedding bank
230
- normal embedding bank
231
- Training Command
232
-
233
- python training_final.py \
234
- --train /home/asimyil/train.jsonl \
235
- --output-dir /home/asimyil/HomayShield_v5
236
- Training Dataset
237
-
238
- HomayShield was trained using a large dataset of:
239
-
240
- benign prompts
241
- adversarial prompts
242
- Turkish prompts
243
- English prompts
244
- mixed-language prompts
245
- Dataset includes attack categories such as:
246
-
247
- Direct prompt injection
248
- Jailbreak attacks
249
- Instruction override
250
- Prompt leakage
251
- Data exfiltration
252
- Obfuscation attacks
253
- Multi-turn attacks
254
- Roleplay attacks
255
- Tool abuse
256
- Code injection
257
- Long context attacks
258
- Hard negative samples
259
- This helps improve detection robustness in real-world enterprise environments.
 
9
  - bert
10
  - guardrail
11
  ---
12
+ # HomayShield v6 🔒
13
 
14
+ CPU-Based AI Guardrail for Turkish & English Security Filtering
15
 
16
+ HomayShield is a lightweight CPU-based AI guardrail designed to detect malicious, adversarial, and suspicious prompts targeting AI systems.
17
+
18
+ Unlike LLM-based guardrails, HomayShield is optimized for **CPU-only inference**, making it practical for organizations operating in resource-constrained or on-prem environments.
19
+
20
+ ---
21
+
22
+ # Overview
23
+
24
+ HomayShield provides AI security filtering for:
25
+
26
+ * LLM applications
27
+ * Chatbots
28
+ * AI agents
29
+ * RAG systems
30
+ * Internal AI assistants
31
+ * Enterprise AI pipelines
32
 
33
  Supported languages:
34
 
35
+ * Turkish 🇹🇷
36
+ * English 🇬🇧
37
+ * Mixed Turkish-English prompts
 
38
 
39
+ ---
40
 
41
+ # Key Features
 
 
 
 
 
 
42
 
43
+ * ✅ CPU-friendly inference
44
+ * ✅ Shared encoder architecture
45
+ * ✅ Low-latency detection
46
+ * ✅ No GPU required in production
47
+ * ✅ Semantic attack detection
48
+ * ✅ Classifier-based attack detection
49
+ * ✅ Hybrid decision engine
50
 
51
+ ---
 
 
 
 
 
 
52
 
53
+ # Architecture
54
 
55
+ HomayShield uses a shared encoder design:
 
 
 
 
 
56
 
 
57
 
58
+ Architecture:
59
+ ![Screenshot 2026-06-26 at 14.30.30](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/hqHqkwOQtg1lY0RTQdxxE.png)
60
 
61
+ # Detection Strategy
62
 
63
+ HomayShield combines two detection mechanisms.
64
 
65
+ ## 1. Semantic Detection
 
 
 
 
 
 
66
 
67
+ Incoming prompt embeddings are compared against known attack embeddings.
 
 
 
 
 
68
 
69
+ Detects:
70
 
71
+ * Prompt injection
72
+ * Jailbreak attacks
73
+ * Instruction override
74
+ * Adversarial prompts
75
+ * Semantic attack variants
76
 
77
+ ---
 
 
 
 
78
 
79
+ ## 2. Classifier Detection
80
 
81
+ Classifier predicts attack probability from embeddings.
 
 
 
 
82
 
83
+ Detects:
 
84
 
85
+ * Known attack patterns
86
+ * Learned malicious behaviors
87
+ * Structured attack prompts
88
 
89
+ ---
 
90
 
91
+ # Inference Modes
 
 
 
92
 
93
+ ## OR Logic
94
 
95
+ Attack if either semantic or classifier score exceeds threshold.
96
 
97
+ Best for:
 
 
 
 
98
 
99
+ * Security-first environments
100
+ * Low false negatives
 
 
 
101
 
102
+ ---
103
 
104
+ ## Weighted Fusion
 
 
 
 
 
105
 
106
+ Weighted combination of semantic + classifier scores.
107
 
108
+ Best for:
 
 
109
 
110
+ * Balanced detection
111
+ * Tunable sensitivity
112
 
113
+ ---
 
 
114
 
115
+ ## Single Signal
116
 
117
+ Use only:
118
 
119
+ * Semantic detection
120
+ or
121
+ * Classifier detection
122
 
123
+ Best for:
124
 
125
+ * Benchmarking
126
+ * Lightweight deployments
 
 
 
 
127
 
128
+ ---
129
 
130
+ # Training
131
 
132
+ Training consists of two stages.
 
 
 
133
 
134
+ ## Stage 1 Encoder Training
135
 
136
+ Loss:
137
+ CosineEmbeddingLoss
138
 
139
+ Goal:
140
 
141
+ * Cluster similar attacks
142
+ * Separate benign and malicious prompts
 
 
 
143
 
144
+ ---
 
 
145
 
146
+ ## Stage 2 — Classifier Training
147
 
148
+ Loss:
149
+ BCEWithLogitsLoss
150
 
151
+ Outputs:
 
 
152
 
153
+ * Encoder weights
154
+ * Classifier weights
155
+ * Attack embedding bank
156
 
157
+ ---
 
 
158
 
159
+ # Training Data
160
 
161
+ HomayShield was trained using a multilingual dataset containing:
162
 
163
+ * Benign prompts
164
+ * Adversarial prompts
165
+ * Turkish prompts
166
+ * English prompts
167
+ * Mixed-language prompts
168
 
169
+ Attack categories include:
170
 
171
+ * Prompt injection
172
+ * Jailbreak
173
+ * Instruction override
174
+ * Prompt leakage
175
+ * Data exfiltration
176
+ * Tool abuse
177
+ * Code injection
178
 
179
+ ---
 
 
 
180
 
181
+ # Files
 
182
 
183
+ This repository contains:
184
 
185
+ * `homayshield_encoder.pt`
186
+ * `homayshield_classifier.pt`
187
+ * `homayshield_attack_bank.npy`
188
+
189
+ ---
190
+
191
+ # Usage
192
+
193
+ Example:
194
+
195
+ ```python
196
+ python inference2.py
197
+ ```
198
+
199
+ Inference modes:
200
+
201
+ * OR
202
+ * Fusion
203
+ * Semantic Only
204
+ * Classifier Only
205
+
206
+ ---
207
+
208
+ # Limitations
209
+
210
+ HomayShield is not intended to replace advanced LLM-based guardrails.
211
+
212
+ Compared to LLM guardrails:
213
+
214
+ Advantages:
215
+
216
+ * Lower infrastructure cost
217
+ * Faster CPU inference
218
+ * Easier deployment
219
+
220
+ Tradeoffs:
221
+
222
+ * Lower reasoning capability
223
+ * Less contextual understanding
224
+ * Reduced zero-day detection
225
+
226
+ ---
227
+
228
+ # Intended Use
229
+
230
+ Recommended for:
231
+
232
+ * Enterprise AI security
233
+ * SOC environments
234
+ * On-prem AI systems
235
+ * Air-gapped deployments
236
+ * CPU-only environments
237
+
238
+ ---
239
+
240
+ # Philosophy
241
+
242
+ > AI security should not be limited to organizations with GPU infrastructure.
243
+
244
+ Even lightweight CPU-based guardrails can provide meaningful protection for real-world AI systems.
245
 
 
 
246