boying07's picture
Update README.md
c93a4ef verified
|
Raw
History Blame Contribute Delete
5.61 kB
---
language:
- tr
- en
base_model:
- dbmdz/bert-base-turkish-128k-cased
pipeline_tag: text-classification
tags:
- bert
- guardrail
---
# HomayShield πŸ”’
CPU-Based AI Guardrail for Turkish & English Security Filtering
HomayShield is a lightweight CPU-based AI guardrail designed to detect malicious, adversarial, and suspicious prompts targeting AI systems.
Unlike LLM-based guardrails, HomayShield is optimized for **CPU-only inference**, making it practical for organizations operating in resource-constrained or on-prem environments.
---
# Overview
HomayShield provides AI security filtering for:
* LLM applications
* Chatbots
* AI agents
* RAG systems
* Internal AI assistants
* Enterprise AI pipelines
Supported languages:
* Turkish πŸ‡ΉπŸ‡·
* English πŸ‡¬πŸ‡§
* Mixed Turkish-English prompts
---
# Key Features
* βœ… CPU-friendly inference
* βœ… Shared encoder architecture
* βœ… Low-latency detection
* βœ… No GPU required in production
* βœ… Semantic attack detection
* βœ… Classifier-based attack detection
* βœ… Hybrid decision engine
---
# Architecture
HomayShield uses a shared encoder design:
![Screenshot 2026-06-26 at 14.30.30](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/hqHqkwOQtg1lY0RTQdxxE.png)
# Detection Strategy
HomayShield combines two detection mechanisms.
## 1. Semantic Detection
Incoming prompt embeddings are compared against known attack embeddings.
Detects:
* Prompt injection
* Jailbreak attacks
* Instruction override
* Adversarial prompts
* Semantic attack variants
---
## 2. Classifier Detection
Classifier predicts attack probability from embeddings.
Detects:
* Known attack patterns
* Learned malicious behaviors
* Structured attack prompts
---
# Inference Modes
## OR Logic
Attack if either semantic or classifier score exceeds threshold.
Best for:
* Security-first environments
* Low false negatives
---
## Weighted Fusion
Weighted combination of semantic + classifier scores.
Best for:
* Balanced detection
* Tunable sensitivity
---
## Single Signal
Use only:
* Semantic detection
or
* Classifier detection
Best for:
* Benchmarking
* Lightweight deployments
---
# Training
Training consists of two stages.
## Stage 1 β€” Encoder Training
Loss:
CosineEmbeddingLoss
Goal:
* Cluster similar attacks
* Separate benign and malicious prompts
---
## Stage 2 β€” Classifier Training
Loss:
BCEWithLogitsLoss
Outputs:
* Encoder weights
* Classifier weights
* Attack embedding bank
---
# Training Data
HomayShield was trained using a multilingual dataset containing:
* Benign prompts
* Adversarial prompts
* Turkish prompts
* English prompts
* Mixed-language prompts
Attack categories include:
* Prompt injection
* Jailbreak
* Instruction override
* Prompt leakage
* Data exfiltration
* Tool abuse
* Code injection
---
# Files
This repository contains:
* `homayshield_encoder.pt`
* `homayshield_classifier.pt`
* `homayshield_attack_bank.npy`
---
# Usage
Example:
## Folder Structure
```text
HomayShield/
β”‚
β”œβ”€β”€ datasets/
β”‚ β”œβ”€β”€ token_level_adversarial_tr_v2.jsonl
β”‚ β”œβ”€β”€ token_level_adversarial_en_v2.jsonl
β”‚ └── final_classifier_merged_all.jsonl
β”‚
β”œβ”€β”€ output/
β”‚ └── Homayv6/
β”‚ β”œβ”€β”€ homayshield_encoder.pt
β”‚ β”œβ”€β”€ homayshield_classifier.pt
β”‚ └── homayshield_attack_bank.npy
β”‚
β”œβ”€β”€ training2.py
β”œβ”€β”€ inference3.py
```
---
## Training Command
```bash
python training2.py \
--train \
./datasets/token_level_adversarial_tr_v2.jsonl \
./datasets/token_level_adversarial_en_v2.jsonl \
./datasets/final_classifier_merged_all.jsonl \
--output-dir ./output/Homayv6
```
---
## Output Files After Training
Training generates:
```text
output/Homayv6/
β”œβ”€β”€ homayshield_encoder.pt
β”œβ”€β”€ homayshield_classifier.pt
└── homayshield_attack_bank.npy
```
---
## Inference Command
```bash
python inference.py
```
Inference loads:
* `homayshield_encoder.pt`
* `homayshield_classifier.pt`
* `homayshield_attack_bank.npy`
from:
```text
./output/Homayv6/
```
Inference modes:
* OR
* Fusion
* Semantic Only
* Classifier Only
---
# Limitations
HomayShield is not intended to replace advanced LLM-based guardrails.
Compared to LLM guardrails:
Advantages:
* Lower infrastructure cost
* Faster CPU inference
* Easier deployment
Tradeoffs:
* Lower reasoning capability
* Less contextual understanding
* Reduced zero-day detection
---
# Intended Use
Recommended for:
* Enterprise AI security
* SOC environments
* On-prem AI systems
* Air-gapped deployments
* CPU-only environments
# Example Usage
![Screenshot 2026-06-26 at 10.36.34](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/aydhqYnQOQfDmnQTw5Wpq.png)
---
# Final Verdict (Attack Detection)
| Threshold | Attack Recall | Precision |
| --------- | ------------: | --------: |
| 0.57 | 100% | 78.2% |
| 0.58 | 80.2% | ~100% |
| 0.59 | 38.6% | 100% |
Your guardrail is highly effective for attack detection, especially due to the semantic layer.
Attack Detection Rating:
Semantic Layer: 9.5/10
Classifier Layer: 7.5/10
Overall Attack Detection: 9/10
# Philosophy
> AI security should not be limited to organizations with GPU infrastructure.
Even lightweight CPU-based guardrails can provide meaningful protection for real-world AI systems.
![ChatGPT Image Jun 26, 2026 at 12_02_58 AM(2)](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/7Q144jZJTTxgIlNd_jTIu.png)