HomayShield π
CPU-Based AI Guardrail for Turkish & English Security Filtering
HomayShield is a lightweight CPU-based AI guardrail designed to detect malicious, adversarial, and suspicious prompts targeting AI systems.
Unlike LLM-based guardrails, HomayShield is optimized for CPU-only inference, making it practical for organizations operating in resource-constrained or on-prem environments.
Overview
HomayShield provides AI security filtering for:
- LLM applications
- Chatbots
- AI agents
- RAG systems
- Internal AI assistants
- Enterprise AI pipelines
Supported languages:
- Turkish πΉπ·
- English π¬π§
- Mixed Turkish-English prompts
Key Features
- β CPU-friendly inference
- β Shared encoder architecture
- β Low-latency detection
- β No GPU required in production
- β Semantic attack detection
- β Classifier-based attack detection
- β Hybrid decision engine
Architecture
HomayShield uses a shared encoder design:
Detection Strategy
HomayShield combines two detection mechanisms.
1. Semantic Detection
Incoming prompt embeddings are compared against known attack embeddings.
Detects:
- Prompt injection
- Jailbreak attacks
- Instruction override
- Adversarial prompts
- Semantic attack variants
2. Classifier Detection
Classifier predicts attack probability from embeddings.
Detects:
- Known attack patterns
- Learned malicious behaviors
- Structured attack prompts
Inference Modes
OR Logic
Attack if either semantic or classifier score exceeds threshold.
Best for:
- Security-first environments
- Low false negatives
Weighted Fusion
Weighted combination of semantic + classifier scores.
Best for:
- Balanced detection
- Tunable sensitivity
Single Signal
Use only:
- Semantic detection or
- Classifier detection
Best for:
- Benchmarking
- Lightweight deployments
Training
Training consists of two stages.
Stage 1 β Encoder Training
Loss: CosineEmbeddingLoss
Goal:
- Cluster similar attacks
- Separate benign and malicious prompts
Stage 2 β Classifier Training
Loss: BCEWithLogitsLoss
Outputs:
- Encoder weights
- Classifier weights
- Attack embedding bank
Training Data
HomayShield was trained using a multilingual dataset containing:
- Benign prompts
- Adversarial prompts
- Turkish prompts
- English prompts
- Mixed-language prompts
Attack categories include:
- Prompt injection
- Jailbreak
- Instruction override
- Prompt leakage
- Data exfiltration
- Tool abuse
- Code injection
Files
This repository contains:
homayshield_encoder.pthomayshield_classifier.pthomayshield_attack_bank.npy
Usage
Example:
Folder Structure
HomayShield/
β
βββ datasets/
β βββ token_level_adversarial_tr_v2.jsonl
β βββ token_level_adversarial_en_v2.jsonl
β βββ final_classifier_merged_all.jsonl
β
βββ output/
β βββ Homayv6/
β βββ homayshield_encoder.pt
β βββ homayshield_classifier.pt
β βββ homayshield_attack_bank.npy
β
βββ training2.py
βββ inference3.py
Training Command
python training2.py \
--train \
./datasets/token_level_adversarial_tr_v2.jsonl \
./datasets/token_level_adversarial_en_v2.jsonl \
./datasets/final_classifier_merged_all.jsonl \
--output-dir ./output/Homayv6
Output Files After Training
Training generates:
output/Homayv6/
βββ homayshield_encoder.pt
βββ homayshield_classifier.pt
βββ homayshield_attack_bank.npy
Inference Command
python inference.py
Inference loads:
homayshield_encoder.pthomayshield_classifier.pthomayshield_attack_bank.npy
from:
./output/Homayv6/
Inference modes:
- OR
- Fusion
- Semantic Only
- Classifier Only
Limitations
HomayShield is not intended to replace advanced LLM-based guardrails.
Compared to LLM guardrails:
Advantages:
- Lower infrastructure cost
- Faster CPU inference
- Easier deployment
Tradeoffs:
- Lower reasoning capability
- Less contextual understanding
- Reduced zero-day detection
Intended Use
Recommended for:
- Enterprise AI security
- SOC environments
- On-prem AI systems
- Air-gapped deployments
- CPU-only environments
Example Usage
Final Verdict (Attack Detection)
| Threshold | Attack Recall | Precision |
|---|---|---|
| 0.57 | 100% | 78.2% |
| 0.58 | 80.2% | ~100% |
| 0.59 | 38.6% | 100% |
Your guardrail is highly effective for attack detection, especially due to the semantic layer. Attack Detection Rating: Semantic Layer: 9.5/10 Classifier Layer: 7.5/10 Overall Attack Detection: 9/10
Philosophy
AI security should not be limited to organizations with GPU infrastructure.
Even lightweight CPU-based guardrails can provide meaningful protection for real-world AI systems.
Model tree for boying07/CPU-Based-AI-Guardrail
Base model
dbmdz/bert-base-turkish-128k-cased

