--- language: - tr - en base_model: - dbmdz/bert-base-turkish-128k-cased pipeline_tag: text-classification tags: - bert - guardrail --- # HomayShield 🔒 CPU-Based AI Guardrail for Turkish & English Security Filtering HomayShield is a lightweight CPU-based AI guardrail designed to detect malicious, adversarial, and suspicious prompts targeting AI systems. Unlike LLM-based guardrails, HomayShield is optimized for **CPU-only inference**, making it practical for organizations operating in resource-constrained or on-prem environments. --- # Overview HomayShield provides AI security filtering for: * LLM applications * Chatbots * AI agents * RAG systems * Internal AI assistants * Enterprise AI pipelines Supported languages: * Turkish 🇹🇷 * English 🇬🇧 * Mixed Turkish-English prompts --- # Key Features * ✅ CPU-friendly inference * ✅ Shared encoder architecture * ✅ Low-latency detection * ✅ No GPU required in production * ✅ Semantic attack detection * ✅ Classifier-based attack detection * ✅ Hybrid decision engine --- # Architecture HomayShield uses a shared encoder design: ![Screenshot 2026-06-26 at 14.30.30](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/hqHqkwOQtg1lY0RTQdxxE.png) # Detection Strategy HomayShield combines two detection mechanisms. ## 1. Semantic Detection Incoming prompt embeddings are compared against known attack embeddings. Detects: * Prompt injection * Jailbreak attacks * Instruction override * Adversarial prompts * Semantic attack variants --- ## 2. Classifier Detection Classifier predicts attack probability from embeddings. Detects: * Known attack patterns * Learned malicious behaviors * Structured attack prompts --- # Inference Modes ## OR Logic Attack if either semantic or classifier score exceeds threshold. Best for: * Security-first environments * Low false negatives --- ## Weighted Fusion Weighted combination of semantic + classifier scores. Best for: * Balanced detection * Tunable sensitivity --- ## Single Signal Use only: * Semantic detection or * Classifier detection Best for: * Benchmarking * Lightweight deployments --- # Training Training consists of two stages. ## Stage 1 — Encoder Training Loss: CosineEmbeddingLoss Goal: * Cluster similar attacks * Separate benign and malicious prompts --- ## Stage 2 — Classifier Training Loss: BCEWithLogitsLoss Outputs: * Encoder weights * Classifier weights * Attack embedding bank --- # Training Data HomayShield was trained using a multilingual dataset containing: * Benign prompts * Adversarial prompts * Turkish prompts * English prompts * Mixed-language prompts Attack categories include: * Prompt injection * Jailbreak * Instruction override * Prompt leakage * Data exfiltration * Tool abuse * Code injection --- # Files This repository contains: * `homayshield_encoder.pt` * `homayshield_classifier.pt` * `homayshield_attack_bank.npy` --- # Usage Example: ## Folder Structure ```text HomayShield/ │ ├── datasets/ │ ├── token_level_adversarial_tr_v2.jsonl │ ├── token_level_adversarial_en_v2.jsonl │ └── final_classifier_merged_all.jsonl │ ├── output/ │ └── Homayv6/ │ ├── homayshield_encoder.pt │ ├── homayshield_classifier.pt │ └── homayshield_attack_bank.npy │ ├── training2.py ├── inference3.py ``` --- ## Training Command ```bash python training2.py \ --train \ ./datasets/token_level_adversarial_tr_v2.jsonl \ ./datasets/token_level_adversarial_en_v2.jsonl \ ./datasets/final_classifier_merged_all.jsonl \ --output-dir ./output/Homayv6 ``` --- ## Output Files After Training Training generates: ```text output/Homayv6/ ├── homayshield_encoder.pt ├── homayshield_classifier.pt └── homayshield_attack_bank.npy ``` --- ## Inference Command ```bash python inference.py ``` Inference loads: * `homayshield_encoder.pt` * `homayshield_classifier.pt` * `homayshield_attack_bank.npy` from: ```text ./output/Homayv6/ ``` Inference modes: * OR * Fusion * Semantic Only * Classifier Only --- # Limitations HomayShield is not intended to replace advanced LLM-based guardrails. Compared to LLM guardrails: Advantages: * Lower infrastructure cost * Faster CPU inference * Easier deployment Tradeoffs: * Lower reasoning capability * Less contextual understanding * Reduced zero-day detection --- # Intended Use Recommended for: * Enterprise AI security * SOC environments * On-prem AI systems * Air-gapped deployments * CPU-only environments # Example Usage ![Screenshot 2026-06-26 at 10.36.34](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/aydhqYnQOQfDmnQTw5Wpq.png) --- # Final Verdict (Attack Detection) | Threshold | Attack Recall | Precision | | --------- | ------------: | --------: | | 0.57 | 100% | 78.2% | | 0.58 | 80.2% | ~100% | | 0.59 | 38.6% | 100% | Your guardrail is highly effective for attack detection, especially due to the semantic layer. Attack Detection Rating: Semantic Layer: 9.5/10 Classifier Layer: 7.5/10 Overall Attack Detection: 9/10 # Philosophy > AI security should not be limited to organizations with GPU infrastructure. Even lightweight CPU-based guardrails can provide meaningful protection for real-world AI systems. ![ChatGPT Image Jun 26, 2026 at 12_02_58 AM(2)](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/7Q144jZJTTxgIlNd_jTIu.png)