| --- |
| language: |
| - tr |
| - en |
| base_model: |
| - dbmdz/bert-base-turkish-128k-cased |
| pipeline_tag: text-classification |
| tags: |
| - bert |
| - guardrail |
| --- |
| # HomayShield π |
|
|
| CPU-Based AI Guardrail for Turkish & English Security Filtering |
|
|
| HomayShield is a lightweight CPU-based AI guardrail designed to detect malicious, adversarial, and suspicious prompts targeting AI systems. |
|
|
| Unlike LLM-based guardrails, HomayShield is optimized for **CPU-only inference**, making it practical for organizations operating in resource-constrained or on-prem environments. |
|
|
| --- |
|
|
| # Overview |
|
|
| HomayShield provides AI security filtering for: |
|
|
| * LLM applications |
| * Chatbots |
| * AI agents |
| * RAG systems |
| * Internal AI assistants |
| * Enterprise AI pipelines |
|
|
| Supported languages: |
|
|
| * Turkish πΉπ· |
| * English π¬π§ |
| * Mixed Turkish-English prompts |
|
|
| --- |
|
|
| # Key Features |
|
|
| * β
CPU-friendly inference |
| * β
Shared encoder architecture |
| * β
Low-latency detection |
| * β
No GPU required in production |
| * β
Semantic attack detection |
| * β
Classifier-based attack detection |
| * β
Hybrid decision engine |
|
|
| --- |
|
|
| # Architecture |
|
|
| HomayShield uses a shared encoder design: |
|
|
|  |
|
|
| # Detection Strategy |
|
|
| HomayShield combines two detection mechanisms. |
|
|
| ## 1. Semantic Detection |
|
|
| Incoming prompt embeddings are compared against known attack embeddings. |
|
|
| Detects: |
|
|
| * Prompt injection |
| * Jailbreak attacks |
| * Instruction override |
| * Adversarial prompts |
| * Semantic attack variants |
|
|
| --- |
|
|
| ## 2. Classifier Detection |
|
|
| Classifier predicts attack probability from embeddings. |
|
|
| Detects: |
|
|
| * Known attack patterns |
| * Learned malicious behaviors |
| * Structured attack prompts |
|
|
| --- |
|
|
| # Inference Modes |
|
|
| ## OR Logic |
|
|
| Attack if either semantic or classifier score exceeds threshold. |
|
|
| Best for: |
|
|
| * Security-first environments |
| * Low false negatives |
|
|
| --- |
|
|
| ## Weighted Fusion |
|
|
| Weighted combination of semantic + classifier scores. |
|
|
| Best for: |
|
|
| * Balanced detection |
| * Tunable sensitivity |
|
|
| --- |
|
|
| ## Single Signal |
|
|
| Use only: |
|
|
| * Semantic detection |
| or |
| * Classifier detection |
|
|
| Best for: |
|
|
| * Benchmarking |
| * Lightweight deployments |
|
|
| --- |
|
|
| # Training |
|
|
| Training consists of two stages. |
|
|
| ## Stage 1 β Encoder Training |
|
|
| Loss: |
| CosineEmbeddingLoss |
|
|
| Goal: |
|
|
| * Cluster similar attacks |
| * Separate benign and malicious prompts |
|
|
| --- |
|
|
| ## Stage 2 β Classifier Training |
|
|
| Loss: |
| BCEWithLogitsLoss |
|
|
| Outputs: |
|
|
| * Encoder weights |
| * Classifier weights |
| * Attack embedding bank |
|
|
| --- |
|
|
| # Training Data |
|
|
| HomayShield was trained using a multilingual dataset containing: |
|
|
| * Benign prompts |
| * Adversarial prompts |
| * Turkish prompts |
| * English prompts |
| * Mixed-language prompts |
|
|
| Attack categories include: |
|
|
| * Prompt injection |
| * Jailbreak |
| * Instruction override |
| * Prompt leakage |
| * Data exfiltration |
| * Tool abuse |
| * Code injection |
|
|
| --- |
|
|
| # Files |
|
|
| This repository contains: |
|
|
| * `homayshield_encoder.pt` |
| * `homayshield_classifier.pt` |
| * `homayshield_attack_bank.npy` |
|
|
| --- |
|
|
| # Usage |
| Example: |
| ## Folder Structure |
|
|
| ```text |
| HomayShield/ |
| β |
| βββ datasets/ |
| β βββ token_level_adversarial_tr_v2.jsonl |
| β βββ token_level_adversarial_en_v2.jsonl |
| β βββ final_classifier_merged_all.jsonl |
| β |
| βββ output/ |
| β βββ Homayv6/ |
| β βββ homayshield_encoder.pt |
| β βββ homayshield_classifier.pt |
| β βββ homayshield_attack_bank.npy |
| β |
| βββ training2.py |
| βββ inference3.py |
| ``` |
|
|
| --- |
|
|
| ## Training Command |
|
|
| ```bash |
| python training2.py \ |
| --train \ |
| ./datasets/token_level_adversarial_tr_v2.jsonl \ |
| ./datasets/token_level_adversarial_en_v2.jsonl \ |
| ./datasets/final_classifier_merged_all.jsonl \ |
| --output-dir ./output/Homayv6 |
| ``` |
|
|
| --- |
|
|
| ## Output Files After Training |
|
|
| Training generates: |
|
|
| ```text |
| output/Homayv6/ |
| βββ homayshield_encoder.pt |
| βββ homayshield_classifier.pt |
| βββ homayshield_attack_bank.npy |
| ``` |
|
|
| --- |
|
|
| ## Inference Command |
|
|
| ```bash |
| python inference.py |
| ``` |
|
|
| Inference loads: |
|
|
| * `homayshield_encoder.pt` |
| * `homayshield_classifier.pt` |
| * `homayshield_attack_bank.npy` |
|
|
| from: |
|
|
| ```text |
| ./output/Homayv6/ |
| ``` |
|
|
|
|
| Inference modes: |
|
|
| * OR |
| * Fusion |
| * Semantic Only |
| * Classifier Only |
|
|
| --- |
|
|
| # Limitations |
|
|
| HomayShield is not intended to replace advanced LLM-based guardrails. |
|
|
| Compared to LLM guardrails: |
|
|
| Advantages: |
|
|
| * Lower infrastructure cost |
| * Faster CPU inference |
| * Easier deployment |
|
|
| Tradeoffs: |
|
|
| * Lower reasoning capability |
| * Less contextual understanding |
| * Reduced zero-day detection |
|
|
| --- |
|
|
| # Intended Use |
|
|
| Recommended for: |
|
|
| * Enterprise AI security |
| * SOC environments |
| * On-prem AI systems |
| * Air-gapped deployments |
| * CPU-only environments |
|
|
| # Example Usage |
|
|
|
|
|  |
|
|
| --- |
| # Final Verdict (Attack Detection) |
|
|
| | Threshold | Attack Recall | Precision | |
| | --------- | ------------: | --------: | |
| | 0.57 | 100% | 78.2% | |
| | 0.58 | 80.2% | ~100% | |
| | 0.59 | 38.6% | 100% | |
|
|
| Your guardrail is highly effective for attack detection, especially due to the semantic layer. |
| Attack Detection Rating: |
| Semantic Layer: 9.5/10 |
| Classifier Layer: 7.5/10 |
| Overall Attack Detection: 9/10 |
|
|
|
|
|
|
| # Philosophy |
|
|
| > AI security should not be limited to organizations with GPU infrastructure. |
|
|
| Even lightweight CPU-based guardrails can provide meaningful protection for real-world AI systems. |
|
|
|  |
|
|
|
|