boying07
/

CPU-Based-AI-Guardrail

@@ -9,251 +9,238 @@ tags:
 - bert
 - guardrail
 ---
-HomayShield: CPU-Based AI Guardrail for Turkish & English Security Filtering
-HomayShield is a lightweight CPU-based AI guardrail system designed to detect malicious, adversarial, and suspicious inputs targeting AI systems.
-The project focuses on providing practical AI security for organizations that cannot deploy GPU-heavy guardrail solutions.
 Supported languages:
-Turkish
-English
-Mixed Turkish-English prompts
-Why HomayShield?
-As AI adoption grows, organizations increasingly deploy:
-LLM applications
-Chatbots
-AI agents
-RAG systems
-Internal AI assistants
-Web-integrated AI pipelines
-These systems introduce new attack surfaces.
-Examples include:
-Prompt injection
-Jailbreak attacks
-Instruction override
-Data exfiltration
-Tool abuse
-Indirect prompt injection
-Modern guardrails often rely on LLM-based security analysis.
-These systems are powerful, but they introduce major operational challenges:
-High infrastructure cost
-GPU dependency
-High inference latency
-Expensive scaling
-Complex deployment
-Many small and mid-sized organizations cannot afford dedicated GPU infrastructure for security layers.
-This creates a major security gap.
-Project Goal
-HomayShield aims to provide a practical alternative.
-Main objectives:
-CPU-based inference
-Low latency
-No GPU requirement in production
-Easy enterprise deployment
-Lower operational cost
-Strong baseline AI security
-HomayShield is designed for:
-SOC environments
-Enterprise AI systems
-Air-gapped systems
-On-prem deployments
-CPU-only environments
-Important Note
-HomayShield is not intended to replace LLM-based guardrails.
-LLM guardrails typically provide:
-deeper reasoning
-better contextual understanding
-stronger zero-day detection
-more adaptive behavior
-In most scenarios, LLM-based guardrails are more powerful.
-However, HomayShield offers an important tradeoff:
-lower detection capability than advanced LLM guardrails
-significantly lower infrastructure cost
-much easier deployment
-much faster CPU inference
-For many organizations, deployability matters.
-A CPU-based guardrail is better than having no guardrail.
-Core Architecture
-HomayShield is built around one key principle:
-Run encoder once. Use output twice.
-A single shared encoder generates embeddings used by both:
-Semantic similarity detection
-Classifier prediction
-Architecture:
-![Screenshot 2026-06-26 at 14.30.30](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/hqHqkwOQtg1lY0RTQdxxE.png)
-Why Shared Encoder?
-Traditional guardrail systems may run:
-Language model
-Embedding model
-Classifier model
-Policy model
-This increases:
-CPU/GPU utilization
-latency
-memory consumption
-infrastructure complexity
-HomayShield avoids this by sharing the encoder.
-Advantages:
-Lower CPU usage
-Faster inference
-Lower memory footprint
-Better scalability
-Consistent semantic representation
-Supported Languages
-Current supported languages:
-Turkish (tr)
-English (en)
-Inference begins with language detection.
-If input language is unsupported:
-Reject or
-Skip evaluation
-Detection Strategy
-HomayShield combines two detection mechanisms.
-1) Semantic Detection
-Semantic similarity compares incoming prompt embeddings against known malicious attack embeddings.
-Useful for detecting:
-similar attacks
-prompt injection variants
-jailbreak attempts
-semantic anomalies
-adversarial patterns
-2) Classifier Detection
-Classifier predicts attack probability using shared embeddings.
-Useful for detecting:
-known attack patterns
-learned malicious behavior
-structured adversarial prompts
-Inference Modes
-HomayShield supports 3 inference strategies.
-Option 1 — OR Logic
-Security-first mode.
-if semantic_score >= semantic_threshold or classifier_score >= classifier_threshold:
-    ATTACK
-else:
-    NORMAL
-Best for:
-strict environments
-low false negatives
-Option 2 — Weighted Fusion
-Balanced mode.
-fusion_score = semantic_weight * semantic_score + classifier_weight * classifier_score
-Best for:
-balanced security
-tunable sensitivity
-Option 3 — Single Signal
-Choose one:
-semantic only
-classifier only
-Useful for benchmarking or lightweight deployments.
-Training Pipeline
-Training consists of 2 stages.
-Stage 1 — Encoder Training
-The encoder is trained using similarity learning.
-Goal:
-similar attacks cluster together
-similar normal prompts cluster together
-attacks and normal prompts separate clearly
-Loss:
-CosineEmbeddingLoss
-Stage 2 — Classifier Training
-After encoder training:
-embeddings are extracted
-classifier head is trained on embeddings
-Loss:
-BCEWithLogitsLoss
-Outputs:
-trained encoder
-trained classifier
-attack embedding bank
-normal embedding bank
-Training Command
-python training_final.py \
-  --train /home/asimyil/train.jsonl \
-  --output-dir /home/asimyil/HomayShield_v5
-Training Dataset
-HomayShield was trained using a large dataset of:
-benign prompts
-adversarial prompts
-Turkish prompts
-English prompts
-mixed-language prompts
-Dataset includes attack categories such as:
-Direct prompt injection
-Jailbreak attacks
-Instruction override
-Prompt leakage
-Data exfiltration
-Obfuscation attacks
-Multi-turn attacks
-Roleplay attacks
-Tool abuse
-Code injection
-Long context attacks
-Hard negative samples
-This helps improve detection robustness in real-world enterprise environments.

 - bert
 - guardrail
 ---
+# HomayShield v6 🔒
+CPU-Based AI Guardrail for Turkish & English Security Filtering
+HomayShield is a lightweight CPU-based AI guardrail designed to detect malicious, adversarial, and suspicious prompts targeting AI systems.
+Unlike LLM-based guardrails, HomayShield is optimized for **CPU-only inference**, making it practical for organizations operating in resource-constrained or on-prem environments.
+---
+# Overview
+HomayShield provides AI security filtering for:
+* LLM applications
+* Chatbots
+* AI agents
+* RAG systems
+* Internal AI assistants
+* Enterprise AI pipelines
 Supported languages:
+* Turkish 🇹🇷
+* English 🇬🇧
+* Mixed Turkish-English prompts
+---
+# Key Features
+* ✅ CPU-friendly inference
+* ✅ Shared encoder architecture
+* ✅ Low-latency detection
+* ✅ No GPU required in production
+* ✅ Semantic attack detection
+* ✅ Classifier-based attack detection
+* ✅ Hybrid decision engine
+---
+# Architecture
+HomayShield uses a shared encoder design:
+Architecture:
+![Screenshot 2026-06-26 at 14.30.30](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/hqHqkwOQtg1lY0RTQdxxE.png)
+# Detection Strategy
+HomayShield combines two detection mechanisms.
+## 1. Semantic Detection
+Incoming prompt embeddings are compared against known attack embeddings.
+Detects:
+* Prompt injection
+* Jailbreak attacks
+* Instruction override
+* Adversarial prompts
+* Semantic attack variants
+---
+## 2. Classifier Detection
+Classifier predicts attack probability from embeddings.
+Detects:
+* Known attack patterns
+* Learned malicious behaviors
+* Structured attack prompts
+---
+# Inference Modes
+## OR Logic
+Attack if either semantic or classifier score exceeds threshold.
+Best for:
+* Security-first environments
+* Low false negatives
+---
+## Weighted Fusion
+Weighted combination of semantic + classifier scores.
+Best for:
+* Balanced detection
+* Tunable sensitivity
+---
+## Single Signal
+Use only:
+* Semantic detection
+  or
+* Classifier detection
+Best for:
+* Benchmarking
+* Lightweight deployments
+---
+# Training
+Training consists of two stages.
+## Stage 1 — Encoder Training
+Loss:
+CosineEmbeddingLoss
+Goal:
+* Cluster similar attacks
+* Separate benign and malicious prompts
+---
+## Stage 2 — Classifier Training
+Loss:
+BCEWithLogitsLoss
+Outputs:
+* Encoder weights
+* Classifier weights
+* Attack embedding bank
+---
+# Training Data
+HomayShield was trained using a multilingual dataset containing:
+* Benign prompts
+* Adversarial prompts
+* Turkish prompts
+* English prompts
+* Mixed-language prompts
+Attack categories include:
+* Prompt injection
+* Jailbreak
+* Instruction override
+* Prompt leakage
+* Data exfiltration
+* Tool abuse
+* Code injection
+---
+# Files
+This repository contains:
+* `homayshield_encoder.pt`
+* `homayshield_classifier.pt`
+* `homayshield_attack_bank.npy`
+---
+# Usage
+Example:
+```python
+python inference2.py
+```
+Inference modes:
+* OR
+* Fusion
+* Semantic Only
+* Classifier Only
+---
+# Limitations
+HomayShield is not intended to replace advanced LLM-based guardrails.
+Compared to LLM guardrails:
+Advantages:
+* Lower infrastructure cost
+* Faster CPU inference
+* Easier deployment
+Tradeoffs:
+* Lower reasoning capability
+* Less contextual understanding
+* Reduced zero-day detection
+---
+# Intended Use
+Recommended for:
+* Enterprise AI security
+* SOC environments
+* On-prem AI systems
+* Air-gapped deployments
+* CPU-only environments
+---
+# Philosophy
+> AI security should not be limited to organizations with GPU infrastructure.
+Even lightweight CPU-based guardrails can provide meaningful protection for real-world AI systems.