HomayShield πŸ”’

CPU-Based AI Guardrail for Turkish & English Security Filtering

HomayShield is a lightweight CPU-based AI guardrail designed to detect malicious, adversarial, and suspicious prompts targeting AI systems.

Unlike LLM-based guardrails, HomayShield is optimized for CPU-only inference, making it practical for organizations operating in resource-constrained or on-prem environments.


Overview

HomayShield provides AI security filtering for:

  • LLM applications
  • Chatbots
  • AI agents
  • RAG systems
  • Internal AI assistants
  • Enterprise AI pipelines

Supported languages:

  • Turkish πŸ‡ΉπŸ‡·
  • English πŸ‡¬πŸ‡§
  • Mixed Turkish-English prompts

Key Features

  • βœ… CPU-friendly inference
  • βœ… Shared encoder architecture
  • βœ… Low-latency detection
  • βœ… No GPU required in production
  • βœ… Semantic attack detection
  • βœ… Classifier-based attack detection
  • βœ… Hybrid decision engine

Architecture

HomayShield uses a shared encoder design:

Screenshot 2026-06-26 at 14.30.30

Detection Strategy

HomayShield combines two detection mechanisms.

1. Semantic Detection

Incoming prompt embeddings are compared against known attack embeddings.

Detects:

  • Prompt injection
  • Jailbreak attacks
  • Instruction override
  • Adversarial prompts
  • Semantic attack variants

2. Classifier Detection

Classifier predicts attack probability from embeddings.

Detects:

  • Known attack patterns
  • Learned malicious behaviors
  • Structured attack prompts

Inference Modes

OR Logic

Attack if either semantic or classifier score exceeds threshold.

Best for:

  • Security-first environments
  • Low false negatives

Weighted Fusion

Weighted combination of semantic + classifier scores.

Best for:

  • Balanced detection
  • Tunable sensitivity

Single Signal

Use only:

  • Semantic detection or
  • Classifier detection

Best for:

  • Benchmarking
  • Lightweight deployments

Training

Training consists of two stages.

Stage 1 β€” Encoder Training

Loss: CosineEmbeddingLoss

Goal:

  • Cluster similar attacks
  • Separate benign and malicious prompts

Stage 2 β€” Classifier Training

Loss: BCEWithLogitsLoss

Outputs:

  • Encoder weights
  • Classifier weights
  • Attack embedding bank

Training Data

HomayShield was trained using a multilingual dataset containing:

  • Benign prompts
  • Adversarial prompts
  • Turkish prompts
  • English prompts
  • Mixed-language prompts

Attack categories include:

  • Prompt injection
  • Jailbreak
  • Instruction override
  • Prompt leakage
  • Data exfiltration
  • Tool abuse
  • Code injection

Files

This repository contains:

  • homayshield_encoder.pt
  • homayshield_classifier.pt
  • homayshield_attack_bank.npy

Usage

Example:

Folder Structure

HomayShield/
β”‚
β”œβ”€β”€ datasets/
β”‚   β”œβ”€β”€ token_level_adversarial_tr_v2.jsonl
β”‚   β”œβ”€β”€ token_level_adversarial_en_v2.jsonl
β”‚   └── final_classifier_merged_all.jsonl
β”‚
β”œβ”€β”€ output/
β”‚   └── Homayv6/
β”‚       β”œβ”€β”€ homayshield_encoder.pt
β”‚       β”œβ”€β”€ homayshield_classifier.pt
β”‚       └── homayshield_attack_bank.npy
β”‚
β”œβ”€β”€ training2.py
β”œβ”€β”€ inference3.py

Training Command

python training2.py \
  --train \
  ./datasets/token_level_adversarial_tr_v2.jsonl \
  ./datasets/token_level_adversarial_en_v2.jsonl \
  ./datasets/final_classifier_merged_all.jsonl \
  --output-dir ./output/Homayv6

Output Files After Training

Training generates:

output/Homayv6/
β”œβ”€β”€ homayshield_encoder.pt
β”œβ”€β”€ homayshield_classifier.pt
└── homayshield_attack_bank.npy

Inference Command

python inference.py

Inference loads:

  • homayshield_encoder.pt
  • homayshield_classifier.pt
  • homayshield_attack_bank.npy

from:

./output/Homayv6/

Inference modes:

  • OR
  • Fusion
  • Semantic Only
  • Classifier Only

Limitations

HomayShield is not intended to replace advanced LLM-based guardrails.

Compared to LLM guardrails:

Advantages:

  • Lower infrastructure cost
  • Faster CPU inference
  • Easier deployment

Tradeoffs:

  • Lower reasoning capability
  • Less contextual understanding
  • Reduced zero-day detection

Intended Use

Recommended for:

  • Enterprise AI security
  • SOC environments
  • On-prem AI systems
  • Air-gapped deployments
  • CPU-only environments

Example Usage

Screenshot 2026-06-26 at 10.36.34


Final Verdict (Attack Detection)

Threshold Attack Recall Precision
0.57 100% 78.2%
0.58 80.2% ~100%
0.59 38.6% 100%

Your guardrail is highly effective for attack detection, especially due to the semantic layer. Attack Detection Rating: Semantic Layer: 9.5/10 Classifier Layer: 7.5/10 Overall Attack Detection: 9/10

Philosophy

AI security should not be limited to organizations with GPU infrastructure.

Even lightweight CPU-based guardrails can provide meaningful protection for real-world AI systems.

ChatGPT Image Jun 26, 2026 at 12_02_58 AM(2)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for boying07/CPU-Based-AI-Guardrail

Finetuned
(8)
this model