HomayShield 🔒

CPU-Based AI Guardrail for Turkish & English Security Filtering

HomayShield is a lightweight CPU-based AI guardrail designed to detect malicious, adversarial, and suspicious prompts targeting AI systems.

Unlike LLM-based guardrails, HomayShield is optimized for CPU-only inference, making it practical for organizations operating in resource-constrained or on-prem environments.

Overview

HomayShield provides AI security filtering for:

LLM applications
Chatbots
AI agents
RAG systems
Internal AI assistants
Enterprise AI pipelines

Supported languages:

Turkish 🇹🇷
English 🇬🇧
Mixed Turkish-English prompts

Key Features

✅ CPU-friendly inference
✅ Shared encoder architecture
✅ Low-latency detection
✅ No GPU required in production
✅ Semantic attack detection
✅ Classifier-based attack detection
✅ Hybrid decision engine

Architecture

HomayShield uses a shared encoder design:

Detection Strategy

HomayShield combines two detection mechanisms.

1. Semantic Detection

Incoming prompt embeddings are compared against known attack embeddings.

Detects:

Prompt injection
Jailbreak attacks
Instruction override
Adversarial prompts
Semantic attack variants

2. Classifier Detection

Classifier predicts attack probability from embeddings.

Detects:

Known attack patterns
Learned malicious behaviors
Structured attack prompts

Inference Modes

OR Logic

Attack if either semantic or classifier score exceeds threshold.

Best for:

Security-first environments
Low false negatives

Weighted Fusion

Weighted combination of semantic + classifier scores.

Best for:

Balanced detection
Tunable sensitivity

Single Signal

Use only:

Semantic detection or
Classifier detection

Best for:

Benchmarking
Lightweight deployments

Training

Training consists of two stages.

Stage 1 — Encoder Training

Loss: CosineEmbeddingLoss

Goal:

Cluster similar attacks
Separate benign and malicious prompts

Stage 2 — Classifier Training

Loss: BCEWithLogitsLoss

Outputs:

Encoder weights
Classifier weights
Attack embedding bank

Training Data

HomayShield was trained using a multilingual dataset containing:

Benign prompts
Adversarial prompts
Turkish prompts
English prompts
Mixed-language prompts

Attack categories include:

Prompt injection
Jailbreak
Instruction override
Prompt leakage
Data exfiltration
Tool abuse
Code injection

Files

This repository contains:

homayshield_encoder.pt
homayshield_classifier.pt
homayshield_attack_bank.npy

Usage

Example:

Folder Structure

HomayShield/
│
├── datasets/
│   ├── token_level_adversarial_tr_v2.jsonl
│   ├── token_level_adversarial_en_v2.jsonl
│   └── final_classifier_merged_all.jsonl
│
├── output/
│   └── Homayv6/
│       ├── homayshield_encoder.pt
│       ├── homayshield_classifier.pt
│       └── homayshield_attack_bank.npy
│
├── training2.py
├── inference3.py

Training Command

python training2.py \
  --train \
  ./datasets/token_level_adversarial_tr_v2.jsonl \
  ./datasets/token_level_adversarial_en_v2.jsonl \
  ./datasets/final_classifier_merged_all.jsonl \
  --output-dir ./output/Homayv6

Output Files After Training

Training generates:

output/Homayv6/
├── homayshield_encoder.pt
├── homayshield_classifier.pt
└── homayshield_attack_bank.npy

Inference Command

python inference.py

Inference loads:

homayshield_encoder.pt
homayshield_classifier.pt
homayshield_attack_bank.npy

from:

./output/Homayv6/

Inference modes:

OR
Fusion
Semantic Only
Classifier Only

Limitations

HomayShield is not intended to replace advanced LLM-based guardrails.

Compared to LLM guardrails:

Advantages:

Lower infrastructure cost
Faster CPU inference
Easier deployment

Tradeoffs:

Lower reasoning capability
Less contextual understanding
Reduced zero-day detection

Intended Use

Recommended for:

Enterprise AI security
SOC environments
On-prem AI systems
Air-gapped deployments
CPU-only environments

Example Usage

Final Verdict (Attack Detection)

Threshold	Attack Recall	Precision
0.57	100%	78.2%
0.58	80.2%	~100%
0.59	38.6%	100%

Your guardrail is highly effective for attack detection, especially due to the semantic layer. Attack Detection Rating: Semantic Layer: 9.5/10 Classifier Layer: 7.5/10 Overall Attack Detection: 9/10

Philosophy

AI security should not be limited to organizations with GPU infrastructure.

Even lightweight CPU-based guardrails can provide meaningful protection for real-world AI systems.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for boying07/CPU-Based-AI-Guardrail

Base model

dbmdz/bert-base-turkish-128k-cased

Finetuned

(8)

this model