---
language:
- tr
- en
base_model:
- dbmdz/bert-base-turkish-128k-cased
pipeline_tag: text-classification
tags:
- bert
- guardrail
---
# HomayShield 🔒

CPU-Based AI Guardrail for Turkish & English Security Filtering

HomayShield is a lightweight CPU-based AI guardrail designed to detect malicious, adversarial, and suspicious prompts targeting AI systems.

Unlike LLM-based guardrails, HomayShield is optimized for **CPU-only inference**, making it practical for organizations operating in resource-constrained or on-prem environments.

---

# Overview

HomayShield provides AI security filtering for:

* LLM applications
* Chatbots
* AI agents
* RAG systems
* Internal AI assistants
* Enterprise AI pipelines

Supported languages:

* Turkish 🇹🇷
* English 🇬🇧
* Mixed Turkish-English prompts

---

# Key Features

* ✅ CPU-friendly inference
* ✅ Shared encoder architecture
* ✅ Low-latency detection
* ✅ No GPU required in production
* ✅ Semantic attack detection
* ✅ Classifier-based attack detection
* ✅ Hybrid decision engine

---

# Architecture

HomayShield uses a shared encoder design:

![Screenshot 2026-06-26 at 14.30.30](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/hqHqkwOQtg1lY0RTQdxxE.png)

# Detection Strategy

HomayShield combines two detection mechanisms.

## 1. Semantic Detection

Incoming prompt embeddings are compared against known attack embeddings.

Detects:

* Prompt injection
* Jailbreak attacks
* Instruction override
* Adversarial prompts
* Semantic attack variants

---

## 2. Classifier Detection

Classifier predicts attack probability from embeddings.

Detects:

* Known attack patterns
* Learned malicious behaviors
* Structured attack prompts

---

# Inference Modes

## OR Logic

Attack if either semantic or classifier score exceeds threshold.

Best for:

* Security-first environments
* Low false negatives

---

## Weighted Fusion

Weighted combination of semantic + classifier scores.

Best for:

* Balanced detection
* Tunable sensitivity

---

## Single Signal

Use only:

* Semantic detection
  or
* Classifier detection

Best for:

* Benchmarking
* Lightweight deployments

---

# Training

Training consists of two stages.

## Stage 1 — Encoder Training

Loss:
CosineEmbeddingLoss

Goal:

* Cluster similar attacks
* Separate benign and malicious prompts

---

## Stage 2 — Classifier Training

Loss:
BCEWithLogitsLoss

Outputs:

* Encoder weights
* Classifier weights
* Attack embedding bank

---

# Training Data

HomayShield was trained using a multilingual dataset containing:

* Benign prompts
* Adversarial prompts
* Turkish prompts
* English prompts
* Mixed-language prompts

Attack categories include:

* Prompt injection
* Jailbreak
* Instruction override
* Prompt leakage
* Data exfiltration
* Tool abuse
* Code injection

---

# Files

This repository contains:

* `homayshield_encoder.pt`
* `homayshield_classifier.pt`
* `homayshield_attack_bank.npy`

---

# Usage
Example:
## Folder Structure

```text
HomayShield/
│
├── datasets/
│   ├── token_level_adversarial_tr_v2.jsonl
│   ├── token_level_adversarial_en_v2.jsonl
│   └── final_classifier_merged_all.jsonl
│
├── output/
│   └── Homayv6/
│       ├── homayshield_encoder.pt
│       ├── homayshield_classifier.pt
│       └── homayshield_attack_bank.npy
│
├── training2.py
├── inference3.py
```

---

## Training Command

```bash
python training2.py \
  --train \
  ./datasets/token_level_adversarial_tr_v2.jsonl \
  ./datasets/token_level_adversarial_en_v2.jsonl \
  ./datasets/final_classifier_merged_all.jsonl \
  --output-dir ./output/Homayv6
```

---

## Output Files After Training

Training generates:

```text
output/Homayv6/
├── homayshield_encoder.pt
├── homayshield_classifier.pt
└── homayshield_attack_bank.npy
```

---

## Inference Command

```bash
python inference.py
```

Inference loads:

* `homayshield_encoder.pt`
* `homayshield_classifier.pt`
* `homayshield_attack_bank.npy`

from:

```text
./output/Homayv6/
```


Inference modes:

* OR
* Fusion
* Semantic Only
* Classifier Only

---

# Limitations

HomayShield is not intended to replace advanced LLM-based guardrails.

Compared to LLM guardrails:

Advantages:

* Lower infrastructure cost
* Faster CPU inference
* Easier deployment

Tradeoffs:

* Lower reasoning capability
* Less contextual understanding
* Reduced zero-day detection

---

# Intended Use

Recommended for:

* Enterprise AI security
* SOC environments
* On-prem AI systems
* Air-gapped deployments
* CPU-only environments

# Example Usage


![Screenshot 2026-06-26 at 10.36.34](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/aydhqYnQOQfDmnQTw5Wpq.png)

---
# Final Verdict (Attack Detection)

| Threshold | Attack Recall | Precision |
| --------- | ------------: | --------: |
| 0.57      |          100% |     78.2% |
| 0.58      |         80.2% |     ~100% |
| 0.59      |         38.6% |      100% |

Your guardrail is highly effective for attack detection, especially due to the semantic layer.
Attack Detection Rating:
Semantic Layer: 9.5/10
Classifier Layer: 7.5/10
Overall Attack Detection: 9/10


# Philosophy

> AI security should not be limited to organizations with GPU infrastructure.

Even lightweight CPU-based guardrails can provide meaningful protection for real-world AI systems.

![ChatGPT Image Jun 26, 2026 at 12_02_58 AM(2)](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/7Q144jZJTTxgIlNd_jTIu.png)