File size: 5,607 Bytes

76ff70e
 
 
 
 
 
 
 
 
 
673e8f2
e4e5238
673e8f2
ce0932c
673e8f2
ce0932c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
673e8f2
 
 
ce0932c
 
 
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
 
 
 
 
 
 
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
 
 
 
 
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
 
 
673e8f2
ce0932c
673e8f2
ce0932c
673e8f2
ce0932c
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
 
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
 
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
 
 
a938937
ce0932c
a938937
ce0932c
 
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
 
a938937
ce0932c
a938937
ce0932c
 
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
 
a938937
ce0932c
a938937
ce0932c
 
 
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
 
 
 
 
a938937
ce0932c
a938937
ce0932c
 
 
 
 
 
 
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
a938937
ce0932c
 
 
 
 
 
 
 
b0bf783
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c93a4ef
b0bf783
 
 
ce0932c
b0bf783
 
 
 
 
 
 
 
 
ce0932c
 
b0bf783
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce0932c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d2dc578
 
 
 
 
ce0932c
d2dc578
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce0932c
 
 
 
 
 
a938937
e4e5238
a938937

---
language:
- tr
- en
base_model:
- dbmdz/bert-base-turkish-128k-cased
pipeline_tag: text-classification
tags:
- bert
- guardrail
---
# HomayShield 🔒

CPU-Based AI Guardrail for Turkish & English Security Filtering

HomayShield is a lightweight CPU-based AI guardrail designed to detect malicious, adversarial, and suspicious prompts targeting AI systems.

Unlike LLM-based guardrails, HomayShield is optimized for **CPU-only inference**, making it practical for organizations operating in resource-constrained or on-prem environments.

---

# Overview

HomayShield provides AI security filtering for:

* LLM applications
* Chatbots
* AI agents
* RAG systems
* Internal AI assistants
* Enterprise AI pipelines

Supported languages:

* Turkish 🇹🇷
* English 🇬🇧
* Mixed Turkish-English prompts

---

# Key Features

* ✅ CPU-friendly inference
* ✅ Shared encoder architecture
* ✅ Low-latency detection
* ✅ No GPU required in production
* ✅ Semantic attack detection
* ✅ Classifier-based attack detection
* ✅ Hybrid decision engine

---

# Architecture

HomayShield uses a shared encoder design:

![Screenshot 2026-06-26 at 14.30.30](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/hqHqkwOQtg1lY0RTQdxxE.png)

# Detection Strategy

HomayShield combines two detection mechanisms.

## 1. Semantic Detection

Incoming prompt embeddings are compared against known attack embeddings.

Detects:

* Prompt injection
* Jailbreak attacks
* Instruction override
* Adversarial prompts
* Semantic attack variants

---

## 2. Classifier Detection

Classifier predicts attack probability from embeddings.

Detects:

* Known attack patterns
* Learned malicious behaviors
* Structured attack prompts

---

# Inference Modes

## OR Logic

Attack if either semantic or classifier score exceeds threshold.

Best for:

* Security-first environments
* Low false negatives

---

## Weighted Fusion

Weighted combination of semantic + classifier scores.

Best for:

* Balanced detection
* Tunable sensitivity

---

## Single Signal

Use only:

* Semantic detection
  or
* Classifier detection

Best for:

* Benchmarking
* Lightweight deployments

---

# Training

Training consists of two stages.

## Stage 1 — Encoder Training

Loss:
CosineEmbeddingLoss

Goal:

* Cluster similar attacks
* Separate benign and malicious prompts

---

## Stage 2 — Classifier Training

Loss:
BCEWithLogitsLoss

Outputs:

* Encoder weights
* Classifier weights
* Attack embedding bank

---

# Training Data

HomayShield was trained using a multilingual dataset containing:

* Benign prompts
* Adversarial prompts
* Turkish prompts
* English prompts
* Mixed-language prompts

Attack categories include:

* Prompt injection
* Jailbreak
* Instruction override
* Prompt leakage
* Data exfiltration
* Tool abuse
* Code injection

---

# Files

This repository contains:

* `homayshield_encoder.pt`
* `homayshield_classifier.pt`
* `homayshield_attack_bank.npy`

---

# Usage
Example:
## Folder Structure

```text
HomayShield/
│
├── datasets/
│   ├── token_level_adversarial_tr_v2.jsonl
│   ├── token_level_adversarial_en_v2.jsonl
│   └── final_classifier_merged_all.jsonl
│
├── output/
│   └── Homayv6/
│       ├── homayshield_encoder.pt
│       ├── homayshield_classifier.pt
│       └── homayshield_attack_bank.npy
│
├── training2.py
├── inference3.py
```

---

## Training Command

```bash
python training2.py \
  --train \
  ./datasets/token_level_adversarial_tr_v2.jsonl \
  ./datasets/token_level_adversarial_en_v2.jsonl \
  ./datasets/final_classifier_merged_all.jsonl \
  --output-dir ./output/Homayv6
```

---

## Output Files After Training

Training generates:

```text
output/Homayv6/
├── homayshield_encoder.pt
├── homayshield_classifier.pt
└── homayshield_attack_bank.npy
```

---

## Inference Command

```bash
python inference.py
```

Inference loads:

* `homayshield_encoder.pt`
* `homayshield_classifier.pt`
* `homayshield_attack_bank.npy`

from:

```text
./output/Homayv6/
```


Inference modes:

* OR
* Fusion
* Semantic Only
* Classifier Only

---

# Limitations

HomayShield is not intended to replace advanced LLM-based guardrails.

Compared to LLM guardrails:

Advantages:

* Lower infrastructure cost
* Faster CPU inference
* Easier deployment

Tradeoffs:

* Lower reasoning capability
* Less contextual understanding
* Reduced zero-day detection

---

# Intended Use

Recommended for:

* Enterprise AI security
* SOC environments
* On-prem AI systems
* Air-gapped deployments
* CPU-only environments

# Example Usage


![Screenshot 2026-06-26 at 10.36.34](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/aydhqYnQOQfDmnQTw5Wpq.png)

---
# Final Verdict (Attack Detection)

| Threshold | Attack Recall | Precision |
| --------- | ------------: | --------: |
| 0.57      |          100% |     78.2% |
| 0.58      |         80.2% |     ~100% |
| 0.59      |         38.6% |      100% |

Your guardrail is highly effective for attack detection, especially due to the semantic layer.
Attack Detection Rating:
Semantic Layer: 9.5/10
Classifier Layer: 7.5/10
Overall Attack Detection: 9/10



# Philosophy

> AI security should not be limited to organizations with GPU infrastructure.

Even lightweight CPU-based guardrails can provide meaningful protection for real-world AI systems.

![ChatGPT Image Jun 26, 2026 at 12_02_58 AM(2)](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/7Q144jZJTTxgIlNd_jTIu.png)