Update README.md

c93a4ef verified 2 days ago

5.61 kB

	---
	language:
	- tr
	- en
	base_model:
	- dbmdz/bert-base-turkish-128k-cased
	pipeline_tag: text-classification
	tags:
	- bert
	- guardrail
	---
	# HomayShield 🔒

	CPU-Based AI Guardrail for Turkish & English Security Filtering

	HomayShield is a lightweight CPU-based AI guardrail designed to detect malicious, adversarial, and suspicious prompts targeting AI systems.

	Unlike LLM-based guardrails, HomayShield is optimized for CPU-only inference, making it practical for organizations operating in resource-constrained or on-prem environments.

	---

	# Overview

	HomayShield provides AI security filtering for:

	* LLM applications
	* Chatbots
	* AI agents
	* RAG systems
	* Internal AI assistants
	* Enterprise AI pipelines

	Supported languages:

	* Turkish 🇹🇷
	* English 🇬🇧
	* Mixed Turkish-English prompts

	---

	# Key Features

	* ✅ CPU-friendly inference
	* ✅ Shared encoder architecture
	* ✅ Low-latency detection
	* ✅ No GPU required in production
	* ✅ Semantic attack detection
	* ✅ Classifier-based attack detection
	* ✅ Hybrid decision engine

	---

	# Architecture

	HomayShield uses a shared encoder design:

	![Screenshot 2026-06-26 at 14.30.30](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/hqHqkwOQtg1lY0RTQdxxE.png)

	# Detection Strategy

	HomayShield combines two detection mechanisms.

	## 1. Semantic Detection

	Incoming prompt embeddings are compared against known attack embeddings.

	Detects:

	* Prompt injection
	* Jailbreak attacks
	* Instruction override
	* Adversarial prompts
	* Semantic attack variants

	---

	## 2. Classifier Detection

	Classifier predicts attack probability from embeddings.

	Detects:

	* Known attack patterns
	* Learned malicious behaviors
	* Structured attack prompts

	---

	# Inference Modes

	## OR Logic

	Attack if either semantic or classifier score exceeds threshold.

	Best for:

	* Security-first environments
	* Low false negatives

	---

	## Weighted Fusion

	Weighted combination of semantic + classifier scores.

	Best for:

	* Balanced detection
	* Tunable sensitivity

	---

	## Single Signal

	Use only:

	* Semantic detection
	or
	* Classifier detection

	Best for:

	* Benchmarking
	* Lightweight deployments

	---

	# Training

	Training consists of two stages.

	## Stage 1 — Encoder Training

	Loss:
	CosineEmbeddingLoss

	Goal:

	* Cluster similar attacks
	* Separate benign and malicious prompts

	---

	## Stage 2 — Classifier Training

	Loss:
	BCEWithLogitsLoss

	Outputs:

	* Encoder weights
	* Classifier weights
	* Attack embedding bank

	---

	# Training Data

	HomayShield was trained using a multilingual dataset containing:

	* Benign prompts
	* Adversarial prompts
	* Turkish prompts
	* English prompts
	* Mixed-language prompts

	Attack categories include:

	* Prompt injection
	* Jailbreak
	* Instruction override
	* Prompt leakage
	* Data exfiltration
	* Tool abuse
	* Code injection

	---

	# Files

	This repository contains:

	* `homayshield_encoder.pt`
	* `homayshield_classifier.pt`
	* `homayshield_attack_bank.npy`

	---

	# Usage
	Example:
	## Folder Structure

	```text
	HomayShield/
	│
	├── datasets/
	│ ├── token_level_adversarial_tr_v2.jsonl
	│ ├── token_level_adversarial_en_v2.jsonl
	│ └── final_classifier_merged_all.jsonl
	│
	├── output/
	│ └── Homayv6/
	│ ├── homayshield_encoder.pt
	│ ├── homayshield_classifier.pt
	│ └── homayshield_attack_bank.npy
	│
	├── training2.py
	├── inference3.py
	```

	---

	## Training Command

	```bash
	python training2.py \
	--train \
	./datasets/token_level_adversarial_tr_v2.jsonl \
	./datasets/token_level_adversarial_en_v2.jsonl \
	./datasets/final_classifier_merged_all.jsonl \
	--output-dir ./output/Homayv6
	```

	---

	## Output Files After Training

	Training generates:

	```text
	output/Homayv6/
	├── homayshield_encoder.pt
	├── homayshield_classifier.pt
	└── homayshield_attack_bank.npy
	```

	---

	## Inference Command

	```bash
	python inference.py
	```

	Inference loads:

	* `homayshield_encoder.pt`
	* `homayshield_classifier.pt`
	* `homayshield_attack_bank.npy`

	from:

	```text
	./output/Homayv6/
	```


	Inference modes:

	* OR
	* Fusion
	* Semantic Only
	* Classifier Only

	---

	# Limitations

	HomayShield is not intended to replace advanced LLM-based guardrails.

	Compared to LLM guardrails:

	Advantages:

	* Lower infrastructure cost
	* Faster CPU inference
	* Easier deployment

	Tradeoffs:

	* Lower reasoning capability
	* Less contextual understanding
	* Reduced zero-day detection

	---

	# Intended Use

	Recommended for:

	* Enterprise AI security
	* SOC environments
	* On-prem AI systems
	* Air-gapped deployments
	* CPU-only environments

	# Example Usage


	![Screenshot 2026-06-26 at 10.36.34](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/aydhqYnQOQfDmnQTw5Wpq.png)

	---
	# Final Verdict (Attack Detection)

	\| Threshold \| Attack Recall \| Precision \|
	\| --------- \| ------------: \| --------: \|
	\| 0.57 \| 100% \| 78.2% \|
	\| 0.58 \| 80.2% \| ~100% \|
	\| 0.59 \| 38.6% \| 100% \|

	Your guardrail is highly effective for attack detection, especially due to the semantic layer.
	Attack Detection Rating:
	Semantic Layer: 9.5/10
	Classifier Layer: 7.5/10
	Overall Attack Detection: 9/10



	# Philosophy

	> AI security should not be limited to organizations with GPU infrastructure.

	Even lightweight CPU-based guardrails can provide meaningful protection for real-world AI systems.

	![ChatGPT Image Jun 26, 2026 at 12_02_58 AM(2)](https://cdn-uploads.huggingface.co/production/uploads/6720d6553279dd0ff66c4995/7Q144jZJTTxgIlNd_jTIu.png)