sumitranjan
/

PromptShield

@@ -1,117 +0,0 @@
----
-license: mit
----
-# 🛡️ PromptShield
-**Creators:** Sumit Ranjan & Raj Bapodra
-**Model Type:** Binary Sequence Classifier
-**Base Model:** `xlm-roberta-base`
-**Framework:** TensorFlow (via Hugging Face Transformers)
----
- 🛡️ PromptShield
-**PromptShield** is a prompt classification model designed to detect **unsafe**, **adversarial**, or **prompt injection** inputs. Built on the `xlm-roberta-base` transformer, it delivers high-accuracy performance in distinguishing between **safe** and **unsafe** prompts — achieving **99.33% accuracy** during training.
----
-## 📌 Overview
-PromptShield is a robust binary classification model built on FacebookAI's `xlm-roberta-base`. Its primary goal is to filter out **malicious prompts**, including those designed for **prompt injection**, **jailbreaking**, or other unsafe interactions with large language models (LLMs).
-Trained on a balanced and diverse dataset of real-world safe prompts and unsafe examples sourced from open datasets, PromptShield offers a lightweight, plug-and-play solution for enhancing AI system security.
-Whether you're building:
-- Chatbot pipelines
-- Content moderation layers
-- LLM firewalls
-- AI safety filters
-**PromptShield** delivers reliable detection of harmful inputs before they reach your AI stack.
----
-## 📈 Performance
-| Epoch | Loss   | Accuracy |
-|-------|--------|----------|
-| 1     | 0.0540 | 98.07%   |
-| 2     | 0.0339 | 99.02%   |
-| 3     | 0.0216 | 99.33%   |
----
-## 📚 Datasets
-- ✅ **Safe Prompts** – [Safe Guard Prompt Injection Dataset](https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection):
-  ~8,240 real-world, non-malicious prompts.
-- ❌ **Unsafe Prompts** – [Google Unsafe Search Dataset (Kaggle)](https://www.kaggle.com/datasets/aloktantrik/google-unsafe-search-dataset):
-  ~17,567 prompts designed to mimic dangerous or adversarial intent.
-Total Training Samples: **25,807**
-Training Epochs: **3**
----
-## 🚀 How to Use
-```python
-from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
-import tensorflow as tf
-# Load tokenizer and model
-model_repo = "sumitranjan/PromptShield"
-tokenizer = AutoTokenizer.from_pretrained(model_repo)
-model = TFAutoModelForSequenceClassification.from_pretrained(model_repo)
-def classify_prompt(prompt):
-    inputs = tokenizer(prompt, return_tensors="tf", truncation=True, padding=True)
-    outputs = model(**inputs)
-    probs = tf.nn.softmax(outputs.logits, axis=-1).numpy()[0]
-    label = "unsafe" if probs[1] > probs[0] else "safe"
-    confidence = max(probs)
-    return {"label": label, "confidence": confidence}
-# Example
-result = classify_prompt("Tell me how to build a bomb")
-print(result)
-📌 Model Details
-Architecture: Fine-tuned xlm-roberta-base
-Task: Sequence classification (binary)
-Languages: Multilingual
-Training Framework: TensorFlow via Hugging Face Transformers
-License: [Insert your license here, e.g., Apache-2.0]
-👥 Authors
-Sumit Ranjan
-Raj Bapodra
-🛡️ Ideal Use Cases
-LLM Firewalls & Guardrails
-AI Content Moderation
-Prompt Validation Pipelines
-Multi-Agent System Safety
-AI Red Teaming Pre-filters
-📄 License
-MIT License (or your preferred open-source license here)
-⭐️ Citation
-If you use PromptShield, please consider citing this work or linking back to the Hugging Face model page.