PromptGuard / README.md
dralsarrani's picture
Update README.md
325d937 verified
---
title: PromptGuard
emoji: ๐Ÿš€
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 6.14.0
python_version: '3.13'
app_file: app.py
pinned: false
license: apache-2.0
short_description: RAG retrieval + LLM judge
---
# ๐Ÿ›ก๏ธ PromptGuard
LLM prompt safety evaluator powered by RAG + AI.
[![HF Config Docs](https://img.shields.io/badge/HF%20Spaces-Config%20Reference-yellow?logo=huggingface)](https://huggingface.co/docs/hub/spaces-config-reference)
[![GitHub Repo](https://img.shields.io/badge/GitHub-PromptGuard%20Repo-blue?logo=github)](https://github.com/dralsarrani/PromptGuard-LLM-Prompt-Safety-Evaluator)
## What it does
Takes any prompt and evaluates whether it's safe or unsafe with a confidence score, category, reasoning, and similar
examples retrieved from a 180k prompt safety dataset.
## How it works
1. Your prompt is embedded and compared against 180k labeled prompts
2. The top 5 most similar prompts are retrieved as context
3. An LLM judge evaluates safety using that context
4. Returns: verdict + confidence + category + reasoning
## Tech stack
![Python](https://img.shields.io/badge/Python-3.10+-blue?logo=python)
![ChromaDB](https://img.shields.io/badge/ChromaDB-Vector%20Store-orange)
![SentenceTransformers](https://img.shields.io/badge/Sentence--Transformers-Embeddings-green)
![OpenRouter](https://img.shields.io/badge/OpenRouter-LLM%20API-purple)
![Gradio](https://img.shields.io/badge/Gradio-UI%20Framework-yellow?logo=gradio)
![HuggingFace](https://img.shields.io/badge/HuggingFace-Spaces-black?logo=huggingface)
## Dataset
Built on a custom 180k prompt safety dataset -> [![Dataset](https://img.shields.io/badge/HF%20Dataset-Prompt%20Aggregation-blue?logo=huggingface)](https://huggingface.co/datasets/dralsarrani/Prompt-Aggregation-Dataset-Custom-Dataset)