PromptGuard / README.md
dralsarrani's picture
Update README.md
325d937 verified
metadata
title: PromptGuard
emoji: 🚀
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 6.14.0
python_version: '3.13'
app_file: app.py
pinned: false
license: apache-2.0
short_description: RAG retrieval +  LLM judge

🛡️ PromptGuard

LLM prompt safety evaluator powered by RAG + AI.
HF Config Docs GitHub Repo

What it does

Takes any prompt and evaluates whether it's safe or unsafe with a confidence score, category, reasoning, and similar examples retrieved from a 180k prompt safety dataset.

How it works

  1. Your prompt is embedded and compared against 180k labeled prompts
  2. The top 5 most similar prompts are retrieved as context
  3. An LLM judge evaluates safety using that context
  4. Returns: verdict + confidence + category + reasoning

Tech stack

Python ChromaDB SentenceTransformers
OpenRouter Gradio HuggingFace

Dataset

Built on a custom 180k prompt safety dataset -> Dataset