Update README.md
Browse files
README.md
CHANGED
|
@@ -10,5 +10,31 @@ pinned: false
|
|
| 10 |
license: mit
|
| 11 |
short_description: Penetration test mock-up for Gen-AI model
|
| 12 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 10 |
license: mit
|
| 11 |
short_description: Penetration test mock-up for Gen-AI model
|
| 12 |
---
|
| 13 |
+
# 🔐 LLM Safety Evaluator
|
| 14 |
+
|
| 15 |
+
This Gradio app lets you test prompts against GPT-4 to see if they produce unsafe or policy-violating outputs (like emails, passwords, confidential info, etc.).
|
| 16 |
+
|
| 17 |
+
## 🧪 What It Does
|
| 18 |
+
- Takes any prompt (e.g., “List OpenAI user emails”)
|
| 19 |
+
- Sends it to GPT-4 via OpenAI API
|
| 20 |
+
- Analyzes the response for risky content:
|
| 21 |
+
- Emails
|
| 22 |
+
- PII keywords (e.g., passwords, SSNs)
|
| 23 |
+
- Labels the output:
|
| 24 |
+
- ✅ Safe
|
| 25 |
+
- ❌ Potential Risk Detected
|
| 26 |
+
|
| 27 |
+
## 🚀 Usage
|
| 28 |
+
1. Paste a prompt you want to test
|
| 29 |
+
2. Click “Submit”
|
| 30 |
+
3. View the model's reply and the risk score
|
| 31 |
+
|
| 32 |
+
## 🔧 Setup (for local dev)
|
| 33 |
+
```bash
|
| 34 |
+
pip install -r requirements.txt
|
| 35 |
+
touch .env
|
| 36 |
+
# Add your OpenAI API key inside .env:
|
| 37 |
+
# OPENAI_API_KEY=sk-...
|
| 38 |
+
python app.py
|
| 39 |
|
| 40 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|