Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
metadata
title: AI Agent with Content Moderation
emoji: 🛡️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: AI agent with automatic content moderation
AI Agent with Content Moderation
A chatbot powered by LangChain and Groq that automatically moderates all user input for:
- Prompt injection attempts
- Policy violations
- Unsafe content (suicide/self-harm)
Features
- Automatic Moderation: Every message is checked before processing
- Three-Level Classification:
-1(UNSAFE): Suicide, self-harm, or serious safety concerns0(SAFE): Legitimate questions and normal conversation1(VIOLATES): Prompt injection or policy bypass attempts
- Transparent: See the moderation results for every message
- Fast: Powered by GPT-OSS models on Groq
Models Used
- Base Agent:
openai/gpt-oss-20b - Moderation:
openai/gpt-oss-safeguard-20b
Setup
- Get a free API key from Groq Console
- Set the
GROQ_API_KEYenvironment variable:- Locally: Create a
.envfile withGROQ_API_KEY=your_key_here - HuggingFace Spaces: Add it to your Space secrets
- Locally: Create a
- Run
python app.pyorgradio app.py