Spaces:

jgrizou
/

gradio_test

Sleeping

App Files Files Community

gradio_test / README.md

Jonathan Grizou

Fix short_description length for HuggingFace

4dfe741 5 months ago

preview code

raw

history blame contribute delete

1.48 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: AI Agent with Content Moderation
emoji: 🛡️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: AI agent with automatic content moderation

AI Agent with Content Moderation

A chatbot powered by LangChain and Groq that automatically moderates all user input for:

Prompt injection attempts
Policy violations
Unsafe content (suicide/self-harm)

Features

Automatic Moderation: Every message is checked before processing
Three-Level Classification:
- -1 (UNSAFE): Suicide, self-harm, or serious safety concerns
- 0 (SAFE): Legitimate questions and normal conversation
- 1 (VIOLATES): Prompt injection or policy bypass attempts
Transparent: See the moderation results for every message
Fast: Powered by GPT-OSS models on Groq

Models Used

Base Agent: openai/gpt-oss-20b
Moderation: openai/gpt-oss-safeguard-20b

Setup

Get a free API key from Groq Console
Set the GROQ_API_KEY environment variable:
- Locally: Create a .env file with GROQ_API_KEY=your_key_here
- HuggingFace Spaces: Add it to your Space secrets
Run python app.py or gradio app.py

Tech Stack

Gradio - UI framework
LangChain - Agent framework
Groq - Fast LLM inference