gradio_test / README.md
Jonathan Grizou
Fix short_description length for HuggingFace
4dfe741

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: AI Agent with Content Moderation
emoji: 🛡️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: AI agent with automatic content moderation

AI Agent with Content Moderation

A chatbot powered by LangChain and Groq that automatically moderates all user input for:

  • Prompt injection attempts
  • Policy violations
  • Unsafe content (suicide/self-harm)

Features

  • Automatic Moderation: Every message is checked before processing
  • Three-Level Classification:
    • -1 (UNSAFE): Suicide, self-harm, or serious safety concerns
    • 0 (SAFE): Legitimate questions and normal conversation
    • 1 (VIOLATES): Prompt injection or policy bypass attempts
  • Transparent: See the moderation results for every message
  • Fast: Powered by GPT-OSS models on Groq

Models Used

  • Base Agent: openai/gpt-oss-20b
  • Moderation: openai/gpt-oss-safeguard-20b

Setup

  1. Get a free API key from Groq Console
  2. Set the GROQ_API_KEY environment variable:
    • Locally: Create a .env file with GROQ_API_KEY=your_key_here
    • HuggingFace Spaces: Add it to your Space secrets
  3. Run python app.py or gradio app.py

Tech Stack