--- title: ContentGuardEnv emoji: 🛡️ colorFrom: indigo colorTo: blue sdk: docker app_port: 7860 tags: - openenv - trust-and-safety - meta - llama-3 - moderation-research pinned: false --- # ContentGuardEnv I built ContentGuardEnv for the **Meta x Hugging Face Hackathon 2026** as a practical moderation environment where an AI agent has to do more than just classify text. Instead of only asking "is this toxic?", the environment asks the model to: 1. Detect the policy category. 2. Choose a proportional enforcement action. 3. Explain a decision in an appeal-style format. ## Live Deployment - Hugging Face Space: https://mj064-contentguardenv.hf.space - Hugging Face repo: https://huggingface.co/spaces/mj064/ContentGuardEnv - GitHub repo: https://github.com/mj064/meta_hack ## What This Project Does ContentGuardEnv is an OpenEnv-style environment with three difficulty tiers: - Easy: category detection - Medium: enforcement action + severity - Hard: appeal ruling + policy references It includes: - A FastAPI backend for reset/step/state APIs - A WebSocket reasoning stream for live agent traces - A browser dashboard to run episodes and inspect rewards - A grading pipeline that returns reward + feedback for each decision ## Why I Built It The goal was to simulate the type of moderation decisions that are messy in real systems: ambiguous context, policy tradeoffs, and high-cost mistakes. This project is meant to be usable both as: - A demo app for human-in-the-loop moderation testing - A benchmark harness for agent evaluation loops ## Stack - Python + FastAPI - Vanilla JS/CSS frontend - OpenAI/Hugging Face compatible inference routing - Dockerized runtime for Hugging Face Spaces ## Run Locally 1. Install dependencies. ```bash pip install -r requirements.txt ``` 2. Set environment variables (or use a local `.env`). ```bash API_BASE_URL=https://api.openai.com/v1 MODEL_NAME=gpt-4o-mini HF_TOKEN=your_token_here ``` 3. Start the app. ```bash python server/app.py ``` Open http://localhost:7860 ## API Overview - POST `/reset` - POST `/step/{episode_id}` - GET `/state/{episode_id}` - GET `/health` - WS `/ws` ## Deploy to Hugging Face Space This repo includes a helper script: ```bash python sync_repo.py ``` It syncs the project folder to the Space while ignoring local-only artifacts. ## Project Layout - `server/app.py`: FastAPI app + WebSocket gateway - `server/env/`: environment, tasks, graders, data generation - `server/static/`: dashboard HTML/CSS/JS - `inference.py`: script for benchmark/evaluation flows - `sync_repo.py`: one-command Hugging Face Space sync ## Notes This is actively iterated during hackathon development, so UI and evaluation behavior continue to evolve as edge cases are discovered.