ContentGuardEnv / README.md
mj064's picture
Upload folder using huggingface_hub
2f4ff21 verified
---
title: ContentGuardEnv
emoji: 🛡️
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860
tags:
- openenv
- trust-and-safety
- meta
- llama-3
- moderation-research
pinned: false
---
# ContentGuardEnv
I built ContentGuardEnv for the **Meta x Hugging Face Hackathon 2026** as a practical moderation environment where an AI agent has to do more than just classify text.
Instead of only asking "is this toxic?", the environment asks the model to:
1. Detect the policy category.
2. Choose a proportional enforcement action.
3. Explain a decision in an appeal-style format.
## Live Deployment
- Hugging Face Space: https://mj064-contentguardenv.hf.space
- Hugging Face repo: https://huggingface.co/spaces/mj064/ContentGuardEnv
- GitHub repo: https://github.com/mj064/meta_hack
## What This Project Does
ContentGuardEnv is an OpenEnv-style environment with three difficulty tiers:
- Easy: category detection
- Medium: enforcement action + severity
- Hard: appeal ruling + policy references
It includes:
- A FastAPI backend for reset/step/state APIs
- A WebSocket reasoning stream for live agent traces
- A browser dashboard to run episodes and inspect rewards
- A grading pipeline that returns reward + feedback for each decision
## Why I Built It
The goal was to simulate the type of moderation decisions that are messy in real systems: ambiguous context, policy tradeoffs, and high-cost mistakes.
This project is meant to be usable both as:
- A demo app for human-in-the-loop moderation testing
- A benchmark harness for agent evaluation loops
## Stack
- Python + FastAPI
- Vanilla JS/CSS frontend
- OpenAI/Hugging Face compatible inference routing
- Dockerized runtime for Hugging Face Spaces
## Run Locally
1. Install dependencies.
```bash
pip install -r requirements.txt
```
2. Set environment variables (or use a local `.env`).
```bash
API_BASE_URL=https://api.openai.com/v1
MODEL_NAME=gpt-4o-mini
HF_TOKEN=your_token_here
```
3. Start the app.
```bash
python server/app.py
```
Open http://localhost:7860
## API Overview
- POST `/reset`
- POST `/step/{episode_id}`
- GET `/state/{episode_id}`
- GET `/health`
- WS `/ws`
## Deploy to Hugging Face Space
This repo includes a helper script:
```bash
python sync_repo.py
```
It syncs the project folder to the Space while ignoring local-only artifacts.
## Project Layout
- `server/app.py`: FastAPI app + WebSocket gateway
- `server/env/`: environment, tasks, graders, data generation
- `server/static/`: dashboard HTML/CSS/JS
- `inference.py`: script for benchmark/evaluation flows
- `sync_repo.py`: one-command Hugging Face Space sync
## Notes
This is actively iterated during hackathon development, so UI and evaluation behavior continue to evolve as edge cases are discovered.