Spaces:
Sleeping
Sleeping
File size: 2,780 Bytes
7743c15 4535620 7743c15 4535620 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 4535620 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 7743c15 2f4ff21 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | ---
title: ContentGuardEnv
emoji: 🛡️
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860
tags:
- openenv
- trust-and-safety
- meta
- llama-3
- moderation-research
pinned: false
---
# ContentGuardEnv
I built ContentGuardEnv for the **Meta x Hugging Face Hackathon 2026** as a practical moderation environment where an AI agent has to do more than just classify text.
Instead of only asking "is this toxic?", the environment asks the model to:
1. Detect the policy category.
2. Choose a proportional enforcement action.
3. Explain a decision in an appeal-style format.
## Live Deployment
- Hugging Face Space: https://mj064-contentguardenv.hf.space
- Hugging Face repo: https://huggingface.co/spaces/mj064/ContentGuardEnv
- GitHub repo: https://github.com/mj064/meta_hack
## What This Project Does
ContentGuardEnv is an OpenEnv-style environment with three difficulty tiers:
- Easy: category detection
- Medium: enforcement action + severity
- Hard: appeal ruling + policy references
It includes:
- A FastAPI backend for reset/step/state APIs
- A WebSocket reasoning stream for live agent traces
- A browser dashboard to run episodes and inspect rewards
- A grading pipeline that returns reward + feedback for each decision
## Why I Built It
The goal was to simulate the type of moderation decisions that are messy in real systems: ambiguous context, policy tradeoffs, and high-cost mistakes.
This project is meant to be usable both as:
- A demo app for human-in-the-loop moderation testing
- A benchmark harness for agent evaluation loops
## Stack
- Python + FastAPI
- Vanilla JS/CSS frontend
- OpenAI/Hugging Face compatible inference routing
- Dockerized runtime for Hugging Face Spaces
## Run Locally
1. Install dependencies.
```bash
pip install -r requirements.txt
```
2. Set environment variables (or use a local `.env`).
```bash
API_BASE_URL=https://api.openai.com/v1
MODEL_NAME=gpt-4o-mini
HF_TOKEN=your_token_here
```
3. Start the app.
```bash
python server/app.py
```
Open http://localhost:7860
## API Overview
- POST `/reset`
- POST `/step/{episode_id}`
- GET `/state/{episode_id}`
- GET `/health`
- WS `/ws`
## Deploy to Hugging Face Space
This repo includes a helper script:
```bash
python sync_repo.py
```
It syncs the project folder to the Space while ignoring local-only artifacts.
## Project Layout
- `server/app.py`: FastAPI app + WebSocket gateway
- `server/env/`: environment, tasks, graders, data generation
- `server/static/`: dashboard HTML/CSS/JS
- `inference.py`: script for benchmark/evaluation flows
- `sync_repo.py`: one-command Hugging Face Space sync
## Notes
This is actively iterated during hackathon development, so UI and evaluation behavior continue to evolve as edge cases are discovered.
|