Spaces:

mj064
/

ContentGuardEnv

Sleeping

App Files Files Community

ContentGuardEnv / README.md

mj064

Upload folder using huggingface_hub

2f4ff21 verified about 1 month ago

preview code

raw

history blame contribute delete

2.78 kB

metadata

title: ContentGuardEnv
emoji: 🛡️
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860
tags:
  - openenv
  - trust-and-safety
  - meta
  - llama-3
  - moderation-research
pinned: false

ContentGuardEnv

I built ContentGuardEnv for the Meta x Hugging Face Hackathon 2026 as a practical moderation environment where an AI agent has to do more than just classify text.

Instead of only asking "is this toxic?", the environment asks the model to:

Detect the policy category.
Choose a proportional enforcement action.
Explain a decision in an appeal-style format.

Live Deployment

Hugging Face Space: https://mj064-contentguardenv.hf.space
Hugging Face repo: https://huggingface.co/spaces/mj064/ContentGuardEnv
GitHub repo: https://github.com/mj064/meta_hack

What This Project Does

ContentGuardEnv is an OpenEnv-style environment with three difficulty tiers:

Easy: category detection
Medium: enforcement action + severity
Hard: appeal ruling + policy references

It includes:

A FastAPI backend for reset/step/state APIs
A WebSocket reasoning stream for live agent traces
A browser dashboard to run episodes and inspect rewards
A grading pipeline that returns reward + feedback for each decision

Why I Built It

The goal was to simulate the type of moderation decisions that are messy in real systems: ambiguous context, policy tradeoffs, and high-cost mistakes.

This project is meant to be usable both as:

A demo app for human-in-the-loop moderation testing
A benchmark harness for agent evaluation loops

Stack

Python + FastAPI
Vanilla JS/CSS frontend
OpenAI/Hugging Face compatible inference routing
Dockerized runtime for Hugging Face Spaces

Run Locally

Install dependencies.

pip install -r requirements.txt

Set environment variables (or use a local .env).

API_BASE_URL=https://api.openai.com/v1
MODEL_NAME=gpt-4o-mini
HF_TOKEN=your_token_here

Start the app.

python server/app.py

Open http://localhost:7860

API Overview

POST /reset
POST /step/{episode_id}
GET /state/{episode_id}
GET /health
WS /ws

Deploy to Hugging Face Space

This repo includes a helper script:

python sync_repo.py

It syncs the project folder to the Space while ignoring local-only artifacts.

Project Layout

server/app.py: FastAPI app + WebSocket gateway
server/env/: environment, tasks, graders, data generation
server/static/: dashboard HTML/CSS/JS
inference.py: script for benchmark/evaluation flows
sync_repo.py: one-command Hugging Face Space sync

Notes

This is actively iterated during hackathon development, so UI and evaluation behavior continue to evolve as edge cases are discovered.