Spaces:

ANI00
/

content-moderation-env

Sleeping

App Files Files Community

content-moderation-env / README.md

ANI00

Add root Dockerfile for HF Spaces build

af65c6d verified about 2 months ago

preview code

raw

history blame contribute delete

29.3 kB

	---
	title: Content Moderation OpenEnv
	emoji: 🛡️
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	app_file: app.py
	pinned: false
	---

	# Content Moderation OpenEnv

	An AI content moderation environment built to the OpenEnv specification. Agents triage real-world content — spam emails, harmful social media posts, and AI-generated deepfakes — using a standard `step()` / `reset()` / `state()` API.

	[![OpenEnv Spec](https://img.shields.io/badge/OpenEnv-Spec-blue)](https://github.com/openenv-core/spec)
	[![Python 3.11+](https://img.shields.io/badge/Python-3.11+-blue.svg)](https://www.python.org/downloads/)
	[![FastAPI](https://img.shields.io/badge/FastAPI-0.111.0-green.svg)](https://fastapi.tiangolo.com/)
	[![Docker](https://img.shields.io/badge/Docker-Ready-blue.svg)](https://www.docker.com/)
	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

	---

	## 📋 Table of Contents

	- [Environment Description & Motivation](#environment-description--motivation)
	- [Task Descriptions](#task-descriptions)
	- [Observation Space](#observation-space)
	- [Action Space](#action-space)
	- [Reward Functions](#reward-functions)
	- [Baseline Scores](#baseline-scores)
	- [Setup & Usage](#setup--usage)
	- [Requirements](#requirements)
	- [Local Installation](#local-installation)
	- [Docker Deployment](#docker-deployment)
	- [HuggingFace Spaces Deployment](#huggingface-spaces-deployment)
	- [Running the Inference Script](#running-the-inference-script)
	- [API Reference](#api-reference)
	- [Project Structure](#project-structure)
	- [Environment Variables](#environment-variables)
	- [Running Tests](#running-tests)
	- [Troubleshooting](#troubleshooting)
	- [Citation](#citation)
	- [Acknowledgements](#acknowledgements)

	---

	## Environment Description & Motivation

	Content moderation is a high-stakes, high-volume real-world task. Human moderators review millions of items daily across platforms and inboxes. This environment simulates a realistic moderation pipeline across three difficulty levels, enabling AI agents to learn decision-making strategies under resource constraints.

	Key Challenges:
	- Multi-label classification with imbalanced datasets
	- Confidence calibration under uncertainty
	- Real-world content variability (spam, deepfakes, policy violations)
	- Escalation vs. immediate action tradeoffs

	\| Task \| Difficulty \| Content Type \| Metrics \| Description \|
	\|---\|---\|---\|---\|---\|
	\| `text_spam` \| Easy \| Email / SMS \| Binary classification + confidence \| Spam vs. legitimate email filtering with sender reputation signals \|
	\| `content_moderation` \| Medium \| Social media posts \| Multi-label (up to 11 labels) \| Detect policy violations (hate speech, harassment, violence, etc.) \|
	\| `deepfake_detection` \| Hard \| Multimodal (image + text) \| Detection accuracy + labels \| Identify AI-manipulated media using vision transformer + metadata \|

	Models Used:
	- Deepfake Detection: `dima806/deepfake_vs_real_image_detection` (ViT, ~300MB)
	- Language Model: `meta-llama/Llama-3.1-8B-Instruct` (8B params, via transformers or HF API)
	- Infrastructure: FastAPI + Uvicorn, Docker containerization

	---

	## Task Descriptions

	### Task 1: Text Spam Classification (Easy)
	Objective: Binary classification of emails/SMS as spam or legitimate.

	Dataset:
	- 50 items (30 spam, 20 legitimate)
	- Features: text content, sender reputation score, link count, source
	- Ground truth: decision + labels

	Example:
	```json
	{
	"content_id": "ts_001",
	"content_type": "text",
	"text": "CONGRATULATIONS! You've won $1,000,000! Click here NOW to claim your prize!!!",
	"metadata": {"source": "email", "sender_reputation": 0.05, "link_count": 3},
	"ground_truth": {
	"decision": "reject",
	"labels": ["spam", "scam"],
	"is_harmful": true
	}
	}
	```

	---

	### Task 2: Content Moderation (Medium)
	Objective: Multi-label classification of social media posts for policy violations.

	Dataset:
	- 40 items across diverse platforms
	- Labels: spam, scam, phishing, hate_speech, violence, harassment, misinformation, adult_content, deepfake, political_manipulation, fraud
	- Features: post text, engagement metrics, user reputation, report count

	Violation Categories:
	\| Category \| Definition \| Examples \|
	\|---\|---\|---\|
	\| Hate Speech \| Dehumanizing content targeting identity \| Slurs, discrimination, incitement \|
	\| Violence \| Threats or glorification of violence \| Physical harm, weapon promotion \|
	\| Harassment \| Coordinated or severe personal attacks \| Doxxing, targeted campaigns \|
	\| Misinformation \| False claims with societal impact \| Election fraud claims, health hoaxes \|

	---

	### Task 3: Deepfake Detection (Hard)
	Objective: Detect AI-manipulated media and classify content appropriately.

	Dataset:
	- 30 items (multimodal: images + descriptions)
	- Deepfake detection model outputs raw confidence scores (0-1)
	- Features: image description, detector_score, metadata

	Detector Score Interpretation:
	- `0.0-0.3`: Likely real/authentic
	- `0.3-0.7`: Uncertain, may require additional analysis
	- `0.7-1.0`: Likely deepfake/manipulated

	Example:
	```json
	{
	"content_id": "df_001",
	"content_type": "multimodal",
	"image_description": "Portrait of person in business attire, lighting appears natural",
	"detector_score": 0.82,
	"metadata": {"platform": "social_media", "report_count": 3}
	}
	```

	---

	## Observation Space

	Every step returns a `ContentObservation` with the following structure:

	```json
	{
	"content_id": "string",
	"content_type": "text \| multimodal",
	"text": "string (optional, for text tasks)",
	"image_description": "string (optional, deepfake task only)",
	"detector_score": 0.0-1.0 (optional, deepfake task only),
	"metadata": {
	"source": "email \| social_media \| platform",
	"sender_reputation": 0.0-1.0,
	"link_count": 0,
	"report_count": 0,
	"timestamp": "ISO8601"
	},
	"step_num": 1,
	"total_steps": 10
	}
	```

	\| Field \| Type \| Required \| Task \| Description \|
	\|---\|---\|---\|---\|---\|
	\| `content_id` \| string \| All \| Unique identifier for the content item \|
	\| `content_type` \| string \| All \| Type of content: `text` or `multimodal` \|
	\| `text` \| string \| text_spam, content_moderation \| The actual email/post body \|
	\| `image_description` \| string \| deepfake_detection \| AI-generated description of the image \|
	\| `detector_score` \| float \| deepfake_detection \| Raw output from deepfake model (0-1) \|
	\| `metadata` \| object \| All \| Platform-specific signals (reputation, reports, etc.) \|
	\| `step_num` \| int \| All \| Current step in episode \|
	\| `total_steps` \| int \| All \| Total steps in this episode \|

	---

	## Action Space

	Agents must respond with a `ModerationAction` object:

	```json
	{
	"decision": "approve \| reject \| escalate \| flag",
	"reason": "string (max 200 chars, explanation)",
	"confidence": 0.0-1.0,
	"labels": ["spam", "scam", "phishing", "hate_speech", "violence",
	"harassment", "misinformation", "adult_content",
	"deepfake", "political_manipulation", "fraud"]
	}
	```

	\| Field \| Type \| Range \| Description \|
	\|---\|---\|---\|---\|
	\| `decision` \| enum \| {approve, reject, escalate, flag} \| approve: Safe content; reject: Clear violation, remove immediately; flag: Borderline, needs review; escalate: Uncertain, route to specialist \|
	\| `reason` \| string \| 0-200 chars \| Brief explanation of the decision \|
	\| `confidence` \| float \| [0.0, 1.0] \| Agent's confidence in this decision. Calibration is rewarded. \|
	\| `labels` \| array \| 0-11 labels \| Detected violation categories. Must be valid from the allowed set. \|

	Valid Labels:
	- `spam` — Unsolicited bulk messaging
	- `scam` — Financial or identity theft attempts
	- `phishing` — Social engineering attacks
	- `hate_speech` — Dehumanizing content targeting identity groups
	- `violence` — Threats, glorification, or instruction of violence
	- `harassment` — Personal attacks, doxxing, coordinated abuse
	- `misinformation` — False claims with societal impact
	- `adult_content` — NSFW or sexually explicit material
	- `deepfake` — AI-manipulated media
	- `political_manipulation` — Coordinated inauthentic behavior
	- `fraud` — Financial scams, false claims

	---

	## Reward Functions

	Rewards are computed per task based on decision accuracy, label coverage (F1), and confidence calibration.

	### text_spam (Easy)

	\| Component \| Reward \| Condition \|
	\|---\|---\|---\|
	\| Correct decision \| +0.65 \| `decision` matches ground truth \|
	\| Escalate on harmful \| +0.30 \| Harmful content + escalate/flag (partial credit) \|
	\| Label F1 contribution \| +0.20 \| F1 score of predicted vs. true labels \|
	\| Confidence calibration \| ±0.10 \| Bonus if confident on correct, penalty if confident on wrong \|
	\| Max per step \| 1.00 \| Sum of components (capped) \|

	### content_moderation (Medium)

	\| Component \| Reward \| Condition \|
	\|---\|---\|---\|
	\| Correct decision \| +0.50 \| `decision` matches ground truth \|
	\| Partial credit \| +0.25 \| Harmful content + flag/escalate (conservative approach) \|
	\| Label F1 contribution \| +0.35 \| Multi-label F1 score (up to 11 labels) \|
	\| Confidence calibration \| ±0.10 \| Brier score penalty for miscalibration \|
	\| Max per step \| 1.00 \| Sum of components (capped) \|

	### deepfake_detection (Hard)

	\| Component \| Reward \| Condition \|
	\|---\|---\|---\|
	\| Correct decision \| +0.40 \| `decision` matches ground truth \|
	\| Deepfake detection \| +0.30 \| Accuracy vs. detector_score threshold \|
	\| Detector alignment \| +0.10 \| Bonus for leveraging model signals \|
	\| Label F1 contribution \| +0.20 \| Multi-label F1 (fewer labels than medium task) \|
	\| Confidence calibration \| ±0.10 \| Calibration error penalty \|
	\| Max per step \| 1.00 \| Sum of components (capped) \|

	Calibration Bonus Formula:
	```
	bonus = 0.1 × (confidence if correct else -confidence)
	```

	---

	## Baseline Scores

	Scores reported for Llama-3.1-8B-Instruct with `temperature=0.2` and `top-p=0.95`:

	\| Task \| Score \| Steps \| Notes \|
	\|---\|---\|---\|---\|
	\| `text_spam` \| 0.72 \| 5 \| Strong on obvious spam; struggles with phishing disguised as legitimate \|
	\| `content_moderation` \| 0.58 \| 8 \| Good binary decisions; incomplete label coverage (F1 ≈0.52) \|
	\| `deepfake_detection` \| 0.44 \| 10 \| Relies on image descriptions; independent detector signals underutilized \|

	---

	## Setup & Usage

	### Requirements

	- Python: 3.11 or higher
	- Docker (optional, for containerized deployment)
	- GPU (optional, recommended for deepfake models): CUDA 12.1+
	- Memory: 8GB+ RAM (16GB recommended for local LLM inference)
	- Disk: 10GB+ (models cached in `~/.cache/huggingface/`)

	### Local Installation

	1. Clone and navigate:
	```bash
	git clone https://github.com/Anidipta/Content-Moderation-env.git
	cd Content-Moderation-env
	```

	2. Create virtual environment:
	```bash
	python -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate
	```

	3. Install dependencies:
	```bash
	pip install -r server/requirements.txt
	```

	4. Start the server:
	```bash
	uvicorn server.main:app --host 0.0.0.0 --port 7860
	```

	Server runs at `http://localhost:7860`

	5. Access API documentation:
	- Swagger UI: `http://localhost:7860/docs`
	- ReDoc: `http://localhost:7860/redoc`

	### Docker Deployment

	#### Build the Image

	```bash
	# Basic build
	docker build -f server/Dockerfile -t content-moderation-env .

	# Build with memory allocation (recommended)
	docker build --memory=4g -f server/Dockerfile -t content-moderation-env .

	# Build with progress output
	docker build --progress=plain -f server/Dockerfile -t content-moderation-env .
	```

	#### Run the Container

	```bash
	# Basic run
	docker run -p 7860:7860 content-moderation-env

	# Run with environment variables
	docker run -p 7860:7860 \
	-e API_BASE_URL="https://router.huggingface.co/v1" \
	-e MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" \
	-e HF_TOKEN="hf_your_token_here" \
	content-moderation-env

	# Run with GPU support
	docker run --gpus all -p 7860:7860 content-moderation-env

	# Run with volume mounts (cache models locally)
	docker run -p 7860:7860 \
	-v ~/.cache/huggingface:/app/.cache/huggingface \
	content-moderation-env

	# Run in background
	docker run -d -p 7860:7860 --name moderation-env content-moderation-env

	# Check logs
	docker logs moderation-env

	# Stop container
	docker stop moderation-env
	```

	#### Dockerfile Details

	The [server/Dockerfile](server/Dockerfile) uses:
	- Base Image: `python:3.11-slim` (~300MB) — minimal footprint with Python runtime
	- System Dependencies: `libgl1 libglib2.0-0 curl` — required for vision models and health checks
	- Dependencies Installation: Multi-stage approach with pip cache optimization
	- Model Preloading: Deepfake detection model downloaded during build for faster startup
	- Environment Setup: HuggingFace cache directories and Python settings pre-configured
	- Entry Point: FastAPI app via Uvicorn on port 7860

	```dockerfile
	# Key optimizations:
	- --no-cache-dir: Reduces image size by 50%
	- --no-build-isolation: Prevents memory spikes during pip install
	- Pre-downloaded models: Eliminates first-run delays
	- Minimal dependencies: Only libraries needed for the environment
	```

	#### Deployment to Production

	Docker Compose:
	```yaml
	version: '3.8'
	services:
	moderation-api:
	build:
	context: .
	dockerfile: server/Dockerfile
	ports:
	- "7860:7860"
	environment:
	- API_BASE_URL=https://router.huggingface.co/v1
	- MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
	- HF_TOKEN=${HF_TOKEN}
	volumes:
	- ~/.cache/huggingface:/app/.cache/huggingface
	restart: unless-stopped
	healthcheck:
	test: ["CMD", "curl", "-f", "http://localhost:7860/health"]
	interval: 30s
	timeout: 10s
	retries: 3
	```

	Run with: `docker-compose up -d`

	### HuggingFace Spaces Deployment

	1. Create a new Space with Docker SDK
	2. Add Secrets (Settings → Repository secrets):
	- `HF_TOKEN`: Your HuggingFace API token
	3. Add Variables (Settings → Repository variables):
	- `API_BASE_URL`: `https://router.huggingface.co/v1`
	- `MODEL_NAME`: `meta-llama/Llama-3.1-8B-Instruct`
	4. Push this repository to the Space
	5. Space URL becomes your `PING_URL` for validation scripts

	---

	## Running the Inference Script

	```bash
	# API mode (HF inference endpoint)
	export API_BASE_URL="https://router.huggingface.co/v1"
	export MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
	export HF_TOKEN="hf_your_token_here"
	export SERVER_URL="http://localhost:7860"
	export TASK_NAME="text_spam"

	python inference.py

	# Local transformers pipeline mode
	export USE_LOCAL_MODEL="true"
	python inference.py
	```

	### Output Format

	```
	[START] task=text_spam env=content_moderation_env model=meta-llama/Llama-3.1-8B-Instruct
	[STEP] step=1 action={"decision":"reject","confidence":0.9,"labels":["spam"]} reward=0.85 done=false error=null
	[STEP] step=2 action={"decision":"approve","confidence":0.8,"labels":[]} reward=0.75 done=false error=null
	[STEP] step=3 action={"decision":"escalate","confidence":0.5,"labels":["scam"]} reward=0.30 done=false error=null
	[STEP] step=4 action={"decision":"reject","confidence":0.85,"labels":["phishing"]} reward=0.70 done=false error=null
	[STEP] step=5 action={"decision":"approve","confidence":0.88,"labels":[]} reward=0.75 done=true error=null
	[END] success=true steps=5 score=0.720 rewards=0.85,0.75,0.30,0.70,0.75
	```

	\| Field \| Type \| Description \|
	\|---\|---\|---\|
	\| `task` \| string \| The task being evaluated \|
	\| `step` \| int \| Current step number in episode \|
	\| `decision` \| string \| Agent's moderation decision \|
	\| `confidence` \| float \| Agent's confidence (0-1) \|
	\| `labels` \| array \| Detected violation labels \|
	\| `reward` \| float \| Reward received for this step \|
	\| `done` \| boolean \| Episode completion flag \|
	\| `error` \| string/null \| Error message if applicable \|
	\| `score` \| float \| Final episode score \|

	---

	## API Reference

	### Server Endpoints

	All endpoints are JSON-based with FastAPI's automatic validation.

	#### 1. Reset Episode
	POST `/reset`

	Start a new moderation episode.

	Request Body:
	```json
	{
	"task": "text_spam"
	}
	```

	Response (200 OK):
	```json
	{
	"observation": {
	"content_id": "ts_001",
	"content_type": "text",
	"text": "CONGRATULATIONS! You've won $1,000,000!...",
	"metadata": {"source": "email", "sender_reputation": 0.05, "link_count": 3},
	"step_num": 1,
	"total_steps": 10
	},
	"info": {}
	}
	```

	Error (400):
	```json
	{
	"detail": "Unknown task 'invalid_task'. Valid: ['text_spam', 'content_moderation', 'deepfake_detection']"
	}
	```

	---

	#### 2. Submit Action
	POST `/step`

	Submit a moderation action for the current content.

	Request Body:
	```json
	{
	"decision": "reject",
	"reason": "Email contains typical spam patterns and suspicious links",
	"confidence": 0.92,
	"labels": ["spam", "scam"]
	}
	```

	Response (200 OK):
	```json
	{
	"observation": {
	"content_id": "ts_002",
	"content_type": "text",
	"text": "Hi Sarah, confirming our meeting tomorrow...",
	"metadata": {"source": "email", "sender_reputation": 0.92, "link_count": 0},
	"step_num": 2,
	"total_steps": 10
	},
	"reward": 0.85,
	"done": false,
	"info": {}
	}
	```

	---

	#### 3. Get Current State
	GET `/state`

	Retrieve the current episode state without taking an action.

	Response (200 OK):
	```json
	{
	"observation": {...},
	"reward": 0.85,
	"done": false,
	"info": {
	"task": "text_spam",
	"items_completed": 2,
	"total_items": 10,
	"cumulative_reward": 1.60
	}
	}
	```

	---

	#### 4. Close Episode
	POST `/close`

	Explicitly close the episode and clean up resources.

	Response (200 OK):
	```json
	{
	"status": "closed",
	"final_reward": 7.20,
	"steps_completed": 10
	}
	```

	---

	#### 5. List Available Tasks
	GET `/tasks`

	Get metadata about all available tasks.

	Response (200 OK):
	```json
	{
	"text_spam": {
	"description": "Classify email/message content as spam or legitimate",
	"difficulty": "easy",
	"num_items": 50,
	"content_type": "text"
	},
	"content_moderation": {
	"description": "Detect policy violations in social media posts",
	"difficulty": "medium",
	"num_items": 40,
	"content_type": "text"
	},
	"deepfake_detection": {
	"description": "Identify AI-manipulated media",
	"difficulty": "hard",
	"num_items": 30,
	"content_type": "multimodal"
	}
	}
	```

	---

	#### 6. Health Check
	GET `/health`

	Check server health and status.

	Response (200 OK):
	```json
	{
	"status": "ok"
	}
	```

	---

	#### 7. Root Endpoint
	GET `/`

	Redirects to interactive Swagger UI documentation.

	---

	## Project Structure

	```
	content-moderation-env/
	│
	├── README.md # This file
	├── uv.lock # Dependency lock file (UV package manager)
	├── inference.py # Baseline agent script (235 lines)
	│ # Demonstrates LLM agent interaction
	│ # Supports HF API and local inference modes
	│
	├── server/ # FastAPI application (core)
	│ ├── __init__.py # Package marker (empty)
	│ │
	│ ├── main.py # FastAPI app & HTTP endpoints (57 lines)
	│ │ # Defines: /reset, /step, /state, /close
	│ │ # /tasks, /health, / endpoints
	│ │
	│ ├── env.py # OpenEnv environment implementation (122 lines)
	│ │ # Core logic: reset(), step(), state(), close()
	│ │ # Thread-safe with locks for concurrency
	│ │
	│ ├── models.py # Pydantic data models
	│ │ # Defines: ContentObservation, ModerationAction
	│ │ # StepResult, ResetResult, EnvState
	│ │
	│ ├── tasks.py # Task datasets & ground truth (193 lines)
	│ │ # Contains: text_spam, content_moderation,
	│ │ # deepfake_detection task definitions & items
	│ │
	│ ├── graders.py # Reward functions per task (95 lines)
	│ │ # Implements: label F1, calibration bonus,
	│ │ # decision accuracy scoring logic
	│ │
	│ ├── deepfake_model.py # HF deepfake detection pipeline (90 lines)
	│ │ # Lazy-loads: dima806/deepfake_vs_real...
	│ │ # Caches model in HF_HOME for reuse
	│ │
	│ ├── openenv.yaml # OpenEnv specification metadata
	│ │ # Declares task specs, observation/action space
	│ │
	│ ├── Dockerfile # Docker container definition
	│ │ # Base: python:3.11-slim (~300MB)
	│ │ # Installs system deps, pip packages,
	│ │ # pre-downloads deepfake model
	│ │
	│ └── requirements.txt # Python dependencies (12 packages)
	│ # Key: fastapi, uvicorn, transformers,
	│ # torch, openai, python-dotenv
	│
	├── test/ # Test suite
	│ └── test.py # pytest tests (20+ test cases)
	│ # Coverage: tasks, endpoints, rewards
	│
	└── .env # Environment variables (git-ignored)
	# Stores: HF_TOKEN, API_BASE_URL, etc.
	```

	---

	## Environment Variables

	Configuration is controlled via environment variables. Create a `.env` file in the project root:

	```env
	# ============ API Configuration ============
	API_BASE_URL=https://router.huggingface.co/v1
	# URL of the LLM inference endpoint
	# Default: HuggingFace router (requires HF_TOKEN)

	MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
	# Which LLM to use for agent inference
	# Other options: gpt-3.5-turbo, claude-3-opus, mistral-large, etc.

	HF_TOKEN=hf_your_token_here
	# HuggingFace API token for authenticated requests
	# Get from: https://huggingface.co/settings/tokens

	# ============ Server Configuration ============
	SERVER_URL=http://localhost:7860
	# Where the OpenEnv API server runs
	# Used by inference.py to connect to environment

	# ============ Task & Inference Configuration ============
	TASK_NAME=text_spam
	# Which task to run: text_spam, content_moderation, deepfake_detection

	USE_LOCAL_MODEL=false
	# If true: Load Llama-3.1-8B locally via transformers
	# If false: Use remote API (requires HF_TOKEN)
	# Local mode requires 16GB+ RAM

	# ============ HuggingFace Model Caching ============
	HF_HOME=/app/.cache/huggingface
	# Directory for cached HF models and datasets
	# Mounted as volume in Docker for persistence

	TRANSFORMERS_CACHE=/app/.cache/huggingface
	# Alternative env var for transformers library caching

	# ============ Python Configuration ============
	PYTHONDONTWRITEBYTECODE=1
	# Don't create __pycache__ directories

	PYTHONUNBUFFERED=1
	# Stream logs immediately (useful in Docker)

	# ============ Logging ============
	LOG_LEVEL=INFO
	# Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL
	```

	### Variable Precedence

	1. Environment variables (highest priority)
	2. `.env` file
	3. Hardcoded defaults in code (lowest priority)

	Example override:
	```bash
	export HF_TOKEN="hf_custom_token" && python inference.py
	# Uses custom token instead of .env value
	```

	---

	## Running Tests

	The project includes a comprehensive test suite using pytest.

	### Setup

	```bash
	pip install pytest pytest-cov
	```

	### Run All Tests

	```bash
	pytest test/test.py -v
	```

	### Run Specific Test Class

	```bash
	pytest test/test.py::TestTasks -v
	```

	### Run with Coverage Report

	```bash
	pytest test/test.py --cov=server --cov-report=html
	# Opens htmlcov/index.html in browser for coverage visualization
	```

	### Test Categories

	\| Test \| Coverage \| Status \|
	\|---\|---\|---\|
	\| Task loading \| All 3 tasks initialize correctly \| ✓ \|
	\| API endpoints \| /reset, /step, /state, /close, /tasks, /health \| ✓ \|
	\| Reward grading \| text_spam, content_moderation, deepfake_detection \| ✓ \|
	\| Input validation \| Action schema validation, label validation \| ✓ \|
	\| Edge cases \| Empty labels, out-of-range confidence, etc. \| ✓ \|

	---

	## Troubleshooting

	### Installation Issues

	Problem: `ImportError: No module named 'openai'`
	```bash
	Solution: pip install "openai>=1.40.0"
	```

	Problem: `ImportError: No module named 'torch'`
	```bash
	Solution: pip install torch torchvision
	# For GPU: pip install torch torchvision -f https://download.pytorch.org/whl/cu121/torch_stable.html
	```

	Problem: `FileNotFoundError: requirements.txt`
	```bash
	Solution: Ensure you're in the project root: cd content-moderation-env/
	# Then: pip install -r server/requirements.txt
	```

	### Docker Issues

	Problem: `Segmentation fault (core dumped)` during build
	```
	Solution: Allocate more memory to Docker build:
	docker build --memory=8g -f server/Dockerfile -t content-moderation-env .
	```

	Problem: `failed to solve: failed to compute cache key`
	```
	Solution: Ensure requirements.txt is in server/ directory:
	# Current: server/requirements.txt (correct)
	# Wrong: ./requirements.txt
	```

	Problem: Port 7860 already in use
	```bash
	Solution: Use different port:
	docker run -p 8000:7860 content-moderation-env
	# Now access at http://localhost:8000
	```

	### Runtime Issues

	Problem: `Connection refused: localhost:7860`
	```bash
	Solution: Ensure server is running:
	uvicorn server.main:app --host 0.0.0.0 --port 7860

	In Docker, use: docker logs <container_id>
	```

	Problem: `Client.__init__() got an unexpected keyword argument 'proxies'`
	```bash
	Solution: Update OpenAI client:
	pip install --upgrade openai
	```

	Problem: HuggingFace models downloading very slowly
	```bash
	Solution: Check internet connection and verify HF_TOKEN:
	export HF_TOKEN="hf_your_token_here"
	# Or download models ahead of time
	python -c "from transformers import pipeline; pipeline('image-classification', model='dima806/deepfake_vs_real_image_detection')"
	```

	### API Issues

	Problem: Invalid request to `/step` without `/reset`
	```json
	Error: "Environment not initialized. Call /reset first."
	Solution: Always call POST /reset before any /step requests
	```

	Problem: Invalid label in action
	```json
	Error: {"detail": "Invalid label: 'unknown_label'"}
	Solution: Use only valid labels from the specification
	```

	Problem: Confidence out of range
	```
	Solution: Ensure confidence is between 0.0 and 1.0
	```

	---

	## Citation

	If you use this environment in your research, please cite:

	```bibtex
	@software{content_moderation_openenv_2025,
	title={Content Moderation OpenEnv: A Real-World AI Triage Environment},
	author={Anidipta},
	year={2025},
	url={https://github.com/Anidipta/Content-Moderation-env},
	note={OpenEnv Specification Compliant}
	}
	```

	---

	## Acknowledgements

	🙏 Built for the OpenEnv Hackathon 2025.

	Special Thanks To:
	- OpenEnv community for the specification and framework
	- HuggingFace for model hosting and inference APIs
	- Meta for the Llama-3.1-8B-Instruct model
	- Contributors and testers who improved the environment

	Dataset & Content Note:
	The email and content corpus is entirely synthetic and does not represent any real individuals, companies, organizations, or actual events. All examples are generated for demonstration and testing purposes only.

	License: MIT License — See [LICENSE](LICENSE) file for details

	Questions? Open an issue on GitHub or contact the maintainers.

	---

	Last Updated: April 8, 2026 \| OpenEnv Spec Version: 1.0
	colorTo: green
	sdk: docker
	pinned: false
	license: mit
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
	>>>>>>> f6dee02010a32ba1936311cbb3790fa087282e74