---
title: Content Moderation OpenEnv
emoji: 🛡️
colorFrom: blue
colorTo: indigo
sdk: docker
app_file: app.py
pinned: false
---

# Content Moderation OpenEnv

An AI content moderation environment built to the OpenEnv specification. Agents triage real-world content — spam emails, harmful social media posts, and AI-generated deepfakes — using a standard `step()` / `reset()` / `state()` API.

[![OpenEnv Spec](https://img.shields.io/badge/OpenEnv-Spec-blue)](https://github.com/openenv-core/spec)
[![Python 3.11+](https://img.shields.io/badge/Python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![FastAPI](https://img.shields.io/badge/FastAPI-0.111.0-green.svg)](https://fastapi.tiangolo.com/)
[![Docker](https://img.shields.io/badge/Docker-Ready-blue.svg)](https://www.docker.com/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

---

## 📋 Table of Contents

- [Environment Description & Motivation](#environment-description--motivation)
- [Task Descriptions](#task-descriptions)
- [Observation Space](#observation-space)
- [Action Space](#action-space)
- [Reward Functions](#reward-functions)
- [Baseline Scores](#baseline-scores)
- [Setup & Usage](#setup--usage)
  - [Requirements](#requirements)
  - [Local Installation](#local-installation)
  - [Docker Deployment](#docker-deployment)
  - [HuggingFace Spaces Deployment](#huggingface-spaces-deployment)
- [Running the Inference Script](#running-the-inference-script)
- [API Reference](#api-reference)
- [Project Structure](#project-structure)
- [Environment Variables](#environment-variables)
- [Running Tests](#running-tests)
- [Troubleshooting](#troubleshooting)
- [Citation](#citation)
- [Acknowledgements](#acknowledgements)

---

## Environment Description & Motivation

Content moderation is a high-stakes, high-volume real-world task. Human moderators review millions of items daily across platforms and inboxes. This environment simulates a realistic moderation pipeline across three difficulty levels, enabling AI agents to learn decision-making strategies under resource constraints.

**Key Challenges:**
- Multi-label classification with imbalanced datasets
- Confidence calibration under uncertainty
- Real-world content variability (spam, deepfakes, policy violations)
- Escalation vs. immediate action tradeoffs

| Task | Difficulty | Content Type | Metrics | Description |
|---|---|---|---|---|
| `text_spam` | Easy | Email / SMS | Binary classification + confidence | Spam vs. legitimate email filtering with sender reputation signals |
| `content_moderation` | Medium | Social media posts | Multi-label (up to 11 labels) | Detect policy violations (hate speech, harassment, violence, etc.) |
| `deepfake_detection` | Hard | Multimodal (image + text) | Detection accuracy + labels | Identify AI-manipulated media using vision transformer + metadata |

**Models Used:**
- **Deepfake Detection**: `dima806/deepfake_vs_real_image_detection` (ViT, ~300MB)
- **Language Model**: `meta-llama/Llama-3.1-8B-Instruct` (8B params, via transformers or HF API)
- **Infrastructure**: FastAPI + Uvicorn, Docker containerization

---

## Task Descriptions

### Task 1: Text Spam Classification (Easy)
**Objective:** Binary classification of emails/SMS as spam or legitimate.

**Dataset:**
- 50 items (30 spam, 20 legitimate)
- Features: text content, sender reputation score, link count, source
- Ground truth: decision + labels

**Example:**
```json
{
  "content_id": "ts_001",
  "content_type": "text",
  "text": "CONGRATULATIONS! You've won $1,000,000! Click here NOW to claim your prize!!!",
  "metadata": {"source": "email", "sender_reputation": 0.05, "link_count": 3},
  "ground_truth": {
    "decision": "reject",
    "labels": ["spam", "scam"],
    "is_harmful": true
  }
}
```

---

### Task 2: Content Moderation (Medium)
**Objective:** Multi-label classification of social media posts for policy violations.

**Dataset:**
- 40 items across diverse platforms
- Labels: spam, scam, phishing, hate_speech, violence, harassment, misinformation, adult_content, deepfake, political_manipulation, fraud
- Features: post text, engagement metrics, user reputation, report count

**Violation Categories:**
| Category | Definition | Examples |
|---|---|---|
| Hate Speech | Dehumanizing content targeting identity | Slurs, discrimination, incitement |
| Violence | Threats or glorification of violence | Physical harm, weapon promotion |
| Harassment | Coordinated or severe personal attacks | Doxxing, targeted campaigns |
| Misinformation | False claims with societal impact | Election fraud claims, health hoaxes |

---

### Task 3: Deepfake Detection (Hard)
**Objective:** Detect AI-manipulated media and classify content appropriately.

**Dataset:**
- 30 items (multimodal: images + descriptions)
- Deepfake detection model outputs raw confidence scores (0-1)
- Features: image description, detector_score, metadata

**Detector Score Interpretation:**
- `0.0-0.3`: Likely real/authentic
- `0.3-0.7`: Uncertain, may require additional analysis
- `0.7-1.0`: Likely deepfake/manipulated

**Example:**
```json
{
  "content_id": "df_001",
  "content_type": "multimodal",
  "image_description": "Portrait of person in business attire, lighting appears natural",
  "detector_score": 0.82,
  "metadata": {"platform": "social_media", "report_count": 3}
}
```

---

## Observation Space

Every step returns a `ContentObservation` with the following structure:

```json
{
  "content_id": "string",
  "content_type": "text | multimodal",
  "text": "string (optional, for text tasks)",
  "image_description": "string (optional, deepfake task only)",
  "detector_score": 0.0-1.0 (optional, deepfake task only),
  "metadata": {
    "source": "email | social_media | platform",
    "sender_reputation": 0.0-1.0,
    "link_count": 0,
    "report_count": 0,
    "timestamp": "ISO8601"
  },
  "step_num": 1,
  "total_steps": 10
}
```

| Field | Type | Required | Task | Description |
|---|---|---|---|---|
| `content_id` | string | All | Unique identifier for the content item |
| `content_type` | string | All | Type of content: `text` or `multimodal` |
| `text` | string | text_spam, content_moderation | The actual email/post body |
| `image_description` | string | deepfake_detection | AI-generated description of the image |
| `detector_score` | float | deepfake_detection | Raw output from deepfake model (0-1) |
| `metadata` | object | All | Platform-specific signals (reputation, reports, etc.) |
| `step_num` | int | All | Current step in episode |
| `total_steps` | int | All | Total steps in this episode |

---

## Action Space

Agents must respond with a `ModerationAction` object:

```json
{
  "decision": "approve | reject | escalate | flag",
  "reason": "string (max 200 chars, explanation)",
  "confidence": 0.0-1.0,
  "labels": ["spam", "scam", "phishing", "hate_speech", "violence",
             "harassment", "misinformation", "adult_content",
             "deepfake", "political_manipulation", "fraud"]
}
```

| Field | Type | Range | Description |
|---|---|---|---|
| `decision` | enum | {approve, reject, escalate, flag} | **approve**: Safe content; **reject**: Clear violation, remove immediately; **flag**: Borderline, needs review; **escalate**: Uncertain, route to specialist |
| `reason` | string | 0-200 chars | Brief explanation of the decision |
| `confidence` | float | [0.0, 1.0] | Agent's confidence in this decision. Calibration is rewarded. |
| `labels` | array | 0-11 labels | Detected violation categories. Must be valid from the allowed set. |

**Valid Labels:**
- `spam` — Unsolicited bulk messaging
- `scam` — Financial or identity theft attempts
- `phishing` — Social engineering attacks
- `hate_speech` — Dehumanizing content targeting identity groups
- `violence` — Threats, glorification, or instruction of violence
- `harassment` — Personal attacks, doxxing, coordinated abuse
- `misinformation` — False claims with societal impact
- `adult_content` — NSFW or sexually explicit material
- `deepfake` — AI-manipulated media
- `political_manipulation` — Coordinated inauthentic behavior
- `fraud` — Financial scams, false claims

---

## Reward Functions

Rewards are computed per task based on decision accuracy, label coverage (F1), and confidence calibration.

### text_spam (Easy)

| Component | Reward | Condition |
|---|---|---|
| Correct decision | **+0.65** | `decision` matches ground truth |
| Escalate on harmful | **+0.30** | Harmful content + escalate/flag (partial credit) |
| Label F1 contribution | **+0.20** | F1 score of predicted vs. true labels |
| Confidence calibration | **±0.10** | Bonus if confident on correct, penalty if confident on wrong |
| **Max per step** | **1.00** | Sum of components (capped) |

### content_moderation (Medium)

| Component | Reward | Condition |
|---|---|---|
| Correct decision | **+0.50** | `decision` matches ground truth |
| Partial credit | **+0.25** | Harmful content + flag/escalate (conservative approach) |
| Label F1 contribution | **+0.35** | Multi-label F1 score (up to 11 labels) |
| Confidence calibration | **±0.10** | Brier score penalty for miscalibration |
| **Max per step** | **1.00** | Sum of components (capped) |

### deepfake_detection (Hard)

| Component | Reward | Condition |
|---|---|---|
| Correct decision | **+0.40** | `decision` matches ground truth |
| Deepfake detection | **+0.30** | Accuracy vs. detector_score threshold |
| Detector alignment | **+0.10** | Bonus for leveraging model signals |
| Label F1 contribution | **+0.20** | Multi-label F1 (fewer labels than medium task) |
| Confidence calibration | **±0.10** | Calibration error penalty |
| **Max per step** | **1.00** | Sum of components (capped) |

**Calibration Bonus Formula:**
```
bonus = 0.1 × (confidence if correct else -confidence)
```

---

## Baseline Scores

Scores reported for **Llama-3.1-8B-Instruct** with `temperature=0.2` and `top-p=0.95`:

| Task | Score | Steps | Notes |
|---|---|---|---|
| `text_spam` | **0.72** | 5 | Strong on obvious spam; struggles with phishing disguised as legitimate |
| `content_moderation` | **0.58** | 8 | Good binary decisions; incomplete label coverage (F1 ≈0.52) |
| `deepfake_detection` | **0.44** | 10 | Relies on image descriptions; independent detector signals underutilized |

---

## Setup & Usage

### Requirements

- **Python**: 3.11 or higher
- **Docker** (optional, for containerized deployment)
- **GPU** (optional, recommended for deepfake models): CUDA 12.1+
- **Memory**: 8GB+ RAM (16GB recommended for local LLM inference)
- **Disk**: 10GB+ (models cached in `~/.cache/huggingface/`)

### Local Installation

1. **Clone and navigate:**
   ```bash
   git clone https://github.com/Anidipta/Content-Moderation-env.git
   cd Content-Moderation-env
   ```

2. **Create virtual environment:**
   ```bash
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```

3. **Install dependencies:**
   ```bash
   pip install -r server/requirements.txt
   ```

4. **Start the server:**
   ```bash
   uvicorn server.main:app --host 0.0.0.0 --port 7860
   ```

   Server runs at `http://localhost:7860`

5. **Access API documentation:**
   - Swagger UI: `http://localhost:7860/docs`
   - ReDoc: `http://localhost:7860/redoc`

### Docker Deployment

#### Build the Image

```bash
# Basic build
docker build -f server/Dockerfile -t content-moderation-env .

# Build with memory allocation (recommended)
docker build --memory=4g -f server/Dockerfile -t content-moderation-env .

# Build with progress output
docker build --progress=plain -f server/Dockerfile -t content-moderation-env .
```

#### Run the Container

```bash
# Basic run
docker run -p 7860:7860 content-moderation-env

# Run with environment variables
docker run -p 7860:7860 \
  -e API_BASE_URL="https://router.huggingface.co/v1" \
  -e MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" \
  -e HF_TOKEN="hf_your_token_here" \
  content-moderation-env

# Run with GPU support
docker run --gpus all -p 7860:7860 content-moderation-env

# Run with volume mounts (cache models locally)
docker run -p 7860:7860 \
  -v ~/.cache/huggingface:/app/.cache/huggingface \
  content-moderation-env

# Run in background
docker run -d -p 7860:7860 --name moderation-env content-moderation-env

# Check logs
docker logs moderation-env

# Stop container
docker stop moderation-env
```

#### Dockerfile Details

The [server/Dockerfile](server/Dockerfile) uses:
- **Base Image**: `python:3.11-slim` (~300MB) — minimal footprint with Python runtime
- **System Dependencies**: `libgl1 libglib2.0-0 curl` — required for vision models and health checks
- **Dependencies Installation**: Multi-stage approach with pip cache optimization
- **Model Preloading**: Deepfake detection model downloaded during build for faster startup
- **Environment Setup**: HuggingFace cache directories and Python settings pre-configured
- **Entry Point**: FastAPI app via Uvicorn on port 7860

```dockerfile
# Key optimizations:
- --no-cache-dir: Reduces image size by 50%
- --no-build-isolation: Prevents memory spikes during pip install
- Pre-downloaded models: Eliminates first-run delays
- Minimal dependencies: Only libraries needed for the environment
```

#### Deployment to Production

**Docker Compose:**
```yaml
version: '3.8'
services:
  moderation-api:
    build:
      context: .
      dockerfile: server/Dockerfile
    ports:
      - "7860:7860"
    environment:
      - API_BASE_URL=https://router.huggingface.co/v1
      - MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
      - HF_TOKEN=${HF_TOKEN}
    volumes:
      - ~/.cache/huggingface:/app/.cache/huggingface
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:7860/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

Run with: `docker-compose up -d`

### HuggingFace Spaces Deployment

1. Create a new Space with Docker SDK
2. Add Secrets (Settings → Repository secrets):
   - `HF_TOKEN`: Your HuggingFace API token
3. Add Variables (Settings → Repository variables):
   - `API_BASE_URL`: `https://router.huggingface.co/v1`
   - `MODEL_NAME`: `meta-llama/Llama-3.1-8B-Instruct`
4. Push this repository to the Space
5. Space URL becomes your `PING_URL` for validation scripts

---

## Running the Inference Script

```bash
# API mode (HF inference endpoint)
export API_BASE_URL="https://router.huggingface.co/v1"
export MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
export HF_TOKEN="hf_your_token_here"
export SERVER_URL="http://localhost:7860"
export TASK_NAME="text_spam"

python inference.py

# Local transformers pipeline mode
export USE_LOCAL_MODEL="true"
python inference.py
```

### Output Format

```
[START] task=text_spam env=content_moderation_env model=meta-llama/Llama-3.1-8B-Instruct
[STEP] step=1 action={"decision":"reject","confidence":0.9,"labels":["spam"]} reward=0.85 done=false error=null
[STEP] step=2 action={"decision":"approve","confidence":0.8,"labels":[]} reward=0.75 done=false error=null
[STEP] step=3 action={"decision":"escalate","confidence":0.5,"labels":["scam"]} reward=0.30 done=false error=null
[STEP] step=4 action={"decision":"reject","confidence":0.85,"labels":["phishing"]} reward=0.70 done=false error=null
[STEP] step=5 action={"decision":"approve","confidence":0.88,"labels":[]} reward=0.75 done=true error=null
[END] success=true steps=5 score=0.720 rewards=0.85,0.75,0.30,0.70,0.75
```

| Field | Type | Description |
|---|---|---|
| `task` | string | The task being evaluated |
| `step` | int | Current step number in episode |
| `decision` | string | Agent's moderation decision |
| `confidence` | float | Agent's confidence (0-1) |
| `labels` | array | Detected violation labels |
| `reward` | float | Reward received for this step |
| `done` | boolean | Episode completion flag |
| `error` | string/null | Error message if applicable |
| `score` | float | Final episode score |

---

## API Reference

### Server Endpoints

All endpoints are JSON-based with FastAPI's automatic validation.

#### 1. Reset Episode
**POST** `/reset`

Start a new moderation episode.

**Request Body:**
```json
{
  "task": "text_spam"
}
```

**Response (200 OK):**
```json
{
  "observation": {
    "content_id": "ts_001",
    "content_type": "text",
    "text": "CONGRATULATIONS! You've won $1,000,000!...",
    "metadata": {"source": "email", "sender_reputation": 0.05, "link_count": 3},
    "step_num": 1,
    "total_steps": 10
  },
  "info": {}
}
```

**Error (400):**
```json
{
  "detail": "Unknown task 'invalid_task'. Valid: ['text_spam', 'content_moderation', 'deepfake_detection']"
}
```

---

#### 2. Submit Action
**POST** `/step`

Submit a moderation action for the current content.

**Request Body:**
```json
{
  "decision": "reject",
  "reason": "Email contains typical spam patterns and suspicious links",
  "confidence": 0.92,
  "labels": ["spam", "scam"]
}
```

**Response (200 OK):**
```json
{
  "observation": {
    "content_id": "ts_002",
    "content_type": "text",
    "text": "Hi Sarah, confirming our meeting tomorrow...",
    "metadata": {"source": "email", "sender_reputation": 0.92, "link_count": 0},
    "step_num": 2,
    "total_steps": 10
  },
  "reward": 0.85,
  "done": false,
  "info": {}
}
```

---

#### 3. Get Current State
**GET** `/state`

Retrieve the current episode state without taking an action.

**Response (200 OK):**
```json
{
  "observation": {...},
  "reward": 0.85,
  "done": false,
  "info": {
    "task": "text_spam",
    "items_completed": 2,
    "total_items": 10,
    "cumulative_reward": 1.60
  }
}
```

---

#### 4. Close Episode
**POST** `/close`

Explicitly close the episode and clean up resources.

**Response (200 OK):**
```json
{
  "status": "closed",
  "final_reward": 7.20,
  "steps_completed": 10
}
```

---

#### 5. List Available Tasks
**GET** `/tasks`

Get metadata about all available tasks.

**Response (200 OK):**
```json
{
  "text_spam": {
    "description": "Classify email/message content as spam or legitimate",
    "difficulty": "easy",
    "num_items": 50,
    "content_type": "text"
  },
  "content_moderation": {
    "description": "Detect policy violations in social media posts",
    "difficulty": "medium",
    "num_items": 40,
    "content_type": "text"
  },
  "deepfake_detection": {
    "description": "Identify AI-manipulated media",
    "difficulty": "hard",
    "num_items": 30,
    "content_type": "multimodal"
  }
}
```

---

#### 6. Health Check
**GET** `/health`

Check server health and status.

**Response (200 OK):**
```json
{
  "status": "ok"
}
```

---

#### 7. Root Endpoint
**GET** `/`

Redirects to interactive Swagger UI documentation.

---

## Project Structure

```
content-moderation-env/
│
├── README.md                          # This file
├── uv.lock                            # Dependency lock file (UV package manager)
├── inference.py                       # Baseline agent script (235 lines)
│                                      # Demonstrates LLM agent interaction
│                                      # Supports HF API and local inference modes
│
├── server/                            # FastAPI application (core)
│   ├── __init__.py                    # Package marker (empty)
│   │
│   ├── main.py                        # FastAPI app & HTTP endpoints (57 lines)
│   │                                  # Defines: /reset, /step, /state, /close
│   │                                  # /tasks, /health, / endpoints
│   │
│   ├── env.py                         # OpenEnv environment implementation (122 lines)
│   │                                  # Core logic: reset(), step(), state(), close()
│   │                                  # Thread-safe with locks for concurrency
│   │
│   ├── models.py                      # Pydantic data models
│   │                                  # Defines: ContentObservation, ModerationAction
│   │                                  # StepResult, ResetResult, EnvState
│   │
│   ├── tasks.py                       # Task datasets & ground truth (193 lines)
│   │                                  # Contains: text_spam, content_moderation,
│   │                                  # deepfake_detection task definitions & items
│   │
│   ├── graders.py                     # Reward functions per task (95 lines)
│   │                                  # Implements: label F1, calibration bonus,
│   │                                  # decision accuracy scoring logic
│   │
│   ├── deepfake_model.py              # HF deepfake detection pipeline (90 lines)
│   │                                  # Lazy-loads: dima806/deepfake_vs_real...
│   │                                  # Caches model in HF_HOME for reuse
│   │
│   ├── openenv.yaml                   # OpenEnv specification metadata
│   │                                  # Declares task specs, observation/action space
│   │
│   ├── Dockerfile                     # Docker container definition
│   │                                  # Base: python:3.11-slim (~300MB)
│   │                                  # Installs system deps, pip packages,
│   │                                  # pre-downloads deepfake model
│   │
│   └── requirements.txt                # Python dependencies (12 packages)
│                                      # Key: fastapi, uvicorn, transformers,
│                                      # torch, openai, python-dotenv
│
├── test/                              # Test suite
│   └── test.py                        # pytest tests (20+ test cases)
│                                      # Coverage: tasks, endpoints, rewards
│
└── .env                               # Environment variables (git-ignored)
                                       # Stores: HF_TOKEN, API_BASE_URL, etc.
```

---

## Environment Variables

Configuration is controlled via environment variables. Create a `.env` file in the project root:

```env
# ============ API Configuration ============
API_BASE_URL=https://router.huggingface.co/v1
# URL of the LLM inference endpoint
# Default: HuggingFace router (requires HF_TOKEN)

MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
# Which LLM to use for agent inference
# Other options: gpt-3.5-turbo, claude-3-opus, mistral-large, etc.

HF_TOKEN=hf_your_token_here
# HuggingFace API token for authenticated requests
# Get from: https://huggingface.co/settings/tokens

# ============ Server Configuration ============
SERVER_URL=http://localhost:7860
# Where the OpenEnv API server runs
# Used by inference.py to connect to environment

# ============ Task & Inference Configuration ============
TASK_NAME=text_spam
# Which task to run: text_spam, content_moderation, deepfake_detection

USE_LOCAL_MODEL=false
# If true: Load Llama-3.1-8B locally via transformers
# If false: Use remote API (requires HF_TOKEN)
# Local mode requires 16GB+ RAM

# ============ HuggingFace Model Caching ============
HF_HOME=/app/.cache/huggingface
# Directory for cached HF models and datasets
# Mounted as volume in Docker for persistence

TRANSFORMERS_CACHE=/app/.cache/huggingface
# Alternative env var for transformers library caching

# ============ Python Configuration ============
PYTHONDONTWRITEBYTECODE=1
# Don't create __pycache__ directories

PYTHONUNBUFFERED=1
# Stream logs immediately (useful in Docker)

# ============ Logging ============
LOG_LEVEL=INFO
# Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL
```

### Variable Precedence

1. Environment variables (highest priority)
2. `.env` file
3. Hardcoded defaults in code (lowest priority)

Example override:
```bash
export HF_TOKEN="hf_custom_token" && python inference.py
# Uses custom token instead of .env value
```

---

## Running Tests

The project includes a comprehensive test suite using pytest.

### Setup

```bash
pip install pytest pytest-cov
```

### Run All Tests

```bash
pytest test/test.py -v
```

### Run Specific Test Class

```bash
pytest test/test.py::TestTasks -v
```

### Run with Coverage Report

```bash
pytest test/test.py --cov=server --cov-report=html
# Opens htmlcov/index.html in browser for coverage visualization
```

### Test Categories

| Test | Coverage | Status |
|---|---|---|
| Task loading | All 3 tasks initialize correctly | ✓ |
| API endpoints | /reset, /step, /state, /close, /tasks, /health | ✓ |
| Reward grading | text_spam, content_moderation, deepfake_detection | ✓ |
| Input validation | Action schema validation, label validation | ✓ |
| Edge cases | Empty labels, out-of-range confidence, etc. | ✓ |

---

## Troubleshooting

### Installation Issues

**Problem:** `ImportError: No module named 'openai'`
```bash
Solution: pip install "openai>=1.40.0"
```

**Problem:** `ImportError: No module named 'torch'`
```bash
Solution: pip install torch torchvision
# For GPU: pip install torch torchvision -f https://download.pytorch.org/whl/cu121/torch_stable.html
```

**Problem:** `FileNotFoundError: requirements.txt`
```bash
Solution: Ensure you're in the project root: cd content-moderation-env/
# Then: pip install -r server/requirements.txt
```

### Docker Issues

**Problem:** `Segmentation fault (core dumped)` during build
```
Solution: Allocate more memory to Docker build:
docker build --memory=8g -f server/Dockerfile -t content-moderation-env .
```

**Problem:** `failed to solve: failed to compute cache key`
```
Solution: Ensure requirements.txt is in server/ directory:
# Current: server/requirements.txt (correct)
# Wrong: ./requirements.txt
```

**Problem:** Port 7860 already in use
```bash
Solution: Use different port:
docker run -p 8000:7860 content-moderation-env
# Now access at http://localhost:8000
```

### Runtime Issues

**Problem:** `Connection refused: localhost:7860`
```bash
Solution: Ensure server is running:
uvicorn server.main:app --host 0.0.0.0 --port 7860

In Docker, use: docker logs <container_id>
```

**Problem:** `Client.__init__() got an unexpected keyword argument 'proxies'`
```bash
Solution: Update OpenAI client:
pip install --upgrade openai
```

**Problem:** HuggingFace models downloading very slowly
```bash
Solution: Check internet connection and verify HF_TOKEN:
export HF_TOKEN="hf_your_token_here"
# Or download models ahead of time
python -c "from transformers import pipeline; pipeline('image-classification', model='dima806/deepfake_vs_real_image_detection')"
```

### API Issues

**Problem:** Invalid request to `/step` without `/reset`
```json
Error: "Environment not initialized. Call /reset first."
Solution: Always call POST /reset before any /step requests
```

**Problem:** Invalid label in action
```json
Error: {"detail": "Invalid label: 'unknown_label'"}
Solution: Use only valid labels from the specification
```

**Problem:** Confidence out of range
```
Solution: Ensure confidence is between 0.0 and 1.0
```

---

## Citation

If you use this environment in your research, please cite:

```bibtex
@software{content_moderation_openenv_2025,
  title={Content Moderation OpenEnv: A Real-World AI Triage Environment},
  author={Anidipta},
  year={2025},
  url={https://github.com/Anidipta/Content-Moderation-env},
  note={OpenEnv Specification Compliant}
}
```

---

## Acknowledgements

🙏 Built for the **OpenEnv Hackathon 2025**.

**Special Thanks To:**
- OpenEnv community for the specification and framework
- HuggingFace for model hosting and inference APIs
- Meta for the Llama-3.1-8B-Instruct model
- Contributors and testers who improved the environment

**Dataset & Content Note:**
The email and content corpus is entirely **synthetic** and does not represent any real individuals, companies, organizations, or actual events. All examples are generated for demonstration and testing purposes only.

**License:** MIT License — See [LICENSE](LICENSE) file for details

**Questions?** Open an issue on GitHub or contact the maintainers.

---

**Last Updated:** April 8, 2026 | **OpenEnv Spec Version:** 1.0
colorTo: green
sdk: docker
pinned: false
license: mit
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
>>>>>>> f6dee02010a32ba1936311cbb3790fa087282e74