Spaces:
Sleeping
Sleeping
| title: Content Moderation OpenEnv | |
| emoji: 🛡️ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| app_file: app.py | |
| pinned: false | |
| # Content Moderation OpenEnv | |
| An AI content moderation environment built to the OpenEnv specification. Agents triage real-world content — spam emails, harmful social media posts, and AI-generated deepfakes — using a standard `step()` / `reset()` / `state()` API. | |
| [](https://github.com/openenv-core/spec) | |
| [](https://www.python.org/downloads/) | |
| [](https://fastapi.tiangolo.com/) | |
| [](https://www.docker.com/) | |
| [](LICENSE) | |
| --- | |
| ## 📋 Table of Contents | |
| - [Environment Description & Motivation](#environment-description--motivation) | |
| - [Task Descriptions](#task-descriptions) | |
| - [Observation Space](#observation-space) | |
| - [Action Space](#action-space) | |
| - [Reward Functions](#reward-functions) | |
| - [Baseline Scores](#baseline-scores) | |
| - [Setup & Usage](#setup--usage) | |
| - [Requirements](#requirements) | |
| - [Local Installation](#local-installation) | |
| - [Docker Deployment](#docker-deployment) | |
| - [HuggingFace Spaces Deployment](#huggingface-spaces-deployment) | |
| - [Running the Inference Script](#running-the-inference-script) | |
| - [API Reference](#api-reference) | |
| - [Project Structure](#project-structure) | |
| - [Environment Variables](#environment-variables) | |
| - [Running Tests](#running-tests) | |
| - [Troubleshooting](#troubleshooting) | |
| - [Citation](#citation) | |
| - [Acknowledgements](#acknowledgements) | |
| --- | |
| ## Environment Description & Motivation | |
| Content moderation is a high-stakes, high-volume real-world task. Human moderators review millions of items daily across platforms and inboxes. This environment simulates a realistic moderation pipeline across three difficulty levels, enabling AI agents to learn decision-making strategies under resource constraints. | |
| **Key Challenges:** | |
| - Multi-label classification with imbalanced datasets | |
| - Confidence calibration under uncertainty | |
| - Real-world content variability (spam, deepfakes, policy violations) | |
| - Escalation vs. immediate action tradeoffs | |
| | Task | Difficulty | Content Type | Metrics | Description | | |
| |---|---|---|---|---| | |
| | `text_spam` | Easy | Email / SMS | Binary classification + confidence | Spam vs. legitimate email filtering with sender reputation signals | | |
| | `content_moderation` | Medium | Social media posts | Multi-label (up to 11 labels) | Detect policy violations (hate speech, harassment, violence, etc.) | | |
| | `deepfake_detection` | Hard | Multimodal (image + text) | Detection accuracy + labels | Identify AI-manipulated media using vision transformer + metadata | | |
| **Models Used:** | |
| - **Deepfake Detection**: `dima806/deepfake_vs_real_image_detection` (ViT, ~300MB) | |
| - **Language Model**: `meta-llama/Llama-3.1-8B-Instruct` (8B params, via transformers or HF API) | |
| - **Infrastructure**: FastAPI + Uvicorn, Docker containerization | |
| --- | |
| ## Task Descriptions | |
| ### Task 1: Text Spam Classification (Easy) | |
| **Objective:** Binary classification of emails/SMS as spam or legitimate. | |
| **Dataset:** | |
| - 50 items (30 spam, 20 legitimate) | |
| - Features: text content, sender reputation score, link count, source | |
| - Ground truth: decision + labels | |
| **Example:** | |
| ```json | |
| { | |
| "content_id": "ts_001", | |
| "content_type": "text", | |
| "text": "CONGRATULATIONS! You've won $1,000,000! Click here NOW to claim your prize!!!", | |
| "metadata": {"source": "email", "sender_reputation": 0.05, "link_count": 3}, | |
| "ground_truth": { | |
| "decision": "reject", | |
| "labels": ["spam", "scam"], | |
| "is_harmful": true | |
| } | |
| } | |
| ``` | |
| --- | |
| ### Task 2: Content Moderation (Medium) | |
| **Objective:** Multi-label classification of social media posts for policy violations. | |
| **Dataset:** | |
| - 40 items across diverse platforms | |
| - Labels: spam, scam, phishing, hate_speech, violence, harassment, misinformation, adult_content, deepfake, political_manipulation, fraud | |
| - Features: post text, engagement metrics, user reputation, report count | |
| **Violation Categories:** | |
| | Category | Definition | Examples | | |
| |---|---|---| | |
| | Hate Speech | Dehumanizing content targeting identity | Slurs, discrimination, incitement | | |
| | Violence | Threats or glorification of violence | Physical harm, weapon promotion | | |
| | Harassment | Coordinated or severe personal attacks | Doxxing, targeted campaigns | | |
| | Misinformation | False claims with societal impact | Election fraud claims, health hoaxes | | |
| --- | |
| ### Task 3: Deepfake Detection (Hard) | |
| **Objective:** Detect AI-manipulated media and classify content appropriately. | |
| **Dataset:** | |
| - 30 items (multimodal: images + descriptions) | |
| - Deepfake detection model outputs raw confidence scores (0-1) | |
| - Features: image description, detector_score, metadata | |
| **Detector Score Interpretation:** | |
| - `0.0-0.3`: Likely real/authentic | |
| - `0.3-0.7`: Uncertain, may require additional analysis | |
| - `0.7-1.0`: Likely deepfake/manipulated | |
| **Example:** | |
| ```json | |
| { | |
| "content_id": "df_001", | |
| "content_type": "multimodal", | |
| "image_description": "Portrait of person in business attire, lighting appears natural", | |
| "detector_score": 0.82, | |
| "metadata": {"platform": "social_media", "report_count": 3} | |
| } | |
| ``` | |
| --- | |
| ## Observation Space | |
| Every step returns a `ContentObservation` with the following structure: | |
| ```json | |
| { | |
| "content_id": "string", | |
| "content_type": "text | multimodal", | |
| "text": "string (optional, for text tasks)", | |
| "image_description": "string (optional, deepfake task only)", | |
| "detector_score": 0.0-1.0 (optional, deepfake task only), | |
| "metadata": { | |
| "source": "email | social_media | platform", | |
| "sender_reputation": 0.0-1.0, | |
| "link_count": 0, | |
| "report_count": 0, | |
| "timestamp": "ISO8601" | |
| }, | |
| "step_num": 1, | |
| "total_steps": 10 | |
| } | |
| ``` | |
| | Field | Type | Required | Task | Description | | |
| |---|---|---|---|---| | |
| | `content_id` | string | All | Unique identifier for the content item | | |
| | `content_type` | string | All | Type of content: `text` or `multimodal` | | |
| | `text` | string | text_spam, content_moderation | The actual email/post body | | |
| | `image_description` | string | deepfake_detection | AI-generated description of the image | | |
| | `detector_score` | float | deepfake_detection | Raw output from deepfake model (0-1) | | |
| | `metadata` | object | All | Platform-specific signals (reputation, reports, etc.) | | |
| | `step_num` | int | All | Current step in episode | | |
| | `total_steps` | int | All | Total steps in this episode | | |
| --- | |
| ## Action Space | |
| Agents must respond with a `ModerationAction` object: | |
| ```json | |
| { | |
| "decision": "approve | reject | escalate | flag", | |
| "reason": "string (max 200 chars, explanation)", | |
| "confidence": 0.0-1.0, | |
| "labels": ["spam", "scam", "phishing", "hate_speech", "violence", | |
| "harassment", "misinformation", "adult_content", | |
| "deepfake", "political_manipulation", "fraud"] | |
| } | |
| ``` | |
| | Field | Type | Range | Description | | |
| |---|---|---|---| | |
| | `decision` | enum | {approve, reject, escalate, flag} | **approve**: Safe content; **reject**: Clear violation, remove immediately; **flag**: Borderline, needs review; **escalate**: Uncertain, route to specialist | | |
| | `reason` | string | 0-200 chars | Brief explanation of the decision | | |
| | `confidence` | float | [0.0, 1.0] | Agent's confidence in this decision. Calibration is rewarded. | | |
| | `labels` | array | 0-11 labels | Detected violation categories. Must be valid from the allowed set. | | |
| **Valid Labels:** | |
| - `spam` — Unsolicited bulk messaging | |
| - `scam` — Financial or identity theft attempts | |
| - `phishing` — Social engineering attacks | |
| - `hate_speech` — Dehumanizing content targeting identity groups | |
| - `violence` — Threats, glorification, or instruction of violence | |
| - `harassment` — Personal attacks, doxxing, coordinated abuse | |
| - `misinformation` — False claims with societal impact | |
| - `adult_content` — NSFW or sexually explicit material | |
| - `deepfake` — AI-manipulated media | |
| - `political_manipulation` — Coordinated inauthentic behavior | |
| - `fraud` — Financial scams, false claims | |
| --- | |
| ## Reward Functions | |
| Rewards are computed per task based on decision accuracy, label coverage (F1), and confidence calibration. | |
| ### text_spam (Easy) | |
| | Component | Reward | Condition | | |
| |---|---|---| | |
| | Correct decision | **+0.65** | `decision` matches ground truth | | |
| | Escalate on harmful | **+0.30** | Harmful content + escalate/flag (partial credit) | | |
| | Label F1 contribution | **+0.20** | F1 score of predicted vs. true labels | | |
| | Confidence calibration | **±0.10** | Bonus if confident on correct, penalty if confident on wrong | | |
| | **Max per step** | **1.00** | Sum of components (capped) | | |
| ### content_moderation (Medium) | |
| | Component | Reward | Condition | | |
| |---|---|---| | |
| | Correct decision | **+0.50** | `decision` matches ground truth | | |
| | Partial credit | **+0.25** | Harmful content + flag/escalate (conservative approach) | | |
| | Label F1 contribution | **+0.35** | Multi-label F1 score (up to 11 labels) | | |
| | Confidence calibration | **±0.10** | Brier score penalty for miscalibration | | |
| | **Max per step** | **1.00** | Sum of components (capped) | | |
| ### deepfake_detection (Hard) | |
| | Component | Reward | Condition | | |
| |---|---|---| | |
| | Correct decision | **+0.40** | `decision` matches ground truth | | |
| | Deepfake detection | **+0.30** | Accuracy vs. detector_score threshold | | |
| | Detector alignment | **+0.10** | Bonus for leveraging model signals | | |
| | Label F1 contribution | **+0.20** | Multi-label F1 (fewer labels than medium task) | | |
| | Confidence calibration | **±0.10** | Calibration error penalty | | |
| | **Max per step** | **1.00** | Sum of components (capped) | | |
| **Calibration Bonus Formula:** | |
| ``` | |
| bonus = 0.1 × (confidence if correct else -confidence) | |
| ``` | |
| --- | |
| ## Baseline Scores | |
| Scores reported for **Llama-3.1-8B-Instruct** with `temperature=0.2` and `top-p=0.95`: | |
| | Task | Score | Steps | Notes | | |
| |---|---|---|---| | |
| | `text_spam` | **0.72** | 5 | Strong on obvious spam; struggles with phishing disguised as legitimate | | |
| | `content_moderation` | **0.58** | 8 | Good binary decisions; incomplete label coverage (F1 ≈0.52) | | |
| | `deepfake_detection` | **0.44** | 10 | Relies on image descriptions; independent detector signals underutilized | | |
| --- | |
| ## Setup & Usage | |
| ### Requirements | |
| - **Python**: 3.11 or higher | |
| - **Docker** (optional, for containerized deployment) | |
| - **GPU** (optional, recommended for deepfake models): CUDA 12.1+ | |
| - **Memory**: 8GB+ RAM (16GB recommended for local LLM inference) | |
| - **Disk**: 10GB+ (models cached in `~/.cache/huggingface/`) | |
| ### Local Installation | |
| 1. **Clone and navigate:** | |
| ```bash | |
| git clone https://github.com/Anidipta/Content-Moderation-env.git | |
| cd Content-Moderation-env | |
| ``` | |
| 2. **Create virtual environment:** | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate # On Windows: venv\Scripts\activate | |
| ``` | |
| 3. **Install dependencies:** | |
| ```bash | |
| pip install -r server/requirements.txt | |
| ``` | |
| 4. **Start the server:** | |
| ```bash | |
| uvicorn server.main:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| Server runs at `http://localhost:7860` | |
| 5. **Access API documentation:** | |
| - Swagger UI: `http://localhost:7860/docs` | |
| - ReDoc: `http://localhost:7860/redoc` | |
| ### Docker Deployment | |
| #### Build the Image | |
| ```bash | |
| # Basic build | |
| docker build -f server/Dockerfile -t content-moderation-env . | |
| # Build with memory allocation (recommended) | |
| docker build --memory=4g -f server/Dockerfile -t content-moderation-env . | |
| # Build with progress output | |
| docker build --progress=plain -f server/Dockerfile -t content-moderation-env . | |
| ``` | |
| #### Run the Container | |
| ```bash | |
| # Basic run | |
| docker run -p 7860:7860 content-moderation-env | |
| # Run with environment variables | |
| docker run -p 7860:7860 \ | |
| -e API_BASE_URL="https://router.huggingface.co/v1" \ | |
| -e MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" \ | |
| -e HF_TOKEN="hf_your_token_here" \ | |
| content-moderation-env | |
| # Run with GPU support | |
| docker run --gpus all -p 7860:7860 content-moderation-env | |
| # Run with volume mounts (cache models locally) | |
| docker run -p 7860:7860 \ | |
| -v ~/.cache/huggingface:/app/.cache/huggingface \ | |
| content-moderation-env | |
| # Run in background | |
| docker run -d -p 7860:7860 --name moderation-env content-moderation-env | |
| # Check logs | |
| docker logs moderation-env | |
| # Stop container | |
| docker stop moderation-env | |
| ``` | |
| #### Dockerfile Details | |
| The [server/Dockerfile](server/Dockerfile) uses: | |
| - **Base Image**: `python:3.11-slim` (~300MB) — minimal footprint with Python runtime | |
| - **System Dependencies**: `libgl1 libglib2.0-0 curl` — required for vision models and health checks | |
| - **Dependencies Installation**: Multi-stage approach with pip cache optimization | |
| - **Model Preloading**: Deepfake detection model downloaded during build for faster startup | |
| - **Environment Setup**: HuggingFace cache directories and Python settings pre-configured | |
| - **Entry Point**: FastAPI app via Uvicorn on port 7860 | |
| ```dockerfile | |
| # Key optimizations: | |
| - --no-cache-dir: Reduces image size by 50% | |
| - --no-build-isolation: Prevents memory spikes during pip install | |
| - Pre-downloaded models: Eliminates first-run delays | |
| - Minimal dependencies: Only libraries needed for the environment | |
| ``` | |
| #### Deployment to Production | |
| **Docker Compose:** | |
| ```yaml | |
| version: '3.8' | |
| services: | |
| moderation-api: | |
| build: | |
| context: . | |
| dockerfile: server/Dockerfile | |
| ports: | |
| - "7860:7860" | |
| environment: | |
| - API_BASE_URL=https://router.huggingface.co/v1 | |
| - MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct | |
| - HF_TOKEN=${HF_TOKEN} | |
| volumes: | |
| - ~/.cache/huggingface:/app/.cache/huggingface | |
| restart: unless-stopped | |
| healthcheck: | |
| test: ["CMD", "curl", "-f", "http://localhost:7860/health"] | |
| interval: 30s | |
| timeout: 10s | |
| retries: 3 | |
| ``` | |
| Run with: `docker-compose up -d` | |
| ### HuggingFace Spaces Deployment | |
| 1. Create a new Space with Docker SDK | |
| 2. Add Secrets (Settings → Repository secrets): | |
| - `HF_TOKEN`: Your HuggingFace API token | |
| 3. Add Variables (Settings → Repository variables): | |
| - `API_BASE_URL`: `https://router.huggingface.co/v1` | |
| - `MODEL_NAME`: `meta-llama/Llama-3.1-8B-Instruct` | |
| 4. Push this repository to the Space | |
| 5. Space URL becomes your `PING_URL` for validation scripts | |
| --- | |
| ## Running the Inference Script | |
| ```bash | |
| # API mode (HF inference endpoint) | |
| export API_BASE_URL="https://router.huggingface.co/v1" | |
| export MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" | |
| export HF_TOKEN="hf_your_token_here" | |
| export SERVER_URL="http://localhost:7860" | |
| export TASK_NAME="text_spam" | |
| python inference.py | |
| # Local transformers pipeline mode | |
| export USE_LOCAL_MODEL="true" | |
| python inference.py | |
| ``` | |
| ### Output Format | |
| ``` | |
| [START] task=text_spam env=content_moderation_env model=meta-llama/Llama-3.1-8B-Instruct | |
| [STEP] step=1 action={"decision":"reject","confidence":0.9,"labels":["spam"]} reward=0.85 done=false error=null | |
| [STEP] step=2 action={"decision":"approve","confidence":0.8,"labels":[]} reward=0.75 done=false error=null | |
| [STEP] step=3 action={"decision":"escalate","confidence":0.5,"labels":["scam"]} reward=0.30 done=false error=null | |
| [STEP] step=4 action={"decision":"reject","confidence":0.85,"labels":["phishing"]} reward=0.70 done=false error=null | |
| [STEP] step=5 action={"decision":"approve","confidence":0.88,"labels":[]} reward=0.75 done=true error=null | |
| [END] success=true steps=5 score=0.720 rewards=0.85,0.75,0.30,0.70,0.75 | |
| ``` | |
| | Field | Type | Description | | |
| |---|---|---| | |
| | `task` | string | The task being evaluated | | |
| | `step` | int | Current step number in episode | | |
| | `decision` | string | Agent's moderation decision | | |
| | `confidence` | float | Agent's confidence (0-1) | | |
| | `labels` | array | Detected violation labels | | |
| | `reward` | float | Reward received for this step | | |
| | `done` | boolean | Episode completion flag | | |
| | `error` | string/null | Error message if applicable | | |
| | `score` | float | Final episode score | | |
| --- | |
| ## API Reference | |
| ### Server Endpoints | |
| All endpoints are JSON-based with FastAPI's automatic validation. | |
| #### 1. Reset Episode | |
| **POST** `/reset` | |
| Start a new moderation episode. | |
| **Request Body:** | |
| ```json | |
| { | |
| "task": "text_spam" | |
| } | |
| ``` | |
| **Response (200 OK):** | |
| ```json | |
| { | |
| "observation": { | |
| "content_id": "ts_001", | |
| "content_type": "text", | |
| "text": "CONGRATULATIONS! You've won $1,000,000!...", | |
| "metadata": {"source": "email", "sender_reputation": 0.05, "link_count": 3}, | |
| "step_num": 1, | |
| "total_steps": 10 | |
| }, | |
| "info": {} | |
| } | |
| ``` | |
| **Error (400):** | |
| ```json | |
| { | |
| "detail": "Unknown task 'invalid_task'. Valid: ['text_spam', 'content_moderation', 'deepfake_detection']" | |
| } | |
| ``` | |
| --- | |
| #### 2. Submit Action | |
| **POST** `/step` | |
| Submit a moderation action for the current content. | |
| **Request Body:** | |
| ```json | |
| { | |
| "decision": "reject", | |
| "reason": "Email contains typical spam patterns and suspicious links", | |
| "confidence": 0.92, | |
| "labels": ["spam", "scam"] | |
| } | |
| ``` | |
| **Response (200 OK):** | |
| ```json | |
| { | |
| "observation": { | |
| "content_id": "ts_002", | |
| "content_type": "text", | |
| "text": "Hi Sarah, confirming our meeting tomorrow...", | |
| "metadata": {"source": "email", "sender_reputation": 0.92, "link_count": 0}, | |
| "step_num": 2, | |
| "total_steps": 10 | |
| }, | |
| "reward": 0.85, | |
| "done": false, | |
| "info": {} | |
| } | |
| ``` | |
| --- | |
| #### 3. Get Current State | |
| **GET** `/state` | |
| Retrieve the current episode state without taking an action. | |
| **Response (200 OK):** | |
| ```json | |
| { | |
| "observation": {...}, | |
| "reward": 0.85, | |
| "done": false, | |
| "info": { | |
| "task": "text_spam", | |
| "items_completed": 2, | |
| "total_items": 10, | |
| "cumulative_reward": 1.60 | |
| } | |
| } | |
| ``` | |
| --- | |
| #### 4. Close Episode | |
| **POST** `/close` | |
| Explicitly close the episode and clean up resources. | |
| **Response (200 OK):** | |
| ```json | |
| { | |
| "status": "closed", | |
| "final_reward": 7.20, | |
| "steps_completed": 10 | |
| } | |
| ``` | |
| --- | |
| #### 5. List Available Tasks | |
| **GET** `/tasks` | |
| Get metadata about all available tasks. | |
| **Response (200 OK):** | |
| ```json | |
| { | |
| "text_spam": { | |
| "description": "Classify email/message content as spam or legitimate", | |
| "difficulty": "easy", | |
| "num_items": 50, | |
| "content_type": "text" | |
| }, | |
| "content_moderation": { | |
| "description": "Detect policy violations in social media posts", | |
| "difficulty": "medium", | |
| "num_items": 40, | |
| "content_type": "text" | |
| }, | |
| "deepfake_detection": { | |
| "description": "Identify AI-manipulated media", | |
| "difficulty": "hard", | |
| "num_items": 30, | |
| "content_type": "multimodal" | |
| } | |
| } | |
| ``` | |
| --- | |
| #### 6. Health Check | |
| **GET** `/health` | |
| Check server health and status. | |
| **Response (200 OK):** | |
| ```json | |
| { | |
| "status": "ok" | |
| } | |
| ``` | |
| --- | |
| #### 7. Root Endpoint | |
| **GET** `/` | |
| Redirects to interactive Swagger UI documentation. | |
| --- | |
| ## Project Structure | |
| ``` | |
| content-moderation-env/ | |
| │ | |
| ├── README.md # This file | |
| ├── uv.lock # Dependency lock file (UV package manager) | |
| ├── inference.py # Baseline agent script (235 lines) | |
| │ # Demonstrates LLM agent interaction | |
| │ # Supports HF API and local inference modes | |
| │ | |
| ├── server/ # FastAPI application (core) | |
| │ ├── __init__.py # Package marker (empty) | |
| │ │ | |
| │ ├── main.py # FastAPI app & HTTP endpoints (57 lines) | |
| │ │ # Defines: /reset, /step, /state, /close | |
| │ │ # /tasks, /health, / endpoints | |
| │ │ | |
| │ ├── env.py # OpenEnv environment implementation (122 lines) | |
| │ │ # Core logic: reset(), step(), state(), close() | |
| │ │ # Thread-safe with locks for concurrency | |
| │ │ | |
| │ ├── models.py # Pydantic data models | |
| │ │ # Defines: ContentObservation, ModerationAction | |
| │ │ # StepResult, ResetResult, EnvState | |
| │ │ | |
| │ ├── tasks.py # Task datasets & ground truth (193 lines) | |
| │ │ # Contains: text_spam, content_moderation, | |
| │ │ # deepfake_detection task definitions & items | |
| │ │ | |
| │ ├── graders.py # Reward functions per task (95 lines) | |
| │ │ # Implements: label F1, calibration bonus, | |
| │ │ # decision accuracy scoring logic | |
| │ │ | |
| │ ├── deepfake_model.py # HF deepfake detection pipeline (90 lines) | |
| │ │ # Lazy-loads: dima806/deepfake_vs_real... | |
| │ │ # Caches model in HF_HOME for reuse | |
| │ │ | |
| │ ├── openenv.yaml # OpenEnv specification metadata | |
| │ │ # Declares task specs, observation/action space | |
| │ │ | |
| │ ├── Dockerfile # Docker container definition | |
| │ │ # Base: python:3.11-slim (~300MB) | |
| │ │ # Installs system deps, pip packages, | |
| │ │ # pre-downloads deepfake model | |
| │ │ | |
| │ └── requirements.txt # Python dependencies (12 packages) | |
| │ # Key: fastapi, uvicorn, transformers, | |
| │ # torch, openai, python-dotenv | |
| │ | |
| ├── test/ # Test suite | |
| │ └── test.py # pytest tests (20+ test cases) | |
| │ # Coverage: tasks, endpoints, rewards | |
| │ | |
| └── .env # Environment variables (git-ignored) | |
| # Stores: HF_TOKEN, API_BASE_URL, etc. | |
| ``` | |
| --- | |
| ## Environment Variables | |
| Configuration is controlled via environment variables. Create a `.env` file in the project root: | |
| ```env | |
| # ============ API Configuration ============ | |
| API_BASE_URL=https://router.huggingface.co/v1 | |
| # URL of the LLM inference endpoint | |
| # Default: HuggingFace router (requires HF_TOKEN) | |
| MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct | |
| # Which LLM to use for agent inference | |
| # Other options: gpt-3.5-turbo, claude-3-opus, mistral-large, etc. | |
| HF_TOKEN=hf_your_token_here | |
| # HuggingFace API token for authenticated requests | |
| # Get from: https://huggingface.co/settings/tokens | |
| # ============ Server Configuration ============ | |
| SERVER_URL=http://localhost:7860 | |
| # Where the OpenEnv API server runs | |
| # Used by inference.py to connect to environment | |
| # ============ Task & Inference Configuration ============ | |
| TASK_NAME=text_spam | |
| # Which task to run: text_spam, content_moderation, deepfake_detection | |
| USE_LOCAL_MODEL=false | |
| # If true: Load Llama-3.1-8B locally via transformers | |
| # If false: Use remote API (requires HF_TOKEN) | |
| # Local mode requires 16GB+ RAM | |
| # ============ HuggingFace Model Caching ============ | |
| HF_HOME=/app/.cache/huggingface | |
| # Directory for cached HF models and datasets | |
| # Mounted as volume in Docker for persistence | |
| TRANSFORMERS_CACHE=/app/.cache/huggingface | |
| # Alternative env var for transformers library caching | |
| # ============ Python Configuration ============ | |
| PYTHONDONTWRITEBYTECODE=1 | |
| # Don't create __pycache__ directories | |
| PYTHONUNBUFFERED=1 | |
| # Stream logs immediately (useful in Docker) | |
| # ============ Logging ============ | |
| LOG_LEVEL=INFO | |
| # Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL | |
| ``` | |
| ### Variable Precedence | |
| 1. Environment variables (highest priority) | |
| 2. `.env` file | |
| 3. Hardcoded defaults in code (lowest priority) | |
| Example override: | |
| ```bash | |
| export HF_TOKEN="hf_custom_token" && python inference.py | |
| # Uses custom token instead of .env value | |
| ``` | |
| --- | |
| ## Running Tests | |
| The project includes a comprehensive test suite using pytest. | |
| ### Setup | |
| ```bash | |
| pip install pytest pytest-cov | |
| ``` | |
| ### Run All Tests | |
| ```bash | |
| pytest test/test.py -v | |
| ``` | |
| ### Run Specific Test Class | |
| ```bash | |
| pytest test/test.py::TestTasks -v | |
| ``` | |
| ### Run with Coverage Report | |
| ```bash | |
| pytest test/test.py --cov=server --cov-report=html | |
| # Opens htmlcov/index.html in browser for coverage visualization | |
| ``` | |
| ### Test Categories | |
| | Test | Coverage | Status | | |
| |---|---|---| | |
| | Task loading | All 3 tasks initialize correctly | ✓ | | |
| | API endpoints | /reset, /step, /state, /close, /tasks, /health | ✓ | | |
| | Reward grading | text_spam, content_moderation, deepfake_detection | ✓ | | |
| | Input validation | Action schema validation, label validation | ✓ | | |
| | Edge cases | Empty labels, out-of-range confidence, etc. | ✓ | | |
| --- | |
| ## Troubleshooting | |
| ### Installation Issues | |
| **Problem:** `ImportError: No module named 'openai'` | |
| ```bash | |
| Solution: pip install "openai>=1.40.0" | |
| ``` | |
| **Problem:** `ImportError: No module named 'torch'` | |
| ```bash | |
| Solution: pip install torch torchvision | |
| # For GPU: pip install torch torchvision -f https://download.pytorch.org/whl/cu121/torch_stable.html | |
| ``` | |
| **Problem:** `FileNotFoundError: requirements.txt` | |
| ```bash | |
| Solution: Ensure you're in the project root: cd content-moderation-env/ | |
| # Then: pip install -r server/requirements.txt | |
| ``` | |
| ### Docker Issues | |
| **Problem:** `Segmentation fault (core dumped)` during build | |
| ``` | |
| Solution: Allocate more memory to Docker build: | |
| docker build --memory=8g -f server/Dockerfile -t content-moderation-env . | |
| ``` | |
| **Problem:** `failed to solve: failed to compute cache key` | |
| ``` | |
| Solution: Ensure requirements.txt is in server/ directory: | |
| # Current: server/requirements.txt (correct) | |
| # Wrong: ./requirements.txt | |
| ``` | |
| **Problem:** Port 7860 already in use | |
| ```bash | |
| Solution: Use different port: | |
| docker run -p 8000:7860 content-moderation-env | |
| # Now access at http://localhost:8000 | |
| ``` | |
| ### Runtime Issues | |
| **Problem:** `Connection refused: localhost:7860` | |
| ```bash | |
| Solution: Ensure server is running: | |
| uvicorn server.main:app --host 0.0.0.0 --port 7860 | |
| In Docker, use: docker logs <container_id> | |
| ``` | |
| **Problem:** `Client.__init__() got an unexpected keyword argument 'proxies'` | |
| ```bash | |
| Solution: Update OpenAI client: | |
| pip install --upgrade openai | |
| ``` | |
| **Problem:** HuggingFace models downloading very slowly | |
| ```bash | |
| Solution: Check internet connection and verify HF_TOKEN: | |
| export HF_TOKEN="hf_your_token_here" | |
| # Or download models ahead of time | |
| python -c "from transformers import pipeline; pipeline('image-classification', model='dima806/deepfake_vs_real_image_detection')" | |
| ``` | |
| ### API Issues | |
| **Problem:** Invalid request to `/step` without `/reset` | |
| ```json | |
| Error: "Environment not initialized. Call /reset first." | |
| Solution: Always call POST /reset before any /step requests | |
| ``` | |
| **Problem:** Invalid label in action | |
| ```json | |
| Error: {"detail": "Invalid label: 'unknown_label'"} | |
| Solution: Use only valid labels from the specification | |
| ``` | |
| **Problem:** Confidence out of range | |
| ``` | |
| Solution: Ensure confidence is between 0.0 and 1.0 | |
| ``` | |
| --- | |
| ## Citation | |
| If you use this environment in your research, please cite: | |
| ```bibtex | |
| @software{content_moderation_openenv_2025, | |
| title={Content Moderation OpenEnv: A Real-World AI Triage Environment}, | |
| author={Anidipta}, | |
| year={2025}, | |
| url={https://github.com/Anidipta/Content-Moderation-env}, | |
| note={OpenEnv Specification Compliant} | |
| } | |
| ``` | |
| --- | |
| ## Acknowledgements | |
| 🙏 Built for the **OpenEnv Hackathon 2025**. | |
| **Special Thanks To:** | |
| - OpenEnv community for the specification and framework | |
| - HuggingFace for model hosting and inference APIs | |
| - Meta for the Llama-3.1-8B-Instruct model | |
| - Contributors and testers who improved the environment | |
| **Dataset & Content Note:** | |
| The email and content corpus is entirely **synthetic** and does not represent any real individuals, companies, organizations, or actual events. All examples are generated for demonstration and testing purposes only. | |
| **License:** MIT License — See [LICENSE](LICENSE) file for details | |
| **Questions?** Open an issue on GitHub or contact the maintainers. | |
| --- | |
| **Last Updated:** April 8, 2026 | **OpenEnv Spec Version:** 1.0 | |
| colorTo: green | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| --- | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| >>>>>>> f6dee02010a32ba1936311cbb3790fa087282e74 | |