Spaces:
Sleeping
Sleeping
| title: Email Triage OpenEnv | |
| emoji: π§ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| port: 7860 | |
| # Email Triage OpenEnv | |
| A complete, production-ready OpenEnv environment for training AI agents to classify and route emails in real-world triage scenarios. | |
| ## Overview | |
| Email triage is a genuine operational bottleneck for support teams, content moderators, and business users. This environment challenges agents to: | |
| 1. **Classify emails** into categories (spam, normal, urgent, billing) | |
| 2. **Route to teams** based on content and context (support, sales, billing) | |
| 3. **Prioritize** based on urgency and SLA requirements | |
| 4. **Handle complexity** across difficulty levels (easy β hard) | |
| The environment provides realistic synthetic email data with varying complexity and meaningful reward signals for partial progress. | |
| ## Features | |
| - β **Full OpenEnv Spec Compliance**: Typed Pydantic models, standard step/reset/state API | |
| - β **3 Graded Tasks**: Easy (spam detection) β Medium (multi-class routing) β Hard (context-aware triage) | |
| - β **Meaningful Reward Function**: Partial credit for classification, routing, and priority decisions | |
| - β **Flask REST API**: HTTP endpoints for interacting with the environment | |
| - β **Baseline Inference**: GPT-4o mini baseline with structured logging | |
| - β **Docker Ready**: Single command deployment to Hugging Face Spaces | |
| - β **Synthetic Data**: Realistic email generation with metadata and ground truth labels | |
| ## Quick Start | |
| ### API Endpoints | |
| The Space provides these endpoints on port 7860: | |
| ```bash | |
| # Health check | |
| GET /health | |
| # Get available tasks | |
| GET /tasks | |
| # Reset environment for a task | |
| POST /reset?task=spam_detection | |
| # Step the environment with an action | |
| POST /step?task=spam_detection | |
| Content-Type: application/json | |
| { | |
| "classification": "spam", | |
| "team": "none", | |
| "priority": 0 | |
| } | |
| # Get current state | |
| GET /state?task=spam_detection | |
| # Describe action/observation spaces | |
| GET /state-describe?task=spam_detection | |
| ``` | |
| ## Tasks | |
| ### Task 1: Spam Detection (Easy) | |
| - **Goal**: Correctly classify 10 emails as spam or legitimate | |
| - **Expected Score**: ~0.80-0.85 | |
| - **Difficulty**: Easy - clear spam patterns | |
| ### Task 2: Multi-Class Routing (Medium) | |
| - **Goal**: Classify 12 emails into 4 categories and route to correct teams | |
| - **Expected Score**: ~0.70-0.75 | |
| - **Difficulty**: Medium - requires multi-class classification and routing | |
| ### Task 3: Context-Aware Triage (Hard) | |
| - **Goal**: Handle 20 emails with VIP customers, SLAs, and escalations | |
| - **Expected Score**: ~0.60-0.70 | |
| - **Difficulty**: Hard - complex context with weighted scoring | |
| ## Environment Structure | |
| ``` | |
| βββ environment/ | |
| β βββ env.py # Main EmailTriageEnv class | |
| β βββ types.py # Pydantic models (Observation, Action, Reward) | |
| β βββ data_generator.py # Synthetic email dataset | |
| β βββ graders.py # Task-specific graders | |
| β βββ __init__.py | |
| βββ app.py # Flask REST API | |
| βββ inference.py # Baseline inference script | |
| βββ openenv.yaml # OpenEnv specification | |
| βββ Dockerfile # Docker configuration | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # This file | |
| ``` | |
| ## Running Locally | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Start Flask app | |
| python app.py | |
| # In another terminal, run inference baseline | |
| OPENAI_API_KEY=your_key python inference.py | |
| ``` | |
| ## Deployment | |
| This Space is already deployed on Hugging Face! The Docker image builds automatically from the Dockerfile and serves the Flask API on port 7860. | |
| ## OpenEnv Specification | |
| This environment fully implements the OpenEnv specification: | |
| - **Observation Space**: Email content, sender info, inbox state | |
| - **Action Space**: Classification (4 categories), Team routing (4 options), Priority (0-3) | |
| - **Reward Space**: Continuous [0.0, 1.0] with breakdown of classification/routing/priority scores | |
| - **API**: `reset()`, `step(action)`, `state()` endpoints | |
| ## Documentation | |
| For more details, see: | |
| - `START_HERE.md` - Getting started guide | |
| - `DEPLOYMENT_CHECKLIST.md` - Pre-submission checklist | |
| - `VALIDATION_GUIDE.md` - Testing and validation | |
| - `FINAL_VALIDATION_REPORT.md` - Full validation results | |
| --- | |
| **Status**: β Production Ready | |
| **OpenEnv Compliance**: β 100% | |
| **All Tests**: β Passing | |
| **Ready for Submission**: β Yes | |