Spaces:
Sleeping
Sleeping
Email Triage OpenEnv - Deployment Checklist
Pre-Submission Verification
Project Structure
- environment/init.py - Package exports
- environment/types.py - Pydantic models (Observation, Action, Reward, State, Email, GroundTruth)
- environment/data_generator.py - Synthetic email generation (3 tasks)
- environment/graders.py - Task graders with reward computation
- environment/env.py - EmailTriageEnv with step/reset/state API
- app.py - Flask REST API server
- inference.py - Baseline inference with GPT-4o mini
- openenv.yaml - OpenEnv specification
- Dockerfile - Container configuration
- requirements.txt - Dependencies
- README.md - Documentation
OpenEnv Spec Compliance
- Typed Pydantic models for Observation, Action, Reward
- step(action) -> (observation, reward, done, info)
- reset() -> initial observation
- state() -> full system state
- openenv.yaml with metadata, tasks, spaces
- JSON serialization support (model_dump(mode="json"))
Three Tasks with Graders
Task 1: Spam Detection (Easy)
- 10 emails, binary classification
- Grader: accuracy-based scoring
- Expected score: 0.80-0.85
Task 2: Multi-Class Routing (Medium)
- 12 emails, 4 categories + 3 teams
- Grader: 50% classification + 50% routing
- Expected score: 0.70-0.75
Task 3: Context-Aware Triage (Hard)
- 20 emails, VIP handling, SLA awareness
- Grader: 50% classification + 30% priority + 20% routing
- Expected score: 0.60-0.70
Reward Function
- Returns float in [0.0, 1.0] range
- Per-step reward: classification (40%) + routing (30%) + priority (30%)
- Partial progress signals throughout episode
- Breakdown dictionary in Reward model
Baseline Inference Script
- Named: inference.py in project root
- Uses OpenAI client (gpt-4o-mini)
- Reads env vars: OPENAI_API_KEY, MODEL_NAME, API_BASE_URL
- Outputs [START], [STEP], [END] structured logs
- Runs all 3 tasks sequentially
- Produces reproducible scores
- Runtime < 20 minutes
API Deployment
- Flask server on port 7860
- /health endpoint
- /reset endpoint
- /step endpoint (POST with JSON action)
- /state endpoint
- /state-describe endpoint
- /tasks endpoint listing all tasks
- JSON request/response format
Containerization
- Dockerfile present and valid
- Base: python:3.11-slim
- Installs requirements.txt
- Copies all necessary files
- Exposes port 7860
- Healthcheck configured
- CMD runs Flask app
Documentation
- README.md with:
- Overview and motivation
- Task descriptions
- Observation space definition
- Action space definition
- Setup instructions
- Usage examples (Python + HTTP)
- Baseline script examples
- Expected scores
- Deployment to HF Spaces
- Project structure
- License and support
Local Verification
- Environment imports work
- All 3 tasks initialize successfully
- step() API functional
- Reward computation works (values in [0, 1])
- Graders score correctly
- JSON serialization works
- Flask API responds to requests
Submission Steps
Create Hugging Face Space:
Create repo at: https://huggingface.co/spaces/{username}/email-triage Clone: git clone https://huggingface.co/spaces/{username}/email-triagePush code:
git add . git commit -m "Initial Email Triage OpenEnv" git push origin mainVerify deployment:
- HF Spaces builds Docker image
- API responds at https://{username}-email-triage.hf.space
- Test: curl https://{username}-email-triage.hf.space/health
Run pre-submission validations:
# Local tests python -c "from environment import EmailTriageEnv; env = EmailTriageEnv(); obs = env.reset(); print('OK')" # Flask API test python app.py & curl http://localhost:7860/health curl http://localhost:7860/tasksTest baseline inference locally:
export OPENAI_API_KEY="sk-..." export MODEL_NAME="gpt-4o-mini" python inference.py
Expected Validation Results
Environment Tests
- Reset returns Observation
- Step returns (Observation, Reward, done, info)
- All rewards in [0.0, 1.0]
- Tasks complete successfully
Inference Tests
- Completes without error
- Produces [START]/[STEP]/[END] logs
- Each task processes all emails
- Final scores reported for all 3 tasks
- Average score around 0.70-0.77
Docker Test
- Build succeeds
- Container runs on port 7860
- Health check passes
- API endpoints responsive
Final Checklist
- Code pushed to HF Spaces
- HF Space builds and deploys successfully
- API responsive at live URL
- Baseline inference runs locally with OPENAI_API_KEY set
- All validation checks pass
- Ready for submission