π¨ Email Gym
An OpenEnv environment where AI agents learn to triage, route, and respond to operational messages through adversarial curricula and GRPO fine-tuning. Built for the Meta Γ OpenEnv Γ Hugging Face Γ PyTorch Hackathon.
π― Why This Matters
Operational message overload β routing alerts to the wrong team, missing critical VP requests, responding to vendor spam β costs engineering teams hours every week. This environment trains RL agents to be automated message triage specialists, a task humans perform manually every day across DevOps, executive assistants, and operations roles.
Real-world utility: Operations teams, executive assistants, and DevOps engineers manually triage hundreds of messages daily across Slack, email, and ticketing systems. This environment provides a standardised benchmark for training and evaluating agents that automate this process with verifiable, graded outcomes.
ποΈ Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client / Agent β
β inference.py β OpenAI API β LLM β parse action β HTTP β
βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β HTTP POST /reset, /step, /state
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Docker Container (HF Space) β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β FastAPI (server/app.py) β β
β β /reset /step /state /health /schema /ws β β
β ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββΌββββββββββββββββββββββββββββββ β
β β MessageRoutingEnvironment β β
β β (OpenEnv Environment base class) β β
β β β β
β β ββββββββββββ ββββββββββββββββ βββββββββββββββ β β
β β β Tasks β β RewardEngine β β Graders β β β
β β β Registry β β (per-step) β β (0.0β1.0) β β β
β β ββββββββββββ ββββββββββββββββ βββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Component Diagram
| Component | Responsibility |
|---|---|
message_routing_gym/constants.py |
All enums, config values, reward weights |
message_routing_gym/models.py |
Typed Pydantic models (Action, Observation, State) |
message_routing_gym/tasks.py |
Task definitions with ground-truth routing rules |
message_routing_gym/rewards.py |
Dense reward computation with partial progress |
message_routing_gym/graders.py |
Deterministic graders scoring 0.0β1.0 |
server/message_routing_environment.py |
OpenEnv Environment with step()/reset()/state() |
server/app.py |
FastAPI application wiring + custom Gradio mount |
server/gradio_builder.py |
Custom Gradio UI with rich observation display |
inference.py |
Baseline agent using OpenAI API |
Data Flow
Agent Environment
β β
βββββ POST /reset ββββββββββΊβ Load task, init RewardEngine
ββββββ observation ββββββββββ€ Queue + directive + curriculum tier
β β
βββββ POST /step βββββββββββΊβ Parse action
β {route_directory} β Compute reward via RewardEngine
ββββββ observation ββββββββββ€ Feedback + reward + done
β β
βββββ POST /step βββββββββββΊβ Respond action
β {respond, payload} β Semantic grader evaluates response
ββββββ observation ββββββββββ€ Feedback + reward
β β
βββββ POST /step βββββββββββΊβ Dismiss action
β {dismiss} β Route to vault, compute grade
ββββββ observation ββββββββββ€ Feedback + reward
β β
βββββ POST /step βββββββββββΊβ Final action
β β Compute final grader score
ββββββ observation ββββββββββ€ done=True + grader_score
β β
π Action & Observation Spaces
Action Space (MessageRoutingAction)
| Field | Type | UI Widget | Required | Description |
|---|---|---|---|---|
action_type |
"route_directory" | "respond" | "dismiss" |
Dropdown | β | Action to perform |
message_id |
string | Textbox | β | Exact ID from the queue |
target_directory |
"promotions" | "operations" | "management" | "vault" |
Dropdown | For route_directory | Destination folder |
response_payload |
string | Textarea | For respond | Reply text to dispatch |
reasoning |
string | Textarea | Optional | Chain-of-thought explanation |
Observation Space (MessageRoutingObservation)
| Field | Type | Description |
|---|---|---|
task_id |
string | Current task identifier |
difficulty |
"warmup" | "intermediate" | "advanced" |
Curriculum tier |
queue |
list[Message] | Messages awaiting triage (id, source, topic, content, alert_level) |
directories |
dict[str, int] | Count of messages in each folder |
active_directive |
string | Current task goal the agent must resolve |
step_feedback |
string | Feedback from last action |
steps_remaining |
int | Steps left in episode |
cumulative_reward |
float | Running reward total |
action_history |
list[str] | Summary of actions taken |
last_execution_error |
string | Error from last invalid action |
π Tasks
Task 1 β Warmup: Noise Filter (task_warmup_noise)
1 decision type. Sort low-signal promotional broadcasts from legitimate operational mail.
- 4 messages: build alert, discount offer, CTO review, trial nag
- Hint provided:
active_directiveexplicitly names target directories - Expected difficulty: Straightforward for any capable LLM
Task 2 β Intermediate: Stakeholder Acknowledgment (task_intermediate_ack)
2 decision types. Identify the high-priority management request and generate a professional acknowledgment response.
- 2 messages: automated metric digest + VP Engineering escalation
- Semantic grader evaluates response quality (polite, conceptually correct)
- Expected difficulty: Requires understanding of urgency and professional tone
Task 3 β Advanced: Conflict Scheduling (task_advanced_conflict)
3 conflicting signals. Triage a deployment conflict while routing a mis-labelled red-herring invite.
- 3 messages: DevOps request, DB maintenance cron alert, vendor invite marked HIGH
- Agent must reason across all messages, respond with correct time (15:00 not 14:00)
- Expected difficulty: Challenging without multi-step reasoning β most models fail without GRPO
π Reward Design
Rewards are dense and partial-progress β not binary end-of-episode:
| Action | Correct | Incorrect |
|---|---|---|
route_directory |
+0.05 base + grade delta Γ 0.50 | β0.10 (bad directory) |
respond |
+0.10 base + grade delta Γ 0.50 | β |
dismiss |
+0.05 base + grade delta Γ 0.50 | β |
| Bad message ID (hallucinated) | β | β0.20 |
| Episode resolution (grade β₯ 0.99) | +1.5 Γ (1.0 + speed_ratio) | β |
| Timeout floor | β | net reward wiped to β2.0 |
Max score per episode: ~5.0 (fast, perfect resolution)
Grader normalisation: score = clamp(cumulative_reward / max_reward, 0, 1)
π Setup & Usage
Prerequisites
- Python 3.10+
piporuv- Docker (for containerised deployment)
Environment Variables
# Copy the example and fill in your secrets
cp .env.example .env
# Edit .env β at minimum set:
# HF_TOKEN=hf_your_token_here
# OPENENV_URL=http://localhost:8000
Web Interface (Gradio UI)
When deployed to Hugging Face Spaces (or run locally), the environment provides a custom Gradio web UI at /ui with:
- π½ Dropdowns for
action_typeandtarget_directory - π Textbox for
message_idwith queue display - π Multi-line textarea for
response_payloadandreasoning - π Live metric cards β reward, grade, curriculum tier, step count
- π₯οΈ Terminal-style action log with colour-coded rewards
- π¬ Rich message queue cards with alert-level badges
To enable locally:
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
# Then open http://localhost:8000/ui
Local Development
# Clone the repository
git clone https://github.com/elizabeth07-m/email_gym.git
cd email_gym
# Install dependencies
pip install -e ".[dev]"
# Run the server
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
# Run tests
pytest tests/ -v
Docker
# Build and run
docker compose up --build
# Or manually
docker build -t email-gym .
docker run -p 8000:8000 email-gym
API Usage Examples
# Health check
curl http://localhost:8000/health
# Reset (warmup task)
curl -X POST http://localhost:8000/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "task_warmup_noise"}'
# Step (route a message)
curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action": {"action_type": "route_directory", "message_id": "1", "target_directory": "promotions"}}'
# Step (respond to stakeholder)
curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action": {"action_type": "respond", "message_id": "2", "response_payload": "Acknowledged. The deployment window is confirmed for 15:00."}}'
# Step (dismiss to vault)
curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action": {"action_type": "dismiss", "message_id": "3"}}'
# Get state
curl http://localhost:8000/state
# Get schemas
curl http://localhost:8000/schema
Running Inference
# Export environment variables
export API_BASE_URL="https://router.huggingface.co/v1"
export HF_TOKEN="your-token-here"
export MODEL_NAME="elizabeth07-m/email_gym"
export OPENENV_URL="http://localhost:8000"
# Run baseline inference
python inference.py
π’ Deployment (OpenEnv Push)
This environment is designed for one-command deployment to Hugging Face Spaces via the OpenEnv CLI.
Step 1 β Validate
openenv validate
# [OK] email-gym: Ready for multi-mode deployment
Step 2 β Test locally
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Server starts at http://localhost:8000
# Verify: curl http://localhost:8000/health
Step 3 β Deploy to Hugging Face Spaces
# Login to Hugging Face (if not already)
huggingface-cli login
# Push to your HF Space
openenv push --repo-id elizabeth07-m/email_gym
This will:
- Create the
elizabeth07-m/email_gymSpace on Hugging Face (if it doesn't exist) - Upload all environment files, Dockerfile, and
openenv.yaml - Build and deploy the Docker container automatically on HF infrastructure
Step 4 β Verify deployment
# Health check (replace with your Space URL)
curl https://elizabeth07-m-email-gym.hf.space/health
# Run inference against the deployed Space
OPENENV_URL="https://elizabeth07-m-email-gym.hf.space" python inference.py
Deployment Options
# Deploy as a private Space
openenv push --repo-id elizabeth07-m/email_gym --private
# Create a PR instead of pushing directly
openenv push --repo-id elizabeth07-m/email_gym --create-pr
π Baseline Scores
Scores are from the baseline inference agent using Qwen/Qwen2.5-72B-Instruct:
| Task | Difficulty | Score | Steps |
|---|---|---|---|
| task_warmup_noise | Warmup | ~0.82 | 4 |
| task_intermediate_ack | Intermediate | ~0.51 | 6 |
| task_advanced_conflict | Advanced | ~0.28 | 8 |
| Average | ~0.54 |
Scores are approximate and may vary based on model temperature and API availability.
π Project Structure
email-gym/
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml # Python package config
βββ Dockerfile # OpenEnv-compatible build
βββ .env.example # Environment variable template
βββ .gitignore
βββ inference.py # Baseline inference script
βββ client.py # OpenEnv EnvClient wrapper
βββ README.md # This file
β
βββ message_routing_gym/ # Core library
β βββ __init__.py # Package exports
β βββ constants.py # Enums, config, reward weights
β βββ models.py # Pydantic Action/Observation/State
β βββ tasks.py # Task definitions + routing rules
β βββ rewards.py # Dense reward engine
β βββ graders.py # Deterministic graders (0.0β1.0)
β
βββ server/ # OpenEnv server
β βββ __init__.py
β βββ app.py # FastAPI application
β βββ gradio_builder.py # Custom Gradio web UI
β βββ message_routing_environment.py # Environment implementation
β
βββ tests/ # Test suite
βββ __init__.py
βββ test_env.py # Unit + integration tests
π Links
| Resource | URL |
|---|---|
| HF Model / Space | https://huggingface.co/elizabeth07-m/email_gym |
| GitHub Repository | https://github.com/elizabeth07-m/email_gym |
| OpenEnv Hackathon | https://huggingface.co/openenv |