Spaces:
Sleeping
Sleeping
Create TRAINING_RUN.md
Browse files- TRAINING_RUN.md +65 -0
TRAINING_RUN.md
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Training Run Documentation
|
| 2 |
+
|
| 3 |
+
## Project Overview
|
| 4 |
+
**Email Triage OpenEnv** is a production-ready OpenEnv environment developed for the Meta x OpenEnv Hackathon. It simulates real-world email triage workflows where AI agents classify, prioritize, and route emails across operational categories such as spam, billing, support, and urgent communications.
|
| 5 |
+
|
| 6 |
+
This project addresses a genuine business bottleneck: automated inbox triage for support teams, moderators, and enterprise workflows.
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Framework Used
|
| 11 |
+
- OpenEnv (latest release)
|
| 12 |
+
- Python 3.11
|
| 13 |
+
- Hugging Face Space (Docker deployment)
|
| 14 |
+
- Flask REST API
|
| 15 |
+
- Pydantic typed models
|
| 16 |
+
- GPT-4o mini baseline inference pipeline
|
| 17 |
+
- Custom task graders and synthetic data generation
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## Objective
|
| 22 |
+
The environment was designed to train and evaluate agents on progressively harder email triage tasks:
|
| 23 |
+
|
| 24 |
+
### Task 1: Spam Detection (Easy)
|
| 25 |
+
- Binary spam vs normal classification
|
| 26 |
+
- 10 synthetic emails
|
| 27 |
+
- Expected score: 0.80–0.85
|
| 28 |
+
|
| 29 |
+
### Task 2: Multi-Class Routing (Medium)
|
| 30 |
+
- Classify emails into spam / normal / urgent / billing
|
| 31 |
+
- Route to support / sales / billing / none
|
| 32 |
+
- 12 synthetic emails
|
| 33 |
+
- Expected score: 0.70–0.75
|
| 34 |
+
|
| 35 |
+
### Task 3: Context-Aware Triage (Hard)
|
| 36 |
+
- VIP customers
|
| 37 |
+
- SLA urgency
|
| 38 |
+
- Escalation handling
|
| 39 |
+
- 20 synthetic emails
|
| 40 |
+
- Expected score: 0.60–0.70
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
|
| 44 |
+
## Development & Training Process
|
| 45 |
+
Although this project was implemented as a full production environment rather than a Colab notebook, the complete training, evaluation, and baseline workflow is included in the repository.
|
| 46 |
+
|
| 47 |
+
### Process:
|
| 48 |
+
1. Designed synthetic email datasets with realistic metadata
|
| 49 |
+
2. Built OpenEnv-compliant environment with typed observation/action/reward spaces
|
| 50 |
+
3. Implemented graded task progression (easy → medium → hard)
|
| 51 |
+
4. Developed reward functions with partial progress scoring:
|
| 52 |
+
- Classification: 40%
|
| 53 |
+
- Routing: 30%
|
| 54 |
+
- Priority: 30%
|
| 55 |
+
5. Created GPT-4o mini inference baseline
|
| 56 |
+
6. Validated all components with comprehensive automated testing
|
| 57 |
+
7. Deployed to Hugging Face Space
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## Reward Function
|
| 62 |
+
```text
|
| 63 |
+
reward = (0.4 * classification_accuracy)
|
| 64 |
+
+ (0.3 * routing_accuracy)
|
| 65 |
+
+ (0.3 * priority_accuracy)
|