File size: 2,483 Bytes
fc163a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# OpenEnv: Email Triage & Scheduling Assistant (EmailEnv-v1) 📧🚀

**EmailTriage-v1** is a high-utility, real-world task simulation designed for evaluating the decision-making and logical reasoning of agentic workflows. This environment bridges the gap between toy grid-worlds and actual professional productivity tasks.

## 🌟 Motivation & Real-World Utility (30% Weight)
Manual email management is a labor-intensive professional task. This environment models the **Email Triage Assistant** role, a critical function in modern digital workflows. Agents are evaluated on their ability to:
- **Prioritize**: Distinguish between high-stakes meeting requests and low-value noise.
- **Categorize**: Maintain a structured workspace by sorting multi-topic communications.
- **Coordinate**: Resolve scheduling conflicts using real-time calendar cross-referencing—a task that requires logical deduction and conflict resolution.

## 🏗️ Environment Design (20% Weight)

### Observation Space (Pydantic Typed)
The agent receives a rich state snapshot including:
- **`inbox_count`**: Real-time counter of unprocessed items.
- **`current_email`**: A structured object containing the sender, subject, body, and priority.
- **`calendar`**: A list of events representing the agent's current "busy" times.

### Action Space (Pydantic Typed)
The agent can interact with the environment via four high-level professional actions:
- **`MOVE`**: Relocate emails to folders (Archive, Work, Social, Spam).
- **`DELETE`**: Permanent removal of high-risk items (Spam).
- **`REPLY/SCHEDULE`**: Contextual interactions that require generating appropriate reply text (e.g., confirming a 2 PM slot).

## 📊 Task Difficulty Progression (25% Weight)

| Task ID | Level | Objective | Grader / Success Criteria |
|---------|-------|-----------|---------------------------|
| **1: Spam Guard** | Easy | Identify and archive a clear spam email ($1M claims). | Successfully move the spam ID to the "Spam" folder. |
| **2: Inbox Zero** | Medium | Categorize a mixture of work and social updates. | Correctly sort all items without misplacing a priority email. |
| **3: Coordinator**| Hard | Schedule a new 2 PM meeting while avoiding a 10 AM conflict. | Generate a reply correctly confirming the non-conflicting time. |

## 🚀 Setup & Usage
1. **API Key**: Set your `OPENAI_API_KEY` in your environment.
2. **Launch Server**: `python main.py`
3. **Run Baseline**: `python inference.py`

**License**: MIT