Spaces:
Sleeping
Sleeping
Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,164 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: ExecAssist
|
| 3 |
+
emoji: π§
|
| 4 |
+
colorFrom: indigo
|
| 5 |
+
colorTo: blue
|
| 6 |
+
sdk: docker
|
| 7 |
+
app_port: 7860
|
| 8 |
+
pinned: false
|
| 9 |
+
license: mit
|
| 10 |
+
tags:
|
| 11 |
+
- openenv
|
| 12 |
+
- rl
|
| 13 |
+
- executive-assistant
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# ExecAssist β Executive Assistant Environment
|
| 17 |
+
|
| 18 |
+
An OpenEnv environment where AI agents learn to manage email and calendar for busy executives.
|
| 19 |
+
|
| 20 |
+
## Problem Statement
|
| 21 |
+
|
| 22 |
+
Every executive assistant juggles email, calendars, and scheduling conflicts daily. This environment simulates that exact challenge: read incoming requests, draft professional replies, book meetings, and resolve conflicts intelligently.
|
| 23 |
+
|
| 24 |
+
**Theme:** #3.2 - World Modeling (Personalized Tasks)
|
| 25 |
+
|
| 26 |
+
## Tasks
|
| 27 |
+
|
| 28 |
+
### Task 1: Easy β Simple Meeting Request
|
| 29 |
+
- **Challenge:** Single email with clear calendar availability
|
| 30 |
+
- **Agent must:** Draft polite reply + book meeting in open slot
|
| 31 |
+
- **Score:** 50% email quality + 50% scheduling correctness
|
| 32 |
+
|
| 33 |
+
### Task 2: Medium β Scheduling Conflict
|
| 34 |
+
- **Challenge:** Requested time is already booked
|
| 35 |
+
- **Agent must:** Identify conflict + propose 2-3 alternatives + explain professionally
|
| 36 |
+
- **Score:** 30% email quality + 40% conflict resolution + 30% scheduling
|
| 37 |
+
|
| 38 |
+
### Task 3: Hard β Multi-Party Coordination
|
| 39 |
+
- **Challenge:** 3 emails requesting meetings, some overlapping, priority conflicts
|
| 40 |
+
- **Agent must:** Prioritize + reschedule + notify all parties
|
| 41 |
+
- **Score:** 34% email + 33% scheduling + 33% conflict
|
| 42 |
+
|
| 43 |
+
## Environment Design
|
| 44 |
+
|
| 45 |
+
### Observation Space
|
| 46 |
+
- **Emails:** Sender, subject, body, priority
|
| 47 |
+
- **Calendar:** Existing meetings, working hours, blocked times
|
| 48 |
+
- **Contacts:** Names, emails, timezones
|
| 49 |
+
|
| 50 |
+
### Action Space
|
| 51 |
+
```json
|
| 52 |
+
{
|
| 53 |
+
"email_reply": "Professional response text",
|
| 54 |
+
"calendar_action": "book | propose_alternatives | reschedule | decline",
|
| 55 |
+
"meeting_details": {
|
| 56 |
+
"participants": ["email@company.com"],
|
| 57 |
+
"start_time": "2026-04-28T14:00:00",
|
| 58 |
+
"end_time": "2026-04-28T15:00:00",
|
| 59 |
+
"subject": "Meeting topic",
|
| 60 |
+
"proposed_alternatives": [...]
|
| 61 |
+
}
|
| 62 |
+
}
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
### Reward Functions (Multiple Independent Checks)
|
| 66 |
+
|
| 67 |
+
**1. Email Quality (0-1)**
|
| 68 |
+
- Politeness markers (thank you, regards)
|
| 69 |
+
- Proper greeting/closing
|
| 70 |
+
- Sufficient detail (20+ words)
|
| 71 |
+
- Professional tone (no negative framing)
|
| 72 |
+
- LLM-as-judge for nuance
|
| 73 |
+
|
| 74 |
+
**2. Scheduling Correctness (0-1)**
|
| 75 |
+
- No double-booking
|
| 76 |
+
- Within working hours
|
| 77 |
+
- Appropriate duration (15min - 2hrs)
|
| 78 |
+
- All participants included
|
| 79 |
+
|
| 80 |
+
**3. Conflict Resolution (0-1)**
|
| 81 |
+
- Recognizes conflicts
|
| 82 |
+
- Proposes 2-3 alternatives
|
| 83 |
+
- Explains professionally
|
| 84 |
+
- Prioritizes correctly (for hard task)
|
| 85 |
+
|
| 86 |
+
**4. Anti-Reward Hacking Penalties**
|
| 87 |
+
- Too short email: -0.3
|
| 88 |
+
- Missing meeting details: -0.4
|
| 89 |
+
- Generic/templated: -0.1
|
| 90 |
+
- Overly long: -0.15
|
| 91 |
+
|
| 92 |
+
## Baseline Scores
|
| 93 |
+
|
| 94 |
+
### AI Baseline (Nemotron 3 Super 120B) β Untrained
|
| 95 |
+
| Task | Score |
|
| 96 |
+
|------|-------|
|
| 97 |
+
| Easy | 0.315 |
|
| 98 |
+
| Medium | 0.349 |
|
| 99 |
+
| Hard | 0.346 |
|
| 100 |
+
| **Average** | **0.337** |
|
| 101 |
+
|
| 102 |
+
*Note: These are pre-training scores. The model struggles with JSON formatting, conflict detection, and professional email composition. Training target: 0.60-0.80*
|
| 103 |
+
|
| 104 |
+
## Setup & Usage
|
| 105 |
+
|
| 106 |
+
### Local Development
|
| 107 |
+
|
| 108 |
+
```bash
|
| 109 |
+
# Clone the repository
|
| 110 |
+
git clone https://huggingface.co/spaces/YourUsername/exec-assist
|
| 111 |
+
cd exec-assist
|
| 112 |
+
|
| 113 |
+
# Install dependencies
|
| 114 |
+
pip install -r requirements.txt
|
| 115 |
+
|
| 116 |
+
# Run the server
|
| 117 |
+
uvicorn server.app:app --reload
|
| 118 |
+
|
| 119 |
+
# Open API docs
|
| 120 |
+
# http://127.0.0.1:8000/docs
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
### Run Baseline Inference
|
| 124 |
+
|
| 125 |
+
```bash
|
| 126 |
+
# Set environment variables
|
| 127 |
+
export APIBASEURL=https://openrouter.ai/api/v1
|
| 128 |
+
export MODELNAME=nvidia/nemotron-3-super-120b-a12b:free
|
| 129 |
+
export HFTOKEN=your-api-key
|
| 130 |
+
|
| 131 |
+
# Run inference
|
| 132 |
+
python inference.py
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
### Docker
|
| 136 |
+
|
| 137 |
+
```bash
|
| 138 |
+
docker build -t exec-assist .
|
| 139 |
+
docker run -p 7860:7860 exec-assist
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
## Training (In Progress β Apr 26)
|
| 143 |
+
|
| 144 |
+
We will train using TRL + Unsloth:
|
| 145 |
+
1. GRPO trainer setup
|
| 146 |
+
2. Reward shaping
|
| 147 |
+
3. Baseline comparison
|
| 148 |
+
4. Before/after examples
|
| 149 |
+
|
| 150 |
+
## API Endpoints
|
| 151 |
+
|
| 152 |
+
| Endpoint | Method | Description |
|
| 153 |
+
|----------|--------|-------------|
|
| 154 |
+
| `/reset?task=easy\|medium\|hard` | POST | Start new episode |
|
| 155 |
+
| `/step` | POST | Submit action, get reward |
|
| 156 |
+
| `/state` | GET | Current state |
|
| 157 |
+
| `/tasks` | GET | List all tasks |
|
| 158 |
+
| `/health` | GET | Health check |
|
| 159 |
+
| `/metadata` | GET | Environment info |
|
| 160 |
+
| `/schema` | GET | Action/observation/state schemas |
|
| 161 |
+
|
| 162 |
+
## Author
|
| 163 |
+
|
| 164 |
+
**DevanshuDon** β Built for OpenEnv Hackathon 2026
|