Spaces:
Runtime error
title: Exec Assistant Arena Environment Server
emoji: π
colorFrom: gray
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
Executive Assistant Arena
An OpenEnv environment that simulates a personal assistant's morning inbox. The LLM agent must resolve calendar conflicts, draft email replies, infer hidden user preferences, and handle late-breaking schedule changes.
Trained a Qwen2.5-7B model via GRPO on this environment, showing measurable improvement across 6 decomposed reward components.
The Problem
Your AI assistant double-books you, ignores your "no mornings" preference, and can't handle it when your boss reschedules a meeting at the last minute. This environment trains LLMs to actually handle real-world scheduling chaos.
Architecture
- Environment: Procedurally generated scenarios with calendar conflicts, emails, user preferences, and late-breaking changes
- 3 difficulty tiers: Easy (2 conflicts), Medium (4 conflicts + late changes), Hard (6 conflicts + 2 late changes)
- 6 reward components: conflict resolution, preference inference, email quality, deadline adherence, efficiency, late-change recovery
- All rewards are rule-based β no LLM judges, fully deterministic and verifiable
Quick Start
from exec_assistant_arena import ExecAssistantArenaEnv, AssistantAction
with ExecAssistantArenaEnv(base_url="https://SidraMiconi-exec-assistant-arena.hf.space") as env:
result = env.reset(seed=42, difficulty="medium")
print(result.observation.tool_result) # scenario description
print(result.observation.conflicts) # scheduling conflicts
# Resolve a conflict
result = env.step(AssistantAction(
tool="reschedule",
arguments={"event_id": "mtg_2", "new_time": "2:00pm"}
))
print(f"Reward: {result.reward}") # +1.0 for resolved conflict
# Draft an email reply
result = env.step(AssistantAction(
tool="draft_reply",
arguments={
"email_id": "email_1",
"body": "Hey! Sure thing, I'll get the budget review to you by tomorrow."
}
))
# Finish
result = env.step(AssistantAction(tool="done"))
Available Tools
| Tool | Arguments | Reward |
|---|---|---|
check_calendar |
none | 0 (free) |
check_inbox |
none | 0 (free) |
reschedule |
event_id, new_time |
+1.0 resolve, -0.5 new conflict |
draft_reply |
email_id, body |
0.0 to +1.0 (quality scored) |
delegate_task |
task, to |
+0.5 if handles late change |
done |
none | terminal rewards |
Training
Trained with GRPO (Group Relative Policy Optimization) using Unsloth + TRL:
# On H100
cd exec_assistant_arena
PYTHONPATH=. uvicorn server.app:app --host 0.0.0.0 --port 8000 &
python training/train_grpo.py
Colab notebook available at training/train_colab.ipynb for reproducing on free T4 GPU.
Project Structure
exec_assistant_arena/
βββ models.py # Action, Observation, State
βββ client.py # WebSocket client
βββ server/
β βββ app.py # FastAPI server
β βββ exec_assistant_arena_environment.py # Core env logic
β βββ scenario_generator.py # Procedural generation
β βββ reward.py # 6 decomposed reward components
βββ training/
βββ train_grpo.py # H100 training script
βββ train_colab.ipynb # Colab version
βββ eval.py # Before/after evaluation
βββ scenarios/
βββ train_scenarios.json # 80 training scenarios
βββ eval_scenarios.json # 20 held-out scenarios
Links
- HF Space: https://huggingface.co/spaces/SidraMiconi/exec-assistant-arena
- GitHub: https://github.com/Sidra/chief-of-staff/tree/main/exec_assistant_arena
- W&B: https://wandb.ai/code-happy-sf/exec-assistant-arena
- Trained Model: https://huggingface.co/SidraMiconi/exec-assistant-arena-lora
Built for the OpenEnv Hackathon SF, March 7-8, 2026.