Spaces:

SidraMiconi
/

exec-assistant-arena

Runtime error

App Files Files Community

exec-assistant-arena / README.md

SidraMiconi

Upload folder using huggingface_hub

f63162c verified about 1 month ago

preview code

raw

history blame contribute delete

4.18 kB

metadata

title: Exec Assistant Arena Environment Server
emoji: 📋
colorFrom: gray
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv

Executive Assistant Arena

An OpenEnv environment that simulates a personal assistant's morning inbox. The LLM agent must resolve calendar conflicts, draft email replies, infer hidden user preferences, and handle late-breaking schedule changes.

Trained a Qwen2.5-7B model via GRPO on this environment, showing measurable improvement across 6 decomposed reward components.

The Problem

Your AI assistant double-books you, ignores your "no mornings" preference, and can't handle it when your boss reschedules a meeting at the last minute. This environment trains LLMs to actually handle real-world scheduling chaos.

Architecture

Environment: Procedurally generated scenarios with calendar conflicts, emails, user preferences, and late-breaking changes
3 difficulty tiers: Easy (2 conflicts), Medium (4 conflicts + late changes), Hard (6 conflicts + 2 late changes)
6 reward components: conflict resolution, preference inference, email quality, deadline adherence, efficiency, late-change recovery
All rewards are rule-based — no LLM judges, fully deterministic and verifiable

Quick Start

from exec_assistant_arena import ExecAssistantArenaEnv, AssistantAction

with ExecAssistantArenaEnv(base_url="https://SidraMiconi-exec-assistant-arena.hf.space") as env:
    result = env.reset(seed=42, difficulty="medium")
    print(result.observation.tool_result)  # scenario description
    print(result.observation.conflicts)     # scheduling conflicts

    # Resolve a conflict
    result = env.step(AssistantAction(
        tool="reschedule",
        arguments={"event_id": "mtg_2", "new_time": "2:00pm"}
    ))
    print(f"Reward: {result.reward}")  # +1.0 for resolved conflict

    # Draft an email reply
    result = env.step(AssistantAction(
        tool="draft_reply",
        arguments={
            "email_id": "email_1",
            "body": "Hey! Sure thing, I'll get the budget review to you by tomorrow."
        }
    ))

    # Finish
    result = env.step(AssistantAction(tool="done"))

Available Tools

Tool	Arguments	Reward
`check_calendar`	none	0 (free)
`check_inbox`	none	0 (free)
`reschedule`	`event_id`, `new_time`	+1.0 resolve, -0.5 new conflict
`draft_reply`	`email_id`, `body`	0.0 to +1.0 (quality scored)
`delegate_task`	`task`, `to`	+0.5 if handles late change
`done`	none	terminal rewards

Training

Trained with GRPO (Group Relative Policy Optimization) using Unsloth + TRL:

# On H100
cd exec_assistant_arena
PYTHONPATH=. uvicorn server.app:app --host 0.0.0.0 --port 8000 &
python training/train_grpo.py

Colab notebook available at training/train_colab.ipynb for reproducing on free T4 GPU.

Project Structure

exec_assistant_arena/
├── models.py                    # Action, Observation, State
├── client.py                    # WebSocket client
├── server/
│   ├── app.py                   # FastAPI server
│   ├── exec_assistant_arena_environment.py  # Core env logic
│   ├── scenario_generator.py    # Procedural generation
│   └── reward.py                # 6 decomposed reward components
└── training/
    ├── train_grpo.py            # H100 training script
    ├── train_colab.ipynb        # Colab version
    ├── eval.py                  # Before/after evaluation
    └── scenarios/
        ├── train_scenarios.json # 80 training scenarios
        └── eval_scenarios.json  # 20 held-out scenarios