SidraMiconi's picture
Upload folder using huggingface_hub
f63162c verified
metadata
title: Exec Assistant Arena Environment Server
emoji: πŸ“‹
colorFrom: gray
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv

Executive Assistant Arena

An OpenEnv environment that simulates a personal assistant's morning inbox. The LLM agent must resolve calendar conflicts, draft email replies, infer hidden user preferences, and handle late-breaking schedule changes.

Trained a Qwen2.5-7B model via GRPO on this environment, showing measurable improvement across 6 decomposed reward components.

The Problem

Your AI assistant double-books you, ignores your "no mornings" preference, and can't handle it when your boss reschedules a meeting at the last minute. This environment trains LLMs to actually handle real-world scheduling chaos.

Architecture

  • Environment: Procedurally generated scenarios with calendar conflicts, emails, user preferences, and late-breaking changes
  • 3 difficulty tiers: Easy (2 conflicts), Medium (4 conflicts + late changes), Hard (6 conflicts + 2 late changes)
  • 6 reward components: conflict resolution, preference inference, email quality, deadline adherence, efficiency, late-change recovery
  • All rewards are rule-based β€” no LLM judges, fully deterministic and verifiable

Quick Start

from exec_assistant_arena import ExecAssistantArenaEnv, AssistantAction

with ExecAssistantArenaEnv(base_url="https://SidraMiconi-exec-assistant-arena.hf.space") as env:
    result = env.reset(seed=42, difficulty="medium")
    print(result.observation.tool_result)  # scenario description
    print(result.observation.conflicts)     # scheduling conflicts

    # Resolve a conflict
    result = env.step(AssistantAction(
        tool="reschedule",
        arguments={"event_id": "mtg_2", "new_time": "2:00pm"}
    ))
    print(f"Reward: {result.reward}")  # +1.0 for resolved conflict

    # Draft an email reply
    result = env.step(AssistantAction(
        tool="draft_reply",
        arguments={
            "email_id": "email_1",
            "body": "Hey! Sure thing, I'll get the budget review to you by tomorrow."
        }
    ))

    # Finish
    result = env.step(AssistantAction(tool="done"))

Available Tools

Tool Arguments Reward
check_calendar none 0 (free)
check_inbox none 0 (free)
reschedule event_id, new_time +1.0 resolve, -0.5 new conflict
draft_reply email_id, body 0.0 to +1.0 (quality scored)
delegate_task task, to +0.5 if handles late change
done none terminal rewards

Training

Trained with GRPO (Group Relative Policy Optimization) using Unsloth + TRL:

# On H100
cd exec_assistant_arena
PYTHONPATH=. uvicorn server.app:app --host 0.0.0.0 --port 8000 &
python training/train_grpo.py

Colab notebook available at training/train_colab.ipynb for reproducing on free T4 GPU.

Project Structure

exec_assistant_arena/
β”œβ”€β”€ models.py                    # Action, Observation, State
β”œβ”€β”€ client.py                    # WebSocket client
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ app.py                   # FastAPI server
β”‚   β”œβ”€β”€ exec_assistant_arena_environment.py  # Core env logic
β”‚   β”œβ”€β”€ scenario_generator.py    # Procedural generation
β”‚   └── reward.py                # 6 decomposed reward components
└── training/
    β”œβ”€β”€ train_grpo.py            # H100 training script
    β”œβ”€β”€ train_colab.ipynb        # Colab version
    β”œβ”€β”€ eval.py                  # Before/after evaluation
    └── scenarios/
        β”œβ”€β”€ train_scenarios.json # 80 training scenarios
        └── eval_scenarios.json  # 20 held-out scenarios

Links

Built for the OpenEnv Hackathon SF, March 7-8, 2026.