Spaces:
Sleeping
Sleeping
| title: Scheduling Env Environment Server | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: pink | |
| sdk: docker | |
| pinned: false | |
| app_port: 8000 | |
| base_path: /web | |
| tags: | |
| - openenv | |
| # Meeting Scheduling RL Environment | |
| An OpenEnv reinforcement-learning environment where AI agents learn to schedule meetings optimally across multiple attendees. The agent must propose time slots, resolve calendar conflicts by rescheduling lower-priority meetings, and satisfy each participant's scheduling preferences β all within a limited number of steps. | |
| ## Overview | |
| The environment simulates a realistic corporate scheduling assistant. Given a meeting request, the agent iteratively: | |
| 1. **Proposes** a time slot for all required attendees. | |
| 2. **Reschedules** any lower-priority conflicting meetings to free up the slot. | |
| 3. **Finalizes** the booking once the slot is conflict-free. | |
| Each episode is scored on scheduling quality (0.0β1.0), penalizing preference violations, unnecessary rescheduling, and excessive steps. | |
| ## Quick Start | |
| ### Running the Heuristic Baseline (no LLM needed) | |
| ```bash | |
| python inference.py | |
| ``` | |
| This runs a greedy baseline policy across all three tasks and prints step-by-step output in the required `[START]`/`[STEP]`/`[END]` format. | |
| ### Using the Environment Directly (Python) | |
| ```python | |
| from server.scheduling_env_environment import SchedulingEnvironment | |
| from models import SchedulingAction | |
| env = SchedulingEnvironment() | |
| # Reset to a specific task | |
| obs = env.reset(task_id="task1_easy") | |
| print(f"Attendees: {obs.attendee_ids}") | |
| print(f"Duration: {obs.requested_duration} min") | |
| print(f"Priority: {obs.requested_priority}") | |
| # Propose a time slot | |
| result = env.step(SchedulingAction( | |
| action_type="propose_slot", | |
| proposed_start="2025-04-07T10:00:00+00:00", | |
| proposed_duration=30, | |
| )) | |
| print(f"Conflicts: {result.conflicts}") | |
| print(f"Reward: {result.reward}") | |
| # Finalize when conflict-free | |
| result = env.step(SchedulingAction(action_type="finalize")) | |
| print(f"Success: {result.success} Final score: {result.reward:.2f}") | |
| ``` | |
| ### Using the HTTP Client | |
| ```python | |
| from client import SchedulingEnv | |
| from models import SchedulingAction | |
| with SchedulingEnv(base_url="http://localhost:8000") as env: | |
| result = env.reset(task_id="task2_medium") | |
| obs = result.observation | |
| # Propose a slot | |
| result = env.step(SchedulingAction( | |
| action_type="propose_slot", | |
| proposed_start="2025-04-07T11:00:00+00:00", | |
| proposed_duration=60, | |
| )) | |
| # Reschedule a conflicting lower-priority meeting | |
| if result.observation.conflicts: | |
| conflict = result.observation.conflicts[0] | |
| result = env.step(SchedulingAction( | |
| action_type="reschedule_meeting", | |
| meeting_id_to_move=conflict["meeting_id"], | |
| new_start_time="2025-04-07T07:00:00+00:00", | |
| )) | |
| # Finalize | |
| result = env.step(SchedulingAction(action_type="finalize")) | |
| print(f"Score: {result.reward:.2f}") | |
| ``` | |
| ## Environment Details | |
| ### Actions (`SchedulingAction`) | |
| | `action_type` | Required fields | Description | | |
| |----------------------|----------------------------------------------|-----------------------------------------------------------| | |
| | `propose_slot` | `proposed_start`, `proposed_duration` | Propose a meeting start time (ISO 8601) and duration (min)| | |
| | `reschedule_meeting` | `meeting_id_to_move`, `new_start_time` | Move a lower-priority conflict to a new time | | |
| | `finalize` | _(none)_ | Confirm the proposed slot; ends the episode | | |
| | `reject` | _(none)_ | Give up on scheduling; ends the episode with 0 reward | | |
| **Meeting ID format:** `{attendee}_{start_iso}` β e.g. `user1_2025-04-07T09:00:00+00:00` | |
| ### Observations (`SchedulingObservation`) | |
| | Field | Type | Description | | |
| |-------------------------|-------------------------|--------------------------------------------------------------| | |
| | `requested_duration` | `int` | Meeting duration in minutes | | |
| | `requested_priority` | `int` | Priority of the new meeting (1 = highest, 5 = lowest) | | |
| | `attendee_ids` | `List[str]` | Required attendees | | |
| | `busy_slots` | `List[dict]` | All existing calendar entries for attendees | | |
| | `collective_work_hours` | `dict` | Shared working-hours window `{min_start_hour, max_end_hour}` | | |
| | `preference_constraints`| `dict` | Aggregated constraints (max meetings/day, buffer, etc.) | | |
| | `current_proposal` | `dict \| None` | Currently proposed slot `{start, end}` | | |
| | `conflicts` | `List[dict]` | Conflicts for the current proposal | | |
| | `preference_penalty` | `float` | Accumulated preference-violation penalty | | |
| | `num_rescheduled` | `int` | Meetings rescheduled so far in this episode | | |
| | `steps_taken` | `int` | Steps used so far | | |
| | `max_steps` | `int` | Episode step limit (20) | | |
| | `success` | `bool` | `True` when the meeting is successfully booked | | |
| | `error_message` | `str \| None` | Reason if the last action was invalid | | |
| | `done` | `bool` | `True` when the episode has ended | | |
| | `reward` | `float` | Step or final reward | | |
| ### Reward Design | |
| **Step-level rewards** (returned after each `propose_slot` or `reschedule_meeting`): | |
| | Outcome | Reward | | |
| |------------------------------------------|--------| | |
| | Conflict-free proposal (low penalty) | +0.5 | | |
| | Proposal has reschedulable conflicts | +0.2 | | |
| | Proposal has non-reschedulable conflicts | β0.3 | | |
| | Invalid action | β0.1 | | |
| | Outside working hours | β0.2 | | |
| **Final reward** (returned on `finalize`) β deducted from 1.0: | |
| ``` | |
| preference_deduction = min(0.75, (penalty ** 1.2) / 200.0) | |
| reschedule_deduction = min(0.30, 0.05 * (1.8 ** num_rescheduled)) [if any rescheduled] | |
| time_deduction = steps_taken * 0.015 | |
| final_reward = clamp(1.0 - preference_deduction - reschedule_deduction - time_deduction, 0.0, 1.0) | |
| ``` | |
| Timeout (step 20 reached without `finalize`) gives partial credit: 70 % of the theoretical reward if conflict-free, or a progress-based fraction otherwise. | |
| ## Tasks | |
| Three tasks of increasing difficulty are provided as JSON scenarios in `server/scenarios/`: | |
| | Task ID | Difficulty | Attendees | Duration | Priority | Rescheduling needed | Expected score | | |
| |-----------------|------------|-----------|----------|----------|---------------------|----------------| | |
| | `task1_easy` | Easy | 2 | 30 min | 3 | No | 0.8 β 1.0 | | |
| | `task2_medium` | Medium | 4 | 60 min | 2 | Yes (1 meeting) | 0.5 β 0.7 | | |
| | `task3_hard` | Hard | 6 | 45 min | 2 | Yes (3+ meetings) | 0.25 β 0.45 | | |
| ### task1_easy β Team Sync (2 attendees) | |
| - Two attendees each have 2 existing meetings; a clear free slot exists at **10:00**. | |
| - Agent should find the free slot and finalize in 2 steps. | |
| - No rescheduling required. | |
| ### task2_medium β Cross-Team Planning (4 attendees) | |
| - Four attendees with densely packed schedules; the optimal slot at **11:00** has one low-priority conflict (`user3` Coffee chat, priority 4). | |
| - Agent needs to propose the slot, reschedule the conflict, then finalize. | |
| - User preferences include back-to-back avoidance and different preferred-hour windows. | |
| ### task3_hard β Executive Planning Session (6 attendees) | |
| - Six attendees with very dense calendars; the best window at **15:00** requires rescheduling three low-priority meetings (priority 4). | |
| - Multiple valid solutions exist; the agent must navigate cascading constraints. | |
| - All attendees have strict buffer requirements and narrow preferred-hour windows. | |
| ## Participant Preferences | |
| Each attendee can have the following preferences (stored in scenario JSON and observed via `preference_constraints`): | |
| | Preference | Description | Penalty for violation | | |
| |------------------------|-----------------------------------------------------|-----------------------| | |
| | `preferred_hours` | `{start: H, end: H}` β preferred working hours | +50 per participant | | |
| | `max_meetings_per_day` | Maximum meetings the participant wants in a day | +30 per participant | | |
| | `avoid_back_to_back` | Whether a buffer gap is required between meetings | +20 per participant | | |
| | `buffer_minutes` | Gap required before/after a meeting (if avoid_btb) | (part of above) | | |
| The **collective working hours** (the intersection of all attendees' preferred hours) define the hard constraint window within which proposals must fall. | |
| ## API Endpoints | |
| The server exposes the following HTTP endpoints (also available via the Web UI at `/web`): | |
| | Method | Path | Description | | |
| |--------|-----------|--------------------------------------------------------------------| | |
| | POST | `/reset` | Start a new episode. Body: `{"task_id": "task1_easy"}` | | |
| | POST | `/step` | Take an action. Body: `{"action_type": "...", ...action fields}` | | |
| | GET | `/state` | Return the full internal `SchedulingState` | | |
| | GET | `/health` | Health check β returns `{"status": "healthy"}` | | |
| | GET | `/docs` | Interactive OpenAPI / Swagger UI | | |
| ### Example: REST interaction | |
| ```bash | |
| # Start episode | |
| curl -X POST http://localhost:8000/reset \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"task_id": "task1_easy"}' | |
| # Propose a slot | |
| curl -X POST http://localhost:8000/step \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"action_type": "propose_slot", "proposed_start": "2025-04-07T10:00:00+00:00", "proposed_duration": 30}' | |
| # Finalize | |
| curl -X POST http://localhost:8000/step \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"action_type": "finalize"}' | |
| ``` | |
| ## Development & Testing | |
| ### Run the baseline inference script | |
| ```bash | |
| python inference.py | |
| ``` | |
| ### Start the server locally | |
| ```bash | |
| uvicorn server.app:app --reload | |
| ``` | |
| ### Validate the environment (required before submission) | |
| ```bash | |
| openenv validate | |
| ``` | |
| ### Generate / update the lock file | |
| ```bash | |
| uv lock | |
| ``` | |
| ### Build the Docker image | |
| ```bash | |
| docker build -t scheduling_env:latest . | |
| ``` | |
| ## Deploying to Hugging Face Spaces | |
| ```bash | |
| # From the project root (where openenv.yaml is located) | |
| openenv push | |
| # Push to a specific repository | |
| openenv push --repo-id my-org/my-scheduling-env | |
| # Push as a private space | |
| openenv push --private | |
| ``` | |
| The `openenv push` command validates the environment, builds a Hugging Face-compatible Docker image, and uploads it. After deployment your space is available at: | |
| ``` | |
| https://huggingface.co/spaces/<repo-id> | |
| ``` | |
| The deployed space includes: | |
| - **Web Interface** at `/web` β interactive UI for exploring the environment | |
| - **API Documentation** at `/docs` β full OpenAPI / Swagger interface | |
| - **Health Check** at `/health` β container health monitoring | |
| ### Options | |
| | Flag | Description | | |
| |------|-------------| | |
| | `--directory`, `-d` | Directory with `openenv.yaml` (default: current dir) | | |
| | `--repo-id`, `-r` | Repository ID `username/repo-name` | | |
| | `--base-image`, `-b` | Override Dockerfile `FROM` image | | |
| | `--private` | Deploy as a private space (default: public) | | |
| ## Environment Variables (for LLM-based inference) | |
| Create a `.env` file (never commit it): | |
| ``` | |
| API_BASE_URL=https://router.huggingface.co/v1 # HF Router endpoint | |
| MODEL_NAME=Qwen/Qwen2.5-72B-Instruct # Model identifier | |
| HF_TOKEN=hf_... # Hugging Face API key | |
| ``` | |
| ## Project Structure | |
| ``` | |
| rl-scheduling-env/ | |
| βββ Dockerfile # Container image (root, required by openenv) | |
| βββ README.md # This file | |
| βββ openenv.yaml # OpenEnv manifest | |
| βββ pyproject.toml # Project metadata and dependencies | |
| βββ uv.lock # Locked dependencies (generated by `uv lock`) | |
| βββ __init__.py # Package exports | |
| βββ models.py # Pydantic models: SchedulingAction, | |
| β # SchedulingObservation, SchedulingState | |
| βββ client.py # SchedulingEnv HTTP/WebSocket client | |
| βββ inference.py # Heuristic baseline (no LLM required) | |
| βββ server/ | |
| βββ __init__.py # Server package exports | |
| βββ app.py # FastAPI app + SchedulingHTTPEnvServer | |
| βββ scheduling_env_environment.py # Core RL environment (reset / step / state) | |
| βββ scheduling_logic.py # Pure utility functions (conflict detection, | |
| β # preference scoring, reward calculation) | |
| βββ graders.py # SchedulingGrader (0.0β1.0 episode scorer) | |
| βββ requirements.txt # Server-side Python dependencies | |
| βββ scenarios/ | |
| βββ task1_easy.json # Easy: 2 attendees, free slot exists | |
| βββ task2_medium.json # Medium: 4 attendees, 1 rescheduling needed | |
| βββ task3_hard.json # Hard: 6 attendees, 3+ reschedulings needed | |
| ``` | |