Spaces:

Akshaykumarbm
/

scheduling_env

Sleeping

App Files Files Community

Akshaykumarbm commited on Apr 8

Commit

0f3c199

verified ·

1 Parent(s): 7bdbe90

Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +244 -171
inference.py +260 -165
pyproject.toml +3 -12
sample_infrenae.py +205 -101
uv.lock +2 -2

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 title: Scheduling Env Environment Server
-emoji: 🏏
 colorFrom: blue
 colorTo: pink
 sdk: docker
@@ -11,245 +11,318 @@ tags:
   - openenv
 ---
-# Scheduling Env Environment
-A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
-## Quick Start
-The simplest way to use the Scheduling Env environment is through the `SchedulingEnv` class:
-```python
-from scheduling_env import SchedulingAction, SchedulingEnv
-try:
-    # Create environment from Docker image
-    scheduling_envenv = SchedulingEnv.from_docker_image("scheduling_env-env:latest")
-    # Reset
-    result = scheduling_envenv.reset()
-    print(f"Reset: {result.observation.echoed_message}")
-    # Send multiple messages
-    messages = ["Hello, World!", "Testing echo", "Final message"]
-    for msg in messages:
-        result = scheduling_envenv.step(SchedulingAction(message=msg))
-        print(f"Sent: '{msg}'")
-        print(f"  → Echoed: '{result.observation.echoed_message}'")
-        print(f"  → Length: {result.observation.message_length}")
-        print(f"  → Reward: {result.reward}")
-finally:
-    # Always clean up
-    scheduling_envenv.close()
 ```
-That's it! The `SchedulingEnv.from_docker_image()` method handles:
-- Starting the Docker container
-- Waiting for the server to be ready
-- Connecting to the environment
-- Container cleanup when you call `close()`
-## Building the Docker Image
-Before using the environment, you need to build the Docker image:
-```bash
-# From project root
-docker build -t scheduling_env-env:latest -f server/Dockerfile .
 ```
-## Deploying to Hugging Face Spaces
-You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
-```bash
-# From the environment directory (where openenv.yaml is located)
-openenv push
-# Or specify options
-openenv push --namespace my-org --private
 ```
-The `openenv push` command will:
-1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
-2. Prepare a custom build for Hugging Face Docker space (enables web interface)
-3. Upload to Hugging Face (ensuring you're logged in)
-### Prerequisites
-- Authenticate with Hugging Face: The command will prompt for login if not already authenticated
-### Options
-- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
-- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
-- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
-- `--private`: Deploy the space as private (default: public)
-### Examples
-```bash
-# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
-openenv push
-# Push to a specific repository
-openenv push --repo-id my-org/my-env
-# Push with a custom base image
-openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
-# Push as a private space
-openenv push --private
-# Combine options
-openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
-```
-After deployment, your space will be available at:
-`https://huggingface.co/spaces/<repo-id>`
-The deployed space includes:
-- **Web Interface** at `/web` - Interactive UI for exploring the environment
-- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
-- **Health Check** at `/health` - Container health monitoring
-- **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
-## Environment Details
-### Action
-**SchedulingAction**: Contains a single field
-- `message` (str) - The message to echo back
-### Observation
-**SchedulingObservation**: Contains the echo response and metadata
-- `echoed_message` (str) - The message echoed back
-- `message_length` (int) - Length of the message
-- `reward` (float) - Reward based on message length (length × 0.1)
-- `done` (bool) - Always False for echo environment
-- `metadata` (dict) - Additional info like step count
-### Reward
-The reward is calculated as: `message_length × 0.1`
-- "Hi" → reward: 0.2
-- "Hello, World!" → reward: 1.3
-- Empty message → reward: 0.0
-## Advanced Usage
-### Connecting to an Existing Server
-If you already have a Scheduling Env environment server running, you can connect directly:
-```python
-from scheduling_env import SchedulingEnv
-# Connect to existing server
-scheduling_envenv = SchedulingEnv(base_url="<ENV_HTTP_URL_HERE>")
-# Use as normal
-result = scheduling_envenv.reset()
-result = scheduling_envenv.step(SchedulingAction(message="Hello!"))
 ```
-Note: When connecting to an existing server, `scheduling_envenv.close()` will NOT stop the server.
-### Using the Context Manager
-The client supports context manager usage for automatic connection management:
-```python
-from scheduling_env import SchedulingAction, SchedulingEnv
-# Connect with context manager (auto-connects and closes)
-with SchedulingEnv(base_url="http://localhost:8000") as env:
-    result = env.reset()
-    print(f"Reset: {result.observation.echoed_message}")
-    # Multiple steps with low latency
-    for msg in ["Hello", "World", "!"]:
-        result = env.step(SchedulingAction(message=msg))
-        print(f"Echoed: {result.observation.echoed_message}")
 ```
-The client uses WebSocket connections for:
-- **Lower latency**: No HTTP connection overhead per request
-- **Persistent session**: Server maintains your environment state
-- **Efficient for episodes**: Better for many sequential steps
-### Concurrent WebSocket Sessions
-The server supports multiple concurrent WebSocket connections. To enable this,
-modify `server/app.py` to use factory mode:
-```python
-# In server/app.py - use factory mode for concurrent sessions
-app = create_app(
-    SchedulingEnvironment,  # Pass class, not instance
-    SchedulingAction,
-    SchedulingObservation,
-    max_concurrent_envs=4,  # Allow 4 concurrent sessions
-)
 ```
-Then multiple clients can connect simultaneously:
-```python
-from scheduling_env import SchedulingAction, SchedulingEnv
-from concurrent.futures import ThreadPoolExecutor
-def run_episode(client_id: int):
-    with SchedulingEnv(base_url="http://localhost:8000") as env:
-        result = env.reset()
-        for i in range(10):
-            result = env.step(SchedulingAction(message=f"Client {client_id}, step {i}"))
-        return client_id, result.observation.message_length
-# Run 4 episodes concurrently
-with ThreadPoolExecutor(max_workers=4) as executor:
-    results = list(executor.map(run_episode, range(4)))
 ```
-## Development & Testing
-### Direct Environment Testing
-Test the environment logic directly without starting the HTTP server:
-```bash
-# From the server directory
-python3 server/scheduling_env_environment.py
-```
-This verifies that:
-- Environment resets correctly
-- Step executes actions properly
-- State tracking works
-- Rewards are calculated correctly
-### Running Locally
-Run the server locally for development:
-```bash
-uvicorn server.app:app --reload
 ```
 ## Project Structure
 ```
-scheduling_env/
-├── .dockerignore         # Docker build exclusions
-├── __init__.py            # Module exports
-├── README.md              # This file
-├── openenv.yaml           # OpenEnv manifest
-├── pyproject.toml         # Project metadata and dependencies
-├── uv.lock                # Locked dependencies (generated)
-├── client.py              # SchedulingEnv client
-├── models.py              # Action and Observation models
 └── server/
-    ├── __init__.py        # Server module exports
-    ├── scheduling_env_environment.py  # Core environment logic
-    ├── app.py             # FastAPI application (HTTP + WebSocket endpoints)
-    └── Dockerfile         # Container image definition
 ```

 ---
 title: Scheduling Env Environment Server
+emoji: 📅
 colorFrom: blue
 colorTo: pink
 sdk: docker
   - openenv
 ---
+# Meeting Scheduling RL Environment
+An OpenEnv reinforcement-learning environment where AI agents learn to schedule meetings optimally across multiple attendees. The agent must propose time slots, resolve calendar conflicts by rescheduling lower-priority meetings, and satisfy each participant's scheduling preferences — all within a limited number of steps.
+## Overview
+The environment simulates a realistic corporate scheduling assistant. Given a meeting request, the agent iteratively:
+1. **Proposes** a time slot for all required attendees.
+2. **Reschedules** any lower-priority conflicting meetings to free up the slot.
+3. **Finalizes** the booking once the slot is conflict-free.
+Each episode is scored on scheduling quality (0.0–1.0), penalizing preference violations, unnecessary rescheduling, and excessive steps.
+## Quick Start
+### Running the Heuristic Baseline (no LLM needed)
+```bash
+python inference.py
 ```
+This runs a greedy baseline policy across all three tasks and prints step-by-step output in the required `[START]`/`[STEP]`/`[END]` format.
+### Using the Environment Directly (Python)
+```python
+from server.scheduling_env_environment import SchedulingEnvironment
+from models import SchedulingAction
+env = SchedulingEnvironment()
+# Reset to a specific task
+obs = env.reset(task_id="task1_easy")
+print(f"Attendees: {obs.attendee_ids}")
+print(f"Duration:  {obs.requested_duration} min")
+print(f"Priority:  {obs.requested_priority}")
+# Propose a time slot
+result = env.step(SchedulingAction(
+    action_type="propose_slot",
+    proposed_start="2025-04-07T10:00:00+00:00",
+    proposed_duration=30,
+))
+print(f"Conflicts: {result.conflicts}")
+print(f"Reward:    {result.reward}")
+# Finalize when conflict-free
+result = env.step(SchedulingAction(action_type="finalize"))
+print(f"Success: {result.success}  Final score: {result.reward:.2f}")
 ```
+### Using the HTTP Client
+```python
+from client import SchedulingEnv
+from models import SchedulingAction
+with SchedulingEnv(base_url="http://localhost:8000") as env:
+    result = env.reset(task_id="task2_medium")
+    obs = result.observation
+    # Propose a slot
+    result = env.step(SchedulingAction(
+        action_type="propose_slot",
+        proposed_start="2025-04-07T11:00:00+00:00",
+        proposed_duration=60,
+    ))
+    # Reschedule a conflicting lower-priority meeting
+    if result.observation.conflicts:
+        conflict = result.observation.conflicts[0]
+        result = env.step(SchedulingAction(
+            action_type="reschedule_meeting",
+            meeting_id_to_move=conflict["meeting_id"],
+            new_start_time="2025-04-07T07:00:00+00:00",
+        ))
+    # Finalize
+    result = env.step(SchedulingAction(action_type="finalize"))
+    print(f"Score: {result.reward:.2f}")
 ```
+## Environment Details
+### Actions (`SchedulingAction`)
+| `action_type`        | Required fields                              | Description                                               |
+|----------------------|----------------------------------------------|-----------------------------------------------------------|
+| `propose_slot`       | `proposed_start`, `proposed_duration`        | Propose a meeting start time (ISO 8601) and duration (min)|
+| `reschedule_meeting` | `meeting_id_to_move`, `new_start_time`       | Move a lower-priority conflict to a new time              |
+| `finalize`           | _(none)_                                     | Confirm the proposed slot; ends the episode               |
+| `reject`             | _(none)_                                     | Give up on scheduling; ends the episode with 0 reward     |
+**Meeting ID format:** `{attendee}_{start_iso}` — e.g. `user1_2025-04-07T09:00:00+00:00`
+### Observations (`SchedulingObservation`)
+| Field                   | Type                    | Description                                                  |
+|-------------------------|-------------------------|--------------------------------------------------------------|
+| `requested_duration`    | `int`                   | Meeting duration in minutes                                  |
+| `requested_priority`    | `int`                   | Priority of the new meeting (1 = highest, 5 = lowest)        |
+| `attendee_ids`          | `List[str]`             | Required attendees                                           |
+| `busy_slots`            | `List[dict]`            | All existing calendar entries for attendees                  |
+| `collective_work_hours` | `dict`                  | Shared working-hours window `{min_start_hour, max_end_hour}` |
+| `preference_constraints`| `dict`                  | Aggregated constraints (max meetings/day, buffer, etc.)      |
+| `current_proposal`      | `dict \| None`          | Currently proposed slot `{start, end}`                       |
+| `conflicts`             | `List[dict]`            | Conflicts for the current proposal                           |
+| `preference_penalty`    | `float`                 | Accumulated preference-violation penalty                     |
+| `num_rescheduled`       | `int`                   | Meetings rescheduled so far in this episode                  |
+| `steps_taken`           | `int`                   | Steps used so far                                            |
+| `max_steps`             | `int`                   | Episode step limit (20)                                      |
+| `success`               | `bool`                  | `True` when the meeting is successfully booked               |
+| `error_message`         | `str \| None`           | Reason if the last action was invalid                        |
+| `done`                  | `bool`                  | `True` when the episode has ended                            |
+| `reward`                | `float`                 | Step or final reward                                         |
+### Reward Design
+**Step-level rewards** (returned after each `propose_slot` or `reschedule_meeting`):
+| Outcome                                  | Reward |
+|------------------------------------------|--------|
+| Conflict-free proposal (low penalty)     | +0.5   |
+| Proposal has reschedulable conflicts     | +0.2   |
+| Proposal has non-reschedulable conflicts | −0.3   |
+| Invalid action                           | −0.1   |
+| Outside working hours                    | −0.2   |
+**Final reward** (returned on `finalize`) — deducted from 1.0:
+```
+preference_deduction  = min(0.75, (penalty ** 1.2) / 200.0)
+reschedule_deduction  = min(0.30, 0.05 * (1.8 ** num_rescheduled))   [if any rescheduled]
+time_deduction        = steps_taken * 0.015
+final_reward = clamp(1.0 - preference_deduction - reschedule_deduction - time_deduction, 0.0, 1.0)
+```
+Timeout (step 20 reached without `finalize`) gives partial credit: 70 % of the theoretical reward if conflict-free, or a progress-based fraction otherwise.
+## Tasks
+Three tasks of increasing difficulty are provided as JSON scenarios in `server/scenarios/`:
+| Task ID         | Difficulty | Attendees | Duration | Priority | Rescheduling needed | Expected score |
+|-----------------|------------|-----------|----------|----------|---------------------|----------------|
+| `task1_easy`    | Easy       | 2         | 30 min   | 3        | No                  | 0.8 – 1.0      |
+| `task2_medium`  | Medium     | 4         | 60 min   | 2        | Yes (1 meeting)     | 0.5 – 0.7      |
+| `task3_hard`    | Hard       | 6         | 45 min   | 2        | Yes (3+ meetings)   | 0.25 – 0.45    |
+### task1_easy — Team Sync (2 attendees)
+- Two attendees each have 2 existing meetings; a clear free slot exists at **10:00**.
+- Agent should find the free slot and finalize in 2 steps.
+- No rescheduling required.
+### task2_medium — Cross-Team Planning (4 attendees)
+- Four attendees with densely packed schedules; the optimal slot at **11:00** has one low-priority conflict (`user3` Coffee chat, priority 4).
+- Agent needs to propose the slot, reschedule the conflict, then finalize.
+- User preferences include back-to-back avoidance and different preferred-hour windows.
+### task3_hard — Executive Planning Session (6 attendees)
+- Six attendees with very dense calendars; the best window at **15:00** requires rescheduling three low-priority meetings (priority 4).
+- Multiple valid solutions exist; the agent must navigate cascading constraints.
+- All attendees have strict buffer requirements and narrow preferred-hour windows.
+## Participant Preferences
+Each attendee can have the following preferences (stored in scenario JSON and observed via `preference_constraints`):
+| Preference             | Description                                         | Penalty for violation |
+|------------------------|-----------------------------------------------------|-----------------------|
+| `preferred_hours`      | `{start: H, end: H}` — preferred working hours      | +50 per participant   |
+| `max_meetings_per_day` | Maximum meetings the participant wants in a day      | +30 per participant   |
+| `avoid_back_to_back`   | Whether a buffer gap is required between meetings    | +20 per participant   |
+| `buffer_minutes`       | Gap required before/after a meeting (if avoid_btb)  | (part of above)       |
+The **collective working hours** (the intersection of all attendees' preferred hours) define the hard constraint window within which proposals must fall.
+## API Endpoints
+The server exposes the following HTTP endpoints (also available via the Web UI at `/web`):
+| Method | Path      | Description                                                        |
+|--------|-----------|--------------------------------------------------------------------|
+| POST   | `/reset`  | Start a new episode. Body: `{"task_id": "task1_easy"}`             |
+| POST   | `/step`   | Take an action. Body: `{"action_type": "...", ...action fields}`   |
+| GET    | `/state`  | Return the full internal `SchedulingState`                         |
+| GET    | `/health` | Health check — returns `{"status": "healthy"}`                     |
+| GET    | `/docs`   | Interactive OpenAPI / Swagger UI                                   |
+### Example: REST interaction
+```bash
+# Start episode
+curl -X POST http://localhost:8000/reset \
+  -H "Content-Type: application/json" \
+  -d '{"task_id": "task1_easy"}'
+# Propose a slot
+curl -X POST http://localhost:8000/step \
+  -H "Content-Type: application/json" \
+  -d '{"action_type": "propose_slot", "proposed_start": "2025-04-07T10:00:00+00:00", "proposed_duration": 30}'
+# Finalize
+curl -X POST http://localhost:8000/step \
+  -H "Content-Type: application/json" \
+  -d '{"action_type": "finalize"}'
 ```
+## Development & Testing
+### Run the baseline inference script
+```bash
+python inference.py
+```
+### Start the server locally
+```bash
+uvicorn server.app:app --reload
+```
+### Validate the environment (required before submission)
+```bash
+openenv validate
 ```
+### Generate / update the lock file
+```bash
+uv lock
+```
+### Build the Docker image
+```bash
+docker build -t scheduling_env:latest .
 ```
+## Deploying to Hugging Face Spaces
+```bash
+# From the project root (where openenv.yaml is located)
+openenv push
+# Push to a specific repository
+openenv push --repo-id my-org/my-scheduling-env
+# Push as a private space
+openenv push --private
 ```
+The `openenv push` command validates the environment, builds a Hugging Face-compatible Docker image, and uploads it. After deployment your space is available at:
+```
+https://huggingface.co/spaces/<repo-id>
+```
+The deployed space includes:
+- **Web Interface** at `/web` — interactive UI for exploring the environment
+- **API Documentation** at `/docs` — full OpenAPI / Swagger interface
+- **Health Check** at `/health` — container health monitoring
+### Options
+| Flag | Description |
+|------|-------------|
+| `--directory`, `-d` | Directory with `openenv.yaml` (default: current dir) |
+| `--repo-id`, `-r` | Repository ID `username/repo-name` |
+| `--base-image`, `-b` | Override Dockerfile `FROM` image |
+| `--private` | Deploy as a private space (default: public) |
+## Environment Variables (for LLM-based inference)
+Create a `.env` file (never commit it):
+```
+API_BASE_URL=https://router.huggingface.co/v1   # HF Router endpoint
+MODEL_NAME=Qwen/Qwen2.5-72B-Instruct            # Model identifier
+HF_TOKEN=hf_...                                  # Hugging Face API key
 ```
 ## Project Structure
 ```
+rl-scheduling-env/
+├── Dockerfile                          # Container image (root, required by openenv)
+├── README.md                           # This file
+├── openenv.yaml                        # OpenEnv manifest
+├── pyproject.toml                      # Project metadata and dependencies
+├── uv.lock                             # Locked dependencies (generated by `uv lock`)
+├── __init__.py                         # Package exports
+├── models.py                           # Pydantic models: SchedulingAction,
+│                                       #   SchedulingObservation, SchedulingState
+├── client.py                           # SchedulingEnv HTTP/WebSocket client
+├── inference.py                        # Heuristic baseline (no LLM required)
 └── server/
+    ├── __init__.py                     # Server package exports
+    ├── app.py                          # FastAPI app + SchedulingHTTPEnvServer
+    ├── scheduling_env_environment.py   # Core RL environment (reset / step / state)
+    ├── scheduling_logic.py             # Pure utility functions (conflict detection,
+    │                                   #   preference scoring, reward calculation)
+    ├── graders.py                      # SchedulingGrader (0.0–1.0 episode scorer)
+    ├── requirements.txt                # Server-side Python dependencies
+    └── scenarios/
+        ├── task1_easy.json             # Easy: 2 attendees, free slot exists
+        ├── task2_medium.json           # Medium: 4 attendees, 1 rescheduling needed
+        └── task3_hard.json             # Hard: 6 attendees, 3+ reschedulings needed
 ```

inference.py CHANGED Viewed

@@ -1,198 +1,293 @@
-#!/usr/bin/env python3
 """
-Baseline inference script for the Meeting Scheduling RL Environment.
-Uses a HEURISTIC policy (BotBooked greedy algorithm) - NO LLM required.
-Deterministic, reproducible, fast (~seconds for all 3 tasks).
-Output format: [START]/[STEP]/[END] per hackathon spec.
 """
-from __future__ import annotations
-import sys
-from datetime import datetime, timedelta, timezone
-from server.scheduling_env_environment import SchedulingEnvironment
-from models import SchedulingAction
-from server.scheduling_logic import find_earliest_free_slot, parse_iso
-def baseline_policy(obs) -> SchedulingAction:
-    """Heuristic baseline using greedy slot search + lowest-priority rescheduling."""
-    # Step 1: No proposal yet -> find a free slot
-    if obs.current_proposal is None:
-        # Build calendars dict from busy_slots
-        calendars = {}
-        for slot in obs.busy_slots:
-            att = slot["attendee"]
-            if att not in calendars:
-                calendars[att] = []
-            calendars[att].append([slot["start"], slot["end"], slot["priority"], slot["summary"]])
-        # Try to find a completely free slot
-        free = find_earliest_free_slot(
-            calendars,
-            obs.attendee_ids,
-            obs.requested_duration,
-            obs.busy_slots[0]["start"] if obs.busy_slots else "2025-04-07T09:00:00+00:00",
-            obs.collective_work_hours,
-        )
-        if free:
             return SchedulingAction(
                 action_type="propose_slot",
-                proposed_start=free,
                 proposed_duration=obs.requested_duration,
             )
-        # No completely free slot found.
-        # Scan 15-min increments within collective hours for a slot with only
-        # reschedulable conflicts (priority > requested_priority).
-        min_h = obs.collective_work_hours.get("min_start_hour", 9)
-        max_h = obs.collective_work_hours.get("max_end_hour", 17)
-        duration = obs.requested_duration
-        tz = timezone.utc
-        candidate = datetime(2025, 4, 7, min_h, 0, 0, tzinfo=tz)
-        end_boundary = datetime(2025, 4, 7, max_h, 0, 0, tzinfo=tz)
-        step_delta = timedelta(minutes=15)
-        best_candidate = None
-        best_conflict_count = 999
-        while candidate + timedelta(minutes=duration) <= end_boundary:
-            c_start = candidate.isoformat()
-            c_end = (candidate + timedelta(minutes=duration)).isoformat()
-            # Count conflicts at this candidate
-            conflicts_here = []
-            for att in obs.attendee_ids:
-                for entry in calendars.get(att, []):
-                    e_start = parse_iso(entry[0])
-                    e_end = parse_iso(entry[1])
-                    if candidate < e_end and e_start < candidate + timedelta(minutes=duration):
-                        conflicts_here.append(entry)
-            # Check if all conflicts are reschedulable
-            all_reschedulable = all(
-                c[2] > obs.requested_priority for c in conflicts_here
-            )
-            if all_reschedulable and len(conflicts_here) < best_conflict_count:
-                best_candidate = c_start
-                best_conflict_count = len(conflicts_here)
-                if best_conflict_count == 0:
-                    break  # Perfect slot
-            candidate += step_delta
-        if best_candidate:
-            return SchedulingAction(
-                action_type="propose_slot",
-                proposed_start=best_candidate,
-                proposed_duration=duration,
-            )
-        # Last resort: propose at collective hours start (will likely conflict)
-        fallback = f"2025-04-07T{min_h:02d}:00:00+00:00"
-        return SchedulingAction(
-            action_type="propose_slot",
-            proposed_start=fallback,
-            proposed_duration=obs.requested_duration,
-        )
-    # Step 2: Has proposal with conflicts -> reschedule lowest-priority conflict
-    if obs.conflicts:
-        sorted_conflicts = sorted(obs.conflicts, key=lambda x: x["priority"], reverse=True)
-        target = sorted_conflicts[0]
-        # Can only reschedule lower priority
-        if target["priority"] <= obs.requested_priority:
-            return SchedulingAction(action_type="reject")
-        # Find a free slot for this attendee to move the meeting to.
-        # Search in early morning (06:00-08:00) and late evening (17:00-20:00).
-        attendee = target["attendee"]
-        meeting_dur = parse_iso(target["end"]) - parse_iso(target["start"])
-        dur_min = int(meeting_dur.total_seconds() // 60)
-        # Build this attendee's calendar
-        att_cal = [
-            s for s in obs.busy_slots if s["attendee"] == attendee
-        ]
-        att_entries = [[s["start"], s["end"], s["priority"], s["summary"]] for s in att_cal]
-        new_time = None
-        # Try slots at 06:00, 06:30, 07:00, 07:30, 17:00, 17:30, 18:00, 18:30, 19:00
-        for h, m in [(6,0),(6,30),(7,0),(7,30),(17,0),(17,30),(18,0),(18,30),(19,0),(19,30),(20,0)]:
-            cand = datetime(2025, 4, 7, h, m, 0, tzinfo=timezone.utc)
-            cand_end = cand + timedelta(minutes=dur_min)
-            cand_iso = cand.isoformat()
-            cand_end_iso = cand_end.isoformat()
-            # Check free for this attendee
-            conflict_found = False
-            for e in att_entries:
-                es = parse_iso(e[0])
-                ee = parse_iso(e[1])
-                if cand < ee and es < cand_end:
-                    conflict_found = True
-                    break
-            if not conflict_found:
-                new_time = cand_iso
                 break
-        if not new_time:
-            # Give up on this conflict, try rejecting
-            return SchedulingAction(action_type="reject")
-        return SchedulingAction(
-            action_type="reschedule_meeting",
-            meeting_id_to_move=target["meeting_id"],
-            new_start_time=new_time,
-        )
-    # Step 3: No conflicts -> finalize
-    return SchedulingAction(action_type="finalize")
-def main():
-    env = SchedulingEnvironment()
-    for task_id in ["task1_easy", "task2_medium", "task3_hard"]:
-        print(f"[START] task={task_id} env=scheduling_env model=heuristic_baseline")
-        obs = env.reset(task_id=task_id)
-        done = False
-        step = 0
-        rewards = []
-        while not done and step < 20:
-            action = baseline_policy(obs)
-            obs = env.step(action)
-            done = obs.done
-            reward = obs.reward if obs.reward is not None else 0.0
-            rewards.append(reward)
-            step += 1
-            error = obs.error_message if obs.error_message else "null"
-            print(
-                f"[STEP]  step={step} action={action.action_type} "
-                f"reward={reward:.2f} done={str(done).lower()} error={error}"
-            )
-        final_score = rewards[-1] if (done and rewards) else 0.0
-        success = obs.success if hasattr(obs, "success") else False
-        rewards_str = ",".join(f"{r:.2f}" for r in rewards)
-        print(
-            f"[END]   success={str(success).lower()} steps={step} "
-            f"score={final_score:.2f} rewards={rewards_str}"
-        )
-        print()
 if __name__ == "__main__":
-    main()

 """
+LLM-based Inference Script for Meeting Scheduling RL Environment.
+===================================
+Uses OpenAI-compatible LLM via HF Router to intelligently schedule meetings.
+MANDATORY environment variables:
+    API_BASE_URL   The API endpoint for the LLM.
+    MODEL_NAME     The model identifier to use for inference.
+    HF_TOKEN       Your Hugging Face / API key.
+STDOUT FORMAT:
+    [START] task=<task_name> env=scheduling_env model=<model_name>
+    [STEP]  step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
+    [END]   success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
 """
+import asyncio
+import json
+import os
+import textwrap
+from typing import Dict, List, Optional
+from openai import OpenAI
+from scheduling_env.client import SchedulingEnv
+from scheduling_env.models import SchedulingAction
+# ---------------------------------------------------------------------------
+# Configuration
+# ---------------------------------------------------------------------------
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
+ENV_REPO_ID = "Akshaykumarbm/scheduling_env"
+BENCHMARK = "scheduling_env"
+TASKS = ["task1_easy", "task2_medium", "task3_hard"]
+MAX_STEPS = 20
+TEMPERATURE = 0.3
+MAX_TOKENS = 512
+# ---------------------------------------------------------------------------
+# Logging helpers
+# ---------------------------------------------------------------------------
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.2f} rewards={rewards_str}",
+        flush=True,
+    )
+# ---------------------------------------------------------------------------
+# LLM interaction
+# ---------------------------------------------------------------------------
+SYSTEM_PROMPT = textwrap.dedent("""\
+You are an AI meeting scheduling assistant. You must schedule a meeting by choosing actions.
+Available actions (respond with EXACTLY one JSON object):
+1. Propose a time slot:
+   {"action_type": "propose_slot", "proposed_start": "<ISO8601>", "proposed_duration": <minutes>}
+2. Reschedule a conflicting meeting (only if priority > requested priority):
+   {"action_type": "reschedule_meeting", "meeting_id_to_move": "<attendee>_<start_iso>", "new_start_time": "<ISO8601>"}
+3. Finalize the schedule (only when no conflicts remain):
+   {"action_type": "finalize"}
+4. Reject (give up):
+   {"action_type": "reject"}
+Rules:
+- Propose slots within collective working hours.
+- You can only reschedule meetings with LOWER priority (higher number) than the requested meeting.
+- meeting_id format is: <attendee>_<start_iso> (e.g., "user1_2025-04-07T09:00:00+00:00").
+- After rescheduling all conflicts, call finalize.
+- Minimize preference violations and rescheduling.
+- Respond with ONLY the JSON object, no other text.
+""")
+def format_observation(obs, step: int) -> str:
+    """Convert a SchedulingObservation into a user prompt for the LLM."""
+    parts = [
+        f"Step {step}/{obs.max_steps}",
+        f"Meeting to schedule: {obs.requested_duration} min, priority {obs.requested_priority}",
+        f"Attendees: {', '.join(obs.attendee_ids)}",
+        f"Collective working hours: {obs.collective_work_hours.get('min_start_hour', 9)}:00 - {obs.collective_work_hours.get('max_end_hour', 17)}:00",
+    ]
+    if obs.preference_constraints:
+        parts.append(f"Preferences: max {obs.preference_constraints.get('max_meetings_per_day', 'N/A')} meetings/day, "
+                      f"buffer required: {obs.preference_constraints.get('requires_buffer', False)}, "
+                      f"buffer mins: {obs.preference_constraints.get('buffer_minutes', 0)}")
+    # Busy slots grouped by attendee
+    busy_by_attendee: Dict[str, List] = {}
+    for slot in obs.busy_slots:
+        att = slot["attendee"]
+        busy_by_attendee.setdefault(att, []).append(slot)
+    parts.append("\nCalendars:")
+    for att in obs.attendee_ids:
+        slots = busy_by_attendee.get(att, [])
+        if slots:
+            slot_strs = [
+                f"  - {s['start']} to {s['end']} (priority {s['priority']}, {s['summary']})"
+                for s in sorted(slots, key=lambda x: x["start"])
+            ]
+            parts.append(f"  {att}:")
+            parts.extend(slot_strs)
+        else:
+            parts.append(f"  {att}: (no meetings)")
+    if obs.current_proposal:
+        parts.append(f"\nCurrent proposal: {obs.current_proposal['start']} to {obs.current_proposal['end']}")
+    if obs.conflicts:
+        parts.append(f"\nConflicts ({len(obs.conflicts)}):")
+        for c in obs.conflicts:
+            parts.append(
+                f"  - {c['attendee']}: {c['start']} to {c['end']} "
+                f"(priority {c['priority']}, {c['summary']}, id: {c['meeting_id']})"
+            )
+    if obs.error_message:
+        parts.append(f"\nLast error: {obs.error_message}")
+    parts.append(f"\nRescheduled so far: {obs.num_rescheduled}")
+    parts.append(f"Preference penalty: {obs.preference_penalty}")
+    if not obs.current_proposal and not obs.conflicts:
+        parts.append("\nAction needed: propose a time slot for the meeting.")
+    elif obs.conflicts:
+        parts.append("\nAction needed: reschedule a conflict (lower-priority only) or propose a different slot.")
+    else:
+        parts.append("\nAction needed: no conflicts remain - you should finalize.")
+    return "\n".join(parts)
+def parse_llm_response(text: str, obs) -> SchedulingAction:
+    """Parse LLM JSON response into a SchedulingAction, with fallback."""
+    # Extract JSON from response (handle markdown code blocks)
+    cleaned = text.strip()
+    if "```" in cleaned:
+        # Extract content between code fences
+        lines = cleaned.split("\n")
+        json_lines = []
+        in_block = False
+        for line in lines:
+            if line.strip().startswith("```"):
+                in_block = not in_block
+                continue
+            if in_block:
+                json_lines.append(line)
+        cleaned = "\n".join(json_lines).strip()
+    # Try to find JSON object in the response
+    start = cleaned.find("{")
+    end = cleaned.rfind("}") + 1
+    if start >= 0 and end > start:
+        cleaned = cleaned[start:end]
+    try:
+        data = json.loads(cleaned)
+        return SchedulingAction(**data)
+    except (json.JSONDecodeError, Exception) as e:
+        print(f"[DEBUG] Failed to parse LLM response: {e}. Response: {text[:200]}", flush=True)
+        # Fallback: if we have no proposal yet, propose at first available hour
+        if obs.current_proposal is None:
+            min_h = obs.collective_work_hours.get("min_start_hour", 9)
             return SchedulingAction(
                 action_type="propose_slot",
+                proposed_start=f"2025-04-07T{min_h:02d}:00:00+00:00",
                 proposed_duration=obs.requested_duration,
             )
+        elif not obs.conflicts:
+            return SchedulingAction(action_type="finalize")
+        else:
+            return SchedulingAction(action_type="reject")
+def get_llm_action(client: OpenAI, obs, step: int) -> SchedulingAction:
+    """Query the LLM and return a SchedulingAction."""
+    user_prompt = format_observation(obs, step)
+    try:
+        completion = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[
+                {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user", "content": user_prompt},
+            ],
+            temperature=TEMPERATURE,
+            max_tokens=MAX_TOKENS,
+            stream=False,
+        )
+        text = (completion.choices[0].message.content or "").strip()
+        return parse_llm_response(text, obs)
+    except Exception as exc:
+        print(f"[DEBUG] LLM request failed: {exc}", flush=True)
+        return parse_llm_response("", obs)
+# ---------------------------------------------------------------------------
+# Main loop
+# ---------------------------------------------------------------------------
+async def run_task(env, client: OpenAI, task_id: str) -> None:
+    """Run a single scheduling task."""
+    rewards: List[float] = []
+    steps_taken = 0
+    score = 0.0
+    success = False
+    log_start(task=task_id, env=BENCHMARK, model=MODEL_NAME)
+    try:
+        result = await env.reset(task_id=task_id)
+        obs = result.observation
+        for step in range(1, MAX_STEPS + 1):
+            if result.done:
                 break
+            action = get_llm_action(client, obs, step)
+            result = await env.step(action)
+            obs = result.observation
+            reward = result.reward or 0.0
+            done = result.done
+            error = obs.error_message
+            rewards.append(reward)
+            steps_taken = step
+            action_str = action.action_type
+            if action.action_type == "propose_slot":
+                action_str = f"propose_slot({action.proposed_start},{action.proposed_duration}m)"
+            elif action.action_type == "reschedule_meeting":
+                action_str = f"reschedule({action.meeting_id_to_move}->{action.new_start_time})"
+            log_step(step=step, action=action_str, reward=reward, done=done, error=error)
+            if done:
+                break
+        # Score is the final reward (0.0-1.0 from calculate_final_reward)
+        score = rewards[-1] if rewards else 0.0
+        score = min(max(score, 0.0), 1.0)
+        success = obs.success if hasattr(obs, "success") else (score > 0.0)
+    except Exception as exc:
+        print(f"[DEBUG] Task {task_id} error: {exc}", flush=True)
+    finally:
+        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+async def main() -> None:
+    llm_client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    env = await SchedulingEnv.from_env(ENV_REPO_ID)
+    try:
+        for task_id in TASKS:
+            await run_task(env, llm_client, task_id)
+    finally:
+        try:
+            await env.close()
+        except Exception as e:
+            print(f"[DEBUG] env.close() error: {e}", flush=True)
 if __name__ == "__main__":
+    asyncio.run(main())

pyproject.toml CHANGED Viewed

@@ -14,19 +14,10 @@ version = "0.1.0"
 description = "Scheduling Env environment for OpenEnv"
 requires-python = ">=3.10"
 dependencies = [
-    "huggingface-hub>=1.9.1",
     # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
-    # install from github
-    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
     "openenv-core[core]>=0.2.2",
-    # Environment-specific dependencies
-    # Add all dependencies needed for your environment here
-    # Examples:
-    # "numpy>=1.19.0",
-    # "torch>=2.0.0",
-    # "gymnasium>=0.29.0",
-    # "openspiel>=1.0.0",
-    # "smolagents>=1.22.0,<2",
 ]
 [project.optional-dependencies]
@@ -43,4 +34,4 @@ server = "scheduling_env.server.app:main"
 [tool.setuptools]
 include-package-data = true
 packages = ["scheduling_env", "scheduling_env.server"]
-package-dir = { "scheduling_env" = ".", "scheduling_env.server" = "server" }

 description = "Scheduling Env environment for OpenEnv"
 requires-python = ">=3.10"
 dependencies = [
     # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
     "openenv-core[core]>=0.2.2",
+    # OpenAI client for LLM-based inference
+    "openai>=1.0.0",
 ]
 [project.optional-dependencies]
 [tool.setuptools]
 include-package-data = true
 packages = ["scheduling_env", "scheduling_env.server"]
+package-dir = { "scheduling_env" = ".", "scheduling_env.server" = "server" }

sample_infrenae.py CHANGED Viewed

@@ -1,82 +1,47 @@
 """
-Inference Script Example
 ===================================
-MANDATORY
-- Before submitting, ensure the following variables are defined in your environment configuration:
     API_BASE_URL   The API endpoint for the LLM.
     MODEL_NAME     The model identifier to use for inference.
     HF_TOKEN       Your Hugging Face / API key.
-    LOCAL_IMAGE_NAME The name of the local image to use for the environment if you are using from_docker_image()
-                     method
-- Defaults are set only for API_BASE_URL and MODEL_NAME
-    (and should reflect your active inference setup):
-    API_BASE_URL = os.getenv("API_BASE_URL", "<your-active-endpoint>")
-    MODEL_NAME = os.getenv("MODEL_NAME", "<your-active-model>")
-- The inference script must be named `inference.py` and placed in the root directory of the project
-- Participants must use OpenAI Client for all LLM calls using above variables
-STDOUT FORMAT
-- The script must emit exactly three line types to stdout, in this order:
-    [START] task=<task_name> env=<benchmark> model=<model_name>
     [STEP]  step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
     [END]   success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
-  Rules:
-    - One [START] line at episode begin.
-    - One [STEP] line per step, immediately after env.step() returns.
-    - One [END] line after env.close(), always emitted (even on exception).
-    - reward and rewards are formatted to 2 decimal places.
-    - done and success are lowercase booleans: true or false.
-    - error is the raw last_action_error string, or null if none.
-    - All fields on a single line with no newlines within a line.
-    - Each tasks should return score in [0, 1]
-  Example:
-    [START] task=click-test env=miniwob model=Qwen3-VL-30B
-    [STEP] step=1 action=click('123') reward=0.00 done=false error=null
-    [STEP] step=2 action=fill('456','text') reward=0.00 done=false error=null
-    [STEP] step=3 action=click('789') reward=1.00 done=true error=null
-    [END] success=true steps=3 score=1.00 rewards=0.00,0.00,1.00
 """
 import asyncio
 import os
 import textwrap
-from typing import List, Optional
 from openai import OpenAI
-from my_env_v4 import MyEnvV4Action, MyEnvV4Env
-IMAGE_NAME = os.getenv("IMAGE_NAME") # If you are using docker image
 API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
-API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
-MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
-TASK_NAME = os.getenv("MY_ENV_V4_TASK", "echo")
-BENCHMARK = os.getenv("MY_ENV_V4_BENCHMARK", "my_env_v4")
-MAX_STEPS = 8
-TEMPERATURE = 0.7
-MAX_TOKENS = 150
-SUCCESS_SCORE_THRESHOLD = 0.1  # normalized score in [0, 1]
-# Max possible reward: each token contributes 0.1, across all steps
-_MAX_REWARD_PER_STEP = MAX_TOKENS * 0.1
-MAX_TOTAL_REWARD = MAX_STEPS * _MAX_REWARD_PER_STEP
-SYSTEM_PROMPT = textwrap.dedent(
-    """
-    You are interacting with a simple echo environment.
-    Each turn you must send a message. The environment will echo it back.
-    Reward is proportional to message length: reward = len(message) * 0.1
-    Your goal is to maximize total reward by sending meaningful, substantive messages.
-    Reply with exactly one message string — no quotes, no prefixes, just the message text.
-    """
-).strip()
 def log_start(task: str, env: str, model: str) -> None:
     print(f"[START] task={task} env={env} model={model}", flush=True)
@@ -93,25 +58,148 @@ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[
 def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
     rewards_str = ",".join(f"{r:.2f}" for r in rewards)
-    print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
-def build_user_prompt(step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
-    history_block = "\n".join(history[-4:]) if history else "None"
-    return textwrap.dedent(
-        f"""
-        Step: {step}
-        Last echoed message: {last_echoed!r}
-        Last reward: {last_reward:.2f}
-        Previous steps:
-        {history_block}
-        Send your next message.
-        """
-    ).strip()
-def get_model_message(client: OpenAI, step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
-    user_prompt = build_user_prompt(step, last_echoed, last_reward, history)
     try:
         completion = client.chat.completions.create(
             model=MODEL_NAME,
@@ -124,66 +212,82 @@ def get_model_message(client: OpenAI, step: int, last_echoed: str, last_reward:
             stream=False,
         )
         text = (completion.choices[0].message.content or "").strip()
-        return text if text else "hello"
     except Exception as exc:
-        print(f"[DEBUG] Model request failed: {exc}", flush=True)
-        return "hello"
-async def main() -> None:
-    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
-    env = await MyEnvV4Env.from_docker_image(IMAGE_NAME)
-    history: List[str] = []
     rewards: List[float] = []
     steps_taken = 0
     score = 0.0
     success = False
-    log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
     try:
-        result = await env.reset() # OpenENV.reset()
-        last_echoed = result.observation.echoed_message
-        last_reward = 0.0
         for step in range(1, MAX_STEPS + 1):
             if result.done:
                 break
-            message = get_model_message(client, step, last_echoed, last_reward, history)
-            result = await env.step(MyEnvV4Action(message=message))
             obs = result.observation
             reward = result.reward or 0.0
             done = result.done
-            error = None
             rewards.append(reward)
             steps_taken = step
-            last_echoed = obs.echoed_message
-            last_reward = reward
-            log_step(step=step, action=message, reward=reward, done=done, error=error)
-            history.append(f"Step {step}: {message!r} -> reward {reward:+.2f}")
             if done:
                 break
-        score = sum(rewards) / MAX_TOTAL_REWARD if MAX_TOTAL_REWARD > 0 else 0.0
-        score = min(max(score, 0.0), 1.0)  # clamp to [0, 1]
-        success = score >= SUCCESS_SCORE_THRESHOLD
     finally:
         try:
             await env.close()
         except Exception as e:
-            print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
-        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
 if __name__ == "__main__":
-    asyncio.run(main())

 """
+LLM-based Inference Script for Meeting Scheduling RL Environment.
 ===================================
+Uses OpenAI-compatible LLM via HF Router to intelligently schedule meetings.
+MANDATORY environment variables:
     API_BASE_URL   The API endpoint for the LLM.
     MODEL_NAME     The model identifier to use for inference.
     HF_TOKEN       Your Hugging Face / API key.
+STDOUT FORMAT:
+    [START] task=<task_name> env=scheduling_env model=<model_name>
     [STEP]  step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
     [END]   success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
 """
 import asyncio
+import json
 import os
 import textwrap
+from typing import Dict, List, Optional
 from openai import OpenAI
+from scheduling_env.client import SchedulingEnv
+from scheduling_env.models import SchedulingAction
+# ---------------------------------------------------------------------------
+# Configuration
+# ---------------------------------------------------------------------------
 API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
+ENV_REPO_ID = "Akshaykumarbm/scheduling_env"
+BENCHMARK = "scheduling_env"
+TASKS = ["task1_easy", "task2_medium", "task3_hard"]
+MAX_STEPS = 20
+TEMPERATURE = 0.3
+MAX_TOKENS = 512
+# ---------------------------------------------------------------------------
+# Logging helpers
+# ---------------------------------------------------------------------------
 def log_start(task: str, env: str, model: str) -> None:
     print(f"[START] task={task} env={env} model={model}", flush=True)
 def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
     rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.2f} rewards={rewards_str}",
+        flush=True,
+    )
+# ---------------------------------------------------------------------------
+# LLM interaction
+# ---------------------------------------------------------------------------
+SYSTEM_PROMPT = textwrap.dedent("""\
+You are an AI meeting scheduling assistant. You must schedule a meeting by choosing actions.
+Available actions (respond with EXACTLY one JSON object):
+1. Propose a time slot:
+   {"action_type": "propose_slot", "proposed_start": "<ISO8601>", "proposed_duration": <minutes>}
+2. Reschedule a conflicting meeting (only if priority > requested priority):
+   {"action_type": "reschedule_meeting", "meeting_id_to_move": "<attendee>_<start_iso>", "new_start_time": "<ISO8601>"}
+3. Finalize the schedule (only when no conflicts remain):
+   {"action_type": "finalize"}
+4. Reject (give up):
+   {"action_type": "reject"}
+Rules:
+- Propose slots within collective working hours.
+- You can only reschedule meetings with LOWER priority (higher number) than the requested meeting.
+- meeting_id format is: <attendee>_<start_iso> (e.g., "user1_2025-04-07T09:00:00+00:00").
+- After rescheduling all conflicts, call finalize.
+- Minimize preference violations and rescheduling.
+- Respond with ONLY the JSON object, no other text.
+""")
+def format_observation(obs, step: int) -> str:
+    """Convert a SchedulingObservation into a user prompt for the LLM."""
+    parts = [
+        f"Step {step}/{obs.max_steps}",
+        f"Meeting to schedule: {obs.requested_duration} min, priority {obs.requested_priority}",
+        f"Attendees: {', '.join(obs.attendee_ids)}",
+        f"Collective working hours: {obs.collective_work_hours.get('min_start_hour', 9)}:00 - {obs.collective_work_hours.get('max_end_hour', 17)}:00",
+    ]
+    if obs.preference_constraints:
+        parts.append(f"Preferences: max {obs.preference_constraints.get('max_meetings_per_day', 'N/A')} meetings/day, "
+                      f"buffer required: {obs.preference_constraints.get('requires_buffer', False)}, "
+                      f"buffer mins: {obs.preference_constraints.get('buffer_minutes', 0)}")
+    # Busy slots grouped by attendee
+    busy_by_attendee: Dict[str, List] = {}
+    for slot in obs.busy_slots:
+        att = slot["attendee"]
+        busy_by_attendee.setdefault(att, []).append(slot)
+    parts.append("\nCalendars:")
+    for att in obs.attendee_ids:
+        slots = busy_by_attendee.get(att, [])
+        if slots:
+            slot_strs = [
+                f"  - {s['start']} to {s['end']} (priority {s['priority']}, {s['summary']})"
+                for s in sorted(slots, key=lambda x: x["start"])
+            ]
+            parts.append(f"  {att}:")
+            parts.extend(slot_strs)
+        else:
+            parts.append(f"  {att}: (no meetings)")
+    if obs.current_proposal:
+        parts.append(f"\nCurrent proposal: {obs.current_proposal['start']} to {obs.current_proposal['end']}")
+    if obs.conflicts:
+        parts.append(f"\nConflicts ({len(obs.conflicts)}):")
+        for c in obs.conflicts:
+            parts.append(
+                f"  - {c['attendee']}: {c['start']} to {c['end']} "
+                f"(priority {c['priority']}, {c['summary']}, id: {c['meeting_id']})"
+            )
+    if obs.error_message:
+        parts.append(f"\nLast error: {obs.error_message}")
+    parts.append(f"\nRescheduled so far: {obs.num_rescheduled}")
+    parts.append(f"Preference penalty: {obs.preference_penalty}")
+    if not obs.current_proposal and not obs.conflicts:
+        parts.append("\nAction needed: propose a time slot for the meeting.")
+    elif obs.conflicts:
+        parts.append("\nAction needed: reschedule a conflict (lower-priority only) or propose a different slot.")
+    else:
+        parts.append("\nAction needed: no conflicts remain - you should finalize.")
+    return "\n".join(parts)
+def parse_llm_response(text: str, obs) -> SchedulingAction:
+    """Parse LLM JSON response into a SchedulingAction, with fallback."""
+    # Extract JSON from response (handle markdown code blocks)
+    cleaned = text.strip()
+    if "```" in cleaned:
+        # Extract content between code fences
+        lines = cleaned.split("\n")
+        json_lines = []
+        in_block = False
+        for line in lines:
+            if line.strip().startswith("```"):
+                in_block = not in_block
+                continue
+            if in_block:
+                json_lines.append(line)
+        cleaned = "\n".join(json_lines).strip()
+    # Try to find JSON object in the response
+    start = cleaned.find("{")
+    end = cleaned.rfind("}") + 1
+    if start >= 0 and end > start:
+        cleaned = cleaned[start:end]
+    try:
+        data = json.loads(cleaned)
+        return SchedulingAction(**data)
+    except (json.JSONDecodeError, Exception) as e:
+        print(f"[DEBUG] Failed to parse LLM response: {e}. Response: {text[:200]}", flush=True)
+        # Fallback: if we have no proposal yet, propose at first available hour
+        if obs.current_proposal is None:
+            min_h = obs.collective_work_hours.get("min_start_hour", 9)
+            return SchedulingAction(
+                action_type="propose_slot",
+                proposed_start=f"2025-04-07T{min_h:02d}:00:00+00:00",
+                proposed_duration=obs.requested_duration,
+            )
+        elif not obs.conflicts:
+            return SchedulingAction(action_type="finalize")
+        else:
+            return SchedulingAction(action_type="reject")
+def get_llm_action(client: OpenAI, obs, step: int) -> SchedulingAction:
+    """Query the LLM and return a SchedulingAction."""
+    user_prompt = format_observation(obs, step)
     try:
         completion = client.chat.completions.create(
             model=MODEL_NAME,
             stream=False,
         )
         text = (completion.choices[0].message.content or "").strip()
+        return parse_llm_response(text, obs)
     except Exception as exc:
+        print(f"[DEBUG] LLM request failed: {exc}", flush=True)
+        return parse_llm_response("", obs)
+# ---------------------------------------------------------------------------
+# Main loop
+# ---------------------------------------------------------------------------
+async def run_task(env, client: OpenAI, task_id: str) -> None:
+    """Run a single scheduling task."""
     rewards: List[float] = []
     steps_taken = 0
     score = 0.0
     success = False
+    log_start(task=task_id, env=BENCHMARK, model=MODEL_NAME)
     try:
+        result = await env.reset(task_id=task_id)
+        obs = result.observation
         for step in range(1, MAX_STEPS + 1):
             if result.done:
                 break
+            action = get_llm_action(client, obs, step)
+            result = await env.step(action)
             obs = result.observation
             reward = result.reward or 0.0
             done = result.done
+            error = obs.error_message
             rewards.append(reward)
             steps_taken = step
+            action_str = action.action_type
+            if action.action_type == "propose_slot":
+                action_str = f"propose_slot({action.proposed_start},{action.proposed_duration}m)"
+            elif action.action_type == "reschedule_meeting":
+                action_str = f"reschedule({action.meeting_id_to_move}->{action.new_start_time})"
+            log_step(step=step, action=action_str, reward=reward, done=done, error=error)
             if done:
                 break
+        # Score is the final reward (0.0-1.0 from calculate_final_reward)
+        score = rewards[-1] if rewards else 0.0
+        score = min(max(score, 0.0), 1.0)
+        success = obs.success if hasattr(obs, "success") else (score > 0.0)
+    except Exception as exc:
+        print(f"[DEBUG] Task {task_id} error: {exc}", flush=True)
+    finally:
+        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+async def main() -> None:
+    llm_client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    env = await SchedulingEnv.from_env(ENV_REPO_ID)
+    try:
+        for task_id in TASKS:
+            await run_task(env, llm_client, task_id)
     finally:
         try:
             await env.close()
         except Exception as e:
+            print(f"[DEBUG] env.close() error: {e}", flush=True)
 if __name__ == "__main__":
+    asyncio.run(main())

uv.lock CHANGED Viewed

@@ -1603,7 +1603,7 @@ name = "openenv-scheduling-env"
 version = "0.1.0"
 source = { editable = "." }
 dependencies = [
-    { name = "huggingface-hub" },
     { name = "openenv-core", extra = ["core"] },
 ]
@@ -1615,7 +1615,7 @@ dev = [
 [package.metadata]
 requires-dist = [
-    { name = "huggingface-hub", specifier = ">=1.9.1" },
     { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
     { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
     { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },

 version = "0.1.0"
 source = { editable = "." }
 dependencies = [
+    { name = "openai" },
     { name = "openenv-core", extra = ["core"] },
 ]
 [package.metadata]
 requires-dist = [
+    { name = "openai", specifier = ">=1.0.0" },
     { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
     { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
     { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },