---
title: Scheduling Env Environment Server
emoji: 📅
colorFrom: blue
colorTo: pink
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
---

# Meeting Scheduling RL Environment

An OpenEnv reinforcement-learning environment where AI agents learn to schedule meetings optimally across multiple attendees. The agent must propose time slots, resolve calendar conflicts by rescheduling lower-priority meetings, and satisfy each participant's scheduling preferences — all within a limited number of steps.

## Overview

The environment simulates a realistic corporate scheduling assistant. Given a meeting request, the agent iteratively:

1. **Proposes** a time slot for all required attendees.
2. **Reschedules** any lower-priority conflicting meetings to free up the slot.
3. **Finalizes** the booking once the slot is conflict-free.

Each episode is scored on scheduling quality (0.0–1.0), penalizing preference violations, unnecessary rescheduling, and excessive steps.

## Quick Start

### Running the Heuristic Baseline (no LLM needed)

```bash
python inference.py
```

This runs a greedy baseline policy across all three tasks and prints step-by-step output in the required `[START]`/`[STEP]`/`[END]` format.

### Using the Environment Directly (Python)

```python
from server.scheduling_env_environment import SchedulingEnvironment
from models import SchedulingAction

env = SchedulingEnvironment()

# Reset to a specific task
obs = env.reset(task_id="task1_easy")
print(f"Attendees: {obs.attendee_ids}")
print(f"Duration:  {obs.requested_duration} min")
print(f"Priority:  {obs.requested_priority}")

# Propose a time slot
result = env.step(SchedulingAction(
    action_type="propose_slot",
    proposed_start="2025-04-07T10:00:00+00:00",
    proposed_duration=30,
))
print(f"Conflicts: {result.conflicts}")
print(f"Reward:    {result.reward}")

# Finalize when conflict-free
result = env.step(SchedulingAction(action_type="finalize"))
print(f"Success: {result.success}  Final score: {result.reward:.2f}")
```

### Using the HTTP Client

```python
from client import SchedulingEnv
from models import SchedulingAction

with SchedulingEnv(base_url="http://localhost:8000") as env:
    result = env.reset(task_id="task2_medium")
    obs = result.observation

    # Propose a slot
    result = env.step(SchedulingAction(
        action_type="propose_slot",
        proposed_start="2025-04-07T11:00:00+00:00",
        proposed_duration=60,
    ))

    # Reschedule a conflicting lower-priority meeting
    if result.observation.conflicts:
        conflict = result.observation.conflicts[0]
        result = env.step(SchedulingAction(
            action_type="reschedule_meeting",
            meeting_id_to_move=conflict["meeting_id"],
            new_start_time="2025-04-07T07:00:00+00:00",
        ))

    # Finalize
    result = env.step(SchedulingAction(action_type="finalize"))
    print(f"Score: {result.reward:.2f}")
```

## Environment Details

### Actions (`SchedulingAction`)

| `action_type`        | Required fields                              | Description                                               |
|----------------------|----------------------------------------------|-----------------------------------------------------------|
| `propose_slot`       | `proposed_start`, `proposed_duration`        | Propose a meeting start time (ISO 8601) and duration (min)|
| `reschedule_meeting` | `meeting_id_to_move`, `new_start_time`       | Move a lower-priority conflict to a new time              |
| `finalize`           | _(none)_                                     | Confirm the proposed slot; ends the episode               |
| `reject`             | _(none)_                                     | Give up on scheduling; ends the episode with 0 reward     |

**Meeting ID format:** `{attendee}_{start_iso}` — e.g. `user1_2025-04-07T09:00:00+00:00`

### Observations (`SchedulingObservation`)

| Field                   | Type                    | Description                                                  |
|-------------------------|-------------------------|--------------------------------------------------------------|
| `requested_duration`    | `int`                   | Meeting duration in minutes                                  |
| `requested_priority`    | `int`                   | Priority of the new meeting (1 = highest, 5 = lowest)        |
| `attendee_ids`          | `List[str]`             | Required attendees                                           |
| `busy_slots`            | `List[dict]`            | All existing calendar entries for attendees                  |
| `collective_work_hours` | `dict`                  | Shared working-hours window `{min_start_hour, max_end_hour}` |
| `preference_constraints`| `dict`                  | Aggregated constraints (max meetings/day, buffer, etc.)      |
| `current_proposal`      | `dict \| None`          | Currently proposed slot `{start, end}`                       |
| `conflicts`             | `List[dict]`            | Conflicts for the current proposal                           |
| `preference_penalty`    | `float`                 | Accumulated preference-violation penalty                     |
| `num_rescheduled`       | `int`                   | Meetings rescheduled so far in this episode                  |
| `steps_taken`           | `int`                   | Steps used so far                                            |
| `max_steps`             | `int`                   | Episode step limit (20)                                      |
| `success`               | `bool`                  | `True` when the meeting is successfully booked               |
| `error_message`         | `str \| None`           | Reason if the last action was invalid                        |
| `done`                  | `bool`                  | `True` when the episode has ended                            |
| `reward`                | `float`                 | Step or final reward                                         |

### Reward Design

**Step-level rewards** (returned after each `propose_slot` or `reschedule_meeting`):

| Outcome                                  | Reward |
|------------------------------------------|--------|
| Conflict-free proposal (low penalty)     | +0.5   |
| Proposal has reschedulable conflicts     | +0.2   |
| Proposal has non-reschedulable conflicts | −0.3   |
| Invalid action                           | −0.1   |
| Outside working hours                    | −0.2   |

**Final reward** (returned on `finalize`) — deducted from 1.0:

```
preference_deduction  = min(0.75, (penalty ** 1.2) / 200.0)
reschedule_deduction  = min(0.30, 0.05 * (1.8 ** num_rescheduled))   [if any rescheduled]
time_deduction        = steps_taken * 0.015

final_reward = clamp(1.0 - preference_deduction - reschedule_deduction - time_deduction, 0.0, 1.0)
```

Timeout (step 20 reached without `finalize`) gives partial credit: 70 % of the theoretical reward if conflict-free, or a progress-based fraction otherwise.

## Tasks

Three tasks of increasing difficulty are provided as JSON scenarios in `server/scenarios/`:

| Task ID         | Difficulty | Attendees | Duration | Priority | Rescheduling needed | Expected score |
|-----------------|------------|-----------|----------|----------|---------------------|----------------|
| `task1_easy`    | Easy       | 2         | 30 min   | 3        | No                  | 0.8 – 1.0      |
| `task2_medium`  | Medium     | 4         | 60 min   | 2        | Yes (1 meeting)     | 0.5 – 0.7      |
| `task3_hard`    | Hard       | 6         | 45 min   | 2        | Yes (3+ meetings)   | 0.25 – 0.45    |

### task1_easy — Team Sync (2 attendees)

- Two attendees each have 2 existing meetings; a clear free slot exists at **10:00**.
- Agent should find the free slot and finalize in 2 steps.
- No rescheduling required.

### task2_medium — Cross-Team Planning (4 attendees)

- Four attendees with densely packed schedules; the optimal slot at **11:00** has one low-priority conflict (`user3` Coffee chat, priority 4).
- Agent needs to propose the slot, reschedule the conflict, then finalize.
- User preferences include back-to-back avoidance and different preferred-hour windows.

### task3_hard — Executive Planning Session (6 attendees)

- Six attendees with very dense calendars; the best window at **15:00** requires rescheduling three low-priority meetings (priority 4).
- Multiple valid solutions exist; the agent must navigate cascading constraints.
- All attendees have strict buffer requirements and narrow preferred-hour windows.

## Participant Preferences

Each attendee can have the following preferences (stored in scenario JSON and observed via `preference_constraints`):

| Preference             | Description                                         | Penalty for violation |
|------------------------|-----------------------------------------------------|-----------------------|
| `preferred_hours`      | `{start: H, end: H}` — preferred working hours      | +50 per participant   |
| `max_meetings_per_day` | Maximum meetings the participant wants in a day      | +30 per participant   |
| `avoid_back_to_back`   | Whether a buffer gap is required between meetings    | +20 per participant   |
| `buffer_minutes`       | Gap required before/after a meeting (if avoid_btb)  | (part of above)       |

The **collective working hours** (the intersection of all attendees' preferred hours) define the hard constraint window within which proposals must fall.

## API Endpoints

The server exposes the following HTTP endpoints (also available via the Web UI at `/web`):

| Method | Path      | Description                                                        |
|--------|-----------|--------------------------------------------------------------------|
| POST   | `/reset`  | Start a new episode. Body: `{"task_id": "task1_easy"}`             |
| POST   | `/step`   | Take an action. Body: `{"action_type": "...", ...action fields}`   |
| GET    | `/state`  | Return the full internal `SchedulingState`                         |
| GET    | `/health` | Health check — returns `{"status": "healthy"}`                     |
| GET    | `/docs`   | Interactive OpenAPI / Swagger UI                                   |

### Example: REST interaction

```bash
# Start episode
curl -X POST http://localhost:8000/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "task1_easy"}'

# Propose a slot
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "propose_slot", "proposed_start": "2025-04-07T10:00:00+00:00", "proposed_duration": 30}'

# Finalize
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "finalize"}'
```

## Development & Testing

### Run the baseline inference script

```bash
python inference.py
```

### Start the server locally

```bash
uvicorn server.app:app --reload
```

### Validate the environment (required before submission)

```bash
openenv validate
```

### Generate / update the lock file

```bash
uv lock
```

### Build the Docker image

```bash
docker build -t scheduling_env:latest .
```

## Deploying to Hugging Face Spaces

```bash
# From the project root (where openenv.yaml is located)
openenv push

# Push to a specific repository
openenv push --repo-id my-org/my-scheduling-env

# Push as a private space
openenv push --private
```

The `openenv push` command validates the environment, builds a Hugging Face-compatible Docker image, and uploads it. After deployment your space is available at:

```
https://huggingface.co/spaces/<repo-id>
```

The deployed space includes:
- **Web Interface** at `/web` — interactive UI for exploring the environment
- **API Documentation** at `/docs` — full OpenAPI / Swagger interface
- **Health Check** at `/health` — container health monitoring

### Options

| Flag | Description |
|------|-------------|
| `--directory`, `-d` | Directory with `openenv.yaml` (default: current dir) |
| `--repo-id`, `-r` | Repository ID `username/repo-name` |
| `--base-image`, `-b` | Override Dockerfile `FROM` image |
| `--private` | Deploy as a private space (default: public) |

## Environment Variables (for LLM-based inference)

Create a `.env` file (never commit it):

```
API_BASE_URL=https://router.huggingface.co/v1   # HF Router endpoint
MODEL_NAME=Qwen/Qwen2.5-72B-Instruct            # Model identifier
HF_TOKEN=hf_...                                  # Hugging Face API key
```

## Project Structure

```
rl-scheduling-env/
├── Dockerfile                          # Container image (root, required by openenv)
├── README.md                           # This file
├── openenv.yaml                        # OpenEnv manifest
├── pyproject.toml                      # Project metadata and dependencies
├── uv.lock                             # Locked dependencies (generated by `uv lock`)
├── __init__.py                         # Package exports
├── models.py                           # Pydantic models: SchedulingAction,
│                                       #   SchedulingObservation, SchedulingState
├── client.py                           # SchedulingEnv HTTP/WebSocket client
├── inference.py                        # Heuristic baseline (no LLM required)
└── server/
    ├── __init__.py                     # Server package exports
    ├── app.py                          # FastAPI app + SchedulingHTTPEnvServer
    ├── scheduling_env_environment.py   # Core RL environment (reset / step / state)
    ├── scheduling_logic.py             # Pure utility functions (conflict detection,
    │                                   #   preference scoring, reward calculation)
    ├── graders.py                      # SchedulingGrader (0.0–1.0 episode scorer)
    ├── requirements.txt                # Server-side Python dependencies
    └── scenarios/
        ├── task1_easy.json             # Easy: 2 attendees, free slot exists
        ├── task2_medium.json           # Medium: 4 attendees, 1 rescheduling needed
        └── task3_hard.json             # Hard: 6 attendees, 3+ reschedulings needed
```