Akshaykumarbm commited on
Commit
7bdbe90
·
verified ·
1 Parent(s): bf30e08

Upload folder using huggingface_hub

Browse files
CLAUDE.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Repository Purpose
6
+
7
+ OpenEnv RL environment for the **Meta OpenEnv Hackathon**. Implements an intelligent meeting scheduling environment where AI agents learn to schedule meetings across multiple attendees by proposing time slots, rescheduling lower-priority conflicts, and balancing participant preferences.
8
+
9
+ ## Development Commands
10
+
11
+ ```bash
12
+ # Run baseline inference (heuristic, no LLM needed)
13
+ python inference.py
14
+
15
+ # Start server locally
16
+ uvicorn server.app:app --reload
17
+
18
+ # Validate environment for submission
19
+ openenv validate
20
+
21
+ # Generate/update lock file (required by validator)
22
+ uv lock
23
+
24
+ # Deploy to Hugging Face Spaces
25
+ openenv push
26
+
27
+ # Build Docker image (Dockerfile must be in root)
28
+ docker build -t scheduling_env:latest .
29
+ ```
30
+
31
+ ## Architecture
32
+
33
+ ### OpenEnv Interface (client-server pattern)
34
+
35
+ The environment follows OpenEnv's standard API:
36
+ - **`POST /reset`** — starts a new episode, accepts `{"task_id": "task1_easy"}`. Returns observation.
37
+ - **`POST /step`** — takes an action, returns observation with reward/done.
38
+ - **`GET /state`** — returns internal environment state.
39
+ - **`GET /health`** — health check.
40
+
41
+ ### Core Flow
42
+
43
+ `server/app.py` creates a `SchedulingHTTPEnvServer` (subclasses `HTTPEnvServer`) that wraps a persistent `SchedulingEnvironment` instance. The server registers custom `/reset`, `/step`, `/state` routes.
44
+
45
+ `server/scheduling_env_environment.py` — Main environment class implementing `Environment`. Loads JSON scenarios from `server/scenarios/`, processes 4 action types: `propose_slot`, `reschedule_meeting`, `finalize`, `reject`. Episode ends on `finalize`, `reject`, or timeout (20 steps).
46
+
47
+ `server/scheduling_logic.py` — Pure utility functions: conflict detection, preference scoring, reward calculation, free-slot search. All datetime handling uses timezone-aware ISO 8601 strings. Calendar format: `Dict[str, List[List]]` where each entry is `[start_iso, end_iso, priority_int, summary_str]`.
48
+
49
+ `models.py` — Pydantic models (`SchedulingAction`, `SchedulingObservation`, `SchedulingState`) imported by both server and client.
50
+
51
+ `client.py` — `SchedulingEnv` extends `EnvClient` for WebSocket-based interaction.
52
+
53
+ `inference.py` — Heuristic baseline (no LLM). Greedy free-slot search + lowest-priority rescheduling. Must emit `[START]`/`[STEP]`/`[END]` stdout format.
54
+
55
+ ### Reward Design
56
+
57
+ Reward is multi-component, deducted from 1.0 (see `calculate_final_reward` in `scheduling_logic.py`):
58
+ - Preference penalty: violations of preferred hours (+50), max meetings/day (+30), back-to-back (+20)
59
+ - Rescheduling deduction: exponential penalty per meeting moved
60
+ - Time deduction: 0.015 per step taken
61
+
62
+ Step-level rewards: +0.5 (conflict-free proposal), +0.2 (reschedulable conflicts), -0.3 (non-reschedulable conflicts), -0.1/-0.2 (invalid actions).
63
+
64
+ ### Tasks (3 difficulty levels)
65
+
66
+ JSON scenarios in `server/scenarios/`:
67
+ - **task1_easy** — 2 attendees, free slot exists, no rescheduling needed. Expected score: 0.8–1.0
68
+ - **task2_medium** — 3 attendees, requires 1 rescheduling. Expected score: 0.5–0.8
69
+ - **task3_hard** — 4 attendees, multiple overlapping conflicts, cascading rescheduling. Expected score: 0.2–0.6
70
+
71
+ ### Key Constraint: Meeting IDs
72
+
73
+ Format is `{attendee}_{start_iso}` (e.g., `user1_2025-04-07T09:00:00+00:00`). Used by `_find_meeting()` to look up calendar entries for rescheduling.
74
+
75
+ ## Hackathon Submission Requirements
76
+
77
+ - `openenv validate` must pass
78
+ - Dockerfile in root directory (not `/server`)
79
+ - `inference.py` in root, uses `[START]`/`[STEP]`/`[END]` stdout format
80
+ - 3+ tasks with graders scoring 0.0–1.0 with diverse scores
81
+ - Runtime < 20 minutes on vcpu=2, memory=8GB
82
+ - Deploy via `openenv push` to HF Spaces
83
+
84
+ ## Environment Variables (for LLM-based inference)
85
+
86
+ Defined in `.env` (never commit):
87
+ ```
88
+ API_BASE_URL # HF Router endpoint (default: https://router.huggingface.co/v1)
89
+ MODEL_NAME # Model identifier (default: Qwen/Qwen2.5-72B-Instruct)
90
+ HF_TOKEN # Hugging Face API key
91
+ ```
Dockerfile ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ # Multi-stage build using openenv-base
8
+ # This Dockerfile is flexible and works for both:
9
+ # - In-repo environments (with local OpenEnv sources)
10
+ # - Standalone environments (with openenv from PyPI/Git)
11
+ # The build script (openenv build) handles context detection and sets appropriate build args.
12
+
13
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
14
+ FROM ${BASE_IMAGE} AS builder
15
+
16
+ WORKDIR /app
17
+
18
+ # Ensure git is available (required for installing dependencies from VCS)
19
+ RUN apt-get update && \
20
+ apt-get install -y --no-install-recommends git && \
21
+ rm -rf /var/lib/apt/lists/*
22
+
23
+ # Build argument to control whether we're building standalone or in-repo
24
+ ARG BUILD_MODE=in-repo
25
+ ARG ENV_NAME=scheduling_env
26
+
27
+ # Copy environment code (always at root of build context)
28
+ COPY . /app/env
29
+
30
+ # For in-repo builds, openenv is already vendored in the build context
31
+ # For standalone builds, openenv will be installed via pyproject.toml
32
+ WORKDIR /app/env
33
+
34
+ # Ensure uv is available (for local builds where base image lacks it)
35
+ RUN if ! command -v uv >/dev/null 2>&1; then \
36
+ curl -LsSf https://astral.sh/uv/install.sh | sh && \
37
+ mv /root/.local/bin/uv /usr/local/bin/uv && \
38
+ mv /root/.local/bin/uvx /usr/local/bin/uvx; \
39
+ fi
40
+
41
+ # Install dependencies using uv sync
42
+ # If uv.lock exists, use it; otherwise resolve on the fly
43
+ RUN --mount=type=cache,target=/root/.cache/uv \
44
+ if [ -f uv.lock ]; then \
45
+ uv sync --frozen --no-install-project --no-editable; \
46
+ else \
47
+ uv sync --no-install-project --no-editable; \
48
+ fi
49
+
50
+ RUN --mount=type=cache,target=/root/.cache/uv \
51
+ if [ -f uv.lock ]; then \
52
+ uv sync --frozen --no-editable; \
53
+ else \
54
+ uv sync --no-editable; \
55
+ fi
56
+
57
+ # Final runtime stage
58
+ FROM ${BASE_IMAGE}
59
+
60
+ WORKDIR /app
61
+
62
+ # Copy the virtual environment from builder
63
+ COPY --from=builder /app/env/.venv /app/.venv
64
+
65
+ # Copy the environment code
66
+ COPY --from=builder /app/env /app/env
67
+
68
+ # Set PATH to use the virtual environment
69
+ ENV PATH="/app/.venv/bin:$PATH"
70
+
71
+ # Set PYTHONPATH so imports work correctly
72
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
73
+
74
+ # Health check
75
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
76
+ CMD curl -f http://localhost:8000/health || exit 1
77
+
78
+ # Run the FastAPI server
79
+ # The module path is constructed to work with the /app/env structure
80
+ ENV ENABLE_WEB_INTERFACE=true
81
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
README.md CHANGED
@@ -1,10 +1,255 @@
1
  ---
2
- title: Scheduling Env
3
- emoji: 📉
4
  colorFrom: blue
5
  colorTo: pink
6
  sdk: docker
7
  pinned: false
 
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Scheduling Env Environment Server
3
+ emoji: 🏏
4
  colorFrom: blue
5
  colorTo: pink
6
  sdk: docker
7
  pinned: false
8
+ app_port: 8000
9
+ base_path: /web
10
+ tags:
11
+ - openenv
12
  ---
13
 
14
+ # Scheduling Env Environment
15
+
16
+ A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
17
+
18
+ ## Quick Start
19
+
20
+ The simplest way to use the Scheduling Env environment is through the `SchedulingEnv` class:
21
+
22
+ ```python
23
+ from scheduling_env import SchedulingAction, SchedulingEnv
24
+
25
+ try:
26
+ # Create environment from Docker image
27
+ scheduling_envenv = SchedulingEnv.from_docker_image("scheduling_env-env:latest")
28
+
29
+ # Reset
30
+ result = scheduling_envenv.reset()
31
+ print(f"Reset: {result.observation.echoed_message}")
32
+
33
+ # Send multiple messages
34
+ messages = ["Hello, World!", "Testing echo", "Final message"]
35
+
36
+ for msg in messages:
37
+ result = scheduling_envenv.step(SchedulingAction(message=msg))
38
+ print(f"Sent: '{msg}'")
39
+ print(f" → Echoed: '{result.observation.echoed_message}'")
40
+ print(f" → Length: {result.observation.message_length}")
41
+ print(f" → Reward: {result.reward}")
42
+
43
+ finally:
44
+ # Always clean up
45
+ scheduling_envenv.close()
46
+ ```
47
+
48
+ That's it! The `SchedulingEnv.from_docker_image()` method handles:
49
+ - Starting the Docker container
50
+ - Waiting for the server to be ready
51
+ - Connecting to the environment
52
+ - Container cleanup when you call `close()`
53
+
54
+ ## Building the Docker Image
55
+
56
+ Before using the environment, you need to build the Docker image:
57
+
58
+ ```bash
59
+ # From project root
60
+ docker build -t scheduling_env-env:latest -f server/Dockerfile .
61
+ ```
62
+
63
+ ## Deploying to Hugging Face Spaces
64
+
65
+ You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
66
+
67
+ ```bash
68
+ # From the environment directory (where openenv.yaml is located)
69
+ openenv push
70
+
71
+ # Or specify options
72
+ openenv push --namespace my-org --private
73
+ ```
74
+
75
+ The `openenv push` command will:
76
+ 1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
77
+ 2. Prepare a custom build for Hugging Face Docker space (enables web interface)
78
+ 3. Upload to Hugging Face (ensuring you're logged in)
79
+
80
+ ### Prerequisites
81
+
82
+ - Authenticate with Hugging Face: The command will prompt for login if not already authenticated
83
+
84
+ ### Options
85
+
86
+ - `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
87
+ - `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
88
+ - `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
89
+ - `--private`: Deploy the space as private (default: public)
90
+
91
+ ### Examples
92
+
93
+ ```bash
94
+ # Push to your personal namespace (defaults to username/env-name from openenv.yaml)
95
+ openenv push
96
+
97
+ # Push to a specific repository
98
+ openenv push --repo-id my-org/my-env
99
+
100
+ # Push with a custom base image
101
+ openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
102
+
103
+ # Push as a private space
104
+ openenv push --private
105
+
106
+ # Combine options
107
+ openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
108
+ ```
109
+
110
+ After deployment, your space will be available at:
111
+ `https://huggingface.co/spaces/<repo-id>`
112
+
113
+ The deployed space includes:
114
+ - **Web Interface** at `/web` - Interactive UI for exploring the environment
115
+ - **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
116
+ - **Health Check** at `/health` - Container health monitoring
117
+ - **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
118
+
119
+ ## Environment Details
120
+
121
+ ### Action
122
+ **SchedulingAction**: Contains a single field
123
+ - `message` (str) - The message to echo back
124
+
125
+ ### Observation
126
+ **SchedulingObservation**: Contains the echo response and metadata
127
+ - `echoed_message` (str) - The message echoed back
128
+ - `message_length` (int) - Length of the message
129
+ - `reward` (float) - Reward based on message length (length × 0.1)
130
+ - `done` (bool) - Always False for echo environment
131
+ - `metadata` (dict) - Additional info like step count
132
+
133
+ ### Reward
134
+ The reward is calculated as: `message_length × 0.1`
135
+ - "Hi" → reward: 0.2
136
+ - "Hello, World!" → reward: 1.3
137
+ - Empty message → reward: 0.0
138
+
139
+ ## Advanced Usage
140
+
141
+ ### Connecting to an Existing Server
142
+
143
+ If you already have a Scheduling Env environment server running, you can connect directly:
144
+
145
+ ```python
146
+ from scheduling_env import SchedulingEnv
147
+
148
+ # Connect to existing server
149
+ scheduling_envenv = SchedulingEnv(base_url="<ENV_HTTP_URL_HERE>")
150
+
151
+ # Use as normal
152
+ result = scheduling_envenv.reset()
153
+ result = scheduling_envenv.step(SchedulingAction(message="Hello!"))
154
+ ```
155
+
156
+ Note: When connecting to an existing server, `scheduling_envenv.close()` will NOT stop the server.
157
+
158
+ ### Using the Context Manager
159
+
160
+ The client supports context manager usage for automatic connection management:
161
+
162
+ ```python
163
+ from scheduling_env import SchedulingAction, SchedulingEnv
164
+
165
+ # Connect with context manager (auto-connects and closes)
166
+ with SchedulingEnv(base_url="http://localhost:8000") as env:
167
+ result = env.reset()
168
+ print(f"Reset: {result.observation.echoed_message}")
169
+ # Multiple steps with low latency
170
+ for msg in ["Hello", "World", "!"]:
171
+ result = env.step(SchedulingAction(message=msg))
172
+ print(f"Echoed: {result.observation.echoed_message}")
173
+ ```
174
+
175
+ The client uses WebSocket connections for:
176
+ - **Lower latency**: No HTTP connection overhead per request
177
+ - **Persistent session**: Server maintains your environment state
178
+ - **Efficient for episodes**: Better for many sequential steps
179
+
180
+ ### Concurrent WebSocket Sessions
181
+
182
+ The server supports multiple concurrent WebSocket connections. To enable this,
183
+ modify `server/app.py` to use factory mode:
184
+
185
+ ```python
186
+ # In server/app.py - use factory mode for concurrent sessions
187
+ app = create_app(
188
+ SchedulingEnvironment, # Pass class, not instance
189
+ SchedulingAction,
190
+ SchedulingObservation,
191
+ max_concurrent_envs=4, # Allow 4 concurrent sessions
192
+ )
193
+ ```
194
+
195
+ Then multiple clients can connect simultaneously:
196
+
197
+ ```python
198
+ from scheduling_env import SchedulingAction, SchedulingEnv
199
+ from concurrent.futures import ThreadPoolExecutor
200
+
201
+ def run_episode(client_id: int):
202
+ with SchedulingEnv(base_url="http://localhost:8000") as env:
203
+ result = env.reset()
204
+ for i in range(10):
205
+ result = env.step(SchedulingAction(message=f"Client {client_id}, step {i}"))
206
+ return client_id, result.observation.message_length
207
+
208
+ # Run 4 episodes concurrently
209
+ with ThreadPoolExecutor(max_workers=4) as executor:
210
+ results = list(executor.map(run_episode, range(4)))
211
+ ```
212
+
213
+ ## Development & Testing
214
+
215
+ ### Direct Environment Testing
216
+
217
+ Test the environment logic directly without starting the HTTP server:
218
+
219
+ ```bash
220
+ # From the server directory
221
+ python3 server/scheduling_env_environment.py
222
+ ```
223
+
224
+ This verifies that:
225
+ - Environment resets correctly
226
+ - Step executes actions properly
227
+ - State tracking works
228
+ - Rewards are calculated correctly
229
+
230
+ ### Running Locally
231
+
232
+ Run the server locally for development:
233
+
234
+ ```bash
235
+ uvicorn server.app:app --reload
236
+ ```
237
+
238
+ ## Project Structure
239
+
240
+ ```
241
+ scheduling_env/
242
+ ├── .dockerignore # Docker build exclusions
243
+ ├── __init__.py # Module exports
244
+ ├── README.md # This file
245
+ ├── openenv.yaml # OpenEnv manifest
246
+ ├── pyproject.toml # Project metadata and dependencies
247
+ ├── uv.lock # Locked dependencies (generated)
248
+ ├── client.py # SchedulingEnv client
249
+ ├── models.py # Action and Observation models
250
+ └── server/
251
+ ├── __init__.py # Server module exports
252
+ ├── scheduling_env_environment.py # Core environment logic
253
+ ├── app.py # FastAPI application (HTTP + WebSocket endpoints)
254
+ └── Dockerfile # Container image definition
255
+ ```
__init__.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Scheduling Env Environment."""
8
+
9
+ from .client import SchedulingEnv
10
+ from .models import SchedulingAction, SchedulingObservation, SchedulingState
11
+
12
+ __all__ = [
13
+ "SchedulingAction",
14
+ "SchedulingObservation",
15
+ "SchedulingState",
16
+ "SchedulingEnv",
17
+ ]
client.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Scheduling Environment Client."""
8
+
9
+ from __future__ import annotations
10
+
11
+ from typing import Dict
12
+
13
+ from openenv.core import EnvClient
14
+ from openenv.core.client_types import StepResult
15
+
16
+ from .models import SchedulingAction, SchedulingObservation, SchedulingState
17
+
18
+
19
+ class SchedulingEnv(
20
+ EnvClient[SchedulingAction, SchedulingObservation, SchedulingState]
21
+ ):
22
+ """Client for the Meeting Scheduling RL Environment.
23
+
24
+ Maintains a persistent WebSocket connection to the environment server.
25
+
26
+ Example::
27
+
28
+ with SchedulingEnv(base_url="http://localhost:8000") as client:
29
+ result = client.reset(task_id="task1_easy")
30
+ obs = result.observation
31
+ result = client.step(SchedulingAction(
32
+ action_type="propose_slot",
33
+ proposed_start="2025-04-07T10:00:00+00:00",
34
+ proposed_duration=30,
35
+ ))
36
+ """
37
+
38
+ def _step_payload(self, action: SchedulingAction) -> Dict:
39
+ return action.model_dump(exclude_none=True)
40
+
41
+ def _parse_result(self, payload: Dict) -> StepResult[SchedulingObservation]:
42
+ obs_data = payload.get("observation", payload)
43
+ observation = SchedulingObservation(**obs_data)
44
+ return StepResult(
45
+ observation=observation,
46
+ reward=observation.reward,
47
+ done=observation.done,
48
+ )
49
+
50
+ def _parse_state(self, payload: Dict) -> SchedulingState:
51
+ return SchedulingState(**payload)
docs/2026-04-07-153534-local-command-caveatcaveat-the-messages-below.txt ADDED
The diff for this file is too large to render. See raw diff
 
docs/ENV_LEARNINGS.md ADDED
@@ -0,0 +1,368 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OpenEnv Environment Research - Key Learnings
2
+
3
+ Research conducted on 5 top OpenEnv environments to inform hackathon project development.
4
+
5
+ ## Executive Summary
6
+
7
+ | Environment | Domain | Key Strength | Best For Learning |
8
+ |-------------|--------|--------------|-------------------|
9
+ | **calendar_env** | Calendar Management | Generic MCP wrapper architecture | Multi-tenant systems, database-backed tasks |
10
+ | **reasoning_gym_env** | Reasoning Tasks | Minimal, single-step episodes | Simple task structures, dataset integration |
11
+ | **tbench2_env** | Terminal/Tool Use | Dual execution modes (local/docker) | Tool benchmarking, session management |
12
+ | **carla_env** | Autonomous Driving | Scenario-based design | Complex simulations, ethical dilemmas |
13
+ | **repl_env** | Code Execution | Recursive LLM architecture | Interactive environments, reward shaping |
14
+
15
+ ---
16
+
17
+ ## 1. Calendar Environment (calendar_env)
18
+
19
+ ### Architecture Highlights
20
+ - **Generic MCP Wrapper**: Fully reusable `openenv_wrapper/` for any MCP server
21
+ - **Multi-Tenancy**: SQLite per agent via `x-database-id` header
22
+ - **Rich Database Schema**: Google Calendar API v3 compliant models
23
+
24
+ ### Action/Observation Pattern
25
+ ```python
26
+ # Action
27
+ class MCPAction(Action):
28
+ action_type: Literal["ListToolsAction", "ToolCallAction"]
29
+ tool_name: Optional[str]
30
+ arguments: Optional[Dict]
31
+
32
+ # Observation
33
+ class MCPObservation(Observation):
34
+ success: bool
35
+ error_message: Optional[str]
36
+ tool_result: Optional[Dict]
37
+ reward: Optional[float]
38
+ done: bool
39
+ ```
40
+
41
+ ### Task Definition Pattern
42
+ - **JSON Scenarios**: Version-controlled task definitions
43
+ - **SQL Verifiers**: Programmatic graders checking database state
44
+ - **3 Verifier Types**: database_state, response_check, tool_execution
45
+
46
+ ### Reward Design
47
+ - Sparse binary rewards: +1.0 (success), -0.5 (error)
48
+ - ListToolsAction: +0.1 (discovery reward)
49
+ - Status code based with metadata for flexibility
50
+
51
+ ### Worth Copying
52
+ 1. **Generic wrapper architecture** - Copy `openenv_wrapper/` for new MCPs
53
+ 2. **Session manager pattern** - Multi-tenant database isolation
54
+ 3. **Verifier-driven tasks** - No code changes for new tasks
55
+ 4. **Config-driven tool discovery** - Dynamic tool handlers via importlib
56
+
57
+ ---
58
+
59
+ ## 2. Reasoning Gym Environment (reasoning_gym_env)
60
+
61
+ ### Architecture Highlights
62
+ - **Minimal footprint**: ~200 lines core logic
63
+ - **Single-step episodes**: reset() → step() → done
64
+ - **Dataset persistence**: Reuse datasets across resets
65
+
66
+ ### Action/Observation Pattern
67
+ ```python
68
+ # Action
69
+ class ReasoningGymAction(Action):
70
+ answer: str # Agent's answer
71
+
72
+ # Observation
73
+ class ReasoningGymObservation(Observation):
74
+ question: Optional[str] # Only in reset()
75
+ score: Optional[float] # Only after step()
76
+ correct_answer: Optional[str]
77
+ done: bool
78
+ ```
79
+
80
+ ### Task Definition Pattern
81
+ - **External library**: `reasoning_gym` handles generation + scoring
82
+ - **Simple datasets**: Single task type (leg_counting, reverse_sort, etc.)
83
+ - **Composite datasets**: Mix multiple tasks with weights
84
+
85
+ ### Reward Design
86
+ - **Binary/partial**: Depends on dataset scoring function
87
+ - **Terminal only**: reward=0.0 on reset, actual score after step()
88
+ - **Single-step**: No trajectory rewards
89
+
90
+ ### Worth Copying
91
+ 1. **Iterator pattern** - Seamless dataset cycling with StopIteration handling
92
+ 2. **Parameter idempotency** - reset() continues, reset(seed=...) restarts
93
+ 3. **Dataset caching** - Compare config to avoid rebuilding
94
+ 4. **Minimal state** - Just episode_id and step_count
95
+
96
+ ---
97
+
98
+ ## 3. TB2 Environment (tbench2_env)
99
+
100
+ ### Architecture Highlights
101
+ - **Dual execution modes**: Local (CAMEL toolkit) vs Docker (TB2 fidelity)
102
+ - **Session management**: Streaming process support via session_id
103
+ - **Task auto-discovery**: Download from GitHub + cache locally
104
+
105
+ ### Action/Observation Pattern
106
+ ```python
107
+ # Action
108
+ class Tbench2Action(Action):
109
+ action_type: str # exec, write, view, wait, kill, evaluate, etc.
110
+ command: str
111
+ session_id: Optional[str]
112
+ block: bool = True
113
+
114
+ # Observation
115
+ class Tbench2Observation(Observation):
116
+ instruction: str
117
+ output: str
118
+ success: bool
119
+ error: str
120
+ reward: Optional[float] # Only on evaluate
121
+ done: bool # Only on evaluate
122
+ ```
123
+
124
+ ### Task Definition Pattern
125
+ - **TOML-based**: `task.toml` with environment + verifier config
126
+ - **Pytest graders**: Each task has tests/ directory
127
+ - **External benchmark**: Terminal-Bench 2 suite
128
+
129
+ ### Reward Design
130
+ - **Binary**: 1.0 if all pytest tests pass, 0.0 otherwise
131
+ - **Terminal only**: reward=None until evaluate action
132
+ - **Exit code parsing**: `__TB2_EXIT_CODE__:$?` marker pattern
133
+
134
+ ### Worth Copying
135
+ 1. **Dual mode pattern** - Local + Docker execution with env var switching
136
+ 2. **Lazy dependency loading** - Import errors surface only when used
137
+ 3. **Docker-in-Docker safe** - Tar streaming instead of bind mounts
138
+ 4. **Session isolation** - Unique working directories per episode_id
139
+ 5. **Metadata-driven discovery** - Tasks self-describe requirements
140
+
141
+ ---
142
+
143
+ ## 4. CARLA Environment (carla_env)
144
+
145
+ ### Architecture Highlights
146
+ - **Scenario system**: BaseScenario ABC with composable tasks
147
+ - **Rubric factory**: Auto-select reward function by scenario type
148
+ - **Mock mode**: Test without GPU/CARLA
149
+ - **GPU-accelerated**: T4 16GB minimum for real mode
150
+
151
+ ### Action/Observation Pattern
152
+ ```python
153
+ # Action
154
+ class CarlaAction(Action):
155
+ action_type: str # observe, control, navigate, capture_image, etc.
156
+ throttle: Optional[float] # [0, 1] with Pydantic validation
157
+ steer: Optional[float] # [-1, 1]
158
+ brake: Optional[float] # [0, 1]
159
+
160
+ # Observation
161
+ class CarlaObservation(Observation):
162
+ scene_description: str
163
+ vehicle_state: Dict # speed, location, rotation
164
+ collision_detected: bool
165
+ nearby_actors: List[Dict]
166
+ camera_images: Optional[Dict]
167
+ rubric_reward: float
168
+ ```
169
+
170
+ ### Task Definition Pattern
171
+ - **9 Trolley scenarios**: Ethical dilemmas with expected outcomes
172
+ - **Navigation tasks**: Maze (goal-directed), Free-roam (open-world)
173
+ - **JSON externalized**: Benchmark definitions separate from code
174
+
175
+ ### Reward Design
176
+ - **Trajectory-based (Trolley)**: r_t = 0.0 until terminal, then gamma-discounted final
177
+ - **Step-level (Navigation)**: Progress + arrival bonus - collision penalty - time cost
178
+ - **Scenario-specific**: compute_outcome() owns scoring logic
179
+
180
+ ### Worth Copying
181
+ 1. **Scenario ABC** - Each task owns physics + scoring independently
182
+ 2. **Rubric factory** - Auto-select reward function by task type
183
+ 3. **Dual mode** - Mock for testing, real for evaluation
184
+ 4. **Layered config** - Common + scenario-specific fields
185
+ 5. **JSON externalization** - Decouple task data from code
186
+
187
+ ---
188
+
189
+ ## 5. REPL Environment (repl_env)
190
+
191
+ ### Architecture Highlights
192
+ - **Layered design**: Environment → Runner → Backend separation
193
+ - **Recursive LLM**: Depth-limited child spawning with RLM pattern
194
+ - **Composable rubrics**: Outcome + process rewards
195
+ - **Thread-safe batching**: Multiple concurrent child queries
196
+
197
+ ### Action/Observation Pattern
198
+ ```python
199
+ # Action
200
+ class REPLAction(Action):
201
+ code: str
202
+ is_final: bool = False
203
+ final_answer: Optional[str] = None
204
+
205
+ # Observation
206
+ class REPLObservation(Observation):
207
+ result: CodeBlockResult # stdout, stderr, locals_snapshot
208
+ available_variables: List[str]
209
+ iteration: int
210
+ done: bool
211
+ reward: float
212
+ ```
213
+
214
+ ### Task Definition Pattern
215
+ - **Rubric-driven**: Ground truth passed at reset()
216
+ - **Multiple finalization patterns**: FINAL(), FINAL_VAR(), dict with ready flag
217
+ - **External graders**: CustomMetricRubric for user-provided scoring
218
+
219
+ ### Reward Design
220
+ - **Composable**: REPLRubric = outcome + process
221
+ - **Outcome (terminal)**: ExactMatch, FuzzyMatch, or CustomMetric
222
+ - **Process (per-step)**: +success_reward, -error_penalty
223
+ - **Failure**: -failure_reward if max_iterations without answer
224
+
225
+ ### Worth Copying
226
+ 1. **Composable rubrics** - outcome + process separation
227
+ 2. **Recursive backend** - Protocol-based with depth limits
228
+ 3. **Message-based loop** - Explicit iteration with timeout checks
229
+ 4. **Variable snapshots** - Serialize namespace state
230
+ 5. **Dual API** - Sync + async with same models
231
+ 6. **Cooperative timeout** - perf_counter() checks, not interrupts
232
+ 7. **Injected helpers** - llm_query, rlm_query available in namespace
233
+
234
+ ---
235
+
236
+ ## Cross-Cutting Patterns
237
+
238
+ ### 1. Pydantic Models Everywhere
239
+ All environments use Pydantic BaseModel for:
240
+ - Type safety + validation
241
+ - JSON serialization
242
+ - OpenAPI schema generation
243
+ - Field descriptions for documentation
244
+
245
+ ### 2. FastAPI App Factory
246
+ ```python
247
+ from openenv.core.env_server.http_server import create_app
248
+
249
+ app = create_app(
250
+ MyEnvironment,
251
+ MyAction,
252
+ MyObservation,
253
+ env_name="my_env",
254
+ max_concurrent_envs=1,
255
+ )
256
+ ```
257
+
258
+ ### 3. Client-Server Separation
259
+ - Server: Implements Environment[Action, Observation, State]
260
+ - Client: EnvClient[Action, Observation, State] wraps HTTP/WebSocket
261
+ - Local variants for in-process testing
262
+
263
+ ### 4. Episode State Management
264
+ ```python
265
+ class State(BaseModel):
266
+ episode_id: str # UUID per episode
267
+ step_count: int # Actions taken
268
+ # Environment-specific metrics
269
+ ```
270
+
271
+ ### 5. Metadata for Flexibility
272
+ - Actions have optional `metadata: Dict[str, Any]`
273
+ - Observations include `metadata` for extra context
274
+ - Enables custom reward signals without model changes
275
+
276
+ ### 6. Docker + openenv.yaml
277
+ ```yaml
278
+ spec_version: 1
279
+ name: my_env
280
+ type: space
281
+ runtime: fastapi
282
+ app: server.app:app
283
+ port: 8000
284
+ ```
285
+
286
+ ### 7. Concurrent Sessions Support
287
+ ```python
288
+ class MyEnvironment(Environment):
289
+ SUPPORTS_CONCURRENT_SESSIONS: bool = True
290
+ ```
291
+
292
+ ---
293
+
294
+ ## Recommendations for Hackathon Project
295
+
296
+ ### Use calendar_env approach if:
297
+ - Building database-backed environment (customer support, data cleaning)
298
+ - Need multi-agent evaluation isolation
299
+ - Want reusable wrapper for other MCPs
300
+
301
+ ### Use reasoning_gym_env approach if:
302
+ - Simple single-step tasks (email triage, classification)
303
+ - Dataset-based evaluation
304
+ - Minimal code complexity desired
305
+
306
+ ### Use tbench2_env approach if:
307
+ - Tool use benchmarking (API integration, CLI tools)
308
+ - Need Docker isolation
309
+ - Session-based interaction required
310
+
311
+ ### Use carla_env approach if:
312
+ - Complex simulation with physics
313
+ - Scenario-based curriculum learning
314
+ - Trajectory-based rewards
315
+
316
+ ### Use repl_env approach if:
317
+ - Code execution environment
318
+ - Recursive reasoning needed
319
+ - Composable reward functions
320
+
321
+ ---
322
+
323
+ ## Quick Start Checklist
324
+
325
+ For your hackathon environment, ensure:
326
+
327
+ - [ ] **3+ tasks with graders** returning scores 0.0-1.0
328
+ - [ ] **Pydantic models** for Action, Observation, State
329
+ - [ ] **openenv.yaml** with correct metadata
330
+ - [ ] **inference.py** in root (uses HF Router, not OpenAI)
331
+ - [ ] **STDOUT logging** with [START], [STEP], [END] format
332
+ - [ ] **Dockerfile** in root directory (not /server)
333
+ - [ ] **Meaningful rewards** that distinguish performance levels
334
+ - [ ] **Real-world task** with genuine value
335
+ - [ ] **< 20 min runtime** on vcpu=2, memory=8GB
336
+ - [ ] **Passes `openenv validate`**
337
+
338
+ ---
339
+
340
+ ## Key Files to Reference
341
+
342
+ ### For Implementation Patterns:
343
+ - `calendar_env/server/openenv_wrapper/mcp_env_environment.py` - Generic wrapper
344
+ - `reasoning_gym_env/server/reasoning_gym_environment.py` - Minimal implementation
345
+ - `tbench2_env/server/tbench2_env_environment.py` - Session management
346
+ - `carla_env/server/benchmark_scenarios/base.py` - Scenario ABC
347
+ - `repl_env/rubrics.py` - Composable reward design
348
+
349
+ ### For Client Usage:
350
+ - `*/client.py` - All environments have reference implementations
351
+ - `repl_env/runner.py` - Message-based orchestration loop
352
+
353
+ ### For Server Setup:
354
+ - `*/server/app.py` - FastAPI app factory usage
355
+ - `*/openenv.yaml` - Configuration examples
356
+ - `*/Dockerfile` - Docker image patterns
357
+
358
+ ---
359
+
360
+ ## Next Steps
361
+
362
+ 1. **Choose architecture**: Pick closest reference environment to your task
363
+ 2. **Copy skeleton**: Use `openenv init` or copy from reference
364
+ 3. **Define models**: Start with Action/Observation Pydantic models
365
+ 4. **Implement graders**: 3 tasks with programmatic scoring
366
+ 5. **Test locally**: Use client.py pattern for rapid iteration
367
+ 6. **Validate**: Run `openenv validate` before deployment
368
+ 7. **Deploy**: `openenv push` to Hugging Face Spaces
docs/HACKATHON_META.md ADDED
@@ -0,0 +1,324 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Meta OpenEnv Hackathon - Round 1
2
+
3
+ ## Overview
4
+
5
+ Build a complete, real-world OpenEnv environment that an AI agent can learn from through the standard `step()` / `reset()` / `state()` API.
6
+
7
+ ## Task Requirements
8
+
9
+ ### Must-Have Features
10
+
11
+ 1. **Real-world Task Simulation**
12
+ - Must simulate tasks humans actually do
13
+ - Not games or toys
14
+ - Examples: email triage, code review, data cleaning, scheduling, customer support, content moderation
15
+
16
+ 2. **OpenEnv Spec Compliance**
17
+ - Typed Observation, Action, and Reward Pydantic models
18
+ - `step(action)` → returns observation, reward, done, info
19
+ - `reset()` → returns initial observation
20
+ - `state()` → returns current state
21
+ - `openenv.yaml` with metadata
22
+ - Must pass `openenv validate`
23
+
24
+ 3. **Minimum 3 Tasks with Agent Graders**
25
+ - Each task defines a concrete objective
26
+ - Programmatic grader scoring (0.0–1.0)
27
+ - Difficulty range: easy → medium → hard
28
+ - Clear, deterministic success/failure criteria
29
+
30
+ 4. **Meaningful Reward Function**
31
+ - Provides signal over full trajectory (not just binary)
32
+ - Rewards partial progress toward completion
33
+ - Penalizes undesirable behavior (infinite loops, destructive actions)
34
+
35
+ 5. **Baseline Inference Script**
36
+ - Uses OpenAI API client
37
+ - Reads credentials from `OPENAI_API_KEY` environment variable
38
+ - Produces reproducible baseline scores on all 3 tasks
39
+
40
+ ## Non-Functional Requirements
41
+
42
+ ### Deployment
43
+ - **Hugging Face Space**: Environment must run as containerized HF Space tagged with `openenv`
44
+ - **Dockerfile**: Working containerization with clean `docker build + docker run`
45
+
46
+ ### Documentation
47
+ README must include:
48
+ - Environment description and motivation
49
+ - Action and observation space definitions
50
+ - Task descriptions with expected difficulty
51
+ - Setup and usage instructions
52
+ - Baseline scores
53
+
54
+ ## Evaluation Criteria & Scoring
55
+
56
+ ### Scoring Breakdown (100 points)
57
+
58
+ | Criterion | Weight | Description |
59
+ |-----------|--------|-------------|
60
+ | **Real-world utility** | 30% | Does the environment model a genuine task? Would someone use this for training/evaluating agents? |
61
+ | **Task & grader quality** | 25% | Well-defined tasks with clear objectives? Accurate graders? Meaningful difficulty progression? |
62
+ | **Environment design** | 20% | Clean state management, sensible action/observation spaces, good reward shaping, proper episode boundaries |
63
+ | **Code quality & spec compliance** | 15% | Follows OpenEnv spec, clean structure, typed models, documented, tested, working Dockerfile |
64
+ | **Creativity & novelty** | 10% | Novel problem domain, interesting mechanics, clever reward design, original approach |
65
+
66
+ ### Detailed Scoring Rubrics
67
+
68
+ #### Real-world Utility (30%)
69
+ - **0–5**: Toy/artificial problem with no practical application
70
+ - **6–15**: Valid domain but shallow modeling
71
+ - **16–25**: Good domain modeling, useful for agent evaluation
72
+ - **26–30**: Excellent — fills real gap, immediate value for RL/agent community
73
+
74
+ #### Task & Grader Quality (25%)
75
+ - 3+ tasks with difficulty range?
76
+ - Graders produce scores between 0.0–1.0?
77
+ - Graders deterministic and reproducible?
78
+ - Hard task genuinely challenges frontier models?
79
+
80
+ #### Environment Design (20%)
81
+ - `reset()` produces clean state?
82
+ - Action/observation types well-designed and documented?
83
+ - Reward function provides useful varying signal (not sparse)?
84
+ - Episode boundaries sensible?
85
+
86
+ #### Code Quality & Spec Compliance (15%)
87
+ - `openenv validate` passes?
88
+ - `docker build && docker run` works?
89
+ - HF Space deploys and responds?
90
+ - Baseline script runs and reproduces scores?
91
+
92
+ #### Creativity & Novelty (10%)
93
+ - Domain not seen in OpenEnv before?
94
+ - Reward design has interesting properties?
95
+ - Clever mechanics that make environment engaging?
96
+
97
+ ## Judging Process
98
+
99
+ ### Phase 1: Automated Validation (Pass/Fail Gate)
100
+ - HF Space deploys
101
+ - OpenEnv spec compliance
102
+ - Dockerfile builds
103
+ - Baseline reproduces
104
+ - 3+ tasks with graders
105
+
106
+ ### Phase 2: Agentic Evaluation (Scored)
107
+ - Baseline agent re-run
108
+ - Standard Open LLM agent (e.g., Nemotron 3 Super) run against all environments
109
+ - Score variance check
110
+
111
+ ### Phase 3: Human Review
112
+ Top submissions reviewed by Meta and Hugging Face engineers for:
113
+ - Real-world utility
114
+ - Creativity
115
+ - Exploit checks
116
+
117
+ ### Disqualification Criteria
118
+ - Environment does not deploy or respond
119
+ - Plagiarized or trivially modified existing environments
120
+ - Graders that always return the same score
121
+ - No baseline inference script
122
+
123
+ ## Pre-Submission Checklist
124
+
125
+ All must pass or you're disqualified:
126
+
127
+ - [ ] HF Space deploys (200 response to reset())
128
+ - [ ] OpenEnv spec compliance validated
129
+ - [ ] Dockerfile builds successfully
130
+ - [ ] Baseline script reproduces without error
131
+ - [ ] 3+ tasks with graders (scores in 0.0–1.0 range)
132
+
133
+ ## Mandatory Requirements
134
+
135
+ ### Environment Variables
136
+ Must be defined in your environment configuration:
137
+
138
+ ```bash
139
+ API_BASE_URL # The API endpoint for the LLM
140
+ MODEL_NAME # The model identifier to use for inference
141
+ HF_TOKEN # Your Hugging Face / API key
142
+ LOCAL_IMAGE_NAME # (Optional) Name of local image if using from_docker_image()
143
+ ```
144
+
145
+ ### Script Requirements
146
+ - **Filename**: `inference.py` (must be in root directory)
147
+ - **LLM Calls**: Must use OpenAI Client with above variables
148
+ - **Logging Format**: Must follow [START], [STEP], [END] format (see below)
149
+
150
+ ### Infrastructure Restrictions
151
+ - **Runtime**: Inference script must complete in < 20 minutes
152
+ - **Resources**: Must run on vcpu=2, memory=8GB
153
+
154
+ ## STDOUT Logging Format
155
+
156
+ ### Required Format
157
+ The script must emit exactly three line types to stdout, in this order:
158
+
159
+ ```
160
+ [START] task=<task_name> env=<benchmark> model=<model_name>
161
+ [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
162
+ [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
163
+ ```
164
+
165
+ ### Format Rules
166
+ - One [START] line at episode begin
167
+ - One [STEP] line per step, immediately after `env.step()` returns
168
+ - One [END] line after `env.close()`, always emitted (even on exception)
169
+ - `reward` and `rewards` formatted to 2 decimal places
170
+ - `done` and `success` are lowercase booleans: `true` or `false`
171
+ - `error` is the raw `last_action_error` string, or `null` if none
172
+ - All fields on a single line with no newlines within a line
173
+ - Each task should return score in [0, 1]
174
+
175
+ ### Example Output
176
+ ```
177
+ [START] task=click-test env=miniwob model=Qwen3-VL-30B
178
+ [STEP] step=1 action=click('123') reward=0.00 done=false error=null
179
+ [STEP] step=2 action=fill('456','text') reward=0.00 done=false error=null
180
+ [STEP] step=3 action=click('789') reward=1.00 done=true error=null
181
+ [END] success=true steps=3 score=1.00 rewards=0.00,0.00,1.00
182
+ ```
183
+
184
+ ## Sample Inference Script
185
+
186
+ ```python
187
+ """
188
+ Inference Script Example
189
+ ===================================
190
+ MANDATORY
191
+ - Before submitting, ensure the following variables are defined in your environment configuration:
192
+ API_BASE_URL The API endpoint for the LLM.
193
+ MODEL_NAME The model identifier to use for inference.
194
+ HF_TOKEN Your Hugging Face / API key.
195
+ LOCAL_IMAGE_NAME The name of the local image to use for the environment if you are using from_docker_image()
196
+ method
197
+
198
+ - Defaults are set only for API_BASE_URL and MODEL_NAME
199
+ (and should reflect your active inference setup):
200
+ API_BASE_URL = os.getenv("API_BASE_URL", "<your-active-endpoint>")
201
+ MODEL_NAME = os.getenv("MODEL_NAME", "<your-active-model>")
202
+
203
+ - The inference script must be named `inference.py` and placed in the root directory of the project
204
+ - Participants must use OpenAI Client for all LLM calls using above variables
205
+
206
+ STDOUT FORMAT
207
+ - The script must emit exactly three line types to stdout, in this order:
208
+
209
+ [START] task=<task_name> env=<benchmark> model=<model_name>
210
+ [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
211
+ [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
212
+
213
+ Rules:
214
+ - One [START] line at episode begin.
215
+ - One [STEP] line per step, immediately after env.step() returns.
216
+ - One [END] line after env.close(), always emitted (even on exception).
217
+ - reward and rewards are formatted to 2 decimal places.
218
+ - done and success are lowercase booleans: true or false.
219
+ - error is the raw last_action_error string, or null if none.
220
+ - All fields on a single line with no newlines within a line.
221
+ - Each tasks should return score in [0, 1]
222
+
223
+ Example:
224
+ [START] task=click-test env=miniwob model=Qwen3-VL-30B
225
+ [STEP] step=1 action=click('123') reward=0.00 done=false error=null
226
+ [STEP] step=2 action=fill('456','text') reward=0.00 done=false error=null
227
+ [STEP] step=3 action=click('789') reward=1.00 done=true error=null
228
+ [END] success=true steps=3 score=1.00 rewards=0.00,0.00,1.00
229
+ """
230
+
231
+ import asyncio
232
+ import os
233
+ import textwrap
234
+ from typing import List, Optional
235
+
236
+ from openai import OpenAI
237
+
238
+ from my_env_v4 import MyEnvV4Action, MyEnvV4Env
239
+
240
+ IMAGE_NAME = os.getenv("IMAGE_NAME") # If you are using docker image
241
+ API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
242
+ API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
243
+ MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
244
+ TASK_NAME = os.getenv("MY_ENV_V4_TASK", "echo")
245
+ BENCHMARK = os.getenv("MY_ENV_V4_BENCHMARK", "my_env_v4")
246
+ MAX_STEPS = 8
247
+ TEMPERATURE = 0.7
248
+
249
+ # TODO: Implement the rest of your inference script here
250
+ ```
251
+
252
+ ## Pre-Validation Script
253
+
254
+ ```bash
255
+ #!/usr/bin/env bash
256
+ #
257
+ # validate-submission.sh — OpenEnv Submission Validator
258
+ #
259
+ # Checks that your HF Space is live, Docker image builds, and openenv validate passes.
260
+ #
261
+ # Prerequisites:
262
+ # - Docker: https://docs.docker.com/get-docker/
263
+ # - openenv-core: pip install openenv-core
264
+ # - curl (usually pre-installed)
265
+ #
266
+ # Run:
267
+ # curl -fsSL https://raw.githubusercontent.com/<owner>/<repo>/main/scripts/validate-submission.sh | bash -s -- <ping_url> [repo_dir]
268
+ #
269
+ # Or download and run locally:
270
+ # chmod +x validate-submission.sh
271
+ # ./validate-submission.sh <ping_url> [repo_dir]
272
+ #
273
+ # Arguments:
274
+ # ping_url Your HuggingFace Space URL (e.g. https://your-space.hf.space)
275
+ # repo_dir Path to your repo (default: current directory)
276
+ #
277
+ # Examples:
278
+ # ./validate-submission.sh https://my-team.hf.space
279
+ # ./validate-submission.sh https://my-team.hf.space ./my-repo
280
+ #
281
+
282
+ set -uo pipefail
283
+
284
+ DOCKER_BUILD_TIMEOUT=600
285
+
286
+ if [ -t 1 ]; then
287
+ RED='\033[0;31m'
288
+ GREEN='\033[0;32m'
289
+ YELLOW='\033[1;33m'
290
+ BOLD='\033[1m'
291
+ NC='\033[0m'
292
+ else
293
+ RED=''
294
+ GREEN=''
295
+ YELLOW=''
296
+ BOLD=''
297
+ NC=''
298
+ fi
299
+
300
+ # TODO: Add the rest of the validation script
301
+ ```
302
+
303
+ ## Tips for Success
304
+
305
+ 1. **Choose a Real Problem**: Pick a task that has genuine value for the AI/agent community
306
+ 2. **Design Good Rewards**: Provide meaningful signals throughout the episode, not just at the end
307
+ 3. **Test Thoroughly**: Ensure your environment works cleanly with `docker build && docker run`
308
+ 4. **Document Well**: Clear README helps reviewers understand your contribution
309
+ 5. **Start Simple**: Get the basic OpenEnv spec working first, then add complexity
310
+ 6. **Run Validator**: Use the pre-validation script before submitting
311
+
312
+ ## Resources
313
+
314
+ - OpenEnv Documentation: [Link to be added]
315
+ - Hugging Face Spaces: https://huggingface.co/spaces
316
+ - OpenAI API Client: https://platform.openai.com/docs/api-reference
317
+
318
+ ## Submission Deadline
319
+
320
+ [To be announced]
321
+
322
+ ---
323
+
324
+ **Good luck with your submission! 🚀**
docs/hackathon-guide-rl-environments.md ADDED
@@ -0,0 +1,228 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Building RL Environments for Hackathon: Complete Guide
2
+
3
+ ## Overview
4
+ This guide provides comprehensive insights for building real-world Reinforcement Learning (RL) environments using the OpenM (Open Environment) library for hackathon participation.
5
+
6
+ ---
7
+
8
+ ## 1. Fundamentals of Reinforcement Learning
9
+
10
+ ### The Mechanism
11
+ - **How it Works:** Model generates candidate implementations (actions) → Environment verifies/tests → Environment provides reward signal (score) based on pre-defined rubrics
12
+ - **Purpose:** Tells the model what is good or bad through trial and error rather than long-context prompts
13
+
14
+ ### Position in Training Pipeline
15
+ - Typically follows **Supervised Fine-Tuning (SFT)**
16
+ - Used to "squeeze out" final performance gains on specific capabilities
17
+ - More efficient alternative to "in-context learning" (which degrades with longer prompts)
18
+
19
+ ### Key Challenges
20
+
21
+ #### Reward Hacking
22
+ - Models learn to "game" the verifier to get high scores without actually solving the task
23
+ - **Mitigation:** Inspect output trajectories or use multiple reward functions
24
+
25
+ #### Curriculum Learning
26
+ - Start with easy tasks and build complexity progressively
27
+ - Ensures model receives consistent reward signal
28
+ - Prevents "wasting compute" on tasks that are too difficult initially
29
+
30
+ ---
31
+
32
+ ## 2. Introduction to OpenM
33
+
34
+ ### What is OpenM?
35
+ - Collaborative project between Meta, Hugging Face, and others
36
+ - Standardizes RL environments (like Hugging Face standardized language models)
37
+ - Single, consistent API for environments
38
+ - Interoperable with training frameworks (TRL, Unsloth, etc.)
39
+
40
+ ### Core Components
41
+ Standard OpenM environment requires defining:
42
+ - **Actions** (as Pydantic objects)
43
+ - **Observations** (as Pydantic objects)
44
+ - **States** (as Pydantic objects)
45
+
46
+ ---
47
+
48
+ ## 3. Technical Implementation
49
+
50
+ ### CLI Workflow
51
+ ```bash
52
+ # Initialize skeleton environment
53
+ openm init
54
+
55
+ # Validate setup
56
+ openm validate
57
+
58
+ # Deploy to Hugging Face Spaces
59
+ openm push
60
+ ```
61
+
62
+ ### Agent Integration
63
+ - Use coding agents (like Codeex) with OpenM "skills"
64
+ - Automatically generate environment code from prompts
65
+
66
+ ### Deployment
67
+ - Environments deployed as Docker containers on Hugging Face
68
+ - Provides web interface for manual testing and debugging
69
+ - **Important:** Dockerfile must be moved outside `/server` folder to main project directory
70
+
71
+ ---
72
+
73
+ ## 4. Hackathon Requirements
74
+
75
+ ### Environment Quality
76
+
77
+ #### Real-World Focus (Critical)
78
+ - **Must build:** Real-world task environments (healthcare, email triage, code optimization)
79
+ - **Avoid:** "Toy" environments, games (Wordle, Connect 4, etc.)
80
+ - **Goal:** Environment that could realistically be used in model's post-training RL run
81
+
82
+ #### Complexity Requirements
83
+ - Map **long-running tasks** with multiple trajectories/routes
84
+ - Agent should have various possible approaches to solve the task
85
+
86
+ ### Technical Requirements
87
+
88
+ #### Mandatory Inference Script
89
+ - **Required for every submission**
90
+ - Used by organizers to evaluate environment effectiveness
91
+ - Measures how well environment provides rewards to model
92
+
93
+ #### API Configuration
94
+ - **No OpenAI API key required**
95
+ - Use **Hugging Face token** instead
96
+ - Use provided **HF Router** (API base URL) for model calls
97
+ - HF Router handles model calls through Hugging Face
98
+
99
+ #### Docker Setup
100
+ - Move Dockerfile outside `/server` folder to main project directory
101
+ - Run `openm validate` before submission
102
+
103
+ ### Reward Signal Design
104
+
105
+ #### Requirements
106
+ - Score typically between 0 and 1
107
+ - Must deliver valid signal indicating "good" or "bad" performance
108
+ - **Grading Diversity:** Must not return same score every time
109
+ - Should distinguish between different performance levels
110
+
111
+ #### Best Practices
112
+ - Start with achievable tasks for the model
113
+ - Ensure task is feasible but challenging
114
+ - Avoid tasks too difficult or out-of-distribution for the model
115
+
116
+ ---
117
+
118
+ ## 5. Grading Criteria
119
+
120
+ Evaluation based on:
121
+
122
+ 1. **Utility of the Idea**
123
+ - How useful is the task for real-world AI?
124
+ - Does it represent authentic human tasks?
125
+
126
+ 2. **Quality of the Grader**
127
+ - Returns diverse scores (not same score every time)
128
+ - Value between 0 and 1
129
+ - Distinguishes performance levels
130
+
131
+ 3. **Technical Design**
132
+ - Environment architecture and implementation
133
+ - Successful execution of inference script
134
+
135
+ 4. **Novelty**
136
+ - Key criterion for high scores
137
+ - Create something not thought of yet
138
+ - Solve problems in unique domains
139
+ - **Plagiarism is strictly prohibited**
140
+
141
+ ---
142
+
143
+ ## 6. Submission Guidelines
144
+
145
+ ### Deadline
146
+ - **Round One:** April 8th
147
+
148
+ ### Submission Process
149
+ - Push environment to **Hugging Face Spaces** using `openm push`
150
+ - Submit URL of Hugging Face Space
151
+ - Multiple submissions allowed (latest accurate submission used)
152
+
153
+ ### Collaboration
154
+ - Teams are **highly encouraged**
155
+ - Helps manage technical and creative requirements
156
+
157
+ ---
158
+
159
+ ## 7. High-Value Environment Ideas
160
+
161
+ ### Healthcare Domain
162
+ - Medical triage tools
163
+ - Navigating medical records
164
+ - Healthcare-specific software tool utilization
165
+
166
+ ### Productivity and Operations
167
+ - **Email Triage:** Prioritize, categorize, respond to complex inbox
168
+ - **Calendar Management:** Coordinate schedules, handle conflicts across multiple participants
169
+
170
+ ### Technical and Code Optimization
171
+ - **Kernel Optimization:** Benchmark and optimize PyTorch/GPU kernels for speed and efficiency
172
+ - **Repository Maintenance:** Navigate GitHub to identify/fix bugs, run test suites
173
+
174
+ ### Logistics and Travel
175
+ - **Complex Flight Booking:** Navigate changing availability, multi-leg transfers, request missing information from users
176
+
177
+ ### API and Tool Integration
178
+ - Wide set of real-world tools
179
+ - Interactive APIs that agents must learn to use correctly
180
+
181
+ ---
182
+
183
+ ## 8. Best Practices Summary
184
+
185
+ ### Do's
186
+ - Focus on real-world utility
187
+ - Design long-running, multi-trajectory tasks
188
+ - Implement diverse grading systems
189
+ - Start with curriculum learning approach
190
+ - Validate thoroughly before submission
191
+ - Work in teams for better results
192
+ - Aim for novelty and uniqueness
193
+
194
+ ### Don'ts
195
+ - Avoid toy environments or games
196
+ - Don't create tasks too difficult for models
197
+ - Don't implement single-score graders
198
+ - Avoid plagiarism
199
+ - Don't submit without testing inference script
200
+ - Don't use tasks without clear reward signals
201
+
202
+ ---
203
+
204
+ ## 9. Technical Checklist
205
+
206
+ - [ ] Initialize project with `openm init`
207
+ - [ ] Define Actions, Observations, States as Pydantic objects
208
+ - [ ] Implement diverse reward function (0-1 range)
209
+ - [ ] Create mandatory inference script
210
+ - [ ] Configure HF token and router (not OpenAI key)
211
+ - [ ] Move Dockerfile to main directory (outside /server)
212
+ - [ ] Run `openm validate` to verify setup
213
+ - [ ] Test environment locally
214
+ - [ ] Deploy with `openm push` to Hugging Face Spaces
215
+ - [ ] Submit Hugging Face Space URL before April 8th
216
+
217
+ ---
218
+
219
+ ## Resources
220
+
221
+ - **OpenM Library:** Standardized RL environment framework
222
+ - **Hugging Face Spaces:** Deployment platform
223
+ - **HF Router:** API for model access
224
+ - **Training Frameworks:** TRL, Unsloth (compatible with OpenM)
225
+
226
+ ---
227
+
228
+ *This guide synthesizes best practices for building competitive RL environments for hackathons. Focus on real-world utility, technical excellence, and novel approaches for the best results.*
docs/superpowers/specs/2025-04-06-scheduling-env-design.md ADDED
@@ -0,0 +1,2068 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Intelligent Meeting Scheduling Environment - Design Specification
2
+
3
+ **Date**: 2025-04-06
4
+ **Author**: Akshay Kumar
5
+ **Hackathon**: Meta OpenEnv Hackathon - Round 1
6
+ **Deadline**: April 8th, 2025
7
+
8
+ ---
9
+
10
+ ## Executive Summary
11
+
12
+ This document specifies an OpenEnv RL environment for intelligent meeting scheduling based on the BotBooked.ai production system. The environment teaches agents to optimize meeting time slot selection, handle cascading rescheduling, and learn multi-stakeholder preferences through reinforcement learning.
13
+
14
+ **Key Features:**
15
+ - Multi-component dense reward function with diverse scoring (0.0-1.0 range)
16
+ - 3 difficulty-graded tasks (Easy → Medium → Hard)
17
+ - Multi-step action space (propose → reschedule → finalize)
18
+ - Ported from proven 30KB BotBooked scheduling algorithm
19
+ - Real-world utility: Executive scheduling is a $10B+ industry problem
20
+
21
+ ---
22
+
23
+ ## 1. Problem Statement
24
+
25
+ ### 1.1 Real-World Context
26
+
27
+ Meeting scheduling involves:
28
+ - Finding time slots that work for multiple participants
29
+ - Balancing individual preferences (preferred hours, buffer times, meeting limits)
30
+ - Handling calendar conflicts through intelligent rescheduling
31
+ - Optimizing for efficiency (minimal disruptions, quick solutions)
32
+
33
+ Current solutions (Calendly, Google Calendar auto-scheduling) use heuristic algorithms. This environment enables RL agents to learn optimal scheduling strategies through trial and error.
34
+
35
+ ### 1.2 Environment Goals
36
+
37
+ The agent must learn to:
38
+ 1. **Propose valid time slots** that satisfy hard constraints (working hours, availability)
39
+ 2. **Minimize preference violations** (back-to-back meetings, outside preferred hours, daily limits)
40
+ 3. **Handle cascading rescheduling** when conflicts exist
41
+ 4. **Balance competing objectives** (speed vs. quality, individual vs. group preferences)
42
+
43
+ ### 1.3 Hackathon Alignment
44
+
45
+ | Requirement | How We Meet It |
46
+ |-------------|----------------|
47
+ | Real-world task | Executive scheduling (genuine $10B+ industry value) |
48
+ | 3 tasks with graders | Easy/Medium/Hard scenarios with programmatic scoring (0.0-1.0) |
49
+ | Meaningful rewards | Dense multi-component signal with partial progress tracking |
50
+ | OpenEnv compliance | Pydantic models, step/reset/state API, openenv.yaml |
51
+ | Baseline inference | inference.py using HF Router with OpenAI client |
52
+ | Diverse scores | Multi-component formula guarantees unique scores per trajectory |
53
+
54
+ ---
55
+
56
+ ## 2. Architecture
57
+
58
+ ### 2.1 High-Level System Design
59
+
60
+ ```
61
+ ┌─────────────────────────────────────────────────────────┐
62
+ │ OpenEnv HTTP Server │
63
+ │ ┌───────────────────────────────────────────────────┐ │
64
+ │ │ FastAPI App (create_app factory) │ │
65
+ │ │ - POST /reset → Initialize episode │ │
66
+ │ │ - POST /step → Execute action │ │
67
+ │ │ - GET /state → Get current state │ │
68
+ │ └───────────────────────────────────────────────────┘ │
69
+ │ ↓ │
70
+ │ ┌───────────────────────────────────────────────────┐ │
71
+ │ │ SchedulingEnvironment │ │
72
+ │ │ - reset(task_id, scenario) → Observation │ │
73
+ │ │ - step(action) → Observation │ │
74
+ │ │ - state() → State │ │
75
+ │ └───────────────────────────────────────────────────┘ │
76
+ │ ↓ │
77
+ │ ┌───────────────────────────────────────────────────┐ │
78
+ │ │ BotBooked Core Logic (ported) │ │
79
+ │ │ - find_earliest_slot() │ │
80
+ │ │ - calculate_preference_score() │ │
81
+ │ │ - check_conflicts() │ │
82
+ │ │ - validate_constraints() │ │
83
+ │ └───────────────────────────────────────────────────┘ │
84
+ └─────────────────────────────────────────────────────────┘
85
+
86
+ ┌─────────────────────────────────────────────────────────┐
87
+ │ Python Client │
88
+ │ SchedulingEnv (EnvClient wrapper) │
89
+ │ - async/sync support │
90
+ │ - Type-safe action/observation handling │
91
+ └─────────────────────────────────────────────────────────┘
92
+ ```
93
+
94
+ ### 2.2 Directory Structure
95
+
96
+ ```
97
+ scheduling_env/
98
+ ├── __init__.py # Package exports
99
+ ├── models.py # Pydantic models (Action, Observation, State)
100
+ ├── client.py # HTTP client (EnvClient wrapper)
101
+ ├── openenv.yaml # OpenEnv metadata
102
+ ├── pyproject.toml # Dependencies
103
+ ├── README.md # Documentation
104
+ ├── inference.py # Baseline inference script (ROOT)
105
+ ├── Dockerfile # Docker image (ROOT)
106
+ ├── .env.example # Environment variables template
107
+ ├── server/
108
+ │ ├── __init__.py
109
+ │ ├── app.py # FastAPI app factory
110
+ │ ├── environment.py # SchedulingEnvironment class
111
+ │ ├── scheduling_logic.py # Ported BotBooked functions
112
+ │ ├── graders.py # Reward calculation
113
+ │ └── scenarios/
114
+ │ ├── task1_easy.json # Easy scenario definition
115
+ │ ├── task2_medium.json # Medium scenario definition
116
+ │ └── task3_hard.json # Hard scenario definition
117
+ └── tests/
118
+ ├── test_environment.py # Unit tests
119
+ └── test_graders.py # Grader validation
120
+ ```
121
+
122
+ ---
123
+
124
+ ## 3. Data Models
125
+
126
+ ### 3.1 Action Model
127
+
128
+ ```python
129
+ class SchedulingAction(Action):
130
+ """Agent's action in the scheduling environment"""
131
+
132
+ action_type: Literal["propose_slot", "reschedule_meeting", "finalize", "reject"]
133
+
134
+ # For propose_slot - agent suggests a time slot
135
+ proposed_start: Optional[str] = None # ISO8601 datetime string
136
+ proposed_duration: Optional[int] = None # minutes
137
+
138
+ # For reschedule_meeting - agent moves an existing meeting
139
+ meeting_id_to_move: Optional[str] = None
140
+ new_start_time: Optional[str] = None # ISO8601 datetime string
141
+
142
+ # Metadata (inherited from Action base class)
143
+ metadata: Dict[str, Any] = Field(default_factory=dict)
144
+ ```
145
+
146
+ **Action Types:**
147
+ - `propose_slot`: Agent proposes a time slot for the new meeting
148
+ - `reschedule_meeting`: Agent reschedules a conflicting lower-priority meeting to open up a slot
149
+ - `finalize`: Agent confirms current schedule is optimal and completes episode
150
+ - `reject`: Agent gives up (no valid slot found)
151
+
152
+ **Validation Rules:**
153
+ - `propose_slot` requires both `proposed_start` and `proposed_duration`
154
+ - `reschedule_meeting` requires both `meeting_id_to_move` and `new_start_time`
155
+ - `finalize` and `reject` require no additional parameters
156
+
157
+ ### 3.2 Observation Model
158
+
159
+ ```python
160
+ class SchedulingObservation(Observation):
161
+ """What agent sees after each step"""
162
+
163
+ # Meeting request details
164
+ requested_duration: int # minutes
165
+ requested_priority: int # 1=highest, 4=lowest
166
+ attendee_ids: List[str] # e.g., ["user1", "user2"]
167
+
168
+ # Current calendar state (all attendees combined)
169
+ busy_slots: List[Dict[str, Any]]
170
+ # Format: [{start: ISO8601, end: ISO8601, priority: int, summary: str, attendee: str}]
171
+
172
+ # Working hours constraints (intersection of all attendees)
173
+ collective_work_hours: Dict[str, int] # {min_start_hour: int, max_end_hour: int}
174
+
175
+ # Preference summary (aggregated from all attendees)
176
+ preference_constraints: Dict[str, Any]
177
+ # {max_meetings_per_day: int, requires_buffer: bool, buffer_minutes: int}
178
+
179
+ # Current proposal state
180
+ current_proposal: Optional[Dict[str, str]] = None # {start: ISO8601, end: ISO8601}
181
+ conflicts: List[Dict[str, Any]] = [] # Meetings that conflict with current proposal
182
+
183
+ # Scoring metrics
184
+ preference_penalty: float = 0.0 # Current preference violation score
185
+ num_rescheduled: int = 0 # How many meetings moved so far
186
+
187
+ # Episode state
188
+ steps_taken: int
189
+ max_steps: int = 20 # Episode limit
190
+
191
+ # Status flags
192
+ success: bool = False # Slot found and validated
193
+ error_message: Optional[str] = None
194
+
195
+ # Standard OpenEnv fields (inherited)
196
+ done: bool = False
197
+ reward: float = 0.0
198
+ metadata: Dict[str, Any] = Field(default_factory=dict)
199
+ ```
200
+
201
+ **Design Rationale:**
202
+ - Agent sees full calendar state (all busy slots across attendees)
203
+ - Preferences are aggregated to collective constraints
204
+ - Current proposal and conflicts help agent track progress
205
+ - Error messages provide feedback on invalid actions
206
+
207
+ ### 3.3 State Model
208
+
209
+ ```python
210
+ class SchedulingState(State):
211
+ """Internal environment state"""
212
+
213
+ # Standard fields (inherited from OpenEnv State)
214
+ episode_id: str # Unique UUID per episode
215
+ step_count: int # Number of steps taken
216
+
217
+ # Task info
218
+ task_id: str # e.g., "task1_easy", "task2_medium", "task3_hard"
219
+ scenario_name: str # Human-readable name
220
+
221
+ # Meeting request
222
+ meeting_request: Dict[str, Any]
223
+ # {duration: int, priority: int, attendees: List[str], summary: str}
224
+
225
+ # Calendar storage (BotBooked format)
226
+ calendars: Dict[str, List[Tuple[datetime, datetime, int, str]]]
227
+ # {user_id: [(start, end, priority, summary), ...]}
228
+
229
+ # Preferences (BotBooked format)
230
+ participant_preferences: Dict[str, Dict[str, Any]]
231
+ # {user_id: {preferred_hours: {start: int, end: int}, max_meetings_per_day: int,
232
+ # avoid_back_to_back: bool, buffer_minutes: int}}
233
+
234
+ # Tracking
235
+ proposed_slot: Optional[Tuple[datetime, datetime]] = None
236
+ rescheduled_meetings: List[Dict[str, Any]] = []
237
+ # [{meeting_id: str, old_start: datetime, new_start: datetime, attendee: str}]
238
+
239
+ # Performance metrics
240
+ total_preference_penalty: float = 0.0
241
+ total_steps: int = 0
242
+ final_reward: float = 0.0
243
+ completed: bool = False
244
+ ```
245
+
246
+ **Design Rationale:**
247
+ - Maintains BotBooked data format (minimal translation layer)
248
+ - Tracks rescheduling history for reward calculation
249
+ - Stores proposed slot for validation across steps
250
+
251
+ ---
252
+
253
+ ## 4. Episode Flow
254
+
255
+ ### 4.1 Episode Lifecycle
256
+
257
+ ```
258
+ ┌─────────────────────────────────────────────────────────┐
259
+ │ RESET │
260
+ ├─────────────────────────────────────────────────────────┤
261
+ │ 1. Load scenario JSON (task1/task2/task3) │
262
+ │ 2. Initialize calendars with existing meetings │
263
+ │ 3. Load participant preferences │
264
+ │ 4. Generate meeting request │
265
+ │ 5. Calculate collective working hours │
266
+ │ 6. Return initial observation (done=False, reward=0.0) │
267
+ └─────────────────────────────────────────────────────────┘
268
+
269
+ ┌─────────────────────────────────────────────────────────┐
270
+ │ STEP LOOP (max 20 steps) │
271
+ ├─────────────────────────────────────────────────────────┤
272
+ │ Agent submits action → Environment processes → Returns │
273
+ │ │
274
+ │ Action Processing: │
275
+ │ • propose_slot: Validate time, check conflicts, score │
276
+ │ • reschedule_meeting: Move meeting, update calendars │
277
+ │ • finalize: Calculate final reward, end episode │
278
+ │ • reject: End episode with failure (reward=0.0) │
279
+ │ │
280
+ │ Returns: Observation(reward, done, success, conflicts) │
281
+ └─────────────────────────────────────────────────────────┘
282
+
283
+ ┌─────────────────────────────────────────────────────────┐
284
+ │ EPISODE END │
285
+ ├─────────────────────────────────────────────────────────┤
286
+ │ Termination Conditions: │
287
+ │ ✓ Agent calls "finalize" with valid schedule │
288
+ │ ✓ Agent calls "reject" (failure) │
289
+ │ ✓ Max steps reached (20 steps timeout) │
290
+ │ ✓ Hard constraint violated │
291
+ │ │
292
+ │ Final reward: calculate_final_reward() → [0.0, 1.0] │
293
+ └─────────────────────────────────────────────────────────┘
294
+ ```
295
+
296
+ ### 4.2 Action Processing
297
+
298
+ #### 4.2.1 propose_slot
299
+
300
+ ```python
301
+ def _process_propose_slot(action: SchedulingAction) -> SchedulingObservation:
302
+ """
303
+ Agent proposes a time slot for the meeting
304
+
305
+ Steps:
306
+ 1. Parse proposed_start and calculate proposed_end
307
+ 2. Validate slot is within collective working hours
308
+ 3. Find conflicts with existing meetings
309
+ 4. Calculate preference penalty score
310
+ 5. Update state with proposal
311
+ 6. Return observation with step reward
312
+
313
+ Step Rewards:
314
+ - +0.5: No conflicts, low preference penalty (<100)
315
+ - +0.2: Conflicts exist, but all are lower priority (reschedulable)
316
+ - -0.3: Conflicts with higher priority meetings (invalid)
317
+ - -0.2: Outside working hours (hard constraint violation)
318
+ """
319
+
320
+ start_time = parse_iso8601(action.proposed_start)
321
+ end_time = start_time + timedelta(minutes=action.proposed_duration)
322
+
323
+ # Validate working hours
324
+ if not within_collective_hours(start_time, end_time, collective_work_hours):
325
+ return SchedulingObservation(
326
+ error_message="Proposed slot outside working hours",
327
+ reward=-0.2,
328
+ done=False
329
+ )
330
+
331
+ # Find conflicts
332
+ conflicts = []
333
+ for attendee in attendee_ids:
334
+ for meeting in calendars[attendee]:
335
+ if overlaps(start_time, end_time, meeting.start, meeting.end):
336
+ conflicts.append({
337
+ 'attendee': attendee,
338
+ 'start': meeting.start,
339
+ 'end': meeting.end,
340
+ 'priority': meeting.priority,
341
+ 'summary': meeting.summary,
342
+ 'meeting_id': f"{attendee}_{meeting.start.isoformat()}"
343
+ })
344
+
345
+ # Calculate preference penalty
346
+ preference_penalty = calculate_preference_score(
347
+ start_time,
348
+ action.proposed_duration,
349
+ participant_preferences
350
+ )
351
+
352
+ # Update state
353
+ state.proposed_slot = (start_time, end_time)
354
+ state.total_preference_penalty = preference_penalty
355
+
356
+ # Calculate step reward
357
+ if len(conflicts) == 0 and preference_penalty < 100:
358
+ step_reward = 0.5 # Perfect slot
359
+ elif len(conflicts) > 0:
360
+ if all(c['priority'] > requested_priority for c in conflicts):
361
+ step_reward = 0.2 # Reschedulable conflicts
362
+ else:
363
+ step_reward = -0.3 # Cannot reschedule (priority violation)
364
+ else:
365
+ step_reward = 0.0 # Free slot but high preference penalty
366
+
367
+ return SchedulingObservation(
368
+ current_proposal={'start': start_time.isoformat(), 'end': end_time.isoformat()},
369
+ conflicts=conflicts,
370
+ preference_penalty=preference_penalty,
371
+ reward=step_reward,
372
+ done=False
373
+ )
374
+ ```
375
+
376
+ #### 4.2.2 reschedule_meeting
377
+
378
+ ```python
379
+ def _process_reschedule_meeting(action: SchedulingAction) -> SchedulingObservation:
380
+ """
381
+ Agent reschedules a conflicting meeting to a new time
382
+
383
+ Steps:
384
+ 1. Validate meeting_id exists and is in conflict list
385
+ 2. Check priority (can only reschedule lower priority)
386
+ 3. Validate new time slot is free for that attendee
387
+ 4. Remove old meeting from calendar
388
+ 5. Add meeting at new time
389
+ 6. Update rescheduled_meetings list
390
+ 7. Recalculate conflicts for current proposal
391
+ 8. Return observation with step reward
392
+
393
+ Step Rewards:
394
+ - +0.5: Successful reschedule and all conflicts now resolved
395
+ - +0.3: Successful reschedule but conflicts remain
396
+ - -0.2: New slot not free or invalid
397
+ - -0.5: Attempted to reschedule higher priority meeting
398
+ """
399
+
400
+ # Find meeting
401
+ meeting = find_meeting_by_id(action.meeting_id_to_move, state.conflicts)
402
+ if not meeting:
403
+ return SchedulingObservation(
404
+ error_message="Invalid meeting_id or not in conflict list",
405
+ reward=-0.2,
406
+ done=False
407
+ )
408
+
409
+ # Check priority
410
+ if meeting['priority'] <= state.meeting_request['priority']:
411
+ return SchedulingObservation(
412
+ error_message="Cannot reschedule higher or equal priority meeting",
413
+ reward=-0.5,
414
+ done=False
415
+ )
416
+
417
+ # Validate new slot
418
+ new_start = parse_iso8601(action.new_start_time)
419
+ meeting_duration = (meeting['end'] - meeting['start']).seconds // 60
420
+ new_end = new_start + timedelta(minutes=meeting_duration)
421
+
422
+ if not is_slot_free(meeting['attendee'], new_start, new_end, calendars):
423
+ return SchedulingObservation(
424
+ error_message="New slot not free",
425
+ reward=-0.2,
426
+ done=False
427
+ )
428
+
429
+ # Update calendar
430
+ remove_meeting(calendars[meeting['attendee']], meeting['start'])
431
+ add_meeting(
432
+ calendars[meeting['attendee']],
433
+ new_start,
434
+ new_end,
435
+ meeting['priority'],
436
+ meeting['summary']
437
+ )
438
+
439
+ # Track rescheduling
440
+ state.rescheduled_meetings.append({
441
+ 'meeting_id': action.meeting_id_to_move,
442
+ 'old_start': meeting['start'].isoformat(),
443
+ 'new_start': new_start.isoformat(),
444
+ 'attendee': meeting['attendee']
445
+ })
446
+ state.num_rescheduled += 1
447
+
448
+ # Recalculate conflicts
449
+ new_conflicts = find_conflicts(
450
+ calendars,
451
+ state.proposed_slot[0],
452
+ state.proposed_slot[1],
453
+ attendee_ids
454
+ )
455
+
456
+ # Step reward
457
+ if len(new_conflicts) == 0:
458
+ step_reward = 0.5 # All conflicts resolved!
459
+ else:
460
+ step_reward = 0.3 # Progress made
461
+
462
+ return SchedulingObservation(
463
+ conflicts=new_conflicts,
464
+ num_rescheduled=state.num_rescheduled,
465
+ reward=step_reward,
466
+ done=False
467
+ )
468
+ ```
469
+
470
+ #### 4.2.3 finalize
471
+
472
+ ```python
473
+ def _process_finalize(action: SchedulingAction) -> SchedulingObservation:
474
+ """
475
+ Agent confirms schedule is optimal and ends episode
476
+
477
+ Steps:
478
+ 1. Validate proposed_slot exists
479
+ 2. Validate no unresolved conflicts
480
+ 3. Calculate final reward
481
+ 4. Mark episode as completed
482
+ 5. Return observation with done=True
483
+
484
+ Final Reward: calculate_final_reward(preference_penalty, num_rescheduled, steps)
485
+ """
486
+
487
+ # Validate state
488
+ if state.proposed_slot is None:
489
+ return SchedulingObservation(
490
+ error_message="No slot proposed",
491
+ success=False,
492
+ reward=-0.5,
493
+ done=True
494
+ )
495
+
496
+ # Check for unresolved conflicts
497
+ current_conflicts = find_conflicts(
498
+ calendars,
499
+ state.proposed_slot[0],
500
+ state.proposed_slot[1],
501
+ attendee_ids
502
+ )
503
+
504
+ if len(current_conflicts) > 0:
505
+ return SchedulingObservation(
506
+ error_message=f"Unresolved conflicts: {len(current_conflicts)} meetings",
507
+ conflicts=current_conflicts,
508
+ success=False,
509
+ reward=-0.3,
510
+ done=True
511
+ )
512
+
513
+ # Calculate final reward
514
+ final_reward = calculate_final_reward(
515
+ preference_penalty=state.total_preference_penalty,
516
+ num_rescheduled=state.num_rescheduled,
517
+ steps_taken=state.step_count
518
+ )
519
+
520
+ # Update state
521
+ state.completed = True
522
+ state.final_reward = final_reward
523
+
524
+ return SchedulingObservation(
525
+ success=True,
526
+ reward=final_reward,
527
+ done=True,
528
+ metadata={'final_slot': state.proposed_slot}
529
+ )
530
+ ```
531
+
532
+ #### 4.2.4 reject
533
+
534
+ ```python
535
+ def _process_reject(action: SchedulingAction) -> SchedulingObservation:
536
+ """
537
+ Agent gives up on finding a valid schedule
538
+
539
+ Returns: Observation with done=True, reward=0.0, success=False
540
+ """
541
+
542
+ return SchedulingObservation(
543
+ success=False,
544
+ reward=0.0,
545
+ done=True,
546
+ error_message="Agent rejected scheduling task"
547
+ )
548
+ ```
549
+
550
+ ### 4.3 Episode Termination
551
+
552
+ | Condition | done=True | Final Reward | Success |
553
+ |-----------|-----------|--------------|---------|
554
+ | Agent calls `finalize` with valid schedule | ✅ | calculate_final_reward() | ✅ |
555
+ | Agent calls `finalize` with conflicts | ✅ | -0.3 | ❌ |
556
+ | Agent calls `reject` | ✅ | 0.0 | ❌ |
557
+ | Max steps reached (20) | ✅ | partial_credit() | ❌ |
558
+ | Priority violation (reschedule higher priority) | ✅ | -0.5 | ❌ |
559
+
560
+ **IMPORTANT - Partial Credit on Timeout**:
561
+ To meet hackathon requirement "reward partial progress," we give partial credit when max steps reached:
562
+
563
+ ```python
564
+ def _handle_timeout(state: SchedulingState) -> SchedulingObservation:
565
+ """Give partial credit if agent made progress before timeout"""
566
+
567
+ # No proposal at all - complete failure
568
+ if state.proposed_slot is None:
569
+ return SchedulingObservation(
570
+ success=False,
571
+ reward=0.0,
572
+ done=True,
573
+ error_message="Timeout: No slot proposed"
574
+ )
575
+
576
+ # Has proposal - check if it's valid
577
+ conflicts = find_conflicts(
578
+ state.calendars,
579
+ state.proposed_slot[0],
580
+ state.proposed_slot[1],
581
+ state.attendee_ids
582
+ )
583
+
584
+ if len(conflicts) == 0:
585
+ # Valid slot found, just didn't finalize in time
586
+ # Give 70% of what final score would have been
587
+ theoretical_score = calculate_final_reward(
588
+ state.total_preference_penalty,
589
+ state.num_rescheduled,
590
+ state.step_count
591
+ )
592
+ partial_reward = theoretical_score * 0.7
593
+ else:
594
+ # Made progress but still has conflicts
595
+ # Give credit based on how close to solution
596
+ progress = 1.0 - (len(conflicts) / max(1, len(state.attendee_ids)))
597
+ partial_reward = 0.2 * progress
598
+
599
+ return SchedulingObservation(
600
+ success=False, # Technically failed (timeout)
601
+ reward=partial_reward,
602
+ done=True,
603
+ error_message=f"Timeout after {state.step_count} steps (partial credit: {partial_reward:.2f})"
604
+ )
605
+ ```
606
+
607
+ ---
608
+
609
+ ## 5. Reward Function
610
+
611
+ ### 5.1 Multi-Component Formula
612
+
613
+ ```python
614
+ def calculate_final_reward(
615
+ preference_penalty: float,
616
+ num_rescheduled: int,
617
+ steps_taken: int,
618
+ success: bool = True
619
+ ) -> float:
620
+ """
621
+ Calculate final episode reward (clamped to [0.0, 1.0])
622
+
623
+ Components (NON-LINEAR to prevent reward hacking):
624
+ 1. Base success: Start at 1.0
625
+ 2. Preference penalty: Non-linear scaling (BotBooked scoring: 0=perfect, 50=minor, 100+=severe)
626
+ 3. Efficiency penalty: EXPONENTIAL per meeting rescheduled (1st=-0.05, 2nd=-0.10, 3rd=-0.20)
627
+ 4. Time penalty: -0.015 per step taken
628
+
629
+ Returns: float in [0.0, 1.0] range
630
+
631
+ ANTI-REWARD-HACKING DESIGN:
632
+ - Preference penalty uses power scaling to make violations hurt more
633
+ - Rescheduling penalty is exponential (discourages cascading rescheduling)
634
+ - Time penalty increased from 0.01 to 0.015 (max penalty 0.30 at 20 steps)
635
+ """
636
+
637
+ if not success:
638
+ return 0.0
639
+
640
+ reward = 1.0
641
+
642
+ # Component 1: Preference penalty with power scaling
643
+ # 0-50 points → -0.0 to -0.25 deduction
644
+ # 50-150 points → -0.25 to -0.75 deduction
645
+ # 150+ points → -0.75+ deduction (severe violations)
646
+ preference_deduction = min(0.75, (preference_penalty ** 1.2) / 200.0)
647
+ reward -= preference_deduction
648
+
649
+ # Component 2: EXPONENTIAL rescheduling penalty
650
+ # Prevents agents from over-rescheduling as a lazy strategy
651
+ if num_rescheduled > 0:
652
+ rescheduling_deduction = 0.05 * (1.8 ** num_rescheduled)
653
+ reward -= min(0.30, rescheduling_deduction)
654
+
655
+ # Component 3: Time penalty (encourage efficiency)
656
+ time_deduction = steps_taken * 0.015
657
+ reward -= time_deduction
658
+
659
+ # Clamp to valid range
660
+ return max(0.0, min(1.0, reward))
661
+ ```
662
+
663
+ ### 5.2 Preference Penalty Calculation
664
+
665
+ ```python
666
+ def calculate_preference_score(
667
+ proposed_start: datetime,
668
+ duration: int,
669
+ participant_preferences: Dict[str, Dict]
670
+ ) -> float:
671
+ """
672
+ Calculate penalty points for preference violations (ported from BotBooked)
673
+
674
+ Violations per participant:
675
+ - Outside preferred hours: +50 points
676
+ - Exceeds max meetings per day: +30 points
677
+ - Back-to-back without buffer: +20 points
678
+
679
+ Returns: Sum of all penalties across participants
680
+ """
681
+
682
+ total_penalty = 0.0
683
+ proposed_end = proposed_start + timedelta(minutes=duration)
684
+
685
+ for user_id, prefs in participant_preferences.items():
686
+ user_penalty = 0.0
687
+
688
+ # Violation 1: Outside preferred hours
689
+ pref_start = prefs.get('preferred_hours', {}).get('start', 9)
690
+ pref_end = prefs.get('preferred_hours', {}).get('end', 17)
691
+
692
+ if proposed_start.hour < pref_start or proposed_end.hour > pref_end:
693
+ user_penalty += 50
694
+
695
+ # Violation 2: Exceeds max meetings per day
696
+ max_meetings = prefs.get('max_meetings_per_day', 999)
697
+ meetings_on_day = count_meetings_on_date(
698
+ calendars[user_id],
699
+ proposed_start.date()
700
+ )
701
+ if meetings_on_day >= max_meetings:
702
+ user_penalty += 30
703
+
704
+ # Violation 3: Back-to-back without buffer
705
+ avoid_btb = prefs.get('avoid_back_to_back', False)
706
+ buffer_min = prefs.get('buffer_minutes', 0)
707
+
708
+ if avoid_btb and buffer_min > 0:
709
+ has_violation = check_back_to_back(
710
+ calendars[user_id],
711
+ proposed_start,
712
+ proposed_end,
713
+ buffer_min
714
+ )
715
+ if has_violation:
716
+ user_penalty += 20
717
+
718
+ total_penalty += user_penalty
719
+
720
+ return total_penalty
721
+ ```
722
+
723
+ ### 5.3 Step Rewards (Dense Signal)
724
+
725
+ | Action Result | Step Reward | Reasoning |
726
+ |---------------|-------------|-----------|
727
+ | propose_slot: no conflicts, penalty < 100 | +0.5 | Perfect slot found |
728
+ | propose_slot: conflicts but reschedulable | +0.2 | Valid proposal |
729
+ | propose_slot: conflicts with higher priority | -0.3 | Invalid choice |
730
+ | propose_slot: outside work hours | -0.2 | Hard constraint violation |
731
+ | reschedule_meeting: all conflicts resolved | +0.5 | Major progress |
732
+ | reschedule_meeting: success, conflicts remain | +0.3 | Incremental progress |
733
+ | reschedule_meeting: new slot not free | -0.2 | Failed attempt |
734
+ | reschedule_meeting: priority violation | -0.5 | Rule violation |
735
+ | finalize: valid schedule | final_reward | Success |
736
+ | finalize: unresolved conflicts | -0.3 | Premature |
737
+ | reject | 0.0 | Gave up |
738
+
739
+ ### 5.4 Score Examples
740
+
741
+ #### Task 1 (Easy) - Expected 0.90-0.98
742
+
743
+ ```
744
+ Scenario: 2 attendees, sparse calendars, loose preferences
745
+
746
+ Agent Trajectory:
747
+ Step 1: propose_slot(10:00 AM, 30 min)
748
+ → No conflicts, preference_penalty=0
749
+ → reward=+0.5
750
+
751
+ Step 2: finalize()
752
+ → final_reward = 1.0 - (0^1.2)/200 - 0 - 2*0.015
753
+ → final_reward = 1.0 - 0.0 - 0.0 - 0.03
754
+ → final_reward = 0.97
755
+
756
+ Score: 0.97 ✅
757
+ ```
758
+
759
+ #### Task 2 (Medium) - Expected 0.55-0.70
760
+
761
+ ```
762
+ Scenario: 4 attendees, moderate density, strict preferences
763
+
764
+ Agent Trajectory:
765
+ Step 1: propose_slot(2:00 PM, 60 min)
766
+ → 1 conflict (priority 4), preference_penalty=50
767
+ → reward=+0.2
768
+
769
+ Step 2: reschedule_meeting(conflict_id, 4:00 PM)
770
+ → Success, no more conflicts
771
+ → reward=+0.5
772
+
773
+ Step 3: finalize()
774
+ → final_reward = 1.0 - (50^1.2)/200 - 0.05*(1.8^1) - 3*0.015
775
+ → final_reward = 1.0 - 0.25 - 0.09 - 0.045
776
+ → final_reward = 0.615
777
+
778
+ Score: 0.62 ✅ (Medium difficulty confirmed)
779
+ ```
780
+
781
+ #### Task 3 (Hard) - Expected 0.25-0.45
782
+
783
+ ```
784
+ Scenario: 6 attendees, dense calendars, conflicting preferences
785
+
786
+ Agent Trajectory:
787
+ Step 1: propose_slot(11:00 AM, 45 min)
788
+ → 3 conflicts, preference_penalty=120
789
+ → reward=+0.2
790
+
791
+ Step 2-4: reschedule 3 meetings
792
+ → rewards: +0.3, +0.3, +0.5
793
+
794
+ Step 5: finalize()
795
+ → final_reward = 1.0 - (120^1.2)/200 - 0.05*(1.8^3) - 5*0.015
796
+ → final_reward = 1.0 - 0.65 - 0.29 - 0.075
797
+ → final_reward = max(0.0, -0.015) = 0.0
798
+
799
+ Score: 0.0 ❌ (Too harsh! Adjust scenario or reduce penalties slightly)
800
+
801
+ CORRECTED Task 3 Trajectory (with preference_penalty=80):
802
+ → final_reward = 1.0 - (80^1.2)/200 - 0.05*(1.8^3) - 5*0.015
803
+ → final_reward = 1.0 - 0.43 - 0.29 - 0.075
804
+ → final_reward = 0.205
805
+
806
+ Score: 0.21 ✅ (Hard but achievable)
807
+ ```
808
+
809
+ ---
810
+
811
+ ## 6. Task Scenarios
812
+
813
+ ### 6.1 Task 1: EASY - "Simple Team Sync"
814
+
815
+ **Description**: Schedule a 30-minute team sync with 2 attendees who have sparse calendars.
816
+
817
+ **Scenario JSON**: `server/scenarios/task1_easy.json`
818
+
819
+ ```json
820
+ {
821
+ "task_id": "task1_easy",
822
+ "description": "Schedule a 30-minute team sync with 2 attendees",
823
+ "difficulty": "easy",
824
+ "meeting_request": {
825
+ "duration": 30,
826
+ "priority": 3,
827
+ "attendees": ["user1", "user2"],
828
+ "summary": "Team Sync"
829
+ },
830
+ "calendars": {
831
+ "user1": [
832
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Morning standup"],
833
+ ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 3, "Client call"]
834
+ ],
835
+ "user2": [
836
+ ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Team meeting"],
837
+ ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 3, "1-on-1"]
838
+ ]
839
+ },
840
+ "preferences": {
841
+ "user1": {
842
+ "preferred_hours": {"start": 9, "end": 17},
843
+ "max_meetings_per_day": 6,
844
+ "avoid_back_to_back": false,
845
+ "buffer_minutes": 0
846
+ },
847
+ "user2": {
848
+ "preferred_hours": {"start": 9, "end": 17},
849
+ "max_meetings_per_day": 6,
850
+ "avoid_back_to_back": false,
851
+ "buffer_minutes": 0
852
+ }
853
+ },
854
+ "expected_solution": {
855
+ "optimal_slot": "2025-04-07T10:00:00+00:00",
856
+ "expected_score_range": [0.8, 1.0],
857
+ "min_steps": 2,
858
+ "requires_rescheduling": false
859
+ }
860
+ }
861
+ ```
862
+
863
+ **Characteristics:**
864
+ - 2 attendees (low coordination complexity)
865
+ - Sparse calendars (2-3 meetings each)
866
+ - Loose preferences (no back-to-back rules, wide hours)
867
+ - Multiple free slots available
868
+ - No rescheduling required
869
+
870
+ **Grading:**
871
+ - ✅ 0.8-1.0: Agent finds free slot in 2-4 steps
872
+ - ⚠️ 0.5-0.8: Agent finds slot but inefficient (many steps)
873
+ - ❌ 0.0-0.5: Agent fails or violates constraints
874
+
875
+ ### 6.2 Task 2: MEDIUM - "Cross-Team Planning"
876
+
877
+ **Description**: Schedule a 60-minute planning session with 4 attendees with moderate calendar density.
878
+
879
+ **Scenario JSON**: `server/scenarios/task2_medium.json`
880
+
881
+ ```json
882
+ {
883
+ "task_id": "task2_medium",
884
+ "description": "Schedule a 60-minute planning session with 4 attendees",
885
+ "difficulty": "medium",
886
+ "meeting_request": {
887
+ "duration": 60,
888
+ "priority": 2,
889
+ "attendees": ["user1", "user2", "user3", "user4"],
890
+ "summary": "Cross-Team Planning"
891
+ },
892
+ "calendars": {
893
+ "user1": [
894
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Standup"],
895
+ ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 3, "Review"],
896
+ ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "Lunch meeting"],
897
+ ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 4, "Optional workshop"],
898
+ ["2025-04-07T16:30:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Sync"]
899
+ ],
900
+ "user2": [
901
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Standup"],
902
+ ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Client demo"],
903
+ ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 3, "Code review"],
904
+ ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Office hours"]
905
+ ],
906
+ "user3": [
907
+ ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 3, "Design review"],
908
+ ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Team lunch"],
909
+ ["2025-04-07T14:00:00+00:00", "2025-04-07T15:30:00+00:00", 2, "Sprint planning"],
910
+ ["2025-04-07T16:00:00+00:00", "2025-04-07T16:30:00+00:00", 4, "Coffee chat"]
911
+ ],
912
+ "user4": [
913
+ ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Strategy meeting"],
914
+ ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "1-on-1"],
915
+ ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 3, "Team sync"]
916
+ ]
917
+ },
918
+ "preferences": {
919
+ "user1": {
920
+ "preferred_hours": {"start": 10, "end": 16},
921
+ "max_meetings_per_day": 5,
922
+ "avoid_back_to_back": true,
923
+ "buffer_minutes": 15
924
+ },
925
+ "user2": {
926
+ "preferred_hours": {"start": 9, "end": 17},
927
+ "max_meetings_per_day": 4,
928
+ "avoid_back_to_back": true,
929
+ "buffer_minutes": 10
930
+ },
931
+ "user3": {
932
+ "preferred_hours": {"start": 9, "end": 15},
933
+ "max_meetings_per_day": 5,
934
+ "avoid_back_to_back": false,
935
+ "buffer_minutes": 0
936
+ },
937
+ "user4": {
938
+ "preferred_hours": {"start": 10, "end": 17},
939
+ "max_meetings_per_day": 6,
940
+ "avoid_back_to_back": true,
941
+ "buffer_minutes": 15
942
+ }
943
+ },
944
+ "expected_solution": {
945
+ "optimal_slot": "2025-04-07T11:00:00+00:00",
946
+ "expected_score_range": [0.5, 0.7],
947
+ "min_steps": 3,
948
+ "requires_rescheduling": true,
949
+ "reschedulable_meetings": ["user3:Coffee chat (priority 4)"]
950
+ }
951
+ }
952
+ ```
953
+
954
+ **Characteristics:**
955
+ - 4 attendees (moderate coordination)
956
+ - Moderate calendar density (5-7 meetings each)
957
+ - Conflicting preferences (narrow vs. wide hours)
958
+ - Back-to-back avoidance rules
959
+ - Requires 1 rescheduling
960
+
961
+ **Grading:**
962
+ - ✅ 0.6-0.7: Efficient rescheduling, respects preferences
963
+ - ⚠️ 0.5-0.6: Valid solution with preference violations
964
+ - ❌ 0.0-0.5: Excessive rescheduling or failure
965
+
966
+ ### 6.3 Task 3: HARD - "Executive Scheduling"
967
+
968
+ **Description**: Schedule a 45-minute executive meeting with 6 attendees with very dense calendars.
969
+
970
+ **Scenario JSON**: `server/scenarios/task3_hard.json`
971
+
972
+ ```json
973
+ {
974
+ "task_id": "task3_hard",
975
+ "description": "Schedule a 45-minute executive meeting with 6 attendees",
976
+ "difficulty": "hard",
977
+ "meeting_request": {
978
+ "duration": 45,
979
+ "priority": 2,
980
+ "attendees": ["user1", "user2", "user3", "user4", "user5", "user6"],
981
+ "summary": "Executive Planning Session"
982
+ },
983
+ "calendars": {
984
+ "user1": [
985
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Strategy meeting"],
986
+ ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 3, "Team standup"],
987
+ ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Lunch meeting"],
988
+ ["2025-04-07T13:30:00+00:00", "2025-04-07T14:30:00+00:00", 2, "Client call"],
989
+ ["2025-04-07T15:00:00+00:00", "2025-04-07T15:45:00+00:00", 4, "Optional training"],
990
+ ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Project sync"]
991
+ ],
992
+ "user2": [
993
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T09:30:00+00:00", 2, "Morning sync"],
994
+ ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Design review"],
995
+ ["2025-04-07T11:30:00+00:00", "2025-04-07T12:30:00+00:00", 3, "Code review"],
996
+ ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "1-on-1"],
997
+ ["2025-04-07T14:30:00+00:00", "2025-04-07T15:30:00+00:00", 2, "Planning meeting"],
998
+ ["2025-04-07T16:00:00+00:00", "2025-04-07T16:45:00+00:00", 4, "Coffee chat"]
999
+ ],
1000
+ "user3": [
1001
+ ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 3, "Sprint planning"],
1002
+ ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Architecture review"],
1003
+ ["2025-04-07T12:30:00+00:00", "2025-04-07T13:30:00+00:00", 3, "Team lunch"],
1004
+ ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 2, "Client demo"],
1005
+ ["2025-04-07T15:30:00+00:00", "2025-04-07T16:15:00+00:00", 4, "Office hours"]
1006
+ ],
1007
+ "user4": [
1008
+ ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Board meeting"],
1009
+ ["2025-04-07T11:30:00+00:00", "2025-04-07T12:30:00+00:00", 3, "Product review"],
1010
+ ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 2, "Executive sync"],
1011
+ ["2025-04-07T14:30:00+00:00", "2025-04-07T15:30:00+00:00", 3, "Team meeting"],
1012
+ ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 4, "Mentor session"]
1013
+ ],
1014
+ "user5": [
1015
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 3, "Daily standup"],
1016
+ ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 2, "Strategic planning"],
1017
+ ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Working lunch"],
1018
+ ["2025-04-07T13:30:00+00:00", "2025-04-07T14:30:00+00:00", 3, "Performance review"],
1019
+ ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 2, "Budget meeting"],
1020
+ ["2025-04-07T16:30:00+00:00", "2025-04-07T17:00:00+00:00", 4, "Optional networking"]
1021
+ ],
1022
+ "user6": [
1023
+ ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 2, "Leadership meeting"],
1024
+ ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 3, "Project checkpoint"],
1025
+ ["2025-04-07T12:30:00+00:00", "2025-04-07T13:30:00+00:00", 3, "Team sync"],
1026
+ ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 2, "Client meeting"],
1027
+ ["2025-04-07T15:30:00+00:00", "2025-04-07T16:30:00+00:00", 4, "Training session"]
1028
+ ]
1029
+ },
1030
+ "preferences": {
1031
+ "user1": {
1032
+ "preferred_hours": {"start": 10, "end": 16},
1033
+ "max_meetings_per_day": 5,
1034
+ "avoid_back_to_back": true,
1035
+ "buffer_minutes": 15
1036
+ },
1037
+ "user2": {
1038
+ "preferred_hours": {"start": 9, "end": 17},
1039
+ "max_meetings_per_day": 5,
1040
+ "avoid_back_to_back": true,
1041
+ "buffer_minutes": 15
1042
+ },
1043
+ "user3": {
1044
+ "preferred_hours": {"start": 9, "end": 15},
1045
+ "max_meetings_per_day": 4,
1046
+ "avoid_back_to_back": true,
1047
+ "buffer_minutes": 20
1048
+ },
1049
+ "user4": {
1050
+ "preferred_hours": {"start": 10, "end": 17},
1051
+ "max_meetings_per_day": 6,
1052
+ "avoid_back_to_back": true,
1053
+ "buffer_minutes": 10
1054
+ },
1055
+ "user5": {
1056
+ "preferred_hours": {"start": 9, "end": 16},
1057
+ "max_meetings_per_day": 5,
1058
+ "avoid_back_to_back": true,
1059
+ "buffer_minutes": 15
1060
+ },
1061
+ "user6": {
1062
+ "preferred_hours": {"start": 9, "end": 16},
1063
+ "max_meetings_per_day": 5,
1064
+ "avoid_back_to_back": true,
1065
+ "buffer_minutes": 10
1066
+ }
1067
+ },
1068
+ "expected_solution": {
1069
+ "optimal_slot": "2025-04-07T15:00:00+00:00",
1070
+ "expected_score_range": [0.25, 0.45],
1071
+ "min_steps": 5,
1072
+ "requires_rescheduling": true,
1073
+ "reschedulable_meetings": [
1074
+ "user1:Optional training (priority 4)",
1075
+ "user2:Coffee chat (priority 4)",
1076
+ "user5:Optional networking (priority 4)"
1077
+ ],
1078
+ "notes": "Multiple valid solutions exist. Agent must reschedule 3+ low-priority meetings."
1079
+ }
1080
+ }
1081
+ ```
1082
+
1083
+ **Characteristics:**
1084
+ - 6 attendees (high coordination complexity)
1085
+ - Dense calendars (5-6 meetings each)
1086
+ - Conflicting narrow preference windows (user3: 9-15, user1: 10-16)
1087
+ - All users near max_meetings_per_day limit
1088
+ - Requires rescheduling 3+ meetings
1089
+ - Cascading rescheduling needed
1090
+
1091
+ **Grading:**
1092
+ - ✅ 0.3-0.45: Successfully reschedules 3+ meetings in 5-8 steps
1093
+ - ⚠️ 0.2-0.3: Valid solution but excessive steps/rescheduling
1094
+ - ❌ 0.0-0.2: Gives up or violates priority rules
1095
+
1096
+ ---
1097
+
1098
+ ## 7. Grader Implementation
1099
+
1100
+ ```python
1101
+ class SchedulingGrader:
1102
+ """Programmatic grader for scheduling tasks"""
1103
+
1104
+ def grade_episode(
1105
+ self,
1106
+ task_id: str,
1107
+ final_state: SchedulingState,
1108
+ final_observation: SchedulingObservation
1109
+ ) -> float:
1110
+ """
1111
+ Calculate episode score in [0.0, 1.0] range
1112
+
1113
+ Process:
1114
+ 1. Check if successfully scheduled (done=True, success=True)
1115
+ 2. Use final_reward from calculate_final_reward()
1116
+ 3. Apply penalty for constraint violations
1117
+ 4. Return score
1118
+ """
1119
+
1120
+ # Failed to schedule
1121
+ if not final_state.completed or not final_observation.success:
1122
+ return 0.0
1123
+
1124
+ # Get final reward (already in [0.0, 1.0] range)
1125
+ score = final_state.final_reward
1126
+
1127
+ # Check for hard constraint violations
1128
+ violations = self._check_violations(final_state)
1129
+ if violations:
1130
+ # Severe penalty for violations
1131
+ score *= 0.5 # Cut score in half
1132
+ logger.warning(f"Constraint violations: {violations}")
1133
+
1134
+ return score
1135
+
1136
+ def _check_violations(self, state: SchedulingState) -> List[str]:
1137
+ """Detect hard constraint violations"""
1138
+ violations = []
1139
+
1140
+ # Violation 1: Rescheduled higher priority meeting
1141
+ for rescheduled in state.rescheduled_meetings:
1142
+ original_meeting = find_original_meeting(
1143
+ state.calendars,
1144
+ rescheduled['attendee'],
1145
+ rescheduled['old_start']
1146
+ )
1147
+ if original_meeting and original_meeting.priority <= state.meeting_request['priority']:
1148
+ violations.append(
1149
+ f"Rescheduled higher priority meeting: "
1150
+ f"{rescheduled['attendee']} {rescheduled['old_start']}"
1151
+ )
1152
+
1153
+ # Violation 2: Proposed slot outside collective working hours
1154
+ if state.proposed_slot:
1155
+ start, end = state.proposed_slot
1156
+ collective_hours = calculate_collective_hours(state.participant_preferences)
1157
+ if start.hour < collective_hours['min_start'] or end.hour > collective_hours['max_end']:
1158
+ violations.append(
1159
+ f"Proposed slot outside working hours: "
1160
+ f"{start.isoformat()} to {end.isoformat()}"
1161
+ )
1162
+
1163
+ # Violation 3: Overlapping meetings after rescheduling
1164
+ for user_id, calendar in state.calendars.items():
1165
+ overlaps = find_overlapping_meetings(calendar)
1166
+ if overlaps:
1167
+ violations.append(f"Overlapping meetings for {user_id}: {overlaps}")
1168
+
1169
+ return violations
1170
+ ```
1171
+
1172
+ ### 7.1 Score Diversity Validation
1173
+
1174
+ ```python
1175
+ def validate_score_diversity():
1176
+ """
1177
+ Verify graders return diverse scores (not same score every time)
1178
+
1179
+ Runs 100 random episodes per task and checks:
1180
+ - Variance > 0.01 (scores are diverse)
1181
+ - Unique scores >= 20 (not clustering)
1182
+ """
1183
+
1184
+ for task_id in ["task1_easy", "task2_medium", "task3_hard"]:
1185
+ scores = []
1186
+
1187
+ for _ in range(100):
1188
+ # Random agent policy
1189
+ score = run_random_episode(task_id)
1190
+ scores.append(score)
1191
+
1192
+ # Statistical checks
1193
+ variance = np.var(scores)
1194
+ unique_scores = len(set(scores))
1195
+ score_range = (min(scores), max(scores))
1196
+
1197
+ # Assertions (fail early if grader is broken)
1198
+ assert variance > 0.01, f"{task_id}: Scores too uniform (var={variance:.4f})"
1199
+ assert unique_scores >= 20, f"{task_id}: Only {unique_scores} unique scores"
1200
+
1201
+ print(f"{task_id}: ✅ Pass")
1202
+ print(f" Variance: {variance:.4f}")
1203
+ print(f" Unique scores: {unique_scores}")
1204
+ print(f" Range: {score_range}")
1205
+ ```
1206
+
1207
+ ---
1208
+
1209
+ ## 8. BotBooked Integration
1210
+
1211
+ ### 8.1 Porting Strategy
1212
+
1213
+ The environment reuses proven logic from BotBooked (30KB `app.py`):
1214
+
1215
+ **Functions to Port:**
1216
+
1217
+ 1. **find_earliest_slot()** → Used in environment to validate agent proposals
1218
+ 2. **calculate_preference_score()** → Direct port for reward calculation
1219
+ 3. **handle_rescheduling()** → Not directly used (agent does this), but reference for validation
1220
+ 4. **check_back_to_back()** → Helper for preference scoring
1221
+ 5. **parse_calendars()** → Calendar format conversion
1222
+
1223
+ **Translation Layer:**
1224
+
1225
+ ```python
1226
+ # BotBooked format
1227
+ calendars = {
1228
+ 'user1': [
1229
+ (datetime(2025, 4, 7, 9, 0), datetime(2025, 4, 7, 10, 0), 2, "Standup"),
1230
+ ...
1231
+ ]
1232
+ }
1233
+
1234
+ # Environment state format (same)
1235
+ state.calendars = calendars # No translation needed!
1236
+
1237
+ # Scenario JSON format → BotBooked format
1238
+ def load_scenario(scenario_json: Dict) -> Tuple[Dict, Dict]:
1239
+ """
1240
+ Convert JSON scenario to BotBooked calendar format
1241
+
1242
+ Input: JSON with ISO8601 strings
1243
+ Output: Dict with datetime tuples
1244
+ """
1245
+ calendars = {}
1246
+ for user_id, meetings in scenario_json['calendars'].items():
1247
+ calendars[user_id] = [
1248
+ (
1249
+ parse_iso8601(start),
1250
+ parse_iso8601(end),
1251
+ priority,
1252
+ summary
1253
+ )
1254
+ for start, end, priority, summary in meetings
1255
+ ]
1256
+ return calendars, scenario_json['preferences']
1257
+ ```
1258
+
1259
+ ### 8.2 Key Differences from BotBooked
1260
+
1261
+ | Aspect | BotBooked | SchedulingEnv |
1262
+ |--------|-----------|---------------|
1263
+ | **Input** | Natural language email | Structured JSON scenario |
1264
+ | **LLM Usage** | Qwen-3 for parsing | No LLM (agent learns policy) |
1265
+ | **Algorithm** | Two-pass search (free → reschedulable) | Agent explores action space |
1266
+ | **Rescheduling** | Automatic recursion | Agent decides step-by-step |
1267
+ | **Output** | Scheduled meeting JSON | Reward signal for RL training |
1268
+ | **Fallbacks** | 3 fallback strategies | Episode terminates on failure |
1269
+
1270
+ **Design Principle**: Environment provides state and validates actions; agent learns the scheduling strategy.
1271
+
1272
+ ### 8.3 BotBooked Integration Scope
1273
+
1274
+ **What We Port from BotBooked**:
1275
+ 1. **Validation functions**: `check_conflicts()`, `validate_constraints()` - ensures realistic constraints
1276
+ 2. **Reward calculation**: `calculate_preference_score()` - proven penalty scoring (50/30/20 points)
1277
+ 3. **Reference baseline**: `find_earliest_slot()` - used as heuristic baseline in `inference.py`
1278
+
1279
+ **What We DON'T Port**:
1280
+ - BotBooked's automatic two-pass algorithm is NOT the agent's policy
1281
+ - Agent must learn its own scheduling strategy through RL
1282
+ - BotBooked provides ground truth for validation, not the agent's decision-making
1283
+
1284
+ **Baseline Policy**:
1285
+ The heuristic baseline in `inference.py` uses BotBooked's `find_earliest_slot()` as a greedy policy for comparison. This establishes a performance floor - RL agents should learn to exceed this baseline by exploring better scheduling strategies.
1286
+
1287
+ ---
1288
+
1289
+ ## 9. Implementation Details
1290
+
1291
+ ### 9.1 Dependencies
1292
+
1293
+ ```toml
1294
+ # pyproject.toml
1295
+ [project]
1296
+ name = "scheduling-env"
1297
+ version = "0.1.0"
1298
+ dependencies = [
1299
+ "openenv-core>=0.2.0",
1300
+ "pydantic>=2.0.0",
1301
+ "fastapi>=0.100.0",
1302
+ "uvicorn>=0.23.0",
1303
+ "python-dateutil>=2.8.0",
1304
+ "openai>=1.0.0", # For baseline inference
1305
+ ]
1306
+
1307
+ [project.optional-dependencies]
1308
+ dev = [
1309
+ "pytest>=7.4.0",
1310
+ "pytest-asyncio>=0.21.0",
1311
+ "black>=23.0.0",
1312
+ "mypy>=1.5.0",
1313
+ ]
1314
+ ```
1315
+
1316
+ ### 9.2 Environment Variables
1317
+
1318
+ ```bash
1319
+ # .env.example
1320
+ # NOTE: API keys NOT required for baseline inference (uses heuristic policy)
1321
+ # These are only needed if you want to test LLM-based agents later
1322
+
1323
+ # API_BASE_URL=https://router.huggingface.co/v1 # Optional
1324
+ # MODEL_NAME=Qwen/Qwen2.5-72B-Instruct # Optional
1325
+ # HF_TOKEN=your_hf_token_here # Optional
1326
+ LOCAL_IMAGE_NAME=scheduling-env:latest # Optional for Docker
1327
+ ```
1328
+
1329
+ ### 9.3 OpenEnv Configuration
1330
+
1331
+ ```yaml
1332
+ # openenv.yaml
1333
+ spec_version: 1
1334
+ name: scheduling_env
1335
+ type: space
1336
+ runtime: fastapi
1337
+ app: server.app:app
1338
+ port: 8000
1339
+ description: "Intelligent Meeting Scheduling Environment - Learn optimal scheduling through multi-stakeholder preference optimization"
1340
+ tags:
1341
+ - scheduling
1342
+ - calendar
1343
+ - optimization
1344
+ - multi-agent
1345
+ - real-world
1346
+ ```
1347
+
1348
+ ### 9.4 Dockerfile
1349
+
1350
+ ```dockerfile
1351
+ # Dockerfile (ROOT directory)
1352
+ FROM python:3.11-slim
1353
+
1354
+ WORKDIR /app
1355
+
1356
+ # Install dependencies
1357
+ COPY pyproject.toml .
1358
+ RUN pip install --no-cache-dir -e .
1359
+
1360
+ # Copy code
1361
+ COPY scheduling_env/ ./scheduling_env/
1362
+ COPY server/ ./server/
1363
+ COPY inference.py .
1364
+
1365
+ # Expose port
1366
+ EXPOSE 8000
1367
+
1368
+ # Run server
1369
+ CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
1370
+ ```
1371
+
1372
+ ### 9.5 Inference Script Structure
1373
+
1374
+ ```python
1375
+ # inference.py (ROOT directory)
1376
+ """
1377
+ Baseline inference script for scheduling environment
1378
+
1379
+ CRITICAL DESIGN DECISION: NO LLM USED
1380
+ - Uses HEURISTIC baseline policy (BotBooked greedy algorithm)
1381
+ - Deterministic and reproducible
1382
+ - Fast execution (~30 seconds for all 3 tasks)
1383
+ - No API keys required (pure algorithmic baseline)
1384
+
1385
+ Requirements:
1386
+ - Outputs [START], [STEP], [END] format to stdout
1387
+ - Completes in < 20 minutes on vcpu=2, memory=8GB (actual: ~30 seconds)
1388
+ """
1389
+
1390
+ import os
1391
+ from datetime import datetime, timedelta
1392
+ from scheduling_env import SchedulingEnv, SchedulingAction
1393
+ from server.scheduling_logic import find_earliest_slot
1394
+
1395
+ def baseline_policy(obs) -> SchedulingAction:
1396
+ """
1397
+ Heuristic baseline using BotBooked two-pass greedy algorithm
1398
+
1399
+ Strategy:
1400
+ 1. If no proposal yet: Use find_earliest_slot() to propose
1401
+ 2. If conflicts exist: Reschedule lowest-priority conflict
1402
+ 3. If no conflicts: Finalize
1403
+
1404
+ NO LLM - Pure algorithmic baseline for reproducibility
1405
+ """
1406
+
1407
+ # Step 1: No proposal yet - find earliest slot
1408
+ if obs.current_proposal is None:
1409
+ # Convert observation to BotBooked calendar format
1410
+ calendars = {}
1411
+ for slot in obs.busy_slots:
1412
+ attendee = slot['attendee']
1413
+ if attendee not in calendars:
1414
+ calendars[attendee] = []
1415
+ calendars[attendee].append((
1416
+ datetime.fromisoformat(slot['start']),
1417
+ datetime.fromisoformat(slot['end']),
1418
+ slot['priority'],
1419
+ slot['summary']
1420
+ ))
1421
+
1422
+ # Use BotBooked's find_earliest_slot (already implemented!)
1423
+ result = find_earliest_slot(
1424
+ calendars=calendars,
1425
+ attendees=obs.attendee_ids,
1426
+ duration_minutes=obs.requested_duration,
1427
+ new_meeting_priority=obs.requested_priority,
1428
+ search_start_time=datetime.now(),
1429
+ max_preference_score=100
1430
+ )
1431
+
1432
+ if result:
1433
+ (start_time, end_time), conflicts = result
1434
+ return SchedulingAction(
1435
+ action_type="propose_slot",
1436
+ proposed_start=start_time.isoformat(),
1437
+ proposed_duration=obs.requested_duration
1438
+ )
1439
+ else:
1440
+ # No slot found - reject
1441
+ return SchedulingAction(action_type="reject")
1442
+
1443
+ # Step 2: Has proposal with conflicts - reschedule lowest priority
1444
+ elif len(obs.conflicts) > 0:
1445
+ # Sort conflicts by priority (highest number = lowest priority)
1446
+ sorted_conflicts = sorted(obs.conflicts, key=lambda x: x['priority'], reverse=True)
1447
+ lowest_priority_conflict = sorted_conflicts[0]
1448
+
1449
+ # Find next available slot after proposed meeting
1450
+ conflict_duration = (
1451
+ datetime.fromisoformat(lowest_priority_conflict['end']) -
1452
+ datetime.fromisoformat(lowest_priority_conflict['start'])
1453
+ ).seconds // 60
1454
+
1455
+ # Search after proposed slot + 15 min buffer
1456
+ new_slot_start = datetime.fromisoformat(obs.current_proposal['end']) + timedelta(minutes=15)
1457
+
1458
+ return SchedulingAction(
1459
+ action_type="reschedule_meeting",
1460
+ meeting_id_to_move=lowest_priority_conflict['meeting_id'],
1461
+ new_start_time=new_slot_start.isoformat()
1462
+ )
1463
+
1464
+ # Step 3: No conflicts - finalize!
1465
+ else:
1466
+ return SchedulingAction(action_type="finalize")
1467
+
1468
+
1469
+ def main():
1470
+ # Initialize environment (no API keys needed)
1471
+ env = SchedulingEnv(base_url="http://localhost:8000").sync()
1472
+
1473
+ for task_id in ["task1_easy", "task2_medium", "task3_hard"]:
1474
+ print(f"[START] task={task_id} env=scheduling_env model=heuristic_baseline")
1475
+
1476
+ obs = env.reset(task_id=task_id)
1477
+ done = False
1478
+ step = 0
1479
+ rewards = []
1480
+
1481
+ while not done and step < 20:
1482
+ # Heuristic baseline policy (NO LLM)
1483
+ action = baseline_policy(obs)
1484
+
1485
+ result = env.step(action)
1486
+ obs = result.observation
1487
+ done = obs.done
1488
+ reward = obs.reward
1489
+ rewards.append(reward)
1490
+ step += 1
1491
+
1492
+ # Log step
1493
+ error = obs.error_message if obs.error_message else "null"
1494
+ print(f"[STEP] step={step} action={action.action_type} reward={reward:.2f} done={str(done).lower()} error={error}")
1495
+
1496
+ # CRITICAL FIX: Final score is the LAST reward (when done=True)
1497
+ # NOT the average of step rewards!
1498
+ final_score = rewards[-1] if (done and rewards) else 0.0
1499
+ success = obs.success
1500
+ rewards_str = ",".join([f"{r:.2f}" for r in rewards])
1501
+
1502
+ print(f"[END] success={str(success).lower()} steps={step} score={final_score:.2f} rewards={rewards_str}")
1503
+
1504
+ env.reset()
1505
+
1506
+ env.close()
1507
+
1508
+ if __name__ == "__main__":
1509
+ main()
1510
+ ```
1511
+
1512
+ ---
1513
+
1514
+ ## 10. Testing & Validation
1515
+
1516
+ ### 10.1 Unit Tests
1517
+
1518
+ ```python
1519
+ # tests/test_environment.py
1520
+ import pytest
1521
+ from scheduling_env.server.environment import SchedulingEnvironment
1522
+ from scheduling_env.models import SchedulingAction
1523
+
1524
+ def test_reset_loads_scenario():
1525
+ """Test that reset() loads scenario correctly"""
1526
+ env = SchedulingEnvironment()
1527
+ obs = env.reset(task_id="task1_easy")
1528
+
1529
+ assert obs.requested_duration == 30
1530
+ assert len(obs.attendee_ids) == 2
1531
+ assert not obs.done
1532
+
1533
+ def test_propose_slot_no_conflicts():
1534
+ """Test proposing a free slot"""
1535
+ env = SchedulingEnvironment()
1536
+ env.reset(task_id="task1_easy")
1537
+
1538
+ action = SchedulingAction(
1539
+ action_type="propose_slot",
1540
+ proposed_start="2025-04-07T10:00:00+00:00",
1541
+ proposed_duration=30
1542
+ )
1543
+
1544
+ obs = env.step(action)
1545
+
1546
+ assert obs.reward > 0 # Should be positive (good proposal)
1547
+ assert len(obs.conflicts) == 0
1548
+ assert not obs.done
1549
+
1550
+ def test_reschedule_meeting():
1551
+ """Test rescheduling a conflicting meeting"""
1552
+ env = SchedulingEnvironment()
1553
+ env.reset(task_id="task2_medium")
1554
+
1555
+ # Propose slot with conflict
1556
+ action1 = SchedulingAction(
1557
+ action_type="propose_slot",
1558
+ proposed_start="2025-04-07T15:00:00+00:00",
1559
+ proposed_duration=60
1560
+ )
1561
+ obs1 = env.step(action1)
1562
+ assert len(obs1.conflicts) > 0
1563
+
1564
+ # Reschedule conflict
1565
+ conflict_id = obs1.conflicts[0]['meeting_id']
1566
+ action2 = SchedulingAction(
1567
+ action_type="reschedule_meeting",
1568
+ meeting_id_to_move=conflict_id,
1569
+ new_start_time="2025-04-07T17:00:00+00:00"
1570
+ )
1571
+ obs2 = env.step(action2)
1572
+
1573
+ assert obs2.num_rescheduled == 1
1574
+ assert obs2.reward > 0
1575
+
1576
+ def test_finalize_success():
1577
+ """Test finalizing a valid schedule"""
1578
+ env = SchedulingEnvironment()
1579
+ env.reset(task_id="task1_easy")
1580
+
1581
+ # Propose free slot
1582
+ env.step(SchedulingAction(
1583
+ action_type="propose_slot",
1584
+ proposed_start="2025-04-07T10:00:00+00:00",
1585
+ proposed_duration=30
1586
+ ))
1587
+
1588
+ # Finalize
1589
+ obs = env.step(SchedulingAction(action_type="finalize"))
1590
+
1591
+ assert obs.done
1592
+ assert obs.success
1593
+ assert obs.reward > 0.5 # Should be high reward
1594
+ ```
1595
+
1596
+ ### 10.2 Integration Tests
1597
+
1598
+ ```python
1599
+ # tests/test_graders.py
1600
+ def test_score_diversity():
1601
+ """Test that graders return diverse scores"""
1602
+ from scheduling_env.server.graders import SchedulingGrader
1603
+
1604
+ grader = SchedulingGrader()
1605
+ scores = []
1606
+
1607
+ # Run 50 random episodes
1608
+ for _ in range(50):
1609
+ env = SchedulingEnvironment()
1610
+ env.reset(task_id="task2_medium")
1611
+
1612
+ # Random policy
1613
+ while not env.state().completed:
1614
+ action = random_action()
1615
+ env.step(action)
1616
+
1617
+ score = grader.grade_episode(
1618
+ "task2_medium",
1619
+ env.state(),
1620
+ env._last_observation
1621
+ )
1622
+ scores.append(score)
1623
+
1624
+ # Check diversity
1625
+ variance = np.var(scores)
1626
+ unique = len(set(scores))
1627
+
1628
+ assert variance > 0.01, f"Scores too uniform: var={variance}"
1629
+ assert unique >= 15, f"Only {unique} unique scores"
1630
+
1631
+ def test_reward_range():
1632
+ """Test that all rewards are in [0.0, 1.0] range"""
1633
+ env = SchedulingEnvironment()
1634
+
1635
+ for task_id in ["task1_easy", "task2_medium", "task3_hard"]:
1636
+ env.reset(task_id=task_id)
1637
+
1638
+ for _ in range(10):
1639
+ action = random_action()
1640
+ obs = env.step(action)
1641
+
1642
+ assert 0.0 <= obs.reward <= 1.0, f"Reward out of range: {obs.reward}"
1643
+
1644
+ if obs.done:
1645
+ break
1646
+ ```
1647
+
1648
+ ### 10.3 Pre-Submission Checklist
1649
+
1650
+ ```bash
1651
+ #!/bin/bash
1652
+ # validate-submission.sh
1653
+
1654
+ echo "=== OpenEnv Scheduling Environment Validation ==="
1655
+
1656
+ # 1. OpenEnv validate
1657
+ echo "1. Running openenv validate..."
1658
+ openenv validate || exit 1
1659
+
1660
+ # 2. Docker build
1661
+ echo "2. Building Docker image..."
1662
+ docker build -t scheduling-env:latest . || exit 1
1663
+
1664
+ # 3. Run tests
1665
+ echo "3. Running tests..."
1666
+ pytest tests/ || exit 1
1667
+
1668
+ # 4. Score diversity check
1669
+ echo "4. Checking score diversity..."
1670
+ python -m tests.validate_diversity || exit 1
1671
+
1672
+ # 5. Inference script
1673
+ echo "5. Testing inference script..."
1674
+ docker run -e HF_TOKEN=$HF_TOKEN scheduling-env:latest python inference.py || exit 1
1675
+
1676
+ # 6. HF Space deployment test
1677
+ echo "6. Deploying to HF Space..."
1678
+ openenv push || exit 1
1679
+
1680
+ echo "✅ All validations passed!"
1681
+ ```
1682
+
1683
+ ---
1684
+
1685
+ ## 11. Deployment
1686
+
1687
+ ### 11.1 Local Development
1688
+
1689
+ ```bash
1690
+ # Install dependencies
1691
+ pip install -e .
1692
+
1693
+ # Run server
1694
+ uvicorn server.app:app --reload --port 8000
1695
+
1696
+ # In another terminal, test client
1697
+ python -c "
1698
+ from scheduling_env import SchedulingEnv
1699
+ env = SchedulingEnv(base_url='http://localhost:8000').sync()
1700
+ obs = env.reset(task_id='task1_easy')
1701
+ print(obs)
1702
+ "
1703
+ ```
1704
+
1705
+ ### 11.2 Docker Deployment
1706
+
1707
+ ```bash
1708
+ # Build image
1709
+ docker build -t scheduling-env:latest .
1710
+
1711
+ # Run container
1712
+ docker run -p 8000:8000 -e HF_TOKEN=$HF_TOKEN scheduling-env:latest
1713
+
1714
+ # Test inference
1715
+ docker exec -it <container_id> python inference.py
1716
+ ```
1717
+
1718
+ ### 11.3 Hugging Face Spaces
1719
+
1720
+ ```bash
1721
+ # Initialize openenv
1722
+ openenv init
1723
+
1724
+ # Validate
1725
+ openenv validate
1726
+
1727
+ # Push to HF Spaces
1728
+ openenv push
1729
+
1730
+ # Test deployed space
1731
+ curl https://your-space.hf.space/reset -X POST \
1732
+ -H "Content-Type: application/json" \
1733
+ -d '{"task_id": "task1_easy"}'
1734
+ ```
1735
+
1736
+ ---
1737
+
1738
+ ## 12. Success Metrics
1739
+
1740
+ ### 12.1 Hackathon Criteria Checklist
1741
+
1742
+ - [x] **Real-world utility (30%)**: Executive scheduling ($10B+ industry)
1743
+ - [x] **3 tasks with graders (25%)**: Easy/Medium/Hard with programmatic scoring
1744
+ - [x] **Environment design (20%)**: Multi-step actions, dense rewards, clean state
1745
+ - [x] **Code quality (15%)**: OpenEnv spec, Pydantic models, working Dockerfile
1746
+ - [x] **Creativity (10%)**: Novel domain (first scheduling RL env), multi-stakeholder optimization
1747
+
1748
+ ### 12.2 Technical Requirements
1749
+
1750
+ - [x] Typed Action/Observation/State Pydantic models
1751
+ - [x] `step()`, `reset()`, `state()` API
1752
+ - [x] `openenv.yaml` with metadata
1753
+ - [x] Passes `openenv validate`
1754
+ - [x] `inference.py` in root with [START]/[STEP]/[END] logging
1755
+ - [x] Dockerfile in root (not /server)
1756
+ - [x] Scores in [0.0, 1.0] range
1757
+ - [x] Graders return diverse scores (multi-component formula)
1758
+ - [x] < 20 min runtime on vcpu=2, memory=8GB
1759
+
1760
+ ### 12.3 Expected Scores
1761
+
1762
+ | Task | Difficulty | Random Agent | Heuristic Baseline | RL Agent Target | Expected Range |
1763
+ |------|------------|--------------|-------------------|-----------------|----------------|
1764
+ | Task 1 | Easy | 0.3-0.5 | 0.90-0.98 | 0.95-1.0 | 0.7-1.0 |
1765
+ | Task 2 | Medium | 0.1-0.3 | 0.55-0.70 | 0.75-0.85 | 0.4-0.8 |
1766
+ | Task 3 | Hard | 0.0-0.1 | 0.25-0.45 | 0.50-0.70 | 0.1-0.6 |
1767
+
1768
+ **Notes**:
1769
+ - **Random Agent**: Takes random valid actions (for diversity validation)
1770
+ - **Heuristic Baseline**: BotBooked greedy algorithm (no LLM, deterministic)
1771
+ - **RL Agent Target**: What a trained RL agent should achieve
1772
+ - **Expected Range**: Full score distribution across all agent types
1773
+
1774
+ ---
1775
+
1776
+ ## 13. Future Enhancements
1777
+
1778
+ (Not for initial hackathon submission, but documented for post-hackathon)
1779
+
1780
+ 1. **Recurring meetings**: Add support for weekly/bi-weekly scheduling
1781
+ 2. **Time zone handling**: Multi-timezone scheduling
1782
+ 3. **Preference learning**: Agent learns user preferences from feedback
1783
+ 4. **Calendar integration**: Real Google Calendar API integration
1784
+ 5. **Multi-day scheduling**: Schedule across multiple days
1785
+ 6. **Attendee prioritization**: Weight attendees by importance
1786
+ 7. **Meeting splitting**: Divide long meetings into multiple shorter slots
1787
+ 8. **Travel time**: Account for physical meeting location travel time
1788
+
1789
+ ---
1790
+
1791
+ ## Appendix A: BotBooked Algorithm Reference
1792
+
1793
+ ### Original Two-Pass Algorithm
1794
+
1795
+ ```python
1796
+ def find_earliest_slot(calendars, attendees, duration, priority):
1797
+ """
1798
+ BotBooked's proven scheduling algorithm (reference only)
1799
+
1800
+ Pass 1: Find completely free slot
1801
+ Pass 2: Find reschedulable slot (if Pass 1 fails)
1802
+ """
1803
+
1804
+ # Pass 1: Free slot search
1805
+ busy_slots = aggregate_busy_slots(calendars, attendees)
1806
+ for gap_start, gap_end in find_gaps(busy_slots):
1807
+ if gap_end - gap_start >= duration:
1808
+ if within_work_hours(gap_start) and preference_score(gap_start) < 100:
1809
+ return (gap_start, gap_start + duration), []
1810
+
1811
+ # Pass 2: Reschedulable slot search
1812
+ for potential_start in iterate_time_slots():
1813
+ conflicts = find_conflicts(calendars, potential_start, duration)
1814
+ if all(c.priority > priority for c in conflicts):
1815
+ if within_work_hours(potential_start) and preference_score(potential_start) < 150:
1816
+ return (potential_start, potential_start + duration), conflicts
1817
+
1818
+ return None
1819
+ ```
1820
+
1821
+ ---
1822
+
1823
+ ## Appendix B: Action Space Examples
1824
+
1825
+ ### Example 1: Perfect Slot (Task 1)
1826
+
1827
+ ```json
1828
+ {
1829
+ "action_type": "propose_slot",
1830
+ "proposed_start": "2025-04-07T10:00:00+00:00",
1831
+ "proposed_duration": 30
1832
+ }
1833
+ ```
1834
+
1835
+ **Expected Response:**
1836
+ ```json
1837
+ {
1838
+ "reward": 0.5,
1839
+ "done": false,
1840
+ "conflicts": [],
1841
+ "preference_penalty": 0.0,
1842
+ "current_proposal": {
1843
+ "start": "2025-04-07T10:00:00+00:00",
1844
+ "end": "2025-04-07T10:30:00+00:00"
1845
+ }
1846
+ }
1847
+ ```
1848
+
1849
+ ### Example 2: Rescheduling (Task 2)
1850
+
1851
+ ```json
1852
+ {
1853
+ "action_type": "reschedule_meeting",
1854
+ "meeting_id_to_move": "user3_2025-04-07T16:00:00",
1855
+ "new_start_time": "2025-04-07T17:00:00+00:00"
1856
+ }
1857
+ ```
1858
+
1859
+ **Expected Response:**
1860
+ ```json
1861
+ {
1862
+ "reward": 0.5,
1863
+ "done": false,
1864
+ "conflicts": [],
1865
+ "num_rescheduled": 1,
1866
+ "rescheduled_meetings": [
1867
+ {
1868
+ "meeting_id": "user3_2025-04-07T16:00:00",
1869
+ "old_start": "2025-04-07T16:00:00+00:00",
1870
+ "new_start": "2025-04-07T17:00:00+00:00"
1871
+ }
1872
+ ]
1873
+ }
1874
+ ```
1875
+
1876
+ ### Example 3: Finalize
1877
+
1878
+ ```json
1879
+ {
1880
+ "action_type": "finalize"
1881
+ }
1882
+ ```
1883
+
1884
+ **Expected Response:**
1885
+ ```json
1886
+ {
1887
+ "reward": 0.62,
1888
+ "done": true,
1889
+ "success": true,
1890
+ "metadata": {
1891
+ "final_slot": ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00"]
1892
+ }
1893
+ }
1894
+ ```
1895
+
1896
+ ---
1897
+
1898
+ ## Document Control
1899
+
1900
+ **Version**: 1.1
1901
+ **Date**: 2025-04-06 (Updated 2026-04-07)
1902
+ **Status**: Approved for Implementation - CRITICAL FIXES APPLIED
1903
+ **Next Steps**: Begin implementation immediately (deadline: April 8th)
1904
+
1905
+ ---
1906
+
1907
+ ## APPENDIX C: Implementation Action Plan (8 Hours to Deadline)
1908
+
1909
+ ### Phase 1: Core Implementation (4 hours)
1910
+
1911
+ #### 1.1 Port BotBooked Functions (1.5 hours)
1912
+ Create `server/scheduling_logic.py`:
1913
+ - Copy `calculate_preference_score()` from BotBooked (lines 251-279)
1914
+ - Copy `find_earliest_slot()` from BotBooked (lines 398-454)
1915
+ - Copy `get_user_preferences()` from BotBooked (lines 239-249)
1916
+ - Add helper functions for calendar manipulation
1917
+
1918
+ #### 1.2 Implement Environment (1.5 hours)
1919
+ Create `server/environment.py`:
1920
+ - `SchedulingEnvironment` class with OpenEnv interface
1921
+ - `reset()` - Load scenario JSON, initialize state
1922
+ - `step()` - Process actions and return observations
1923
+ - `_process_propose_slot()` - Validate proposals using BotBooked logic
1924
+ - `_process_reschedule_meeting()` - Update calendars
1925
+ - `_process_finalize()` - Calculate final reward with NEW formula
1926
+ - `_handle_timeout()` - Partial credit implementation
1927
+
1928
+ #### 1.3 Create Graders (30 min)
1929
+ Create `server/graders.py`:
1930
+ - `calculate_final_reward()` with NON-LINEAR penalties
1931
+ - `SchedulingGrader` class with validation checks
1932
+ - Score diversity validation functions
1933
+
1934
+ #### 1.4 Write Task Scenarios (30 min)
1935
+ Create JSON files in `server/scenarios/`:
1936
+ - `task1_easy.json` - 2 attendees, sparse calendars
1937
+ - `task2_medium.json` - 4 attendees, moderate density
1938
+ - `task3_hard.json` - 6 attendees, dense calendars (COMPLETE SPEC ABOVE)
1939
+
1940
+ ### Phase 2: Baseline & Testing (2 hours)
1941
+
1942
+ #### 2.1 Implement Baseline Policy (45 min)
1943
+ Create `inference.py` (ROOT):
1944
+ - `baseline_policy()` using BotBooked greedy algorithm
1945
+ - `convert_obs_to_calendar_format()` helper
1946
+ - Main loop with CORRECT score calculation (final reward, not average)
1947
+ - [START]/[STEP]/[END] logging format
1948
+
1949
+ #### 2.2 Local Testing (1 hour)
1950
+ ```bash
1951
+ # Terminal 1: Start server
1952
+ uvicorn server.app:app --port 8000 --reload
1953
+
1954
+ # Terminal 2: Run inference
1955
+ python inference.py
1956
+
1957
+ # Verify:
1958
+ # - All 3 tasks complete successfully
1959
+ # - Scores in expected ranges
1960
+ # - Runtime < 1 minute total
1961
+ # - Output format matches requirements
1962
+ ```
1963
+
1964
+ #### 2.3 Fix Bugs (15 min)
1965
+ - Debug any environment errors
1966
+ - Verify reward calculations match spec
1967
+ - Test edge cases (timeout, no solution, priority violations)
1968
+
1969
+ ### Phase 3: Docker & Deployment (2 hours)
1970
+
1971
+ #### 3.1 Docker Setup (30 min)
1972
+ Create `Dockerfile` (ROOT):
1973
+ ```dockerfile
1974
+ FROM python:3.11-slim
1975
+ WORKDIR /app
1976
+ COPY . .
1977
+ RUN pip install --no-cache-dir -e .
1978
+ EXPOSE 8000
1979
+ CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
1980
+ ```
1981
+
1982
+ Create `pyproject.toml`:
1983
+ ```toml
1984
+ [project]
1985
+ name = "scheduling-env"
1986
+ version = "0.1.0"
1987
+ dependencies = [
1988
+ "openenv-core>=0.2.0",
1989
+ "pydantic>=2.0.0",
1990
+ "fastapi>=0.100.0",
1991
+ "uvicorn>=0.23.0",
1992
+ "python-dateutil>=2.8.0",
1993
+ ]
1994
+ ```
1995
+
1996
+ #### 3.2 Validation (30 min)
1997
+ ```bash
1998
+ # Build Docker
1999
+ docker build -t scheduling-env:latest .
2000
+
2001
+ # Test Docker
2002
+ docker run -p 8000:8000 scheduling-env:latest
2003
+
2004
+ # OpenEnv validation
2005
+ openenv validate
2006
+
2007
+ # Must pass ALL checks:
2008
+ # ✓ Pydantic models valid
2009
+ # ✓ openenv.yaml correct
2010
+ # ✓ Server responds to /reset
2011
+ ```
2012
+
2013
+ #### 3.3 Deploy to HF Spaces (1 hour)
2014
+ ```bash
2015
+ # Push to Hugging Face
2016
+ openenv push
2017
+
2018
+ # Test deployed space
2019
+ curl https://your-space.hf.space/reset \
2020
+ -X POST \
2021
+ -H "Content-Type: application/json" \
2022
+ -d '{"task_id": "task1_easy"}'
2023
+
2024
+ # Should return 200 OK with observation
2025
+
2026
+ # Run inference on deployed space
2027
+ # Verify scores match local testing
2028
+ ```
2029
+
2030
+ ### Critical Success Checklist
2031
+
2032
+ Before submission, verify:
2033
+ - [ ] HF Space deploys (200 response to /reset)
2034
+ - [ ] `openenv validate` passes
2035
+ - [ ] Dockerfile builds without errors
2036
+ - [ ] `inference.py` runs in < 1 minute
2037
+ - [ ] All 3 tasks complete successfully
2038
+ - [ ] Scores in expected ranges (Task1: 0.9+, Task2: 0.6+, Task3: 0.3+)
2039
+ - [ ] [START]/[STEP]/[END] format correct
2040
+ - [ ] Reward function uses NON-LINEAR penalties
2041
+ - [ ] Partial credit on timeout implemented
2042
+ - [ ] Score diversity validation passes
2043
+
2044
+ ### Expected Timeline
2045
+
2046
+ | Phase | Duration | Completion Time |
2047
+ |-------|----------|-----------------|
2048
+ | Phase 1: Core Implementation | 4 hours | +4 hours |
2049
+ | Phase 2: Baseline & Testing | 2 hours | +6 hours |
2050
+ | Phase 3: Docker & Deployment | 2 hours | +8 hours |
2051
+ | **Total** | **8 hours** | **Ready for submission** |
2052
+
2053
+ ### Estimated Score (After Fixes)
2054
+
2055
+ | Criterion | Weight | Score | Points |
2056
+ |-----------|--------|-------|--------|
2057
+ | Real-world utility | 30% | Excellent | 27/30 |
2058
+ | Task & grader quality | 25% | Strong | 22/25 |
2059
+ | Environment design | 20% | Strong | 17/20 |
2060
+ | Code quality & spec | 15% | Excellent | 14/15 |
2061
+ | Creativity & novelty | 10% | Good | 7/10 |
2062
+ | **TOTAL** | **100%** | - | **87/100 (A-)** |
2063
+
2064
+ **Projected Rank**: Top 15-20% if executed correctly
2065
+
2066
+ ---
2067
+
2068
+ **End of Design Specification**
inference.py ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Baseline inference script for the Meeting Scheduling RL Environment.
4
+
5
+ Uses a HEURISTIC policy (BotBooked greedy algorithm) - NO LLM required.
6
+ Deterministic, reproducible, fast (~seconds for all 3 tasks).
7
+
8
+ Output format: [START]/[STEP]/[END] per hackathon spec.
9
+ """
10
+
11
+ from __future__ import annotations
12
+
13
+ import sys
14
+ from datetime import datetime, timedelta, timezone
15
+
16
+ from server.scheduling_env_environment import SchedulingEnvironment
17
+ from models import SchedulingAction
18
+ from server.scheduling_logic import find_earliest_free_slot, parse_iso
19
+
20
+
21
+ def baseline_policy(obs) -> SchedulingAction:
22
+ """Heuristic baseline using greedy slot search + lowest-priority rescheduling."""
23
+
24
+ # Step 1: No proposal yet -> find a free slot
25
+ if obs.current_proposal is None:
26
+ # Build calendars dict from busy_slots
27
+ calendars = {}
28
+ for slot in obs.busy_slots:
29
+ att = slot["attendee"]
30
+ if att not in calendars:
31
+ calendars[att] = []
32
+ calendars[att].append([slot["start"], slot["end"], slot["priority"], slot["summary"]])
33
+
34
+ # Try to find a completely free slot
35
+ free = find_earliest_free_slot(
36
+ calendars,
37
+ obs.attendee_ids,
38
+ obs.requested_duration,
39
+ obs.busy_slots[0]["start"] if obs.busy_slots else "2025-04-07T09:00:00+00:00",
40
+ obs.collective_work_hours,
41
+ )
42
+
43
+ if free:
44
+ return SchedulingAction(
45
+ action_type="propose_slot",
46
+ proposed_start=free,
47
+ proposed_duration=obs.requested_duration,
48
+ )
49
+
50
+ # No completely free slot found.
51
+ # Scan 15-min increments within collective hours for a slot with only
52
+ # reschedulable conflicts (priority > requested_priority).
53
+ min_h = obs.collective_work_hours.get("min_start_hour", 9)
54
+ max_h = obs.collective_work_hours.get("max_end_hour", 17)
55
+ duration = obs.requested_duration
56
+ tz = timezone.utc
57
+
58
+ candidate = datetime(2025, 4, 7, min_h, 0, 0, tzinfo=tz)
59
+ end_boundary = datetime(2025, 4, 7, max_h, 0, 0, tzinfo=tz)
60
+ step_delta = timedelta(minutes=15)
61
+
62
+ best_candidate = None
63
+ best_conflict_count = 999
64
+
65
+ while candidate + timedelta(minutes=duration) <= end_boundary:
66
+ c_start = candidate.isoformat()
67
+ c_end = (candidate + timedelta(minutes=duration)).isoformat()
68
+
69
+ # Count conflicts at this candidate
70
+ conflicts_here = []
71
+ for att in obs.attendee_ids:
72
+ for entry in calendars.get(att, []):
73
+ e_start = parse_iso(entry[0])
74
+ e_end = parse_iso(entry[1])
75
+ if candidate < e_end and e_start < candidate + timedelta(minutes=duration):
76
+ conflicts_here.append(entry)
77
+
78
+ # Check if all conflicts are reschedulable
79
+ all_reschedulable = all(
80
+ c[2] > obs.requested_priority for c in conflicts_here
81
+ )
82
+
83
+ if all_reschedulable and len(conflicts_here) < best_conflict_count:
84
+ best_candidate = c_start
85
+ best_conflict_count = len(conflicts_here)
86
+ if best_conflict_count == 0:
87
+ break # Perfect slot
88
+
89
+ candidate += step_delta
90
+
91
+ if best_candidate:
92
+ return SchedulingAction(
93
+ action_type="propose_slot",
94
+ proposed_start=best_candidate,
95
+ proposed_duration=duration,
96
+ )
97
+
98
+ # Last resort: propose at collective hours start (will likely conflict)
99
+ fallback = f"2025-04-07T{min_h:02d}:00:00+00:00"
100
+ return SchedulingAction(
101
+ action_type="propose_slot",
102
+ proposed_start=fallback,
103
+ proposed_duration=obs.requested_duration,
104
+ )
105
+
106
+ # Step 2: Has proposal with conflicts -> reschedule lowest-priority conflict
107
+ if obs.conflicts:
108
+ sorted_conflicts = sorted(obs.conflicts, key=lambda x: x["priority"], reverse=True)
109
+ target = sorted_conflicts[0]
110
+
111
+ # Can only reschedule lower priority
112
+ if target["priority"] <= obs.requested_priority:
113
+ return SchedulingAction(action_type="reject")
114
+
115
+ # Find a free slot for this attendee to move the meeting to.
116
+ # Search in early morning (06:00-08:00) and late evening (17:00-20:00).
117
+ attendee = target["attendee"]
118
+ meeting_dur = parse_iso(target["end"]) - parse_iso(target["start"])
119
+ dur_min = int(meeting_dur.total_seconds() // 60)
120
+
121
+ # Build this attendee's calendar
122
+ att_cal = [
123
+ s for s in obs.busy_slots if s["attendee"] == attendee
124
+ ]
125
+ att_entries = [[s["start"], s["end"], s["priority"], s["summary"]] for s in att_cal]
126
+
127
+ new_time = None
128
+ # Try slots at 06:00, 06:30, 07:00, 07:30, 17:00, 17:30, 18:00, 18:30, 19:00
129
+ for h, m in [(6,0),(6,30),(7,0),(7,30),(17,0),(17,30),(18,0),(18,30),(19,0),(19,30),(20,0)]:
130
+ cand = datetime(2025, 4, 7, h, m, 0, tzinfo=timezone.utc)
131
+ cand_end = cand + timedelta(minutes=dur_min)
132
+ cand_iso = cand.isoformat()
133
+ cand_end_iso = cand_end.isoformat()
134
+ # Check free for this attendee
135
+ conflict_found = False
136
+ for e in att_entries:
137
+ es = parse_iso(e[0])
138
+ ee = parse_iso(e[1])
139
+ if cand < ee and es < cand_end:
140
+ conflict_found = True
141
+ break
142
+ if not conflict_found:
143
+ new_time = cand_iso
144
+ break
145
+
146
+ if not new_time:
147
+ # Give up on this conflict, try rejecting
148
+ return SchedulingAction(action_type="reject")
149
+
150
+
151
+ return SchedulingAction(
152
+ action_type="reschedule_meeting",
153
+ meeting_id_to_move=target["meeting_id"],
154
+ new_start_time=new_time,
155
+ )
156
+
157
+ # Step 3: No conflicts -> finalize
158
+ return SchedulingAction(action_type="finalize")
159
+
160
+
161
+ def main():
162
+ env = SchedulingEnvironment()
163
+
164
+ for task_id in ["task1_easy", "task2_medium", "task3_hard"]:
165
+ print(f"[START] task={task_id} env=scheduling_env model=heuristic_baseline")
166
+
167
+ obs = env.reset(task_id=task_id)
168
+ done = False
169
+ step = 0
170
+ rewards = []
171
+
172
+ while not done and step < 20:
173
+ action = baseline_policy(obs)
174
+ obs = env.step(action)
175
+ done = obs.done
176
+ reward = obs.reward if obs.reward is not None else 0.0
177
+ rewards.append(reward)
178
+ step += 1
179
+
180
+ error = obs.error_message if obs.error_message else "null"
181
+ print(
182
+ f"[STEP] step={step} action={action.action_type} "
183
+ f"reward={reward:.2f} done={str(done).lower()} error={error}"
184
+ )
185
+
186
+ final_score = rewards[-1] if (done and rewards) else 0.0
187
+ success = obs.success if hasattr(obs, "success") else False
188
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
189
+
190
+ print(
191
+ f"[END] success={str(success).lower()} steps={step} "
192
+ f"score={final_score:.2f} rewards={rewards_str}"
193
+ )
194
+ print()
195
+
196
+
197
+ if __name__ == "__main__":
198
+ main()
models.py ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Data models for the Meeting Scheduling RL Environment.
9
+
10
+ Defines the Action, Observation, and State Pydantic models used by the
11
+ scheduling environment to coordinate meeting proposals, rescheduling,
12
+ and conflict resolution across multiple attendees.
13
+ """
14
+
15
+ from __future__ import annotations
16
+
17
+ from typing import Any, Dict, List, Literal, Optional
18
+
19
+ from pydantic import Field
20
+
21
+ from openenv.core.env_server.types import Action, Observation, State
22
+
23
+
24
+ class SchedulingAction(Action):
25
+ """Action the agent can take in the scheduling environment."""
26
+
27
+ action_type: Literal["propose_slot", "reschedule_meeting", "finalize", "reject"] = Field(
28
+ default="propose_slot",
29
+ description="Type of scheduling action to perform.",
30
+ )
31
+ proposed_start: Optional[str] = Field(
32
+ default=None,
33
+ description="ISO8601 datetime string for the proposed meeting start (used with propose_slot).",
34
+ )
35
+ proposed_duration: Optional[int] = Field(
36
+ default=None,
37
+ description="Duration in minutes for the proposed meeting (used with propose_slot).",
38
+ )
39
+ meeting_id_to_move: Optional[str] = Field(
40
+ default=None,
41
+ description="Identifier of an existing meeting to reschedule (used with reschedule_meeting).",
42
+ )
43
+ new_start_time: Optional[str] = Field(
44
+ default=None,
45
+ description="ISO8601 datetime string for the new start time of a rescheduled meeting.",
46
+ )
47
+
48
+
49
+ class SchedulingObservation(Observation):
50
+ """Observation returned to the agent after each step."""
51
+
52
+ requested_duration: int = Field(
53
+ default=0,
54
+ description="Requested meeting duration in minutes.",
55
+ )
56
+ requested_priority: int = Field(
57
+ default=3,
58
+ description="Priority of the meeting request (1=highest, 5=lowest).",
59
+ )
60
+ attendee_ids: List[str] = Field(
61
+ default_factory=list,
62
+ description="List of attendee user IDs required for the meeting.",
63
+ )
64
+ busy_slots: List[Dict[str, Any]] = Field(
65
+ default_factory=list,
66
+ description="Busy time slots: [{start, end, priority, summary, attendee}].",
67
+ )
68
+ collective_work_hours: Dict[str, int] = Field(
69
+ default_factory=dict,
70
+ description="Shared working hours window: {min_start_hour, max_end_hour}.",
71
+ )
72
+ preference_constraints: Dict[str, Any] = Field(
73
+ default_factory=dict,
74
+ description="Attendee preference constraints (e.g. preferred times, avoid windows).",
75
+ )
76
+ current_proposal: Optional[Dict[str, str]] = Field(
77
+ default=None,
78
+ description="Currently proposed slot: {start, end} as ISO8601 strings.",
79
+ )
80
+ conflicts: List[Dict[str, Any]] = Field(
81
+ default_factory=list,
82
+ description="List of conflicts with the current proposal.",
83
+ )
84
+ preference_penalty: float = Field(
85
+ default=0.0,
86
+ description="Accumulated penalty from violating attendee preferences.",
87
+ )
88
+ num_rescheduled: int = Field(
89
+ default=0,
90
+ description="Number of existing meetings rescheduled so far.",
91
+ )
92
+ steps_taken: int = Field(
93
+ default=0,
94
+ description="Number of steps taken in the current episode.",
95
+ )
96
+ max_steps: int = Field(
97
+ default=20,
98
+ description="Maximum number of steps allowed in the episode.",
99
+ )
100
+ success: bool = Field(
101
+ default=False,
102
+ description="Whether the meeting was successfully scheduled.",
103
+ )
104
+ error_message: Optional[str] = Field(
105
+ default=None,
106
+ description="Error message if the last action was invalid.",
107
+ )
108
+
109
+
110
+ class SchedulingState(State):
111
+ """Internal environment state tracking the full scheduling episode."""
112
+
113
+ task_id: str = Field(
114
+ default="",
115
+ description="Unique identifier for the current task.",
116
+ )
117
+ scenario_name: str = Field(
118
+ default="",
119
+ description="Human-readable name of the scheduling scenario.",
120
+ )
121
+ meeting_request: Dict[str, Any] = Field(
122
+ default_factory=dict,
123
+ description="The incoming meeting request details.",
124
+ )
125
+ calendars: Dict[str, List[Any]] = Field(
126
+ default_factory=dict,
127
+ description="Per-user calendars: {user_id: [[start, end, priority, summary], ...]}.",
128
+ )
129
+ participant_preferences: Dict[str, Dict[str, Any]] = Field(
130
+ default_factory=dict,
131
+ description="Per-participant scheduling preferences.",
132
+ )
133
+ proposed_slot: Optional[List[str]] = Field(
134
+ default=None,
135
+ description="Currently proposed slot as [start_iso, end_iso].",
136
+ )
137
+ rescheduled_meetings: List[Dict[str, Any]] = Field(
138
+ default_factory=list,
139
+ description="List of meetings that have been rescheduled during this episode.",
140
+ )
141
+ total_preference_penalty: float = Field(
142
+ default=0.0,
143
+ description="Cumulative penalty from preference violations.",
144
+ )
145
+ total_steps: int = Field(
146
+ default=0,
147
+ description="Total steps taken so far in the episode.",
148
+ )
149
+ final_reward: float = Field(
150
+ default=0.0,
151
+ description="Final computed reward for the episode.",
152
+ )
153
+ completed: bool = Field(
154
+ default=False,
155
+ description="Whether the episode has ended.",
156
+ )
openenv.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ spec_version: 1
2
+ name: scheduling_env
3
+ type: space
4
+ runtime: fastapi
5
+ app: server.app:app
6
+ port: 8000
7
+ description: "Intelligent Meeting Scheduling - Learn optimal scheduling through multi-stakeholder preference optimization"
openenv_scheduling_env.egg-info/PKG-INFO ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ Metadata-Version: 2.4
2
+ Name: openenv-scheduling_env
3
+ Version: 0.1.0
4
+ Summary: Scheduling Env environment for OpenEnv
5
+ Requires-Python: >=3.10
6
+ Requires-Dist: huggingface-hub>=1.9.1
7
+ Requires-Dist: openenv-core[core]>=0.2.2
8
+ Provides-Extra: dev
9
+ Requires-Dist: pytest>=8.0.0; extra == "dev"
10
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
openenv_scheduling_env.egg-info/SOURCES.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ README.md
2
+ pyproject.toml
3
+ ./__init__.py
4
+ ./client.py
5
+ ./inference.py
6
+ ./models.py
7
+ ./sample_infrenae.py
8
+ openenv_scheduling_env.egg-info/PKG-INFO
9
+ openenv_scheduling_env.egg-info/SOURCES.txt
10
+ openenv_scheduling_env.egg-info/dependency_links.txt
11
+ openenv_scheduling_env.egg-info/entry_points.txt
12
+ openenv_scheduling_env.egg-info/requires.txt
13
+ openenv_scheduling_env.egg-info/top_level.txt
14
+ server/__init__.py
15
+ server/app.py
16
+ server/graders.py
17
+ server/scenario_generator.py
18
+ server/scheduling_env_environment.py
19
+ server/scheduling_logic.py
openenv_scheduling_env.egg-info/dependency_links.txt ADDED
@@ -0,0 +1 @@
 
 
1
+
openenv_scheduling_env.egg-info/entry_points.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [console_scripts]
2
+ server = scheduling_env.server.app:main
openenv_scheduling_env.egg-info/requires.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ huggingface-hub>=1.9.1
2
+ openenv-core[core]>=0.2.2
3
+
4
+ [dev]
5
+ pytest>=8.0.0
6
+ pytest-cov>=4.0.0
openenv_scheduling_env.egg-info/top_level.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ scheduling_env
pyproject.toml ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ [build-system]
8
+ requires = ["setuptools>=45", "wheel"]
9
+ build-backend = "setuptools.build_meta"
10
+
11
+ [project]
12
+ name = "openenv-scheduling_env"
13
+ version = "0.1.0"
14
+ description = "Scheduling Env environment for OpenEnv"
15
+ requires-python = ">=3.10"
16
+ dependencies = [
17
+ "huggingface-hub>=1.9.1",
18
+ # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
19
+ # install from github
20
+ # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
21
+ "openenv-core[core]>=0.2.2",
22
+ # Environment-specific dependencies
23
+ # Add all dependencies needed for your environment here
24
+ # Examples:
25
+ # "numpy>=1.19.0",
26
+ # "torch>=2.0.0",
27
+ # "gymnasium>=0.29.0",
28
+ # "openspiel>=1.0.0",
29
+ # "smolagents>=1.22.0,<2",
30
+ ]
31
+
32
+ [project.optional-dependencies]
33
+ dev = [
34
+ "pytest>=8.0.0",
35
+ "pytest-cov>=4.0.0",
36
+ ]
37
+
38
+ [project.scripts]
39
+ # Server entry point - enables running via: uv run --project . server
40
+ # or: python -m scheduling_env.server.app
41
+ server = "scheduling_env.server.app:main"
42
+
43
+ [tool.setuptools]
44
+ include-package-data = true
45
+ packages = ["scheduling_env", "scheduling_env.server"]
46
+ package-dir = { "scheduling_env" = ".", "scheduling_env.server" = "server" }
sample_infrenae.py ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ """
3
+ Inference Script Example
4
+ ===================================
5
+ MANDATORY
6
+ - Before submitting, ensure the following variables are defined in your environment configuration:
7
+ API_BASE_URL The API endpoint for the LLM.
8
+ MODEL_NAME The model identifier to use for inference.
9
+ HF_TOKEN Your Hugging Face / API key.
10
+ LOCAL_IMAGE_NAME The name of the local image to use for the environment if you are using from_docker_image()
11
+ method
12
+
13
+ - Defaults are set only for API_BASE_URL and MODEL_NAME
14
+ (and should reflect your active inference setup):
15
+ API_BASE_URL = os.getenv("API_BASE_URL", "<your-active-endpoint>")
16
+ MODEL_NAME = os.getenv("MODEL_NAME", "<your-active-model>")
17
+
18
+ - The inference script must be named `inference.py` and placed in the root directory of the project
19
+ - Participants must use OpenAI Client for all LLM calls using above variables
20
+
21
+ STDOUT FORMAT
22
+ - The script must emit exactly three line types to stdout, in this order:
23
+
24
+ [START] task=<task_name> env=<benchmark> model=<model_name>
25
+ [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
26
+ [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
27
+
28
+ Rules:
29
+ - One [START] line at episode begin.
30
+ - One [STEP] line per step, immediately after env.step() returns.
31
+ - One [END] line after env.close(), always emitted (even on exception).
32
+ - reward and rewards are formatted to 2 decimal places.
33
+ - done and success are lowercase booleans: true or false.
34
+ - error is the raw last_action_error string, or null if none.
35
+ - All fields on a single line with no newlines within a line.
36
+ - Each tasks should return score in [0, 1]
37
+
38
+ Example:
39
+ [START] task=click-test env=miniwob model=Qwen3-VL-30B
40
+ [STEP] step=1 action=click('123') reward=0.00 done=false error=null
41
+ [STEP] step=2 action=fill('456','text') reward=0.00 done=false error=null
42
+ [STEP] step=3 action=click('789') reward=1.00 done=true error=null
43
+ [END] success=true steps=3 score=1.00 rewards=0.00,0.00,1.00
44
+ """
45
+
46
+ import asyncio
47
+ import os
48
+ import textwrap
49
+ from typing import List, Optional
50
+
51
+ from openai import OpenAI
52
+
53
+ from my_env_v4 import MyEnvV4Action, MyEnvV4Env
54
+ IMAGE_NAME = os.getenv("IMAGE_NAME") # If you are using docker image
55
+ API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
56
+
57
+ API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
58
+ MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
59
+ TASK_NAME = os.getenv("MY_ENV_V4_TASK", "echo")
60
+ BENCHMARK = os.getenv("MY_ENV_V4_BENCHMARK", "my_env_v4")
61
+ MAX_STEPS = 8
62
+ TEMPERATURE = 0.7
63
+ MAX_TOKENS = 150
64
+ SUCCESS_SCORE_THRESHOLD = 0.1 # normalized score in [0, 1]
65
+
66
+ # Max possible reward: each token contributes 0.1, across all steps
67
+ _MAX_REWARD_PER_STEP = MAX_TOKENS * 0.1
68
+ MAX_TOTAL_REWARD = MAX_STEPS * _MAX_REWARD_PER_STEP
69
+
70
+ SYSTEM_PROMPT = textwrap.dedent(
71
+ """
72
+ You are interacting with a simple echo environment.
73
+ Each turn you must send a message. The environment will echo it back.
74
+ Reward is proportional to message length: reward = len(message) * 0.1
75
+ Your goal is to maximize total reward by sending meaningful, substantive messages.
76
+ Reply with exactly one message string — no quotes, no prefixes, just the message text.
77
+ """
78
+ ).strip()
79
+
80
+
81
+ def log_start(task: str, env: str, model: str) -> None:
82
+ print(f"[START] task={task} env={env} model={model}", flush=True)
83
+
84
+
85
+ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
86
+ error_val = error if error else "null"
87
+ done_val = str(done).lower()
88
+ print(
89
+ f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
90
+ flush=True,
91
+ )
92
+
93
+
94
+ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
95
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
96
+ print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
97
+
98
+
99
+ def build_user_prompt(step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
100
+ history_block = "\n".join(history[-4:]) if history else "None"
101
+ return textwrap.dedent(
102
+ f"""
103
+ Step: {step}
104
+ Last echoed message: {last_echoed!r}
105
+ Last reward: {last_reward:.2f}
106
+ Previous steps:
107
+ {history_block}
108
+ Send your next message.
109
+ """
110
+ ).strip()
111
+
112
+
113
+ def get_model_message(client: OpenAI, step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
114
+ user_prompt = build_user_prompt(step, last_echoed, last_reward, history)
115
+ try:
116
+ completion = client.chat.completions.create(
117
+ model=MODEL_NAME,
118
+ messages=[
119
+ {"role": "system", "content": SYSTEM_PROMPT},
120
+ {"role": "user", "content": user_prompt},
121
+ ],
122
+ temperature=TEMPERATURE,
123
+ max_tokens=MAX_TOKENS,
124
+ stream=False,
125
+ )
126
+ text = (completion.choices[0].message.content or "").strip()
127
+ return text if text else "hello"
128
+ except Exception as exc:
129
+ print(f"[DEBUG] Model request failed: {exc}", flush=True)
130
+ return "hello"
131
+
132
+
133
+ async def main() -> None:
134
+ client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
135
+
136
+ env = await MyEnvV4Env.from_docker_image(IMAGE_NAME)
137
+
138
+ history: List[str] = []
139
+ rewards: List[float] = []
140
+ steps_taken = 0
141
+ score = 0.0
142
+ success = False
143
+
144
+ log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
145
+
146
+ try:
147
+ result = await env.reset() # OpenENV.reset()
148
+ last_echoed = result.observation.echoed_message
149
+ last_reward = 0.0
150
+
151
+ for step in range(1, MAX_STEPS + 1):
152
+ if result.done:
153
+ break
154
+
155
+ message = get_model_message(client, step, last_echoed, last_reward, history)
156
+
157
+ result = await env.step(MyEnvV4Action(message=message))
158
+ obs = result.observation
159
+
160
+ reward = result.reward or 0.0
161
+ done = result.done
162
+ error = None
163
+
164
+ rewards.append(reward)
165
+ steps_taken = step
166
+ last_echoed = obs.echoed_message
167
+ last_reward = reward
168
+
169
+ log_step(step=step, action=message, reward=reward, done=done, error=error)
170
+
171
+ history.append(f"Step {step}: {message!r} -> reward {reward:+.2f}")
172
+
173
+ if done:
174
+ break
175
+
176
+ score = sum(rewards) / MAX_TOTAL_REWARD if MAX_TOTAL_REWARD > 0 else 0.0
177
+ score = min(max(score, 0.0), 1.0) # clamp to [0, 1]
178
+ success = score >= SUCCESS_SCORE_THRESHOLD
179
+
180
+ finally:
181
+ try:
182
+ await env.close()
183
+ except Exception as e:
184
+ print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
185
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
186
+
187
+
188
+ if __name__ == "__main__":
189
+ asyncio.run(main())
server/Dockerfile ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ # Multi-stage build using openenv-base
8
+ # This Dockerfile is flexible and works for both:
9
+ # - In-repo environments (with local OpenEnv sources)
10
+ # - Standalone environments (with openenv from PyPI/Git)
11
+ # The build script (openenv build) handles context detection and sets appropriate build args.
12
+
13
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
14
+ FROM ${BASE_IMAGE} AS builder
15
+
16
+ WORKDIR /app
17
+
18
+ # Ensure git is available (required for installing dependencies from VCS)
19
+ RUN apt-get update && \
20
+ apt-get install -y --no-install-recommends git && \
21
+ rm -rf /var/lib/apt/lists/*
22
+
23
+ # Build argument to control whether we're building standalone or in-repo
24
+ ARG BUILD_MODE=in-repo
25
+ ARG ENV_NAME=scheduling_env
26
+
27
+ # Copy environment code (always at root of build context)
28
+ COPY . /app/env
29
+
30
+ # For in-repo builds, openenv is already vendored in the build context
31
+ # For standalone builds, openenv will be installed via pyproject.toml
32
+ WORKDIR /app/env
33
+
34
+ # Ensure uv is available (for local builds where base image lacks it)
35
+ RUN if ! command -v uv >/dev/null 2>&1; then \
36
+ curl -LsSf https://astral.sh/uv/install.sh | sh && \
37
+ mv /root/.local/bin/uv /usr/local/bin/uv && \
38
+ mv /root/.local/bin/uvx /usr/local/bin/uvx; \
39
+ fi
40
+
41
+ # Install dependencies using uv sync
42
+ # If uv.lock exists, use it; otherwise resolve on the fly
43
+ RUN --mount=type=cache,target=/root/.cache/uv \
44
+ if [ -f uv.lock ]; then \
45
+ uv sync --frozen --no-install-project --no-editable; \
46
+ else \
47
+ uv sync --no-install-project --no-editable; \
48
+ fi
49
+
50
+ RUN --mount=type=cache,target=/root/.cache/uv \
51
+ if [ -f uv.lock ]; then \
52
+ uv sync --frozen --no-editable; \
53
+ else \
54
+ uv sync --no-editable; \
55
+ fi
56
+
57
+ # Final runtime stage
58
+ FROM ${BASE_IMAGE}
59
+
60
+ WORKDIR /app
61
+
62
+ # Copy the virtual environment from builder
63
+ COPY --from=builder /app/env/.venv /app/.venv
64
+
65
+ # Copy the environment code
66
+ COPY --from=builder /app/env /app/env
67
+
68
+ # Set PATH to use the virtual environment
69
+ ENV PATH="/app/.venv/bin:$PATH"
70
+
71
+ # Set PYTHONPATH so imports work correctly
72
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
73
+
74
+ # Health check
75
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
76
+ CMD curl -f http://localhost:8000/health || exit 1
77
+
78
+ # Run the FastAPI server
79
+ # The module path is constructed to work with the /app/env structure
80
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
server/__init__.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Scheduling Env environment server components."""
8
+
9
+ from .scheduling_env_environment import SchedulingEnvironment
10
+
11
+ __all__ = ["SchedulingEnvironment"]
server/app.py ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ FastAPI application for the Meeting Scheduling RL Environment.
9
+
10
+ Uses a custom HTTP server pattern (based on calendar_env reference)
11
+ to maintain a persistent environment instance across HTTP calls,
12
+ enabling stateful multi-step episodes via /reset → /step → /state.
13
+ """
14
+
15
+ from __future__ import annotations
16
+
17
+ import asyncio
18
+ import logging
19
+ from typing import Any, Dict, Optional
20
+
21
+ from fastapi import Body, FastAPI
22
+ from openenv.core.env_server.http_server import HTTPEnvServer
23
+
24
+ try:
25
+ from ..models import SchedulingAction, SchedulingObservation, SchedulingState
26
+ from .scheduling_env_environment import SchedulingEnvironment
27
+ except (ModuleNotFoundError, ImportError):
28
+ from models import SchedulingAction, SchedulingObservation, SchedulingState
29
+ from server.scheduling_env_environment import SchedulingEnvironment
30
+
31
+ logger = logging.getLogger(__name__)
32
+
33
+
34
+ class SchedulingHTTPEnvServer(HTTPEnvServer):
35
+ """Custom HTTP server that maintains a persistent env instance.
36
+
37
+ Follows the pattern from OpenEnv's calendar_env: subclass HTTPEnvServer,
38
+ create one persistent environment, and register custom routes that
39
+ use it for all HTTP requests.
40
+ """
41
+
42
+ def __init__(self, env, action_cls, observation_cls):
43
+ self.action_cls = action_cls
44
+ self.observation_cls = observation_cls
45
+ super().__init__(env=env, action_cls=action_cls, observation_cls=observation_cls)
46
+
47
+ # Persistent environment for HTTP endpoints
48
+ if callable(self._env_factory):
49
+ self.env = self._env_factory()
50
+ else:
51
+ self.env = self._env_factory
52
+
53
+ def register_routes(self, app: FastAPI) -> None: # type: ignore[override]
54
+ """Register custom /reset, /step, /state endpoints."""
55
+
56
+ @app.post("/reset")
57
+ async def reset_handler(
58
+ body: Optional[Dict[str, Any]] = Body(default=None),
59
+ ) -> Dict[str, Any]:
60
+ body = body or {}
61
+ task_id = body.get("task_id", "task1_easy")
62
+
63
+ loop = asyncio.get_event_loop()
64
+ observation = await loop.run_in_executor(
65
+ None, lambda: self.env.reset(task_id=task_id)
66
+ )
67
+
68
+ obs_dict = (
69
+ observation.model_dump()
70
+ if hasattr(observation, "model_dump")
71
+ else observation.__dict__
72
+ )
73
+ return {
74
+ "observation": obs_dict,
75
+ "done": getattr(observation, "done", False),
76
+ "reward": getattr(observation, "reward", 0.0),
77
+ **obs_dict,
78
+ }
79
+
80
+ @app.post("/step")
81
+ async def step_handler(
82
+ body: Dict[str, Any] = Body(...),
83
+ ) -> Dict[str, Any]:
84
+ # Support both {"action": {...}} and direct action fields
85
+ action_data = body.get("action", body)
86
+
87
+ try:
88
+ action = self.action_cls(**action_data)
89
+ except Exception as e:
90
+ logger.error("Failed to deserialize action: %s", e)
91
+ return {
92
+ "observation": {
93
+ "success": False,
94
+ "error_message": f"Invalid action: {e}",
95
+ "done": False,
96
+ "reward": -1.0,
97
+ },
98
+ "done": False,
99
+ "reward": -1.0,
100
+ }
101
+
102
+ loop = asyncio.get_event_loop()
103
+ observation = await loop.run_in_executor(
104
+ None, self.env.step, action
105
+ )
106
+
107
+ obs_dict = (
108
+ observation.model_dump()
109
+ if hasattr(observation, "model_dump")
110
+ else observation.__dict__
111
+ )
112
+ return {
113
+ "observation": obs_dict,
114
+ "done": getattr(observation, "done", False),
115
+ "reward": getattr(observation, "reward", 0.0),
116
+ **obs_dict,
117
+ }
118
+
119
+ @app.get("/state")
120
+ async def state_handler() -> Dict[str, Any]:
121
+ state = self.env.state
122
+ return (
123
+ state.model_dump()
124
+ if hasattr(state, "model_dump")
125
+ else state.__dict__
126
+ )
127
+
128
+ @app.get("/health")
129
+ async def health_handler() -> Dict[str, str]:
130
+ return {"status": "healthy", "environment": "scheduling_env"}
131
+
132
+
133
+ def create_scheduling_environment():
134
+ """Factory function for the scheduling environment."""
135
+ return SchedulingEnvironment()
136
+
137
+
138
+ # Build the FastAPI app with custom stateful HTTP server
139
+ app = FastAPI(
140
+ title="Scheduling RL Environment",
141
+ description="Intelligent Meeting Scheduling Environment for OpenEnv",
142
+ version="1.0.0",
143
+ )
144
+
145
+ _server = SchedulingHTTPEnvServer(
146
+ env=create_scheduling_environment,
147
+ action_cls=SchedulingAction,
148
+ observation_cls=SchedulingObservation,
149
+ )
150
+ _server.register_routes(app)
151
+
152
+
153
+ def main():
154
+ import uvicorn
155
+ uvicorn.run(app, host="0.0.0.0", port=8000)
156
+
157
+
158
+ if __name__ == "__main__":
159
+ main()
server/graders.py ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Graders for the meeting-scheduling RL environment.
2
+
3
+ Provides programmatic scoring (0.0–1.0) per episode and validation
4
+ that graders produce diverse scores across different agent trajectories.
5
+ """
6
+
7
+ from __future__ import annotations
8
+
9
+ import logging
10
+ from typing import List
11
+
12
+ from .scheduling_logic import (
13
+ calculate_collective_hours,
14
+ calculate_final_reward,
15
+ find_conflicts,
16
+ parse_iso,
17
+ )
18
+
19
+ logger = logging.getLogger(__name__)
20
+
21
+
22
+ class SchedulingGrader:
23
+ """Programmatic grader for scheduling tasks."""
24
+
25
+ def grade_episode(self, final_state, final_observation) -> float:
26
+ """Score an episode in [0.0, 1.0].
27
+
28
+ Returns ``final_state.final_reward`` when the episode completed
29
+ successfully, with a 50 % penalty applied if any hard constraint
30
+ violations are detected.
31
+ """
32
+ if not final_state.completed or not final_observation.success:
33
+ return 0.0
34
+
35
+ score = final_state.final_reward
36
+
37
+ violations = self._check_violations(final_state)
38
+ if violations:
39
+ score *= 0.5
40
+ logger.warning("Constraint violations: %s", violations)
41
+
42
+ return max(0.0, min(1.0, score))
43
+
44
+ def _check_violations(self, state) -> List[str]:
45
+ """Detect hard constraint violations in the final state."""
46
+ violations: List[str] = []
47
+ req_priority = state.meeting_request.get("priority", 99)
48
+
49
+ # Violation 1: Rescheduled equal-or-higher priority meeting
50
+ for rm in state.rescheduled_meetings:
51
+ attendee = rm["attendee"]
52
+ old_start = rm["old_start"]
53
+ for entry in state.calendars.get(attendee, []):
54
+ if entry[0] == old_start and entry[2] <= req_priority:
55
+ violations.append(
56
+ f"Rescheduled higher priority meeting: "
57
+ f"{attendee} {old_start}"
58
+ )
59
+
60
+ # Violation 2: Proposed slot outside collective working hours
61
+ if state.proposed_slot:
62
+ collective = calculate_collective_hours(state.participant_preferences)
63
+ start = parse_iso(state.proposed_slot[0])
64
+ end = parse_iso(state.proposed_slot[1])
65
+ if start.hour < collective["min_start_hour"]:
66
+ violations.append(
67
+ f"Slot starts before working hours: {state.proposed_slot[0]}"
68
+ )
69
+ if end.hour > collective["max_end_hour"] or (
70
+ end.hour == collective["max_end_hour"] and end.minute > 0
71
+ ):
72
+ violations.append(
73
+ f"Slot ends after working hours: {state.proposed_slot[1]}"
74
+ )
75
+
76
+ # Violation 3: Overlapping meetings after rescheduling
77
+ for user_id, calendar in state.calendars.items():
78
+ sorted_cal = sorted(calendar, key=lambda e: e[0])
79
+ for i in range(len(sorted_cal) - 1):
80
+ end_i = parse_iso(sorted_cal[i][1])
81
+ start_next = parse_iso(sorted_cal[i + 1][0])
82
+ if end_i > start_next:
83
+ violations.append(
84
+ f"Overlap for {user_id}: {sorted_cal[i][3]} / {sorted_cal[i+1][3]}"
85
+ )
86
+
87
+ return violations
server/requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ openenv[core]>=0.2.0
2
+ fastapi>=0.115.0
3
+ uvicorn>=0.24.0
4
+
5
+
6
+
server/scenario_generator.py ADDED
@@ -0,0 +1,403 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Random scenario generator for the meeting-scheduling RL environment.
2
+
3
+ Produces solvable scenarios with controlled difficulty so agents cannot
4
+ memorise fixed answers. Difficulty is governed by a parameter dict that
5
+ controls attendee count, calendar density, and preference strictness.
6
+
7
+ Usage:
8
+ from server.scenario_generator import generate_scenario
9
+ scenario = generate_scenario("random_medium")
10
+ scenario = generate_scenario("random_easy", seed=42) # reproducible
11
+ """
12
+
13
+ from __future__ import annotations
14
+
15
+ import random
16
+ from datetime import date, datetime, timedelta, timezone
17
+ from typing import Any, Dict, List, Optional, Tuple
18
+
19
+ from .scheduling_logic import (
20
+ find_conflicts,
21
+ is_slot_free,
22
+ within_collective_hours,
23
+ calculate_collective_hours,
24
+ )
25
+
26
+ # ── Difficulty presets ────────────────────────────────────────────────
27
+
28
+ DIFFICULTY_PARAMS: Dict[str, Dict[str, Any]] = {
29
+ "easy": {
30
+ "num_attendees": (2, 2), # (min, max)
31
+ "meetings_per_user": (1, 3),
32
+ "meeting_duration": (30, 30), # requested meeting duration in min
33
+ "request_priority": (3, 4), # lower number = higher priority
34
+ "existing_priority_range": (2, 5),
35
+ "pref_start_range": (8, 10),
36
+ "pref_end_range": (16, 18),
37
+ "max_meetings_day_range": (5, 8),
38
+ "avoid_btb_prob": 0.1,
39
+ "buffer_range": (0, 10),
40
+ "calendar_slot_min": 30,
41
+ "calendar_slot_max": 60,
42
+ "guarantee_free_slot": True, # easy always has a free slot
43
+ },
44
+ "medium": {
45
+ "num_attendees": (3, 4),
46
+ "meetings_per_user": (3, 5),
47
+ "meeting_duration": (30, 60),
48
+ "request_priority": (2, 3),
49
+ "existing_priority_range": (2, 5),
50
+ "pref_start_range": (9, 10),
51
+ "pref_end_range": (15, 17),
52
+ "max_meetings_day_range": (4, 6),
53
+ "avoid_btb_prob": 0.5,
54
+ "buffer_range": (10, 20),
55
+ "calendar_slot_min": 30,
56
+ "calendar_slot_max": 90,
57
+ "guarantee_free_slot": False,
58
+ },
59
+ "hard": {
60
+ "num_attendees": (4, 6),
61
+ "meetings_per_user": (4, 7),
62
+ "meeting_duration": (30, 60),
63
+ "request_priority": (1, 2),
64
+ "existing_priority_range": (2, 5),
65
+ "pref_start_range": (9, 11),
66
+ "pref_end_range": (15, 16),
67
+ "max_meetings_day_range": (3, 5),
68
+ "avoid_btb_prob": 0.7,
69
+ "buffer_range": (10, 25),
70
+ "calendar_slot_min": 30,
71
+ "calendar_slot_max": 90,
72
+ "guarantee_free_slot": False,
73
+ },
74
+ }
75
+
76
+ MEETING_SUMMARIES = [
77
+ "Standup", "Sprint planning", "Design review", "Code review",
78
+ "Client call", "1-on-1", "Team sync", "Project checkpoint",
79
+ "Lunch meeting", "Strategy session", "Architecture review",
80
+ "Product demo", "Budget meeting", "Office hours", "Workshop",
81
+ "Training session", "Coffee chat", "Retrospective",
82
+ "Performance review", "Brainstorming", "Board meeting",
83
+ "Vendor call", "Onboarding session", "Knowledge sharing",
84
+ ]
85
+
86
+ # ── Helpers ───────────────────────────────────────────────────────────
87
+
88
+ def _rand_int(lo: int, hi: int, rng: random.Random) -> int:
89
+ return rng.randint(lo, hi)
90
+
91
+
92
+ def _rand_range(r: Tuple[int, int], rng: random.Random) -> int:
93
+ return rng.randint(r[0], r[1])
94
+
95
+
96
+ def _random_weekday(rng: random.Random) -> date:
97
+ """Pick a random weekday within the next 30 days."""
98
+ base = date(2025, 4, 7) # fixed base so TZ stays consistent
99
+ offset = rng.randint(0, 29)
100
+ d = base + timedelta(days=offset)
101
+ # shift to nearest weekday
102
+ while d.weekday() >= 5:
103
+ d += timedelta(days=1)
104
+ return d
105
+
106
+
107
+ def _generate_calendar(
108
+ user_id: str,
109
+ target_date: date,
110
+ num_meetings: int,
111
+ params: Dict[str, Any],
112
+ rng: random.Random,
113
+ reserved_slots: Optional[List[Tuple[datetime, datetime]]] = None,
114
+ ) -> List[List]:
115
+ """Generate random non-overlapping calendar entries for one user."""
116
+ tz = timezone.utc
117
+ day_start_hour = _rand_range(params["pref_start_range"], rng)
118
+ day_end_hour = _rand_range(params["pref_end_range"], rng)
119
+ if day_end_hour <= day_start_hour + 2:
120
+ day_end_hour = day_start_hour + 6
121
+
122
+ entries: List[List] = []
123
+ occupied: List[Tuple[datetime, datetime]] = []
124
+ if reserved_slots:
125
+ occupied.extend(reserved_slots)
126
+
127
+ attempts = 0
128
+ while len(entries) < num_meetings and attempts < 80:
129
+ attempts += 1
130
+ dur = _rand_range(
131
+ (params["calendar_slot_min"], params["calendar_slot_max"]),
132
+ rng,
133
+ )
134
+ # round to 15-min
135
+ dur = max(15, (dur // 15) * 15)
136
+
137
+ hour = rng.randint(day_start_hour, max(day_start_hour, day_end_hour - 1))
138
+ minute = rng.choice([0, 15, 30, 45])
139
+ start = datetime(target_date.year, target_date.month, target_date.day,
140
+ hour, minute, 0, tzinfo=tz)
141
+ end = start + timedelta(minutes=dur)
142
+
143
+ boundary = datetime(target_date.year, target_date.month, target_date.day,
144
+ day_end_hour, 0, 0, tzinfo=tz)
145
+ if end > boundary:
146
+ continue
147
+
148
+ # check overlap with already placed meetings
149
+ overlap = False
150
+ for occ_s, occ_e in occupied:
151
+ if start < occ_e and occ_s < end:
152
+ overlap = True
153
+ break
154
+ if overlap:
155
+ continue
156
+
157
+ priority = _rand_range(params["existing_priority_range"], rng)
158
+ summary = rng.choice(MEETING_SUMMARIES)
159
+ entries.append([start.isoformat(), end.isoformat(), priority, summary])
160
+ occupied.append((start, end))
161
+
162
+ # sort by start time
163
+ entries.sort(key=lambda e: e[0])
164
+ return entries
165
+
166
+
167
+ def _generate_preferences(
168
+ user_id: str,
169
+ params: Dict[str, Any],
170
+ rng: random.Random,
171
+ ) -> Dict[str, Any]:
172
+ """Generate random but realistic preferences for one user."""
173
+ start_h = _rand_range(params["pref_start_range"], rng)
174
+ end_h = _rand_range(params["pref_end_range"], rng)
175
+ if end_h <= start_h + 4:
176
+ end_h = start_h + 6
177
+ if end_h > 18:
178
+ end_h = 18
179
+
180
+ avoid_btb = rng.random() < params["avoid_btb_prob"]
181
+ buffer = _rand_range(params["buffer_range"], rng) if avoid_btb else 0
182
+ # round buffer to 5
183
+ buffer = (buffer // 5) * 5
184
+
185
+ return {
186
+ "preferred_hours": {"start": start_h, "end": end_h},
187
+ "max_meetings_per_day": _rand_range(params["max_meetings_day_range"], rng),
188
+ "avoid_back_to_back": avoid_btb,
189
+ "buffer_minutes": buffer,
190
+ }
191
+
192
+
193
+ # ── Solvability check ────────────────────────────────────────────────
194
+
195
+ def _find_solvable_slot(
196
+ calendars: Dict[str, List[List]],
197
+ attendees: List[str],
198
+ duration: int,
199
+ request_priority: int,
200
+ collective_hours: Dict[str, int],
201
+ target_date: date,
202
+ allow_rescheduling: bool = True,
203
+ ) -> Optional[str]:
204
+ """Check if at least one slot exists (possibly after rescheduling).
205
+
206
+ Returns the ISO start time of a viable slot, or None.
207
+ """
208
+ tz = timezone.utc
209
+ min_h = collective_hours["min_start_hour"]
210
+ max_h = collective_hours["max_end_hour"]
211
+
212
+ candidate = datetime(target_date.year, target_date.month, target_date.day,
213
+ min_h, 0, 0, tzinfo=tz)
214
+ end_boundary = datetime(target_date.year, target_date.month, target_date.day,
215
+ max_h, 0, 0, tzinfo=tz)
216
+ step = timedelta(minutes=15)
217
+
218
+ while candidate + timedelta(minutes=duration) <= end_boundary:
219
+ c_start = candidate.isoformat()
220
+ c_end = (candidate + timedelta(minutes=duration)).isoformat()
221
+
222
+ conflicts = find_conflicts(calendars, c_start, c_end, attendees)
223
+
224
+ if len(conflicts) == 0:
225
+ return c_start
226
+
227
+ if allow_rescheduling:
228
+ # solvable if ALL conflicts have strictly lower priority (higher number)
229
+ all_reschedulable = all(c["priority"] > request_priority for c in conflicts)
230
+ if all_reschedulable:
231
+ return c_start
232
+
233
+ candidate += step
234
+
235
+ return None
236
+
237
+
238
+ # ── Plant a guaranteed free slot (easy mode) ──────────────────────────
239
+
240
+ def _plant_free_slot(
241
+ calendars: Dict[str, List[List]],
242
+ attendees: List[str],
243
+ duration: int,
244
+ collective_hours: Dict[str, int],
245
+ target_date: date,
246
+ rng: random.Random,
247
+ ) -> Optional[str]:
248
+ """Remove conflicts from a random viable slot to guarantee a free one.
249
+
250
+ Returns the ISO start of the planted slot.
251
+ """
252
+ tz = timezone.utc
253
+ min_h = collective_hours["min_start_hour"]
254
+ max_h = collective_hours["max_end_hour"]
255
+
256
+ # collect all possible starts
257
+ candidates = []
258
+ t = datetime(target_date.year, target_date.month, target_date.day,
259
+ min_h, 0, 0, tzinfo=tz)
260
+ end_boundary = datetime(target_date.year, target_date.month, target_date.day,
261
+ max_h, 0, 0, tzinfo=tz)
262
+ step = timedelta(minutes=15)
263
+ while t + timedelta(minutes=duration) <= end_boundary:
264
+ candidates.append(t)
265
+ t += step
266
+
267
+ rng.shuffle(candidates)
268
+
269
+ for candidate in candidates:
270
+ c_start = candidate.isoformat()
271
+ c_end = (candidate + timedelta(minutes=duration)).isoformat()
272
+
273
+ # remove any overlapping entries for all attendees
274
+ for att in attendees:
275
+ calendars[att] = [
276
+ e for e in calendars[att]
277
+ if not (candidate < datetime.fromisoformat(e[1])
278
+ and datetime.fromisoformat(e[0]) < candidate + timedelta(minutes=duration))
279
+ ]
280
+
281
+ # verify it's now free
282
+ conflicts = find_conflicts(calendars, c_start, c_end, attendees)
283
+ if len(conflicts) == 0:
284
+ return c_start
285
+
286
+ return None
287
+
288
+
289
+ # ── Main generator ────────────────────────────────────────────────────
290
+
291
+ def generate_scenario(
292
+ difficulty: str = "medium",
293
+ seed: Optional[int] = None,
294
+ ) -> Dict[str, Any]:
295
+ """Generate a random solvable scheduling scenario.
296
+
297
+ Args:
298
+ difficulty: One of "easy", "medium", "hard".
299
+ seed: Optional RNG seed for reproducibility.
300
+
301
+ Returns:
302
+ A scenario dict with the same structure as the static JSON files.
303
+ """
304
+ if difficulty not in DIFFICULTY_PARAMS:
305
+ raise ValueError(f"Unknown difficulty: {difficulty}. Use easy/medium/hard.")
306
+
307
+ params = DIFFICULTY_PARAMS[difficulty]
308
+ rng = random.Random(seed)
309
+
310
+ target_date = _random_weekday(rng)
311
+ num_attendees = _rand_range(params["num_attendees"], rng)
312
+ attendees = [f"user{i+1}" for i in range(num_attendees)]
313
+
314
+ duration = _rand_range(params["meeting_duration"], rng)
315
+ # round to 15
316
+ duration = max(15, (duration // 15) * 15)
317
+
318
+ request_priority = _rand_range(params["request_priority"], rng)
319
+
320
+ # generate preferences first (needed for collective hours)
321
+ preferences: Dict[str, Dict] = {}
322
+ for att in attendees:
323
+ preferences[att] = _generate_preferences(att, params, rng)
324
+
325
+ collective_hours = calculate_collective_hours(preferences)
326
+ # safety: ensure window is wide enough for the meeting
327
+ window = collective_hours["max_end_hour"] - collective_hours["min_start_hour"]
328
+ if window * 60 < duration:
329
+ collective_hours["max_end_hour"] = collective_hours["min_start_hour"] + (duration // 60) + 2
330
+ # also widen preferences
331
+ for att in attendees:
332
+ preferences[att]["preferred_hours"]["end"] = max(
333
+ preferences[att]["preferred_hours"]["end"],
334
+ collective_hours["max_end_hour"],
335
+ )
336
+
337
+ # generate calendars
338
+ calendars: Dict[str, List[List]] = {}
339
+ for att in attendees:
340
+ n_meetings = _rand_range(params["meetings_per_user"], rng)
341
+ calendars[att] = _generate_calendar(att, target_date, n_meetings, params, rng)
342
+
343
+ # ensure solvability
344
+ if params["guarantee_free_slot"]:
345
+ _plant_free_slot(calendars, attendees, duration, collective_hours, target_date, rng)
346
+
347
+ # verify at least one solution exists (with rescheduling allowed)
348
+ max_retries = 10
349
+ for attempt in range(max_retries):
350
+ viable = _find_solvable_slot(
351
+ calendars, attendees, duration, request_priority,
352
+ collective_hours, target_date, allow_rescheduling=True,
353
+ )
354
+ if viable is not None:
355
+ break
356
+
357
+ # regenerate calendars with fewer meetings to open up space
358
+ for att in attendees:
359
+ reduced = max(1, _rand_range(params["meetings_per_user"], rng) - 1)
360
+ calendars[att] = _generate_calendar(att, target_date, reduced, params, rng)
361
+ else:
362
+ # last resort: plant a free slot
363
+ _plant_free_slot(calendars, attendees, duration, collective_hours, target_date, rng)
364
+
365
+ # find solution info for metadata
366
+ free_slot = _find_solvable_slot(
367
+ calendars, attendees, duration, request_priority,
368
+ collective_hours, target_date, allow_rescheduling=False,
369
+ )
370
+ needs_rescheduling = free_slot is None
371
+ best_slot = free_slot or _find_solvable_slot(
372
+ calendars, attendees, duration, request_priority,
373
+ collective_hours, target_date, allow_rescheduling=True,
374
+ )
375
+
376
+ task_id = f"random_{difficulty}"
377
+ description_map = {
378
+ "easy": f"Schedule a {duration}-min meeting with {num_attendees} attendees (random easy)",
379
+ "medium": f"Schedule a {duration}-min meeting with {num_attendees} attendees (random medium)",
380
+ "hard": f"Schedule a {duration}-min meeting with {num_attendees} attendees (random hard)",
381
+ }
382
+
383
+ return {
384
+ "task_id": task_id,
385
+ "description": description_map[difficulty],
386
+ "difficulty": difficulty,
387
+ "meeting_request": {
388
+ "duration": duration,
389
+ "priority": request_priority,
390
+ "attendees": attendees,
391
+ "summary": rng.choice([
392
+ "Team Sync", "Planning Session", "Design Review",
393
+ "Sprint Review", "Cross-Team Standup", "Strategy Meeting",
394
+ ]),
395
+ },
396
+ "calendars": calendars,
397
+ "preferences": preferences,
398
+ "expected_solution": {
399
+ "optimal_slot": best_slot,
400
+ "requires_rescheduling": needs_rescheduling,
401
+ "generated": True,
402
+ },
403
+ }
server/scenarios/task1_easy.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "task_id": "task1_easy",
3
+ "description": "Schedule a 30-minute team sync with 2 attendees",
4
+ "difficulty": "easy",
5
+ "meeting_request": {
6
+ "duration": 30,
7
+ "priority": 3,
8
+ "attendees": ["user1", "user2"],
9
+ "summary": "Team Sync"
10
+ },
11
+ "calendars": {
12
+ "user1": [
13
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Morning standup"],
14
+ ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 3, "Client call"]
15
+ ],
16
+ "user2": [
17
+ ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Team meeting"],
18
+ ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 3, "1-on-1"]
19
+ ]
20
+ },
21
+ "preferences": {
22
+ "user1": {
23
+ "preferred_hours": {"start": 9, "end": 17},
24
+ "max_meetings_per_day": 6,
25
+ "avoid_back_to_back": false,
26
+ "buffer_minutes": 0
27
+ },
28
+ "user2": {
29
+ "preferred_hours": {"start": 9, "end": 17},
30
+ "max_meetings_per_day": 6,
31
+ "avoid_back_to_back": false,
32
+ "buffer_minutes": 0
33
+ }
34
+ },
35
+ "expected_solution": {
36
+ "optimal_slot": "2025-04-07T10:00:00+00:00",
37
+ "expected_score_range": [0.8, 1.0],
38
+ "min_steps": 2,
39
+ "requires_rescheduling": false
40
+ }
41
+ }
server/scenarios/task2_medium.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "task_id": "task2_medium",
3
+ "description": "Schedule a 60-minute planning session with 4 attendees",
4
+ "difficulty": "medium",
5
+ "meeting_request": {
6
+ "duration": 60,
7
+ "priority": 2,
8
+ "attendees": ["user1", "user2", "user3", "user4"],
9
+ "summary": "Cross-Team Planning"
10
+ },
11
+ "calendars": {
12
+ "user1": [
13
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Standup"],
14
+ ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 3, "Review"],
15
+ ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "Lunch meeting"],
16
+ ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 4, "Optional workshop"],
17
+ ["2025-04-07T16:30:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Sync"]
18
+ ],
19
+ "user2": [
20
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Standup"],
21
+ ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Client demo"],
22
+ ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 3, "Code review"],
23
+ ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Office hours"]
24
+ ],
25
+ "user3": [
26
+ ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 3, "Design review"],
27
+ ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Team lunch"],
28
+ ["2025-04-07T14:00:00+00:00", "2025-04-07T15:30:00+00:00", 2, "Sprint planning"],
29
+ ["2025-04-07T16:00:00+00:00", "2025-04-07T16:30:00+00:00", 4, "Coffee chat"]
30
+ ],
31
+ "user4": [
32
+ ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Strategy meeting"],
33
+ ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "1-on-1"],
34
+ ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 3, "Team sync"]
35
+ ]
36
+ },
37
+ "preferences": {
38
+ "user1": {
39
+ "preferred_hours": {"start": 10, "end": 16},
40
+ "max_meetings_per_day": 5,
41
+ "avoid_back_to_back": true,
42
+ "buffer_minutes": 15
43
+ },
44
+ "user2": {
45
+ "preferred_hours": {"start": 9, "end": 17},
46
+ "max_meetings_per_day": 4,
47
+ "avoid_back_to_back": true,
48
+ "buffer_minutes": 10
49
+ },
50
+ "user3": {
51
+ "preferred_hours": {"start": 9, "end": 15},
52
+ "max_meetings_per_day": 5,
53
+ "avoid_back_to_back": false,
54
+ "buffer_minutes": 0
55
+ },
56
+ "user4": {
57
+ "preferred_hours": {"start": 10, "end": 17},
58
+ "max_meetings_per_day": 6,
59
+ "avoid_back_to_back": true,
60
+ "buffer_minutes": 15
61
+ }
62
+ },
63
+ "expected_solution": {
64
+ "optimal_slot": "2025-04-07T11:00:00+00:00",
65
+ "expected_score_range": [0.5, 0.7],
66
+ "min_steps": 3,
67
+ "requires_rescheduling": true,
68
+ "reschedulable_meetings": ["user3:Coffee chat (priority 4)"]
69
+ }
70
+ }
server/scenarios/task3_hard.json ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "task_id": "task3_hard",
3
+ "description": "Schedule a 45-minute executive meeting with 6 attendees",
4
+ "difficulty": "hard",
5
+ "meeting_request": {
6
+ "duration": 45,
7
+ "priority": 2,
8
+ "attendees": ["user1", "user2", "user3", "user4", "user5", "user6"],
9
+ "summary": "Executive Planning Session"
10
+ },
11
+ "calendars": {
12
+ "user1": [
13
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Strategy meeting"],
14
+ ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 3, "Team standup"],
15
+ ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Lunch meeting"],
16
+ ["2025-04-07T13:30:00+00:00", "2025-04-07T14:30:00+00:00", 2, "Client call"],
17
+ ["2025-04-07T15:00:00+00:00", "2025-04-07T15:45:00+00:00", 4, "Optional training"],
18
+ ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Project sync"]
19
+ ],
20
+ "user2": [
21
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T09:30:00+00:00", 2, "Morning sync"],
22
+ ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Design review"],
23
+ ["2025-04-07T11:30:00+00:00", "2025-04-07T12:30:00+00:00", 3, "Code review"],
24
+ ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "1-on-1"],
25
+ ["2025-04-07T14:30:00+00:00", "2025-04-07T15:30:00+00:00", 2, "Planning meeting"],
26
+ ["2025-04-07T16:00:00+00:00", "2025-04-07T16:45:00+00:00", 4, "Coffee chat"]
27
+ ],
28
+ "user3": [
29
+ ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 3, "Sprint planning"],
30
+ ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Architecture review"],
31
+ ["2025-04-07T12:30:00+00:00", "2025-04-07T13:30:00+00:00", 3, "Team lunch"],
32
+ ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 2, "Client demo"],
33
+ ["2025-04-07T15:30:00+00:00", "2025-04-07T16:15:00+00:00", 4, "Office hours"]
34
+ ],
35
+ "user4": [
36
+ ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Board meeting"],
37
+ ["2025-04-07T11:30:00+00:00", "2025-04-07T12:30:00+00:00", 3, "Product review"],
38
+ ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 2, "Executive sync"],
39
+ ["2025-04-07T14:30:00+00:00", "2025-04-07T15:30:00+00:00", 3, "Team meeting"],
40
+ ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 4, "Mentor session"]
41
+ ],
42
+ "user5": [
43
+ ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 3, "Daily standup"],
44
+ ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 2, "Strategic planning"],
45
+ ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Working lunch"],
46
+ ["2025-04-07T13:30:00+00:00", "2025-04-07T14:30:00+00:00", 3, "Performance review"],
47
+ ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 2, "Budget meeting"],
48
+ ["2025-04-07T16:30:00+00:00", "2025-04-07T17:00:00+00:00", 4, "Optional networking"]
49
+ ],
50
+ "user6": [
51
+ ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 2, "Leadership meeting"],
52
+ ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 3, "Project checkpoint"],
53
+ ["2025-04-07T12:30:00+00:00", "2025-04-07T13:30:00+00:00", 3, "Team sync"],
54
+ ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 2, "Client meeting"],
55
+ ["2025-04-07T15:30:00+00:00", "2025-04-07T16:30:00+00:00", 4, "Training session"]
56
+ ]
57
+ },
58
+ "preferences": {
59
+ "user1": {
60
+ "preferred_hours": {"start": 10, "end": 16},
61
+ "max_meetings_per_day": 5,
62
+ "avoid_back_to_back": true,
63
+ "buffer_minutes": 15
64
+ },
65
+ "user2": {
66
+ "preferred_hours": {"start": 9, "end": 17},
67
+ "max_meetings_per_day": 5,
68
+ "avoid_back_to_back": true,
69
+ "buffer_minutes": 15
70
+ },
71
+ "user3": {
72
+ "preferred_hours": {"start": 9, "end": 15},
73
+ "max_meetings_per_day": 4,
74
+ "avoid_back_to_back": true,
75
+ "buffer_minutes": 20
76
+ },
77
+ "user4": {
78
+ "preferred_hours": {"start": 10, "end": 17},
79
+ "max_meetings_per_day": 6,
80
+ "avoid_back_to_back": true,
81
+ "buffer_minutes": 10
82
+ },
83
+ "user5": {
84
+ "preferred_hours": {"start": 9, "end": 16},
85
+ "max_meetings_per_day": 5,
86
+ "avoid_back_to_back": true,
87
+ "buffer_minutes": 15
88
+ },
89
+ "user6": {
90
+ "preferred_hours": {"start": 9, "end": 16},
91
+ "max_meetings_per_day": 5,
92
+ "avoid_back_to_back": true,
93
+ "buffer_minutes": 10
94
+ }
95
+ },
96
+ "expected_solution": {
97
+ "optimal_slot": "2025-04-07T15:00:00+00:00",
98
+ "expected_score_range": [0.25, 0.45],
99
+ "min_steps": 5,
100
+ "requires_rescheduling": true,
101
+ "reschedulable_meetings": [
102
+ "user1:Optional training (priority 4)",
103
+ "user2:Coffee chat (priority 4)",
104
+ "user5:Optional networking (priority 4)"
105
+ ],
106
+ "notes": "Multiple valid solutions exist. Agent must reschedule 3+ low-priority meetings."
107
+ }
108
+ }
server/scheduling_env_environment.py ADDED
@@ -0,0 +1,476 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Meeting Scheduling RL Environment.
9
+
10
+ Teaches agents to optimally schedule meetings across multiple attendees
11
+ by proposing time slots, rescheduling lower-priority conflicts, and
12
+ balancing participant preferences.
13
+ """
14
+
15
+ from __future__ import annotations
16
+
17
+ import copy
18
+ import json
19
+ import logging
20
+ from datetime import timedelta
21
+ from pathlib import Path
22
+ from uuid import uuid4
23
+
24
+ from openenv.core.env_server.interfaces import Environment
25
+
26
+ try:
27
+ from ..models import SchedulingAction, SchedulingObservation, SchedulingState
28
+ except ImportError:
29
+ from models import SchedulingAction, SchedulingObservation, SchedulingState
30
+
31
+ from .scheduling_logic import (
32
+ build_busy_slots,
33
+ calculate_collective_hours,
34
+ calculate_final_reward,
35
+ calculate_preference_score,
36
+ find_conflicts,
37
+ is_slot_free,
38
+ parse_iso,
39
+ within_collective_hours,
40
+ )
41
+ from .scenario_generator import generate_scenario
42
+
43
+ logger = logging.getLogger(__name__)
44
+
45
+ SCENARIOS_DIR = Path(__file__).parent / "scenarios"
46
+ MAX_STEPS = 20
47
+
48
+
49
+ class SchedulingEnvironment(Environment):
50
+ """RL environment for intelligent meeting scheduling.
51
+
52
+ The agent must learn to:
53
+ 1. Propose valid time slots satisfying hard constraints
54
+ 2. Minimize preference violations
55
+ 3. Handle cascading rescheduling when conflicts exist
56
+ 4. Balance speed vs. quality
57
+ """
58
+
59
+ SUPPORTS_CONCURRENT_SESSIONS: bool = True
60
+
61
+ def __init__(self):
62
+ self._state = SchedulingState(episode_id=str(uuid4()), step_count=0)
63
+ self._scenario: dict = {}
64
+ self._collective_hours: dict = {}
65
+
66
+ # ------------------------------------------------------------------
67
+ # OpenEnv interface
68
+ # ------------------------------------------------------------------
69
+
70
+ def reset(self, **kwargs) -> SchedulingObservation:
71
+ """Reset environment for a new episode.
72
+
73
+ Accepts ``task_id`` kwarg. Static tasks (``"task1_easy"`` etc.) load
74
+ from JSON. Random tasks (``"random_easy"``, ``"random_medium"``,
75
+ ``"random_hard"``) generate a fresh scenario every call. An optional
76
+ ``seed`` kwarg makes random generation reproducible.
77
+ """
78
+ task_id = kwargs.get("task_id", "task1_easy")
79
+
80
+ # ── random scenario generation ──
81
+ if task_id.startswith("random_"):
82
+ difficulty = task_id.split("_", 1)[1]
83
+ seed = kwargs.get("seed", None)
84
+ try:
85
+ self._scenario = generate_scenario(difficulty, seed=seed)
86
+ except ValueError:
87
+ return SchedulingObservation(
88
+ error_message=f"Unknown difficulty in task_id: {task_id}",
89
+ done=True,
90
+ reward=0.0,
91
+ )
92
+ else:
93
+ # ── static JSON scenario ──
94
+ scenario_path = SCENARIOS_DIR / f"{task_id}.json"
95
+ if not scenario_path.exists():
96
+ return SchedulingObservation(
97
+ error_message=f"Unknown task_id: {task_id}",
98
+ done=True,
99
+ reward=0.0,
100
+ )
101
+ with open(scenario_path) as f:
102
+ self._scenario = json.load(f)
103
+
104
+ req = self._scenario["meeting_request"]
105
+ prefs = self._scenario["preferences"]
106
+ self._collective_hours = calculate_collective_hours(prefs)
107
+
108
+ self._state = SchedulingState(
109
+ episode_id=str(uuid4()),
110
+ step_count=0,
111
+ task_id=task_id,
112
+ scenario_name=self._scenario.get("description", task_id),
113
+ meeting_request=req,
114
+ calendars=copy.deepcopy(self._scenario["calendars"]),
115
+ participant_preferences=prefs,
116
+ proposed_slot=None,
117
+ rescheduled_meetings=[],
118
+ total_preference_penalty=0.0,
119
+ total_steps=0,
120
+ final_reward=0.0,
121
+ completed=False,
122
+ )
123
+
124
+ attendees = req["attendees"]
125
+ return SchedulingObservation(
126
+ requested_duration=req["duration"],
127
+ requested_priority=req["priority"],
128
+ attendee_ids=attendees,
129
+ busy_slots=build_busy_slots(self._state.calendars, attendees),
130
+ collective_work_hours=self._collective_hours,
131
+ preference_constraints=self._aggregate_preferences(prefs),
132
+ current_proposal=None,
133
+ conflicts=[],
134
+ preference_penalty=0.0,
135
+ num_rescheduled=0,
136
+ steps_taken=0,
137
+ max_steps=MAX_STEPS,
138
+ success=False,
139
+ error_message=None,
140
+ done=False,
141
+ reward=0.0,
142
+ )
143
+
144
+ def step(self, action: SchedulingAction) -> SchedulingObservation: # type: ignore[override]
145
+ """Process one agent action and return an observation."""
146
+ if self._state.completed:
147
+ return self._obs(error_message="Episode already completed", done=True, reward=0.0)
148
+
149
+ self._state.step_count += 1
150
+ self._state.total_steps += 1
151
+
152
+ # Timeout check
153
+ if self._state.step_count >= MAX_STEPS:
154
+ return self._handle_timeout()
155
+
156
+ action_type = action.action_type
157
+
158
+ if action_type == "propose_slot":
159
+ return self._process_propose_slot(action)
160
+ elif action_type == "reschedule_meeting":
161
+ return self._process_reschedule_meeting(action)
162
+ elif action_type == "finalize":
163
+ return self._process_finalize()
164
+ elif action_type == "reject":
165
+ return self._process_reject()
166
+ else:
167
+ return self._obs(error_message=f"Unknown action_type: {action_type}", reward=-0.1)
168
+
169
+ @property
170
+ def state(self) -> SchedulingState:
171
+ return self._state
172
+
173
+ # ------------------------------------------------------------------
174
+ # Action handlers
175
+ # ------------------------------------------------------------------
176
+
177
+ def _process_propose_slot(self, action: SchedulingAction) -> SchedulingObservation:
178
+ if not action.proposed_start or not action.proposed_duration:
179
+ return self._obs(
180
+ error_message="propose_slot requires proposed_start and proposed_duration",
181
+ reward=-0.1,
182
+ )
183
+
184
+ try:
185
+ start = parse_iso(action.proposed_start)
186
+ except (ValueError, TypeError):
187
+ return self._obs(error_message="Invalid proposed_start format", reward=-0.1)
188
+
189
+ end = start + timedelta(minutes=action.proposed_duration)
190
+ start_iso = start.isoformat()
191
+ end_iso = end.isoformat()
192
+ attendees = self._state.meeting_request["attendees"]
193
+ req_priority = self._state.meeting_request["priority"]
194
+
195
+ # Validate working hours
196
+ if not within_collective_hours(start_iso, end_iso, self._collective_hours):
197
+ return self._obs(
198
+ error_message="Proposed slot outside working hours",
199
+ reward=-0.2,
200
+ )
201
+
202
+ # Find conflicts
203
+ conflicts = find_conflicts(
204
+ self._state.calendars, start_iso, end_iso, attendees
205
+ )
206
+
207
+ # Calculate preference penalty
208
+ pref_penalty = calculate_preference_score(
209
+ start_iso,
210
+ action.proposed_duration,
211
+ self._state.participant_preferences,
212
+ self._state.calendars,
213
+ )
214
+
215
+ # Update state
216
+ self._state.proposed_slot = [start_iso, end_iso]
217
+ self._state.total_preference_penalty = pref_penalty
218
+
219
+ # Step reward
220
+ if len(conflicts) == 0 and pref_penalty < 100:
221
+ step_reward = 0.5
222
+ elif len(conflicts) > 0:
223
+ if all(c["priority"] > req_priority for c in conflicts):
224
+ step_reward = 0.2
225
+ else:
226
+ step_reward = -0.3
227
+ else:
228
+ step_reward = 0.0
229
+
230
+ return self._obs(
231
+ current_proposal={"start": start_iso, "end": end_iso},
232
+ conflicts=conflicts,
233
+ preference_penalty=pref_penalty,
234
+ reward=step_reward,
235
+ )
236
+
237
+ def _process_reschedule_meeting(self, action: SchedulingAction) -> SchedulingObservation:
238
+ if not action.meeting_id_to_move or not action.new_start_time:
239
+ return self._obs(
240
+ error_message="reschedule_meeting requires meeting_id_to_move and new_start_time",
241
+ reward=-0.1,
242
+ )
243
+
244
+ if self._state.proposed_slot is None:
245
+ return self._obs(
246
+ error_message="Must propose a slot before rescheduling",
247
+ reward=-0.2,
248
+ )
249
+
250
+ # Find the meeting to move
251
+ meeting = self._find_meeting(action.meeting_id_to_move)
252
+ if meeting is None:
253
+ return self._obs(
254
+ error_message=f"Meeting not found: {action.meeting_id_to_move}",
255
+ reward=-0.2,
256
+ )
257
+
258
+ req_priority = self._state.meeting_request["priority"]
259
+ if meeting["priority"] <= req_priority:
260
+ return self._obs(
261
+ error_message="Cannot reschedule equal or higher priority meeting",
262
+ reward=-0.5,
263
+ )
264
+
265
+ # Validate new slot
266
+ try:
267
+ new_start = parse_iso(action.new_start_time)
268
+ except (ValueError, TypeError):
269
+ return self._obs(error_message="Invalid new_start_time format", reward=-0.1)
270
+
271
+ old_start = parse_iso(meeting["start"])
272
+ old_end = parse_iso(meeting["end"])
273
+ duration = old_end - old_start
274
+ new_end = new_start + duration
275
+ new_start_iso = new_start.isoformat()
276
+ new_end_iso = new_end.isoformat()
277
+ attendee = meeting["attendee"]
278
+
279
+ if not is_slot_free(attendee, new_start_iso, new_end_iso, self._state.calendars):
280
+ return self._obs(error_message="New slot not free for attendee", reward=-0.2)
281
+
282
+ # Update calendar: remove old, add new
283
+ cal = self._state.calendars[attendee]
284
+ self._state.calendars[attendee] = [
285
+ e for e in cal if e[0] != meeting["start"]
286
+ ]
287
+ self._state.calendars[attendee].append(
288
+ [new_start_iso, new_end_iso, meeting["priority"], meeting["summary"]]
289
+ )
290
+
291
+ self._state.rescheduled_meetings.append({
292
+ "meeting_id": action.meeting_id_to_move,
293
+ "old_start": meeting["start"],
294
+ "new_start": new_start_iso,
295
+ "attendee": attendee,
296
+ })
297
+
298
+ # Recalculate conflicts for current proposal
299
+ attendees = self._state.meeting_request["attendees"]
300
+ new_conflicts = find_conflicts(
301
+ self._state.calendars,
302
+ self._state.proposed_slot[0],
303
+ self._state.proposed_slot[1],
304
+ attendees,
305
+ )
306
+
307
+ num_rescheduled = len(self._state.rescheduled_meetings)
308
+ step_reward = 0.5 if len(new_conflicts) == 0 else 0.3
309
+
310
+ return self._obs(
311
+ conflicts=new_conflicts,
312
+ num_rescheduled=num_rescheduled,
313
+ reward=step_reward,
314
+ )
315
+
316
+ def _process_finalize(self) -> SchedulingObservation:
317
+ if self._state.proposed_slot is None:
318
+ self._state.completed = True
319
+ return self._obs(
320
+ error_message="No slot proposed",
321
+ success=False,
322
+ reward=0.0,
323
+ done=True,
324
+ )
325
+
326
+ attendees = self._state.meeting_request["attendees"]
327
+ conflicts = find_conflicts(
328
+ self._state.calendars,
329
+ self._state.proposed_slot[0],
330
+ self._state.proposed_slot[1],
331
+ attendees,
332
+ )
333
+
334
+ if len(conflicts) > 0:
335
+ self._state.completed = True
336
+ return self._obs(
337
+ error_message=f"Unresolved conflicts: {len(conflicts)} meetings",
338
+ conflicts=conflicts,
339
+ success=False,
340
+ reward=0.0,
341
+ done=True,
342
+ )
343
+
344
+ final_reward = calculate_final_reward(
345
+ preference_penalty=self._state.total_preference_penalty,
346
+ num_rescheduled=len(self._state.rescheduled_meetings),
347
+ steps_taken=self._state.step_count,
348
+ success=True,
349
+ )
350
+
351
+ self._state.completed = True
352
+ self._state.final_reward = final_reward
353
+
354
+ return self._obs(
355
+ success=True,
356
+ reward=final_reward,
357
+ done=True,
358
+ )
359
+
360
+ def _process_reject(self) -> SchedulingObservation:
361
+ self._state.completed = True
362
+ return self._obs(
363
+ success=False,
364
+ reward=0.0,
365
+ done=True,
366
+ error_message="Agent rejected scheduling task",
367
+ )
368
+
369
+ def _handle_timeout(self) -> SchedulingObservation:
370
+ """Give partial credit when max steps reached."""
371
+ self._state.completed = True
372
+
373
+ if self._state.proposed_slot is None:
374
+ return self._obs(
375
+ success=False,
376
+ reward=0.0,
377
+ done=True,
378
+ error_message="Timeout: No slot proposed",
379
+ )
380
+
381
+ attendees = self._state.meeting_request["attendees"]
382
+ conflicts = find_conflicts(
383
+ self._state.calendars,
384
+ self._state.proposed_slot[0],
385
+ self._state.proposed_slot[1],
386
+ attendees,
387
+ )
388
+
389
+ if len(conflicts) == 0:
390
+ theoretical = calculate_final_reward(
391
+ self._state.total_preference_penalty,
392
+ len(self._state.rescheduled_meetings),
393
+ self._state.step_count,
394
+ )
395
+ partial = theoretical * 0.7
396
+ else:
397
+ progress = 1.0 - (len(conflicts) / max(1, len(attendees)))
398
+ partial = 0.2 * progress
399
+
400
+ self._state.final_reward = partial
401
+ return self._obs(
402
+ success=False,
403
+ reward=partial,
404
+ done=True,
405
+ error_message=f"Timeout after {self._state.step_count} steps (partial credit: {partial:.2f})",
406
+ )
407
+
408
+ # ------------------------------------------------------------------
409
+ # Helpers
410
+ # ------------------------------------------------------------------
411
+
412
+ def _obs(self, **overrides) -> SchedulingObservation:
413
+ """Build an observation from current state, applying overrides."""
414
+ req = self._state.meeting_request
415
+ attendees = req.get("attendees", [])
416
+
417
+ defaults = dict(
418
+ requested_duration=req.get("duration", 0),
419
+ requested_priority=req.get("priority", 3),
420
+ attendee_ids=attendees,
421
+ busy_slots=build_busy_slots(self._state.calendars, attendees),
422
+ collective_work_hours=self._collective_hours,
423
+ preference_constraints=self._aggregate_preferences(
424
+ self._state.participant_preferences
425
+ ),
426
+ current_proposal=(
427
+ {"start": self._state.proposed_slot[0], "end": self._state.proposed_slot[1]}
428
+ if self._state.proposed_slot
429
+ else None
430
+ ),
431
+ conflicts=[],
432
+ preference_penalty=self._state.total_preference_penalty,
433
+ num_rescheduled=len(self._state.rescheduled_meetings),
434
+ steps_taken=self._state.step_count,
435
+ max_steps=MAX_STEPS,
436
+ success=False,
437
+ error_message=None,
438
+ done=False,
439
+ reward=0.0,
440
+ )
441
+ defaults.update(overrides)
442
+ return SchedulingObservation(**defaults)
443
+
444
+ def _find_meeting(self, meeting_id: str) -> dict | None:
445
+ """Look up a meeting by its id (format: attendee_startiso)."""
446
+ parts = meeting_id.split("_", 1)
447
+ if len(parts) != 2:
448
+ return None
449
+ attendee, start_iso = parts
450
+ for entry in self._state.calendars.get(attendee, []):
451
+ if entry[0] == start_iso:
452
+ return {
453
+ "attendee": attendee,
454
+ "start": entry[0],
455
+ "end": entry[1],
456
+ "priority": entry[2],
457
+ "summary": entry[3],
458
+ }
459
+ return None
460
+
461
+ @staticmethod
462
+ def _aggregate_preferences(prefs: dict) -> dict:
463
+ """Summarize preferences for the observation."""
464
+ if not prefs:
465
+ return {}
466
+ max_meetings = min(p.get("max_meetings_per_day", 99) for p in prefs.values())
467
+ any_buffer = any(p.get("avoid_back_to_back", False) for p in prefs.values())
468
+ buffer_mins = max(
469
+ (p.get("buffer_minutes", 0) for p in prefs.values() if p.get("avoid_back_to_back")),
470
+ default=0,
471
+ )
472
+ return {
473
+ "max_meetings_per_day": max_meetings,
474
+ "requires_buffer": any_buffer,
475
+ "buffer_minutes": buffer_mins,
476
+ }
server/scheduling_logic.py ADDED
@@ -0,0 +1,324 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Pure utility functions for the meeting-scheduling RL environment.
2
+
3
+ Calendar format: Dict[str, List[List]]
4
+ Each entry is [start_iso, end_iso, priority_int, summary_str].
5
+
6
+ All datetimes are timezone-aware ISO 8601 strings.
7
+ """
8
+ from __future__ import annotations
9
+
10
+ import json
11
+ from datetime import datetime, date, timedelta
12
+ from pathlib import Path
13
+ from typing import Dict, List, Optional
14
+
15
+
16
+ def parse_iso(s: str) -> datetime:
17
+ """Parse an ISO 8601 string into a datetime object."""
18
+ return datetime.fromisoformat(s)
19
+
20
+
21
+ def load_scenario(scenario_path: str) -> dict:
22
+ """Load a scenario JSON file and return the parsed dict."""
23
+ with open(scenario_path, "r") as f:
24
+ return json.load(f)
25
+
26
+
27
+ def find_conflicts(
28
+ calendars: Dict[str, List[List]],
29
+ proposed_start_iso: str,
30
+ proposed_end_iso: str,
31
+ attendee_ids: List[str],
32
+ ) -> List[Dict]:
33
+ """Find calendar conflicts between a proposed slot and existing meetings.
34
+
35
+ Two intervals overlap when start1 < end2 and start2 < end1.
36
+
37
+ Returns:
38
+ List of conflict dicts with keys: attendee, start, end, priority,
39
+ summary, meeting_id.
40
+ """
41
+ proposed_start = parse_iso(proposed_start_iso)
42
+ proposed_end = parse_iso(proposed_end_iso)
43
+ conflicts: List[Dict] = []
44
+
45
+ for attendee in attendee_ids:
46
+ entries = calendars.get(attendee, [])
47
+ for entry in entries:
48
+ entry_start_iso, entry_end_iso, priority, summary = entry
49
+ entry_start = parse_iso(entry_start_iso)
50
+ entry_end = parse_iso(entry_end_iso)
51
+
52
+ if proposed_start < entry_end and entry_start < proposed_end:
53
+ conflicts.append({
54
+ "attendee": attendee,
55
+ "start": entry_start_iso,
56
+ "end": entry_end_iso,
57
+ "priority": priority,
58
+ "summary": summary,
59
+ "meeting_id": f"{attendee}_{entry_start_iso}",
60
+ })
61
+
62
+ return conflicts
63
+
64
+
65
+ def calculate_collective_hours(preferences: Dict) -> Dict[str, int]:
66
+ """Find the intersection of all users' preferred working hours.
67
+
68
+ Each user preference has 'preferred_hours': {'start': int, 'end': int}.
69
+
70
+ Returns:
71
+ {"min_start_hour": <max of all starts>, "max_end_hour": <min of all ends>}
72
+ """
73
+ start_hours = [p.get("preferred_hours", {}).get("start", 9) for p in preferences.values()]
74
+ end_hours = [p.get("preferred_hours", {}).get("end", 17) for p in preferences.values()]
75
+
76
+ return {
77
+ "min_start_hour": max(start_hours),
78
+ "max_end_hour": min(end_hours),
79
+ }
80
+
81
+
82
+ def within_collective_hours(
83
+ start_iso: str,
84
+ end_iso: str,
85
+ collective_hours: Dict[str, int],
86
+ ) -> bool:
87
+ """Check if a proposed slot falls within collective working hours.
88
+
89
+ The start hour must be >= min_start_hour and the end hour must be
90
+ <= max_end_hour (exact hour boundary is allowed).
91
+ """
92
+ start = parse_iso(start_iso)
93
+ end = parse_iso(end_iso)
94
+ min_start = collective_hours["min_start_hour"]
95
+ max_end = collective_hours["max_end_hour"]
96
+
97
+ if start.hour < min_start:
98
+ return False
99
+
100
+ # Handle end at exact hour boundary (minute == 0) vs. mid-hour
101
+ if end.minute == 0 and end.second == 0:
102
+ if end.hour > max_end:
103
+ return False
104
+ else:
105
+ if end.hour >= max_end:
106
+ return False
107
+
108
+ return True
109
+
110
+
111
+ def count_meetings_on_date(calendar_entries: List[List], target_date: date) -> int:
112
+ """Count how many meetings a user has on a given date."""
113
+ count = 0
114
+ for entry in calendar_entries:
115
+ entry_start = parse_iso(entry[0])
116
+ if entry_start.date() == target_date:
117
+ count += 1
118
+ return count
119
+
120
+
121
+ def check_back_to_back(
122
+ calendar_entries: List[List],
123
+ proposed_start_iso: str,
124
+ proposed_end_iso: str,
125
+ buffer_minutes: int,
126
+ ) -> bool:
127
+ """Check if a proposed meeting would be back-to-back with an existing one.
128
+
129
+ Returns True if any existing meeting ends within buffer_minutes before
130
+ the proposed start, or starts within buffer_minutes after the proposed end.
131
+ """
132
+ proposed_start = parse_iso(proposed_start_iso)
133
+ proposed_end = parse_iso(proposed_end_iso)
134
+ buffer = timedelta(minutes=buffer_minutes)
135
+
136
+ for entry in calendar_entries:
137
+ entry_start = parse_iso(entry[0])
138
+ entry_end = parse_iso(entry[1])
139
+
140
+ # Existing meeting ends close before proposed start
141
+ gap_before = proposed_start - entry_end
142
+ if timedelta(0) <= gap_before < buffer:
143
+ return True
144
+
145
+ # Existing meeting starts close after proposed end
146
+ gap_after = entry_start - proposed_end
147
+ if timedelta(0) <= gap_after < buffer:
148
+ return True
149
+
150
+ return False
151
+
152
+
153
+ def calculate_preference_score(
154
+ proposed_start_iso: str,
155
+ duration_minutes: int,
156
+ participant_preferences: Dict,
157
+ calendars: Dict[str, List[List]],
158
+ ) -> float:
159
+ """Calculate penalty points for scheduling preference violations.
160
+
161
+ Penalty rules:
162
+ - Outside preferred hours: +50 per participant
163
+ - Exceeds max meetings per day: +30 per participant
164
+ - Back-to-back without buffer: +20 per participant
165
+
166
+ Returns:
167
+ Total penalty sum (float).
168
+ """
169
+ proposed_start = parse_iso(proposed_start_iso)
170
+ proposed_end = proposed_start + timedelta(minutes=duration_minutes)
171
+ proposed_end_iso = proposed_end.isoformat()
172
+ proposed_date = proposed_start.date()
173
+
174
+ total_penalty = 0.0
175
+
176
+ for participant, prefs in participant_preferences.items():
177
+ pref_hours = prefs.get("preferred_hours", {})
178
+ pref_start = pref_hours.get("start", 9)
179
+ pref_end = pref_hours.get("end", 17)
180
+ max_meetings = prefs.get("max_meetings_per_day", 8)
181
+ avoid_btb = prefs.get("avoid_back_to_back", False)
182
+ buffer_mins = prefs.get("buffer_minutes", 0)
183
+
184
+ # Outside preferred hours
185
+ collective = {"min_start_hour": pref_start, "max_end_hour": pref_end}
186
+ if not within_collective_hours(proposed_start_iso, proposed_end_iso, collective):
187
+ total_penalty += 50
188
+
189
+ # Exceeds max meetings per day
190
+ entries = calendars.get(participant, [])
191
+ existing_count = count_meetings_on_date(entries, proposed_date)
192
+ if existing_count + 1 > max_meetings:
193
+ total_penalty += 30
194
+
195
+ # Back-to-back without buffer (only if user cares about it)
196
+ if avoid_btb and buffer_mins > 0:
197
+ if check_back_to_back(entries, proposed_start_iso, proposed_end_iso, buffer_mins):
198
+ total_penalty += 20
199
+
200
+ return total_penalty
201
+
202
+
203
+ def is_slot_free(
204
+ attendee: str,
205
+ start_iso: str,
206
+ end_iso: str,
207
+ calendars: Dict[str, List[List]],
208
+ ) -> bool:
209
+ """Check if a time slot is free for a specific attendee (no overlaps)."""
210
+ start = parse_iso(start_iso)
211
+ end = parse_iso(end_iso)
212
+
213
+ for entry in calendars.get(attendee, []):
214
+ entry_start = parse_iso(entry[0])
215
+ entry_end = parse_iso(entry[1])
216
+ if start < entry_end and entry_start < end:
217
+ return False
218
+
219
+ return True
220
+
221
+
222
+ def calculate_final_reward(
223
+ preference_penalty: float,
224
+ num_rescheduled: int,
225
+ steps_taken: int,
226
+ success: bool = True,
227
+ ) -> float:
228
+ """Compute the multi-component reward for an episode, clamped to [0.0, 1.0].
229
+
230
+ Components (deducted from 1.0):
231
+ - Preference deduction: min(0.75, (preference_penalty ** 1.2) / 200.0)
232
+ - Rescheduling deduction: min(0.30, 0.05 * (1.8 ** num_rescheduled))
233
+ (only applied when num_rescheduled > 0)
234
+ - Time deduction: steps_taken * 0.015
235
+
236
+ Returns 0.0 if the episode was not successful.
237
+ """
238
+ if not success:
239
+ return 0.0
240
+
241
+ reward = 1.0
242
+
243
+ # Preference deduction
244
+ pref_deduction = min(0.75, (preference_penalty ** 1.2) / 200.0)
245
+ reward -= pref_deduction
246
+
247
+ # Rescheduling deduction (exponential)
248
+ if num_rescheduled > 0:
249
+ reschedule_deduction = min(0.30, 0.05 * (1.8 ** num_rescheduled))
250
+ reward -= reschedule_deduction
251
+
252
+ # Time deduction
253
+ time_deduction = steps_taken * 0.015
254
+ reward -= time_deduction
255
+
256
+ return max(0.0, min(1.0, reward))
257
+
258
+
259
+ def build_busy_slots(
260
+ calendars: Dict[str, List[List]],
261
+ attendee_ids: List[str],
262
+ ) -> List[Dict]:
263
+ """Convert calendar data to observation-friendly busy_slots format.
264
+
265
+ Returns:
266
+ List of dicts with keys: start, end, priority, summary, attendee.
267
+ """
268
+ busy_slots: List[Dict] = []
269
+
270
+ for attendee in attendee_ids:
271
+ for entry in calendars.get(attendee, []):
272
+ start_iso, end_iso, priority, summary = entry
273
+ busy_slots.append({
274
+ "start": start_iso,
275
+ "end": end_iso,
276
+ "priority": priority,
277
+ "summary": summary,
278
+ "attendee": attendee,
279
+ })
280
+
281
+ return busy_slots
282
+
283
+
284
+ def find_earliest_free_slot(
285
+ calendars: Dict[str, List[List]],
286
+ attendees: List[str],
287
+ duration_minutes: int,
288
+ search_date_iso: str,
289
+ collective_hours: Dict[str, int],
290
+ ) -> Optional[str]:
291
+ """Find the earliest free slot on a given date for all attendees.
292
+
293
+ Iterates from min_start_hour to max_end_hour in 15-minute increments.
294
+ Returns the ISO 8601 string of the first conflict-free slot, or None.
295
+ """
296
+ search_date = parse_iso(search_date_iso)
297
+ base_date = search_date.date()
298
+ tz = search_date.tzinfo
299
+
300
+ min_start = collective_hours["min_start_hour"]
301
+ max_end = collective_hours["max_end_hour"]
302
+
303
+ candidate = datetime(base_date.year, base_date.month, base_date.day,
304
+ min_start, 0, 0, tzinfo=tz)
305
+ end_boundary = datetime(base_date.year, base_date.month, base_date.day,
306
+ max_end, 0, 0, tzinfo=tz)
307
+ step = timedelta(minutes=15)
308
+
309
+ while candidate + timedelta(minutes=duration_minutes) <= end_boundary:
310
+ candidate_iso = candidate.isoformat()
311
+ candidate_end_iso = (candidate + timedelta(minutes=duration_minutes)).isoformat()
312
+
313
+ all_free = True
314
+ for attendee in attendees:
315
+ if not is_slot_free(attendee, candidate_iso, candidate_end_iso, calendars):
316
+ all_free = False
317
+ break
318
+
319
+ if all_free:
320
+ return candidate_iso
321
+
322
+ candidate += step
323
+
324
+ return None
uv.lock ADDED
The diff for this file is too large to render. See raw diff