Akshaykumarbm commited on
Commit
0f3c199
·
verified ·
1 Parent(s): 7bdbe90

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. README.md +244 -171
  2. inference.py +260 -165
  3. pyproject.toml +3 -12
  4. sample_infrenae.py +205 -101
  5. uv.lock +2 -2
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: Scheduling Env Environment Server
3
- emoji: 🏏
4
  colorFrom: blue
5
  colorTo: pink
6
  sdk: docker
@@ -11,245 +11,318 @@ tags:
11
  - openenv
12
  ---
13
 
14
- # Scheduling Env Environment
15
 
16
- A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
17
 
18
- ## Quick Start
19
-
20
- The simplest way to use the Scheduling Env environment is through the `SchedulingEnv` class:
21
 
22
- ```python
23
- from scheduling_env import SchedulingAction, SchedulingEnv
24
 
25
- try:
26
- # Create environment from Docker image
27
- scheduling_envenv = SchedulingEnv.from_docker_image("scheduling_env-env:latest")
28
 
29
- # Reset
30
- result = scheduling_envenv.reset()
31
- print(f"Reset: {result.observation.echoed_message}")
32
 
33
- # Send multiple messages
34
- messages = ["Hello, World!", "Testing echo", "Final message"]
35
 
36
- for msg in messages:
37
- result = scheduling_envenv.step(SchedulingAction(message=msg))
38
- print(f"Sent: '{msg}'")
39
- print(f" → Echoed: '{result.observation.echoed_message}'")
40
- print(f" → Length: {result.observation.message_length}")
41
- print(f" → Reward: {result.reward}")
42
 
43
- finally:
44
- # Always clean up
45
- scheduling_envenv.close()
46
  ```
47
 
48
- That's it! The `SchedulingEnv.from_docker_image()` method handles:
49
- - Starting the Docker container
50
- - Waiting for the server to be ready
51
- - Connecting to the environment
52
- - Container cleanup when you call `close()`
53
-
54
- ## Building the Docker Image
55
 
56
- Before using the environment, you need to build the Docker image:
57
 
58
- ```bash
59
- # From project root
60
- docker build -t scheduling_env-env:latest -f server/Dockerfile .
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  ```
62
 
63
- ## Deploying to Hugging Face Spaces
64
-
65
- You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
66
 
67
- ```bash
68
- # From the environment directory (where openenv.yaml is located)
69
- openenv push
70
 
71
- # Or specify options
72
- openenv push --namespace my-org --private
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  ```
74
 
75
- The `openenv push` command will:
76
- 1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
77
- 2. Prepare a custom build for Hugging Face Docker space (enables web interface)
78
- 3. Upload to Hugging Face (ensuring you're logged in)
79
 
80
- ### Prerequisites
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
- - Authenticate with Hugging Face: The command will prompt for login if not already authenticated
 
 
 
83
 
84
- ### Options
 
85
 
86
- - `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
87
- - `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
88
- - `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
89
- - `--private`: Deploy the space as private (default: public)
90
 
91
- ### Examples
92
 
93
- ```bash
94
- # Push to your personal namespace (defaults to username/env-name from openenv.yaml)
95
- openenv push
96
 
97
- # Push to a specific repository
98
- openenv push --repo-id my-org/my-env
 
 
 
99
 
100
- # Push with a custom base image
101
- openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
102
 
103
- # Push as a private space
104
- openenv push --private
 
105
 
106
- # Combine options
107
- openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
108
- ```
109
 
110
- After deployment, your space will be available at:
111
- `https://huggingface.co/spaces/<repo-id>`
 
112
 
113
- The deployed space includes:
114
- - **Web Interface** at `/web` - Interactive UI for exploring the environment
115
- - **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
116
- - **Health Check** at `/health` - Container health monitoring
117
- - **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
118
 
119
- ## Environment Details
 
 
120
 
121
- ### Action
122
- **SchedulingAction**: Contains a single field
123
- - `message` (str) - The message to echo back
124
 
125
- ### Observation
126
- **SchedulingObservation**: Contains the echo response and metadata
127
- - `echoed_message` (str) - The message echoed back
128
- - `message_length` (int) - Length of the message
129
- - `reward` (float) - Reward based on message length (length × 0.1)
130
- - `done` (bool) - Always False for echo environment
131
- - `metadata` (dict) - Additional info like step count
132
 
133
- ### Reward
134
- The reward is calculated as: `message_length × 0.1`
135
- - "Hi" reward: 0.2
136
- - "Hello, World!" reward: 1.3
137
- - Empty message reward: 0.0
 
138
 
139
- ## Advanced Usage
140
 
141
- ### Connecting to an Existing Server
142
 
143
- If you already have a Scheduling Env environment server running, you can connect directly:
144
 
145
- ```python
146
- from scheduling_env import SchedulingEnv
 
 
 
 
 
147
 
148
- # Connect to existing server
149
- scheduling_envenv = SchedulingEnv(base_url="<ENV_HTTP_URL_HERE>")
150
 
151
- # Use as normal
152
- result = scheduling_envenv.reset()
153
- result = scheduling_envenv.step(SchedulingAction(message="Hello!"))
 
 
 
 
 
 
 
 
 
 
 
 
154
  ```
155
 
156
- Note: When connecting to an existing server, `scheduling_envenv.close()` will NOT stop the server.
 
 
157
 
158
- ### Using the Context Manager
 
 
159
 
160
- The client supports context manager usage for automatic connection management:
161
 
162
- ```python
163
- from scheduling_env import SchedulingAction, SchedulingEnv
 
164
 
165
- # Connect with context manager (auto-connects and closes)
166
- with SchedulingEnv(base_url="http://localhost:8000") as env:
167
- result = env.reset()
168
- print(f"Reset: {result.observation.echoed_message}")
169
- # Multiple steps with low latency
170
- for msg in ["Hello", "World", "!"]:
171
- result = env.step(SchedulingAction(message=msg))
172
- print(f"Echoed: {result.observation.echoed_message}")
173
  ```
174
 
175
- The client uses WebSocket connections for:
176
- - **Lower latency**: No HTTP connection overhead per request
177
- - **Persistent session**: Server maintains your environment state
178
- - **Efficient for episodes**: Better for many sequential steps
179
 
180
- ### Concurrent WebSocket Sessions
 
 
181
 
182
- The server supports multiple concurrent WebSocket connections. To enable this,
183
- modify `server/app.py` to use factory mode:
184
 
185
- ```python
186
- # In server/app.py - use factory mode for concurrent sessions
187
- app = create_app(
188
- SchedulingEnvironment, # Pass class, not instance
189
- SchedulingAction,
190
- SchedulingObservation,
191
- max_concurrent_envs=4, # Allow 4 concurrent sessions
192
- )
193
  ```
194
 
195
- Then multiple clients can connect simultaneously:
196
 
197
- ```python
198
- from scheduling_env import SchedulingAction, SchedulingEnv
199
- from concurrent.futures import ThreadPoolExecutor
200
 
201
- def run_episode(client_id: int):
202
- with SchedulingEnv(base_url="http://localhost:8000") as env:
203
- result = env.reset()
204
- for i in range(10):
205
- result = env.step(SchedulingAction(message=f"Client {client_id}, step {i}"))
206
- return client_id, result.observation.message_length
207
 
208
- # Run 4 episodes concurrently
209
- with ThreadPoolExecutor(max_workers=4) as executor:
210
- results = list(executor.map(run_episode, range(4)))
211
  ```
212
 
213
- ## Development & Testing
214
 
215
- ### Direct Environment Testing
 
 
216
 
217
- Test the environment logic directly without starting the HTTP server:
 
 
 
218
 
219
- ```bash
220
- # From the server directory
221
- python3 server/scheduling_env_environment.py
222
- ```
223
 
224
- This verifies that:
225
- - Environment resets correctly
226
- - Step executes actions properly
227
- - State tracking works
228
- - Rewards are calculated correctly
 
229
 
230
- ### Running Locally
231
 
232
- Run the server locally for development:
233
 
234
- ```bash
235
- uvicorn server.app:app --reload
 
 
236
  ```
237
 
238
  ## Project Structure
239
 
240
  ```
241
- scheduling_env/
242
- ├── .dockerignore # Docker build exclusions
243
- ├── __init__.py # Module exports
244
- ├── README.md # This file
245
- ├── openenv.yaml # OpenEnv manifest
246
- ├── pyproject.toml # Project metadata and dependencies
247
- ├── uv.lock # Locked dependencies (generated)
248
- ├── client.py # SchedulingEnv client
249
- ├── models.py # Action and Observation models
 
 
250
  └── server/
251
- ├── __init__.py # Server module exports
252
- ├── scheduling_env_environment.py # Core environment logic
253
- ├── app.py # FastAPI application (HTTP + WebSocket endpoints)
254
- ── Dockerfile # Container image definition
 
 
 
 
 
 
 
255
  ```
 
1
  ---
2
  title: Scheduling Env Environment Server
3
+ emoji: 📅
4
  colorFrom: blue
5
  colorTo: pink
6
  sdk: docker
 
11
  - openenv
12
  ---
13
 
14
+ # Meeting Scheduling RL Environment
15
 
16
+ An OpenEnv reinforcement-learning environment where AI agents learn to schedule meetings optimally across multiple attendees. The agent must propose time slots, resolve calendar conflicts by rescheduling lower-priority meetings, and satisfy each participant's scheduling preferences — all within a limited number of steps.
17
 
18
+ ## Overview
 
 
19
 
20
+ The environment simulates a realistic corporate scheduling assistant. Given a meeting request, the agent iteratively:
 
21
 
22
+ 1. **Proposes** a time slot for all required attendees.
23
+ 2. **Reschedules** any lower-priority conflicting meetings to free up the slot.
24
+ 3. **Finalizes** the booking once the slot is conflict-free.
25
 
26
+ Each episode is scored on scheduling quality (0.0–1.0), penalizing preference violations, unnecessary rescheduling, and excessive steps.
 
 
27
 
28
+ ## Quick Start
 
29
 
30
+ ### Running the Heuristic Baseline (no LLM needed)
 
 
 
 
 
31
 
32
+ ```bash
33
+ python inference.py
 
34
  ```
35
 
36
+ This runs a greedy baseline policy across all three tasks and prints step-by-step output in the required `[START]`/`[STEP]`/`[END]` format.
 
 
 
 
 
 
37
 
38
+ ### Using the Environment Directly (Python)
39
 
40
+ ```python
41
+ from server.scheduling_env_environment import SchedulingEnvironment
42
+ from models import SchedulingAction
43
+
44
+ env = SchedulingEnvironment()
45
+
46
+ # Reset to a specific task
47
+ obs = env.reset(task_id="task1_easy")
48
+ print(f"Attendees: {obs.attendee_ids}")
49
+ print(f"Duration: {obs.requested_duration} min")
50
+ print(f"Priority: {obs.requested_priority}")
51
+
52
+ # Propose a time slot
53
+ result = env.step(SchedulingAction(
54
+ action_type="propose_slot",
55
+ proposed_start="2025-04-07T10:00:00+00:00",
56
+ proposed_duration=30,
57
+ ))
58
+ print(f"Conflicts: {result.conflicts}")
59
+ print(f"Reward: {result.reward}")
60
+
61
+ # Finalize when conflict-free
62
+ result = env.step(SchedulingAction(action_type="finalize"))
63
+ print(f"Success: {result.success} Final score: {result.reward:.2f}")
64
  ```
65
 
66
+ ### Using the HTTP Client
 
 
67
 
68
+ ```python
69
+ from client import SchedulingEnv
70
+ from models import SchedulingAction
71
 
72
+ with SchedulingEnv(base_url="http://localhost:8000") as env:
73
+ result = env.reset(task_id="task2_medium")
74
+ obs = result.observation
75
+
76
+ # Propose a slot
77
+ result = env.step(SchedulingAction(
78
+ action_type="propose_slot",
79
+ proposed_start="2025-04-07T11:00:00+00:00",
80
+ proposed_duration=60,
81
+ ))
82
+
83
+ # Reschedule a conflicting lower-priority meeting
84
+ if result.observation.conflicts:
85
+ conflict = result.observation.conflicts[0]
86
+ result = env.step(SchedulingAction(
87
+ action_type="reschedule_meeting",
88
+ meeting_id_to_move=conflict["meeting_id"],
89
+ new_start_time="2025-04-07T07:00:00+00:00",
90
+ ))
91
+
92
+ # Finalize
93
+ result = env.step(SchedulingAction(action_type="finalize"))
94
+ print(f"Score: {result.reward:.2f}")
95
  ```
96
 
97
+ ## Environment Details
 
 
 
98
 
99
+ ### Actions (`SchedulingAction`)
100
+
101
+ | `action_type` | Required fields | Description |
102
+ |----------------------|----------------------------------------------|-----------------------------------------------------------|
103
+ | `propose_slot` | `proposed_start`, `proposed_duration` | Propose a meeting start time (ISO 8601) and duration (min)|
104
+ | `reschedule_meeting` | `meeting_id_to_move`, `new_start_time` | Move a lower-priority conflict to a new time |
105
+ | `finalize` | _(none)_ | Confirm the proposed slot; ends the episode |
106
+ | `reject` | _(none)_ | Give up on scheduling; ends the episode with 0 reward |
107
+
108
+ **Meeting ID format:** `{attendee}_{start_iso}` — e.g. `user1_2025-04-07T09:00:00+00:00`
109
+
110
+ ### Observations (`SchedulingObservation`)
111
+
112
+ | Field | Type | Description |
113
+ |-------------------------|-------------------------|--------------------------------------------------------------|
114
+ | `requested_duration` | `int` | Meeting duration in minutes |
115
+ | `requested_priority` | `int` | Priority of the new meeting (1 = highest, 5 = lowest) |
116
+ | `attendee_ids` | `List[str]` | Required attendees |
117
+ | `busy_slots` | `List[dict]` | All existing calendar entries for attendees |
118
+ | `collective_work_hours` | `dict` | Shared working-hours window `{min_start_hour, max_end_hour}` |
119
+ | `preference_constraints`| `dict` | Aggregated constraints (max meetings/day, buffer, etc.) |
120
+ | `current_proposal` | `dict \| None` | Currently proposed slot `{start, end}` |
121
+ | `conflicts` | `List[dict]` | Conflicts for the current proposal |
122
+ | `preference_penalty` | `float` | Accumulated preference-violation penalty |
123
+ | `num_rescheduled` | `int` | Meetings rescheduled so far in this episode |
124
+ | `steps_taken` | `int` | Steps used so far |
125
+ | `max_steps` | `int` | Episode step limit (20) |
126
+ | `success` | `bool` | `True` when the meeting is successfully booked |
127
+ | `error_message` | `str \| None` | Reason if the last action was invalid |
128
+ | `done` | `bool` | `True` when the episode has ended |
129
+ | `reward` | `float` | Step or final reward |
130
+
131
+ ### Reward Design
132
+
133
+ **Step-level rewards** (returned after each `propose_slot` or `reschedule_meeting`):
134
+
135
+ | Outcome | Reward |
136
+ |------------------------------------------|--------|
137
+ | Conflict-free proposal (low penalty) | +0.5 |
138
+ | Proposal has reschedulable conflicts | +0.2 |
139
+ | Proposal has non-reschedulable conflicts | −0.3 |
140
+ | Invalid action | −0.1 |
141
+ | Outside working hours | −0.2 |
142
+
143
+ **Final reward** (returned on `finalize`) — deducted from 1.0:
144
 
145
+ ```
146
+ preference_deduction = min(0.75, (penalty ** 1.2) / 200.0)
147
+ reschedule_deduction = min(0.30, 0.05 * (1.8 ** num_rescheduled)) [if any rescheduled]
148
+ time_deduction = steps_taken * 0.015
149
 
150
+ final_reward = clamp(1.0 - preference_deduction - reschedule_deduction - time_deduction, 0.0, 1.0)
151
+ ```
152
 
153
+ Timeout (step 20 reached without `finalize`) gives partial credit: 70 % of the theoretical reward if conflict-free, or a progress-based fraction otherwise.
 
 
 
154
 
155
+ ## Tasks
156
 
157
+ Three tasks of increasing difficulty are provided as JSON scenarios in `server/scenarios/`:
 
 
158
 
159
+ | Task ID | Difficulty | Attendees | Duration | Priority | Rescheduling needed | Expected score |
160
+ |-----------------|------------|-----------|----------|----------|---------------------|----------------|
161
+ | `task1_easy` | Easy | 2 | 30 min | 3 | No | 0.8 – 1.0 |
162
+ | `task2_medium` | Medium | 4 | 60 min | 2 | Yes (1 meeting) | 0.5 – 0.7 |
163
+ | `task3_hard` | Hard | 6 | 45 min | 2 | Yes (3+ meetings) | 0.25 – 0.45 |
164
 
165
+ ### task1_easy Team Sync (2 attendees)
 
166
 
167
+ - Two attendees each have 2 existing meetings; a clear free slot exists at **10:00**.
168
+ - Agent should find the free slot and finalize in 2 steps.
169
+ - No rescheduling required.
170
 
171
+ ### task2_medium — Cross-Team Planning (4 attendees)
 
 
172
 
173
+ - Four attendees with densely packed schedules; the optimal slot at **11:00** has one low-priority conflict (`user3` Coffee chat, priority 4).
174
+ - Agent needs to propose the slot, reschedule the conflict, then finalize.
175
+ - User preferences include back-to-back avoidance and different preferred-hour windows.
176
 
177
+ ### task3_hard Executive Planning Session (6 attendees)
 
 
 
 
178
 
179
+ - Six attendees with very dense calendars; the best window at **15:00** requires rescheduling three low-priority meetings (priority 4).
180
+ - Multiple valid solutions exist; the agent must navigate cascading constraints.
181
+ - All attendees have strict buffer requirements and narrow preferred-hour windows.
182
 
183
+ ## Participant Preferences
 
 
184
 
185
+ Each attendee can have the following preferences (stored in scenario JSON and observed via `preference_constraints`):
 
 
 
 
 
 
186
 
187
+ | Preference | Description | Penalty for violation |
188
+ |------------------------|-----------------------------------------------------|-----------------------|
189
+ | `preferred_hours` | `{start: H, end: H}` — preferred working hours | +50 per participant |
190
+ | `max_meetings_per_day` | Maximum meetings the participant wants in a day | +30 per participant |
191
+ | `avoid_back_to_back` | Whether a buffer gap is required between meetings | +20 per participant |
192
+ | `buffer_minutes` | Gap required before/after a meeting (if avoid_btb) | (part of above) |
193
 
194
+ The **collective working hours** (the intersection of all attendees' preferred hours) define the hard constraint window within which proposals must fall.
195
 
196
+ ## API Endpoints
197
 
198
+ The server exposes the following HTTP endpoints (also available via the Web UI at `/web`):
199
 
200
+ | Method | Path | Description |
201
+ |--------|-----------|--------------------------------------------------------------------|
202
+ | POST | `/reset` | Start a new episode. Body: `{"task_id": "task1_easy"}` |
203
+ | POST | `/step` | Take an action. Body: `{"action_type": "...", ...action fields}` |
204
+ | GET | `/state` | Return the full internal `SchedulingState` |
205
+ | GET | `/health` | Health check — returns `{"status": "healthy"}` |
206
+ | GET | `/docs` | Interactive OpenAPI / Swagger UI |
207
 
208
+ ### Example: REST interaction
 
209
 
210
+ ```bash
211
+ # Start episode
212
+ curl -X POST http://localhost:8000/reset \
213
+ -H "Content-Type: application/json" \
214
+ -d '{"task_id": "task1_easy"}'
215
+
216
+ # Propose a slot
217
+ curl -X POST http://localhost:8000/step \
218
+ -H "Content-Type: application/json" \
219
+ -d '{"action_type": "propose_slot", "proposed_start": "2025-04-07T10:00:00+00:00", "proposed_duration": 30}'
220
+
221
+ # Finalize
222
+ curl -X POST http://localhost:8000/step \
223
+ -H "Content-Type: application/json" \
224
+ -d '{"action_type": "finalize"}'
225
  ```
226
 
227
+ ## Development & Testing
228
+
229
+ ### Run the baseline inference script
230
 
231
+ ```bash
232
+ python inference.py
233
+ ```
234
 
235
+ ### Start the server locally
236
 
237
+ ```bash
238
+ uvicorn server.app:app --reload
239
+ ```
240
 
241
+ ### Validate the environment (required before submission)
242
+
243
+ ```bash
244
+ openenv validate
 
 
 
 
245
  ```
246
 
247
+ ### Generate / update the lock file
 
 
 
248
 
249
+ ```bash
250
+ uv lock
251
+ ```
252
 
253
+ ### Build the Docker image
 
254
 
255
+ ```bash
256
+ docker build -t scheduling_env:latest .
 
 
 
 
 
 
257
  ```
258
 
259
+ ## Deploying to Hugging Face Spaces
260
 
261
+ ```bash
262
+ # From the project root (where openenv.yaml is located)
263
+ openenv push
264
 
265
+ # Push to a specific repository
266
+ openenv push --repo-id my-org/my-scheduling-env
 
 
 
 
267
 
268
+ # Push as a private space
269
+ openenv push --private
 
270
  ```
271
 
272
+ The `openenv push` command validates the environment, builds a Hugging Face-compatible Docker image, and uploads it. After deployment your space is available at:
273
 
274
+ ```
275
+ https://huggingface.co/spaces/<repo-id>
276
+ ```
277
 
278
+ The deployed space includes:
279
+ - **Web Interface** at `/web` — interactive UI for exploring the environment
280
+ - **API Documentation** at `/docs` — full OpenAPI / Swagger interface
281
+ - **Health Check** at `/health` — container health monitoring
282
 
283
+ ### Options
 
 
 
284
 
285
+ | Flag | Description |
286
+ |------|-------------|
287
+ | `--directory`, `-d` | Directory with `openenv.yaml` (default: current dir) |
288
+ | `--repo-id`, `-r` | Repository ID `username/repo-name` |
289
+ | `--base-image`, `-b` | Override Dockerfile `FROM` image |
290
+ | `--private` | Deploy as a private space (default: public) |
291
 
292
+ ## Environment Variables (for LLM-based inference)
293
 
294
+ Create a `.env` file (never commit it):
295
 
296
+ ```
297
+ API_BASE_URL=https://router.huggingface.co/v1 # HF Router endpoint
298
+ MODEL_NAME=Qwen/Qwen2.5-72B-Instruct # Model identifier
299
+ HF_TOKEN=hf_... # Hugging Face API key
300
  ```
301
 
302
  ## Project Structure
303
 
304
  ```
305
+ rl-scheduling-env/
306
+ ├── Dockerfile # Container image (root, required by openenv)
307
+ ├── README.md # This file
308
+ ├── openenv.yaml # OpenEnv manifest
309
+ ├── pyproject.toml # Project metadata and dependencies
310
+ ├── uv.lock # Locked dependencies (generated by `uv lock`)
311
+ ├── __init__.py # Package exports
312
+ ├── models.py # Pydantic models: SchedulingAction,
313
+ # SchedulingObservation, SchedulingState
314
+ ├── client.py # SchedulingEnv HTTP/WebSocket client
315
+ ├── inference.py # Heuristic baseline (no LLM required)
316
  └── server/
317
+ ├── __init__.py # Server package exports
318
+ ├── app.py # FastAPI app + SchedulingHTTPEnvServer
319
+ ├── scheduling_env_environment.py # Core RL environment (reset / step / state)
320
+ ── scheduling_logic.py # Pure utility functions (conflict detection,
321
+ │ # preference scoring, reward calculation)
322
+ ├── graders.py # SchedulingGrader (0.0–1.0 episode scorer)
323
+ ├── requirements.txt # Server-side Python dependencies
324
+ └── scenarios/
325
+ ├── task1_easy.json # Easy: 2 attendees, free slot exists
326
+ ├── task2_medium.json # Medium: 4 attendees, 1 rescheduling needed
327
+ └── task3_hard.json # Hard: 6 attendees, 3+ reschedulings needed
328
  ```
inference.py CHANGED
@@ -1,198 +1,293 @@
1
- #!/usr/bin/env python3
2
  """
3
- Baseline inference script for the Meeting Scheduling RL Environment.
4
-
5
- Uses a HEURISTIC policy (BotBooked greedy algorithm) - NO LLM required.
6
- Deterministic, reproducible, fast (~seconds for all 3 tasks).
7
-
8
- Output format: [START]/[STEP]/[END] per hackathon spec.
 
 
 
 
 
 
 
9
  """
10
 
11
- from __future__ import annotations
12
-
13
- import sys
14
- from datetime import datetime, timedelta, timezone
15
-
16
- from server.scheduling_env_environment import SchedulingEnvironment
17
- from models import SchedulingAction
18
- from server.scheduling_logic import find_earliest_free_slot, parse_iso
19
-
20
-
21
- def baseline_policy(obs) -> SchedulingAction:
22
- """Heuristic baseline using greedy slot search + lowest-priority rescheduling."""
23
-
24
- # Step 1: No proposal yet -> find a free slot
25
- if obs.current_proposal is None:
26
- # Build calendars dict from busy_slots
27
- calendars = {}
28
- for slot in obs.busy_slots:
29
- att = slot["attendee"]
30
- if att not in calendars:
31
- calendars[att] = []
32
- calendars[att].append([slot["start"], slot["end"], slot["priority"], slot["summary"]])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- # Try to find a completely free slot
35
- free = find_earliest_free_slot(
36
- calendars,
37
- obs.attendee_ids,
38
- obs.requested_duration,
39
- obs.busy_slots[0]["start"] if obs.busy_slots else "2025-04-07T09:00:00+00:00",
40
- obs.collective_work_hours,
41
- )
42
 
43
- if free:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  return SchedulingAction(
45
  action_type="propose_slot",
46
- proposed_start=free,
47
  proposed_duration=obs.requested_duration,
48
  )
 
 
 
 
49
 
50
- # No completely free slot found.
51
- # Scan 15-min increments within collective hours for a slot with only
52
- # reschedulable conflicts (priority > requested_priority).
53
- min_h = obs.collective_work_hours.get("min_start_hour", 9)
54
- max_h = obs.collective_work_hours.get("max_end_hour", 17)
55
- duration = obs.requested_duration
56
- tz = timezone.utc
57
-
58
- candidate = datetime(2025, 4, 7, min_h, 0, 0, tzinfo=tz)
59
- end_boundary = datetime(2025, 4, 7, max_h, 0, 0, tzinfo=tz)
60
- step_delta = timedelta(minutes=15)
61
-
62
- best_candidate = None
63
- best_conflict_count = 999
64
-
65
- while candidate + timedelta(minutes=duration) <= end_boundary:
66
- c_start = candidate.isoformat()
67
- c_end = (candidate + timedelta(minutes=duration)).isoformat()
68
-
69
- # Count conflicts at this candidate
70
- conflicts_here = []
71
- for att in obs.attendee_ids:
72
- for entry in calendars.get(att, []):
73
- e_start = parse_iso(entry[0])
74
- e_end = parse_iso(entry[1])
75
- if candidate < e_end and e_start < candidate + timedelta(minutes=duration):
76
- conflicts_here.append(entry)
77
-
78
- # Check if all conflicts are reschedulable
79
- all_reschedulable = all(
80
- c[2] > obs.requested_priority for c in conflicts_here
81
- )
82
 
83
- if all_reschedulable and len(conflicts_here) < best_conflict_count:
84
- best_candidate = c_start
85
- best_conflict_count = len(conflicts_here)
86
- if best_conflict_count == 0:
87
- break # Perfect slot
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
- candidate += step_delta
90
 
91
- if best_candidate:
92
- return SchedulingAction(
93
- action_type="propose_slot",
94
- proposed_start=best_candidate,
95
- proposed_duration=duration,
96
- )
97
 
98
- # Last resort: propose at collective hours start (will likely conflict)
99
- fallback = f"2025-04-07T{min_h:02d}:00:00+00:00"
100
- return SchedulingAction(
101
- action_type="propose_slot",
102
- proposed_start=fallback,
103
- proposed_duration=obs.requested_duration,
104
- )
105
 
106
- # Step 2: Has proposal with conflicts -> reschedule lowest-priority conflict
107
- if obs.conflicts:
108
- sorted_conflicts = sorted(obs.conflicts, key=lambda x: x["priority"], reverse=True)
109
- target = sorted_conflicts[0]
110
 
111
- # Can only reschedule lower priority
112
- if target["priority"] <= obs.requested_priority:
113
- return SchedulingAction(action_type="reject")
114
 
115
- # Find a free slot for this attendee to move the meeting to.
116
- # Search in early morning (06:00-08:00) and late evening (17:00-20:00).
117
- attendee = target["attendee"]
118
- meeting_dur = parse_iso(target["end"]) - parse_iso(target["start"])
119
- dur_min = int(meeting_dur.total_seconds() // 60)
120
-
121
- # Build this attendee's calendar
122
- att_cal = [
123
- s for s in obs.busy_slots if s["attendee"] == attendee
124
- ]
125
- att_entries = [[s["start"], s["end"], s["priority"], s["summary"]] for s in att_cal]
126
-
127
- new_time = None
128
- # Try slots at 06:00, 06:30, 07:00, 07:30, 17:00, 17:30, 18:00, 18:30, 19:00
129
- for h, m in [(6,0),(6,30),(7,0),(7,30),(17,0),(17,30),(18,0),(18,30),(19,0),(19,30),(20,0)]:
130
- cand = datetime(2025, 4, 7, h, m, 0, tzinfo=timezone.utc)
131
- cand_end = cand + timedelta(minutes=dur_min)
132
- cand_iso = cand.isoformat()
133
- cand_end_iso = cand_end.isoformat()
134
- # Check free for this attendee
135
- conflict_found = False
136
- for e in att_entries:
137
- es = parse_iso(e[0])
138
- ee = parse_iso(e[1])
139
- if cand < ee and es < cand_end:
140
- conflict_found = True
141
- break
142
- if not conflict_found:
143
- new_time = cand_iso
144
  break
145
 
146
- if not new_time:
147
- # Give up on this conflict, try rejecting
148
- return SchedulingAction(action_type="reject")
149
 
 
 
150
 
151
- return SchedulingAction(
152
- action_type="reschedule_meeting",
153
- meeting_id_to_move=target["meeting_id"],
154
- new_start_time=new_time,
155
- )
156
 
157
- # Step 3: No conflicts -> finalize
158
- return SchedulingAction(action_type="finalize")
159
 
 
 
 
 
 
160
 
161
- def main():
162
- env = SchedulingEnvironment()
163
 
164
- for task_id in ["task1_easy", "task2_medium", "task3_hard"]:
165
- print(f"[START] task={task_id} env=scheduling_env model=heuristic_baseline")
166
 
167
- obs = env.reset(task_id=task_id)
168
- done = False
169
- step = 0
170
- rewards = []
171
 
172
- while not done and step < 20:
173
- action = baseline_policy(obs)
174
- obs = env.step(action)
175
- done = obs.done
176
- reward = obs.reward if obs.reward is not None else 0.0
177
- rewards.append(reward)
178
- step += 1
179
 
180
- error = obs.error_message if obs.error_message else "null"
181
- print(
182
- f"[STEP] step={step} action={action.action_type} "
183
- f"reward={reward:.2f} done={str(done).lower()} error={error}"
184
- )
185
 
186
- final_score = rewards[-1] if (done and rewards) else 0.0
187
- success = obs.success if hasattr(obs, "success") else False
188
- rewards_str = ",".join(f"{r:.2f}" for r in rewards)
189
 
190
- print(
191
- f"[END] success={str(success).lower()} steps={step} "
192
- f"score={final_score:.2f} rewards={rewards_str}"
193
- )
194
- print()
 
 
 
 
 
 
 
 
195
 
196
 
197
  if __name__ == "__main__":
198
- main()
 
 
1
  """
2
+ LLM-based Inference Script for Meeting Scheduling RL Environment.
3
+ ===================================
4
+ Uses OpenAI-compatible LLM via HF Router to intelligently schedule meetings.
5
+
6
+ MANDATORY environment variables:
7
+ API_BASE_URL The API endpoint for the LLM.
8
+ MODEL_NAME The model identifier to use for inference.
9
+ HF_TOKEN Your Hugging Face / API key.
10
+
11
+ STDOUT FORMAT:
12
+ [START] task=<task_name> env=scheduling_env model=<model_name>
13
+ [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
14
+ [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
15
  """
16
 
17
+ import asyncio
18
+ import json
19
+ import os
20
+ import textwrap
21
+ from typing import Dict, List, Optional
22
+
23
+ from openai import OpenAI
24
+
25
+ from scheduling_env.client import SchedulingEnv
26
+ from scheduling_env.models import SchedulingAction
27
+
28
+ # ---------------------------------------------------------------------------
29
+ # Configuration
30
+ # ---------------------------------------------------------------------------
31
+ API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
32
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
33
+ MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
34
+
35
+ ENV_REPO_ID = "Akshaykumarbm/scheduling_env"
36
+ BENCHMARK = "scheduling_env"
37
+ TASKS = ["task1_easy", "task2_medium", "task3_hard"]
38
+ MAX_STEPS = 20
39
+ TEMPERATURE = 0.3
40
+ MAX_TOKENS = 512
41
+
42
+ # ---------------------------------------------------------------------------
43
+ # Logging helpers
44
+ # ---------------------------------------------------------------------------
45
+
46
+ def log_start(task: str, env: str, model: str) -> None:
47
+ print(f"[START] task={task} env={env} model={model}", flush=True)
48
+
49
+
50
+ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
51
+ error_val = error if error else "null"
52
+ done_val = str(done).lower()
53
+ print(
54
+ f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
55
+ flush=True,
56
+ )
57
+
58
+
59
+ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
60
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
61
+ print(
62
+ f"[END] success={str(success).lower()} steps={steps} score={score:.2f} rewards={rewards_str}",
63
+ flush=True,
64
+ )
65
+
66
+
67
+ # ---------------------------------------------------------------------------
68
+ # LLM interaction
69
+ # ---------------------------------------------------------------------------
70
+
71
+ SYSTEM_PROMPT = textwrap.dedent("""\
72
+ You are an AI meeting scheduling assistant. You must schedule a meeting by choosing actions.
73
+
74
+ Available actions (respond with EXACTLY one JSON object):
75
+
76
+ 1. Propose a time slot:
77
+ {"action_type": "propose_slot", "proposed_start": "<ISO8601>", "proposed_duration": <minutes>}
78
+
79
+ 2. Reschedule a conflicting meeting (only if priority > requested priority):
80
+ {"action_type": "reschedule_meeting", "meeting_id_to_move": "<attendee>_<start_iso>", "new_start_time": "<ISO8601>"}
81
+
82
+ 3. Finalize the schedule (only when no conflicts remain):
83
+ {"action_type": "finalize"}
84
+
85
+ 4. Reject (give up):
86
+ {"action_type": "reject"}
87
+
88
+ Rules:
89
+ - Propose slots within collective working hours.
90
+ - You can only reschedule meetings with LOWER priority (higher number) than the requested meeting.
91
+ - meeting_id format is: <attendee>_<start_iso> (e.g., "user1_2025-04-07T09:00:00+00:00").
92
+ - After rescheduling all conflicts, call finalize.
93
+ - Minimize preference violations and rescheduling.
94
+ - Respond with ONLY the JSON object, no other text.
95
+ """)
96
+
97
+
98
+ def format_observation(obs, step: int) -> str:
99
+ """Convert a SchedulingObservation into a user prompt for the LLM."""
100
+ parts = [
101
+ f"Step {step}/{obs.max_steps}",
102
+ f"Meeting to schedule: {obs.requested_duration} min, priority {obs.requested_priority}",
103
+ f"Attendees: {', '.join(obs.attendee_ids)}",
104
+ f"Collective working hours: {obs.collective_work_hours.get('min_start_hour', 9)}:00 - {obs.collective_work_hours.get('max_end_hour', 17)}:00",
105
+ ]
106
+
107
+ if obs.preference_constraints:
108
+ parts.append(f"Preferences: max {obs.preference_constraints.get('max_meetings_per_day', 'N/A')} meetings/day, "
109
+ f"buffer required: {obs.preference_constraints.get('requires_buffer', False)}, "
110
+ f"buffer mins: {obs.preference_constraints.get('buffer_minutes', 0)}")
111
+
112
+ # Busy slots grouped by attendee
113
+ busy_by_attendee: Dict[str, List] = {}
114
+ for slot in obs.busy_slots:
115
+ att = slot["attendee"]
116
+ busy_by_attendee.setdefault(att, []).append(slot)
117
+
118
+ parts.append("\nCalendars:")
119
+ for att in obs.attendee_ids:
120
+ slots = busy_by_attendee.get(att, [])
121
+ if slots:
122
+ slot_strs = [
123
+ f" - {s['start']} to {s['end']} (priority {s['priority']}, {s['summary']})"
124
+ for s in sorted(slots, key=lambda x: x["start"])
125
+ ]
126
+ parts.append(f" {att}:")
127
+ parts.extend(slot_strs)
128
+ else:
129
+ parts.append(f" {att}: (no meetings)")
130
+
131
+ if obs.current_proposal:
132
+ parts.append(f"\nCurrent proposal: {obs.current_proposal['start']} to {obs.current_proposal['end']}")
133
 
134
+ if obs.conflicts:
135
+ parts.append(f"\nConflicts ({len(obs.conflicts)}):")
136
+ for c in obs.conflicts:
137
+ parts.append(
138
+ f" - {c['attendee']}: {c['start']} to {c['end']} "
139
+ f"(priority {c['priority']}, {c['summary']}, id: {c['meeting_id']})"
140
+ )
 
141
 
142
+ if obs.error_message:
143
+ parts.append(f"\nLast error: {obs.error_message}")
144
+
145
+ parts.append(f"\nRescheduled so far: {obs.num_rescheduled}")
146
+ parts.append(f"Preference penalty: {obs.preference_penalty}")
147
+
148
+ if not obs.current_proposal and not obs.conflicts:
149
+ parts.append("\nAction needed: propose a time slot for the meeting.")
150
+ elif obs.conflicts:
151
+ parts.append("\nAction needed: reschedule a conflict (lower-priority only) or propose a different slot.")
152
+ else:
153
+ parts.append("\nAction needed: no conflicts remain - you should finalize.")
154
+
155
+ return "\n".join(parts)
156
+
157
+
158
+ def parse_llm_response(text: str, obs) -> SchedulingAction:
159
+ """Parse LLM JSON response into a SchedulingAction, with fallback."""
160
+ # Extract JSON from response (handle markdown code blocks)
161
+ cleaned = text.strip()
162
+ if "```" in cleaned:
163
+ # Extract content between code fences
164
+ lines = cleaned.split("\n")
165
+ json_lines = []
166
+ in_block = False
167
+ for line in lines:
168
+ if line.strip().startswith("```"):
169
+ in_block = not in_block
170
+ continue
171
+ if in_block:
172
+ json_lines.append(line)
173
+ cleaned = "\n".join(json_lines).strip()
174
+
175
+ # Try to find JSON object in the response
176
+ start = cleaned.find("{")
177
+ end = cleaned.rfind("}") + 1
178
+ if start >= 0 and end > start:
179
+ cleaned = cleaned[start:end]
180
+
181
+ try:
182
+ data = json.loads(cleaned)
183
+ return SchedulingAction(**data)
184
+ except (json.JSONDecodeError, Exception) as e:
185
+ print(f"[DEBUG] Failed to parse LLM response: {e}. Response: {text[:200]}", flush=True)
186
+ # Fallback: if we have no proposal yet, propose at first available hour
187
+ if obs.current_proposal is None:
188
+ min_h = obs.collective_work_hours.get("min_start_hour", 9)
189
  return SchedulingAction(
190
  action_type="propose_slot",
191
+ proposed_start=f"2025-04-07T{min_h:02d}:00:00+00:00",
192
  proposed_duration=obs.requested_duration,
193
  )
194
+ elif not obs.conflicts:
195
+ return SchedulingAction(action_type="finalize")
196
+ else:
197
+ return SchedulingAction(action_type="reject")
198
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
199
 
200
+ def get_llm_action(client: OpenAI, obs, step: int) -> SchedulingAction:
201
+ """Query the LLM and return a SchedulingAction."""
202
+ user_prompt = format_observation(obs, step)
203
+ try:
204
+ completion = client.chat.completions.create(
205
+ model=MODEL_NAME,
206
+ messages=[
207
+ {"role": "system", "content": SYSTEM_PROMPT},
208
+ {"role": "user", "content": user_prompt},
209
+ ],
210
+ temperature=TEMPERATURE,
211
+ max_tokens=MAX_TOKENS,
212
+ stream=False,
213
+ )
214
+ text = (completion.choices[0].message.content or "").strip()
215
+ return parse_llm_response(text, obs)
216
+ except Exception as exc:
217
+ print(f"[DEBUG] LLM request failed: {exc}", flush=True)
218
+ return parse_llm_response("", obs)
219
 
 
220
 
221
+ # ---------------------------------------------------------------------------
222
+ # Main loop
223
+ # ---------------------------------------------------------------------------
 
 
 
224
 
225
+ async def run_task(env, client: OpenAI, task_id: str) -> None:
226
+ """Run a single scheduling task."""
227
+ rewards: List[float] = []
228
+ steps_taken = 0
229
+ score = 0.0
230
+ success = False
 
231
 
232
+ log_start(task=task_id, env=BENCHMARK, model=MODEL_NAME)
 
 
 
233
 
234
+ try:
235
+ result = await env.reset(task_id=task_id)
236
+ obs = result.observation
237
 
238
+ for step in range(1, MAX_STEPS + 1):
239
+ if result.done:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
240
  break
241
 
242
+ action = get_llm_action(client, obs, step)
 
 
243
 
244
+ result = await env.step(action)
245
+ obs = result.observation
246
 
247
+ reward = result.reward or 0.0
248
+ done = result.done
249
+ error = obs.error_message
 
 
250
 
251
+ rewards.append(reward)
252
+ steps_taken = step
253
 
254
+ action_str = action.action_type
255
+ if action.action_type == "propose_slot":
256
+ action_str = f"propose_slot({action.proposed_start},{action.proposed_duration}m)"
257
+ elif action.action_type == "reschedule_meeting":
258
+ action_str = f"reschedule({action.meeting_id_to_move}->{action.new_start_time})"
259
 
260
+ log_step(step=step, action=action_str, reward=reward, done=done, error=error)
 
261
 
262
+ if done:
263
+ break
264
 
265
+ # Score is the final reward (0.0-1.0 from calculate_final_reward)
266
+ score = rewards[-1] if rewards else 0.0
267
+ score = min(max(score, 0.0), 1.0)
268
+ success = obs.success if hasattr(obs, "success") else (score > 0.0)
269
 
270
+ except Exception as exc:
271
+ print(f"[DEBUG] Task {task_id} error: {exc}", flush=True)
 
 
 
 
 
272
 
273
+ finally:
274
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
 
 
 
275
 
 
 
 
276
 
277
+ async def main() -> None:
278
+ llm_client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
279
+
280
+ env = await SchedulingEnv.from_env(ENV_REPO_ID)
281
+
282
+ try:
283
+ for task_id in TASKS:
284
+ await run_task(env, llm_client, task_id)
285
+ finally:
286
+ try:
287
+ await env.close()
288
+ except Exception as e:
289
+ print(f"[DEBUG] env.close() error: {e}", flush=True)
290
 
291
 
292
  if __name__ == "__main__":
293
+ asyncio.run(main())
pyproject.toml CHANGED
@@ -14,19 +14,10 @@ version = "0.1.0"
14
  description = "Scheduling Env environment for OpenEnv"
15
  requires-python = ">=3.10"
16
  dependencies = [
17
- "huggingface-hub>=1.9.1",
18
  # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
19
- # install from github
20
- # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
21
  "openenv-core[core]>=0.2.2",
22
- # Environment-specific dependencies
23
- # Add all dependencies needed for your environment here
24
- # Examples:
25
- # "numpy>=1.19.0",
26
- # "torch>=2.0.0",
27
- # "gymnasium>=0.29.0",
28
- # "openspiel>=1.0.0",
29
- # "smolagents>=1.22.0,<2",
30
  ]
31
 
32
  [project.optional-dependencies]
@@ -43,4 +34,4 @@ server = "scheduling_env.server.app:main"
43
  [tool.setuptools]
44
  include-package-data = true
45
  packages = ["scheduling_env", "scheduling_env.server"]
46
- package-dir = { "scheduling_env" = ".", "scheduling_env.server" = "server" }
 
14
  description = "Scheduling Env environment for OpenEnv"
15
  requires-python = ">=3.10"
16
  dependencies = [
 
17
  # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
 
 
18
  "openenv-core[core]>=0.2.2",
19
+ # OpenAI client for LLM-based inference
20
+ "openai>=1.0.0",
 
 
 
 
 
 
21
  ]
22
 
23
  [project.optional-dependencies]
 
34
  [tool.setuptools]
35
  include-package-data = true
36
  packages = ["scheduling_env", "scheduling_env.server"]
37
+ package-dir = { "scheduling_env" = ".", "scheduling_env.server" = "server" }
sample_infrenae.py CHANGED
@@ -1,82 +1,47 @@
1
-
2
  """
3
- Inference Script Example
4
  ===================================
5
- MANDATORY
6
- - Before submitting, ensure the following variables are defined in your environment configuration:
 
7
  API_BASE_URL The API endpoint for the LLM.
8
  MODEL_NAME The model identifier to use for inference.
9
  HF_TOKEN Your Hugging Face / API key.
10
- LOCAL_IMAGE_NAME The name of the local image to use for the environment if you are using from_docker_image()
11
- method
12
-
13
- - Defaults are set only for API_BASE_URL and MODEL_NAME
14
- (and should reflect your active inference setup):
15
- API_BASE_URL = os.getenv("API_BASE_URL", "<your-active-endpoint>")
16
- MODEL_NAME = os.getenv("MODEL_NAME", "<your-active-model>")
17
-
18
- - The inference script must be named `inference.py` and placed in the root directory of the project
19
- - Participants must use OpenAI Client for all LLM calls using above variables
20
 
21
- STDOUT FORMAT
22
- - The script must emit exactly three line types to stdout, in this order:
23
-
24
- [START] task=<task_name> env=<benchmark> model=<model_name>
25
  [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
26
  [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
27
-
28
- Rules:
29
- - One [START] line at episode begin.
30
- - One [STEP] line per step, immediately after env.step() returns.
31
- - One [END] line after env.close(), always emitted (even on exception).
32
- - reward and rewards are formatted to 2 decimal places.
33
- - done and success are lowercase booleans: true or false.
34
- - error is the raw last_action_error string, or null if none.
35
- - All fields on a single line with no newlines within a line.
36
- - Each tasks should return score in [0, 1]
37
-
38
- Example:
39
- [START] task=click-test env=miniwob model=Qwen3-VL-30B
40
- [STEP] step=1 action=click('123') reward=0.00 done=false error=null
41
- [STEP] step=2 action=fill('456','text') reward=0.00 done=false error=null
42
- [STEP] step=3 action=click('789') reward=1.00 done=true error=null
43
- [END] success=true steps=3 score=1.00 rewards=0.00,0.00,1.00
44
  """
45
 
46
  import asyncio
 
47
  import os
48
  import textwrap
49
- from typing import List, Optional
50
 
51
  from openai import OpenAI
52
 
53
- from my_env_v4 import MyEnvV4Action, MyEnvV4Env
54
- IMAGE_NAME = os.getenv("IMAGE_NAME") # If you are using docker image
 
 
 
 
55
  API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
 
 
56
 
57
- API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
58
- MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
59
- TASK_NAME = os.getenv("MY_ENV_V4_TASK", "echo")
60
- BENCHMARK = os.getenv("MY_ENV_V4_BENCHMARK", "my_env_v4")
61
- MAX_STEPS = 8
62
- TEMPERATURE = 0.7
63
- MAX_TOKENS = 150
64
- SUCCESS_SCORE_THRESHOLD = 0.1 # normalized score in [0, 1]
65
-
66
- # Max possible reward: each token contributes 0.1, across all steps
67
- _MAX_REWARD_PER_STEP = MAX_TOKENS * 0.1
68
- MAX_TOTAL_REWARD = MAX_STEPS * _MAX_REWARD_PER_STEP
69
-
70
- SYSTEM_PROMPT = textwrap.dedent(
71
- """
72
- You are interacting with a simple echo environment.
73
- Each turn you must send a message. The environment will echo it back.
74
- Reward is proportional to message length: reward = len(message) * 0.1
75
- Your goal is to maximize total reward by sending meaningful, substantive messages.
76
- Reply with exactly one message string — no quotes, no prefixes, just the message text.
77
- """
78
- ).strip()
79
 
 
 
 
80
 
81
  def log_start(task: str, env: str, model: str) -> None:
82
  print(f"[START] task={task} env={env} model={model}", flush=True)
@@ -93,25 +58,148 @@ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[
93
 
94
  def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
95
  rewards_str = ",".join(f"{r:.2f}" for r in rewards)
96
- print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
97
-
 
 
98
 
99
- def build_user_prompt(step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
100
- history_block = "\n".join(history[-4:]) if history else "None"
101
- return textwrap.dedent(
102
- f"""
103
- Step: {step}
104
- Last echoed message: {last_echoed!r}
105
- Last reward: {last_reward:.2f}
106
- Previous steps:
107
- {history_block}
108
- Send your next message.
109
- """
110
- ).strip()
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
113
- def get_model_message(client: OpenAI, step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
114
- user_prompt = build_user_prompt(step, last_echoed, last_reward, history)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
  try:
116
  completion = client.chat.completions.create(
117
  model=MODEL_NAME,
@@ -124,66 +212,82 @@ def get_model_message(client: OpenAI, step: int, last_echoed: str, last_reward:
124
  stream=False,
125
  )
126
  text = (completion.choices[0].message.content or "").strip()
127
- return text if text else "hello"
128
  except Exception as exc:
129
- print(f"[DEBUG] Model request failed: {exc}", flush=True)
130
- return "hello"
131
-
132
 
133
- async def main() -> None:
134
- client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
135
 
136
- env = await MyEnvV4Env.from_docker_image(IMAGE_NAME)
 
 
137
 
138
- history: List[str] = []
 
139
  rewards: List[float] = []
140
  steps_taken = 0
141
  score = 0.0
142
  success = False
143
 
144
- log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
145
 
146
  try:
147
- result = await env.reset() # OpenENV.reset()
148
- last_echoed = result.observation.echoed_message
149
- last_reward = 0.0
150
 
151
  for step in range(1, MAX_STEPS + 1):
152
  if result.done:
153
  break
154
 
155
- message = get_model_message(client, step, last_echoed, last_reward, history)
156
 
157
- result = await env.step(MyEnvV4Action(message=message))
158
  obs = result.observation
159
 
160
  reward = result.reward or 0.0
161
  done = result.done
162
- error = None
163
 
164
  rewards.append(reward)
165
  steps_taken = step
166
- last_echoed = obs.echoed_message
167
- last_reward = reward
168
 
169
- log_step(step=step, action=message, reward=reward, done=done, error=error)
 
 
 
 
170
 
171
- history.append(f"Step {step}: {message!r} -> reward {reward:+.2f}")
172
 
173
  if done:
174
  break
175
 
176
- score = sum(rewards) / MAX_TOTAL_REWARD if MAX_TOTAL_REWARD > 0 else 0.0
177
- score = min(max(score, 0.0), 1.0) # clamp to [0, 1]
178
- success = score >= SUCCESS_SCORE_THRESHOLD
 
 
 
 
 
 
 
 
 
 
 
 
 
179
 
 
 
 
180
  finally:
181
  try:
182
  await env.close()
183
  except Exception as e:
184
- print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
185
- log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
186
 
187
 
188
  if __name__ == "__main__":
189
- asyncio.run(main())
 
 
1
  """
2
+ LLM-based Inference Script for Meeting Scheduling RL Environment.
3
  ===================================
4
+ Uses OpenAI-compatible LLM via HF Router to intelligently schedule meetings.
5
+
6
+ MANDATORY environment variables:
7
  API_BASE_URL The API endpoint for the LLM.
8
  MODEL_NAME The model identifier to use for inference.
9
  HF_TOKEN Your Hugging Face / API key.
 
 
 
 
 
 
 
 
 
 
10
 
11
+ STDOUT FORMAT:
12
+ [START] task=<task_name> env=scheduling_env model=<model_name>
 
 
13
  [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
14
  [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  """
16
 
17
  import asyncio
18
+ import json
19
  import os
20
  import textwrap
21
+ from typing import Dict, List, Optional
22
 
23
  from openai import OpenAI
24
 
25
+ from scheduling_env.client import SchedulingEnv
26
+ from scheduling_env.models import SchedulingAction
27
+
28
+ # ---------------------------------------------------------------------------
29
+ # Configuration
30
+ # ---------------------------------------------------------------------------
31
  API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
32
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
33
+ MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
34
 
35
+ ENV_REPO_ID = "Akshaykumarbm/scheduling_env"
36
+ BENCHMARK = "scheduling_env"
37
+ TASKS = ["task1_easy", "task2_medium", "task3_hard"]
38
+ MAX_STEPS = 20
39
+ TEMPERATURE = 0.3
40
+ MAX_TOKENS = 512
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
+ # ---------------------------------------------------------------------------
43
+ # Logging helpers
44
+ # ---------------------------------------------------------------------------
45
 
46
  def log_start(task: str, env: str, model: str) -> None:
47
  print(f"[START] task={task} env={env} model={model}", flush=True)
 
58
 
59
  def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
60
  rewards_str = ",".join(f"{r:.2f}" for r in rewards)
61
+ print(
62
+ f"[END] success={str(success).lower()} steps={steps} score={score:.2f} rewards={rewards_str}",
63
+ flush=True,
64
+ )
65
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
+ # ---------------------------------------------------------------------------
68
+ # LLM interaction
69
+ # ---------------------------------------------------------------------------
70
+
71
+ SYSTEM_PROMPT = textwrap.dedent("""\
72
+ You are an AI meeting scheduling assistant. You must schedule a meeting by choosing actions.
73
+
74
+ Available actions (respond with EXACTLY one JSON object):
75
+
76
+ 1. Propose a time slot:
77
+ {"action_type": "propose_slot", "proposed_start": "<ISO8601>", "proposed_duration": <minutes>}
78
+
79
+ 2. Reschedule a conflicting meeting (only if priority > requested priority):
80
+ {"action_type": "reschedule_meeting", "meeting_id_to_move": "<attendee>_<start_iso>", "new_start_time": "<ISO8601>"}
81
+
82
+ 3. Finalize the schedule (only when no conflicts remain):
83
+ {"action_type": "finalize"}
84
+
85
+ 4. Reject (give up):
86
+ {"action_type": "reject"}
87
+
88
+ Rules:
89
+ - Propose slots within collective working hours.
90
+ - You can only reschedule meetings with LOWER priority (higher number) than the requested meeting.
91
+ - meeting_id format is: <attendee>_<start_iso> (e.g., "user1_2025-04-07T09:00:00+00:00").
92
+ - After rescheduling all conflicts, call finalize.
93
+ - Minimize preference violations and rescheduling.
94
+ - Respond with ONLY the JSON object, no other text.
95
+ """)
96
+
97
+
98
+ def format_observation(obs, step: int) -> str:
99
+ """Convert a SchedulingObservation into a user prompt for the LLM."""
100
+ parts = [
101
+ f"Step {step}/{obs.max_steps}",
102
+ f"Meeting to schedule: {obs.requested_duration} min, priority {obs.requested_priority}",
103
+ f"Attendees: {', '.join(obs.attendee_ids)}",
104
+ f"Collective working hours: {obs.collective_work_hours.get('min_start_hour', 9)}:00 - {obs.collective_work_hours.get('max_end_hour', 17)}:00",
105
+ ]
106
+
107
+ if obs.preference_constraints:
108
+ parts.append(f"Preferences: max {obs.preference_constraints.get('max_meetings_per_day', 'N/A')} meetings/day, "
109
+ f"buffer required: {obs.preference_constraints.get('requires_buffer', False)}, "
110
+ f"buffer mins: {obs.preference_constraints.get('buffer_minutes', 0)}")
111
+
112
+ # Busy slots grouped by attendee
113
+ busy_by_attendee: Dict[str, List] = {}
114
+ for slot in obs.busy_slots:
115
+ att = slot["attendee"]
116
+ busy_by_attendee.setdefault(att, []).append(slot)
117
+
118
+ parts.append("\nCalendars:")
119
+ for att in obs.attendee_ids:
120
+ slots = busy_by_attendee.get(att, [])
121
+ if slots:
122
+ slot_strs = [
123
+ f" - {s['start']} to {s['end']} (priority {s['priority']}, {s['summary']})"
124
+ for s in sorted(slots, key=lambda x: x["start"])
125
+ ]
126
+ parts.append(f" {att}:")
127
+ parts.extend(slot_strs)
128
+ else:
129
+ parts.append(f" {att}: (no meetings)")
130
+
131
+ if obs.current_proposal:
132
+ parts.append(f"\nCurrent proposal: {obs.current_proposal['start']} to {obs.current_proposal['end']}")
133
+
134
+ if obs.conflicts:
135
+ parts.append(f"\nConflicts ({len(obs.conflicts)}):")
136
+ for c in obs.conflicts:
137
+ parts.append(
138
+ f" - {c['attendee']}: {c['start']} to {c['end']} "
139
+ f"(priority {c['priority']}, {c['summary']}, id: {c['meeting_id']})"
140
+ )
141
+
142
+ if obs.error_message:
143
+ parts.append(f"\nLast error: {obs.error_message}")
144
+
145
+ parts.append(f"\nRescheduled so far: {obs.num_rescheduled}")
146
+ parts.append(f"Preference penalty: {obs.preference_penalty}")
147
+
148
+ if not obs.current_proposal and not obs.conflicts:
149
+ parts.append("\nAction needed: propose a time slot for the meeting.")
150
+ elif obs.conflicts:
151
+ parts.append("\nAction needed: reschedule a conflict (lower-priority only) or propose a different slot.")
152
+ else:
153
+ parts.append("\nAction needed: no conflicts remain - you should finalize.")
154
+
155
+ return "\n".join(parts)
156
+
157
+
158
+ def parse_llm_response(text: str, obs) -> SchedulingAction:
159
+ """Parse LLM JSON response into a SchedulingAction, with fallback."""
160
+ # Extract JSON from response (handle markdown code blocks)
161
+ cleaned = text.strip()
162
+ if "```" in cleaned:
163
+ # Extract content between code fences
164
+ lines = cleaned.split("\n")
165
+ json_lines = []
166
+ in_block = False
167
+ for line in lines:
168
+ if line.strip().startswith("```"):
169
+ in_block = not in_block
170
+ continue
171
+ if in_block:
172
+ json_lines.append(line)
173
+ cleaned = "\n".join(json_lines).strip()
174
+
175
+ # Try to find JSON object in the response
176
+ start = cleaned.find("{")
177
+ end = cleaned.rfind("}") + 1
178
+ if start >= 0 and end > start:
179
+ cleaned = cleaned[start:end]
180
 
181
+ try:
182
+ data = json.loads(cleaned)
183
+ return SchedulingAction(**data)
184
+ except (json.JSONDecodeError, Exception) as e:
185
+ print(f"[DEBUG] Failed to parse LLM response: {e}. Response: {text[:200]}", flush=True)
186
+ # Fallback: if we have no proposal yet, propose at first available hour
187
+ if obs.current_proposal is None:
188
+ min_h = obs.collective_work_hours.get("min_start_hour", 9)
189
+ return SchedulingAction(
190
+ action_type="propose_slot",
191
+ proposed_start=f"2025-04-07T{min_h:02d}:00:00+00:00",
192
+ proposed_duration=obs.requested_duration,
193
+ )
194
+ elif not obs.conflicts:
195
+ return SchedulingAction(action_type="finalize")
196
+ else:
197
+ return SchedulingAction(action_type="reject")
198
+
199
+
200
+ def get_llm_action(client: OpenAI, obs, step: int) -> SchedulingAction:
201
+ """Query the LLM and return a SchedulingAction."""
202
+ user_prompt = format_observation(obs, step)
203
  try:
204
  completion = client.chat.completions.create(
205
  model=MODEL_NAME,
 
212
  stream=False,
213
  )
214
  text = (completion.choices[0].message.content or "").strip()
215
+ return parse_llm_response(text, obs)
216
  except Exception as exc:
217
+ print(f"[DEBUG] LLM request failed: {exc}", flush=True)
218
+ return parse_llm_response("", obs)
 
219
 
 
 
220
 
221
+ # ---------------------------------------------------------------------------
222
+ # Main loop
223
+ # ---------------------------------------------------------------------------
224
 
225
+ async def run_task(env, client: OpenAI, task_id: str) -> None:
226
+ """Run a single scheduling task."""
227
  rewards: List[float] = []
228
  steps_taken = 0
229
  score = 0.0
230
  success = False
231
 
232
+ log_start(task=task_id, env=BENCHMARK, model=MODEL_NAME)
233
 
234
  try:
235
+ result = await env.reset(task_id=task_id)
236
+ obs = result.observation
 
237
 
238
  for step in range(1, MAX_STEPS + 1):
239
  if result.done:
240
  break
241
 
242
+ action = get_llm_action(client, obs, step)
243
 
244
+ result = await env.step(action)
245
  obs = result.observation
246
 
247
  reward = result.reward or 0.0
248
  done = result.done
249
+ error = obs.error_message
250
 
251
  rewards.append(reward)
252
  steps_taken = step
 
 
253
 
254
+ action_str = action.action_type
255
+ if action.action_type == "propose_slot":
256
+ action_str = f"propose_slot({action.proposed_start},{action.proposed_duration}m)"
257
+ elif action.action_type == "reschedule_meeting":
258
+ action_str = f"reschedule({action.meeting_id_to_move}->{action.new_start_time})"
259
 
260
+ log_step(step=step, action=action_str, reward=reward, done=done, error=error)
261
 
262
  if done:
263
  break
264
 
265
+ # Score is the final reward (0.0-1.0 from calculate_final_reward)
266
+ score = rewards[-1] if rewards else 0.0
267
+ score = min(max(score, 0.0), 1.0)
268
+ success = obs.success if hasattr(obs, "success") else (score > 0.0)
269
+
270
+ except Exception as exc:
271
+ print(f"[DEBUG] Task {task_id} error: {exc}", flush=True)
272
+
273
+ finally:
274
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
275
+
276
+
277
+ async def main() -> None:
278
+ llm_client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
279
+
280
+ env = await SchedulingEnv.from_env(ENV_REPO_ID)
281
 
282
+ try:
283
+ for task_id in TASKS:
284
+ await run_task(env, llm_client, task_id)
285
  finally:
286
  try:
287
  await env.close()
288
  except Exception as e:
289
+ print(f"[DEBUG] env.close() error: {e}", flush=True)
 
290
 
291
 
292
  if __name__ == "__main__":
293
+ asyncio.run(main())
uv.lock CHANGED
@@ -1603,7 +1603,7 @@ name = "openenv-scheduling-env"
1603
  version = "0.1.0"
1604
  source = { editable = "." }
1605
  dependencies = [
1606
- { name = "huggingface-hub" },
1607
  { name = "openenv-core", extra = ["core"] },
1608
  ]
1609
 
@@ -1615,7 +1615,7 @@ dev = [
1615
 
1616
  [package.metadata]
1617
  requires-dist = [
1618
- { name = "huggingface-hub", specifier = ">=1.9.1" },
1619
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
1620
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
1621
  { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
 
1603
  version = "0.1.0"
1604
  source = { editable = "." }
1605
  dependencies = [
1606
+ { name = "openai" },
1607
  { name = "openenv-core", extra = ["core"] },
1608
  ]
1609
 
 
1615
 
1616
  [package.metadata]
1617
  requires-dist = [
1618
+ { name = "openai", specifier = ">=1.0.0" },
1619
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
1620
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
1621
  { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },