Spaces:

Akshaykumarbm
/

scheduling_env

Sleeping

App Files Files Community

scheduling_env / README.md

Akshaykumarbm

Upload folder using huggingface_hub

0f3c199 verified about 1 month ago

preview code

raw

history blame contribute delete

14.3 kB

	---
	title: Scheduling Env Environment Server
	emoji: 📅
	colorFrom: blue
	colorTo: pink
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv
	---

	# Meeting Scheduling RL Environment

	An OpenEnv reinforcement-learning environment where AI agents learn to schedule meetings optimally across multiple attendees. The agent must propose time slots, resolve calendar conflicts by rescheduling lower-priority meetings, and satisfy each participant's scheduling preferences — all within a limited number of steps.

	## Overview

	The environment simulates a realistic corporate scheduling assistant. Given a meeting request, the agent iteratively:

	1. Proposes a time slot for all required attendees.
	2. Reschedules any lower-priority conflicting meetings to free up the slot.
	3. Finalizes the booking once the slot is conflict-free.

	Each episode is scored on scheduling quality (0.0–1.0), penalizing preference violations, unnecessary rescheduling, and excessive steps.

	## Quick Start

	### Running the Heuristic Baseline (no LLM needed)

	```bash
	python inference.py
	```

	This runs a greedy baseline policy across all three tasks and prints step-by-step output in the required `[START]`/`[STEP]`/`[END]` format.

	### Using the Environment Directly (Python)

	```python
	from server.scheduling_env_environment import SchedulingEnvironment
	from models import SchedulingAction

	env = SchedulingEnvironment()

	# Reset to a specific task
	obs = env.reset(task_id="task1_easy")
	print(f"Attendees: {obs.attendee_ids}")
	print(f"Duration: {obs.requested_duration} min")
	print(f"Priority: {obs.requested_priority}")

	# Propose a time slot
	result = env.step(SchedulingAction(
	action_type="propose_slot",
	proposed_start="2025-04-07T10:00:00+00:00",
	proposed_duration=30,
	))
	print(f"Conflicts: {result.conflicts}")
	print(f"Reward: {result.reward}")

	# Finalize when conflict-free
	result = env.step(SchedulingAction(action_type="finalize"))
	print(f"Success: {result.success} Final score: {result.reward:.2f}")
	```

	### Using the HTTP Client

	```python
	from client import SchedulingEnv
	from models import SchedulingAction

	with SchedulingEnv(base_url="http://localhost:8000") as env:
	result = env.reset(task_id="task2_medium")
	obs = result.observation

	# Propose a slot
	result = env.step(SchedulingAction(
	action_type="propose_slot",
	proposed_start="2025-04-07T11:00:00+00:00",
	proposed_duration=60,
	))

	# Reschedule a conflicting lower-priority meeting
	if result.observation.conflicts:
	conflict = result.observation.conflicts[0]
	result = env.step(SchedulingAction(
	action_type="reschedule_meeting",
	meeting_id_to_move=conflict["meeting_id"],
	new_start_time="2025-04-07T07:00:00+00:00",
	))

	# Finalize
	result = env.step(SchedulingAction(action_type="finalize"))
	print(f"Score: {result.reward:.2f}")
	```

	## Environment Details

	### Actions (`SchedulingAction`)

	\| `action_type` \| Required fields \| Description \|
	\|----------------------\|----------------------------------------------\|-----------------------------------------------------------\|
	\| `propose_slot` \| `proposed_start`, `proposed_duration` \| Propose a meeting start time (ISO 8601) and duration (min)\|
	\| `reschedule_meeting` \| `meeting_id_to_move`, `new_start_time` \| Move a lower-priority conflict to a new time \|
	\| `finalize` \| _(none)_ \| Confirm the proposed slot; ends the episode \|
	\| `reject` \| _(none)_ \| Give up on scheduling; ends the episode with 0 reward \|

	Meeting ID format: `{attendee}_{start_iso}` — e.g. `user1_2025-04-07T09:00:00+00:00`

	### Observations (`SchedulingObservation`)

	\| Field \| Type \| Description \|
	\|-------------------------\|-------------------------\|--------------------------------------------------------------\|
	\| `requested_duration` \| `int` \| Meeting duration in minutes \|
	\| `requested_priority` \| `int` \| Priority of the new meeting (1 = highest, 5 = lowest) \|
	\| `attendee_ids` \| `List[str]` \| Required attendees \|
	\| `busy_slots` \| `List[dict]` \| All existing calendar entries for attendees \|
	\| `collective_work_hours` \| `dict` \| Shared working-hours window `{min_start_hour, max_end_hour}` \|
	\| `preference_constraints`\| `dict` \| Aggregated constraints (max meetings/day, buffer, etc.) \|
	\| `current_proposal` \| `dict \\| None` \| Currently proposed slot `{start, end}` \|
	\| `conflicts` \| `List[dict]` \| Conflicts for the current proposal \|
	\| `preference_penalty` \| `float` \| Accumulated preference-violation penalty \|
	\| `num_rescheduled` \| `int` \| Meetings rescheduled so far in this episode \|
	\| `steps_taken` \| `int` \| Steps used so far \|
	\| `max_steps` \| `int` \| Episode step limit (20) \|
	\| `success` \| `bool` \| `True` when the meeting is successfully booked \|
	\| `error_message` \| `str \\| None` \| Reason if the last action was invalid \|
	\| `done` \| `bool` \| `True` when the episode has ended \|
	\| `reward` \| `float` \| Step or final reward \|

	### Reward Design

	Step-level rewards (returned after each `propose_slot` or `reschedule_meeting`):

	\| Outcome \| Reward \|
	\|------------------------------------------\|--------\|
	\| Conflict-free proposal (low penalty) \| +0.5 \|
	\| Proposal has reschedulable conflicts \| +0.2 \|
	\| Proposal has non-reschedulable conflicts \| −0.3 \|
	\| Invalid action \| −0.1 \|
	\| Outside working hours \| −0.2 \|

	Final reward (returned on `finalize`) — deducted from 1.0:

	```
	preference_deduction = min(0.75, (penalty ** 1.2) / 200.0)
	reschedule_deduction = min(0.30, 0.05 * (1.8 ** num_rescheduled)) [if any rescheduled]
	time_deduction = steps_taken * 0.015

	final_reward = clamp(1.0 - preference_deduction - reschedule_deduction - time_deduction, 0.0, 1.0)
	```

	Timeout (step 20 reached without `finalize`) gives partial credit: 70 % of the theoretical reward if conflict-free, or a progress-based fraction otherwise.

	## Tasks

	Three tasks of increasing difficulty are provided as JSON scenarios in `server/scenarios/`:

	\| Task ID \| Difficulty \| Attendees \| Duration \| Priority \| Rescheduling needed \| Expected score \|
	\|-----------------\|------------\|-----------\|----------\|----------\|---------------------\|----------------\|
	\| `task1_easy` \| Easy \| 2 \| 30 min \| 3 \| No \| 0.8 – 1.0 \|
	\| `task2_medium` \| Medium \| 4 \| 60 min \| 2 \| Yes (1 meeting) \| 0.5 – 0.7 \|
	\| `task3_hard` \| Hard \| 6 \| 45 min \| 2 \| Yes (3+ meetings) \| 0.25 – 0.45 \|

	### task1_easy — Team Sync (2 attendees)

	- Two attendees each have 2 existing meetings; a clear free slot exists at 10:00.
	- Agent should find the free slot and finalize in 2 steps.
	- No rescheduling required.

	### task2_medium — Cross-Team Planning (4 attendees)

	- Four attendees with densely packed schedules; the optimal slot at 11:00 has one low-priority conflict (`user3` Coffee chat, priority 4).
	- Agent needs to propose the slot, reschedule the conflict, then finalize.
	- User preferences include back-to-back avoidance and different preferred-hour windows.

	### task3_hard — Executive Planning Session (6 attendees)

	- Six attendees with very dense calendars; the best window at 15:00 requires rescheduling three low-priority meetings (priority 4).
	- Multiple valid solutions exist; the agent must navigate cascading constraints.
	- All attendees have strict buffer requirements and narrow preferred-hour windows.

	## Participant Preferences

	Each attendee can have the following preferences (stored in scenario JSON and observed via `preference_constraints`):

	\| Preference \| Description \| Penalty for violation \|
	\|------------------------\|-----------------------------------------------------\|-----------------------\|
	\| `preferred_hours` \| `{start: H, end: H}` — preferred working hours \| +50 per participant \|
	\| `max_meetings_per_day` \| Maximum meetings the participant wants in a day \| +30 per participant \|
	\| `avoid_back_to_back` \| Whether a buffer gap is required between meetings \| +20 per participant \|
	\| `buffer_minutes` \| Gap required before/after a meeting (if avoid_btb) \| (part of above) \|

	The collective working hours (the intersection of all attendees' preferred hours) define the hard constraint window within which proposals must fall.

	## API Endpoints

	The server exposes the following HTTP endpoints (also available via the Web UI at `/web`):

	\| Method \| Path \| Description \|
	\|--------\|-----------\|--------------------------------------------------------------------\|
	\| POST \| `/reset` \| Start a new episode. Body: `{"task_id": "task1_easy"}` \|
	\| POST \| `/step` \| Take an action. Body: `{"action_type": "...", ...action fields}` \|
	\| GET \| `/state` \| Return the full internal `SchedulingState` \|
	\| GET \| `/health` \| Health check — returns `{"status": "healthy"}` \|
	\| GET \| `/docs` \| Interactive OpenAPI / Swagger UI \|

	### Example: REST interaction

	```bash
	# Start episode
	curl -X POST http://localhost:8000/reset \
	-H "Content-Type: application/json" \
	-d '{"task_id": "task1_easy"}'

	# Propose a slot
	curl -X POST http://localhost:8000/step \
	-H "Content-Type: application/json" \
	-d '{"action_type": "propose_slot", "proposed_start": "2025-04-07T10:00:00+00:00", "proposed_duration": 30}'

	# Finalize
	curl -X POST http://localhost:8000/step \
	-H "Content-Type: application/json" \
	-d '{"action_type": "finalize"}'
	```

	## Development & Testing

	### Run the baseline inference script

	```bash
	python inference.py
	```

	### Start the server locally

	```bash
	uvicorn server.app:app --reload
	```

	### Validate the environment (required before submission)

	```bash
	openenv validate
	```

	### Generate / update the lock file

	```bash
	uv lock
	```

	### Build the Docker image

	```bash
	docker build -t scheduling_env:latest .
	```

	## Deploying to Hugging Face Spaces

	```bash
	# From the project root (where openenv.yaml is located)
	openenv push

	# Push to a specific repository
	openenv push --repo-id my-org/my-scheduling-env

	# Push as a private space
	openenv push --private
	```

	The `openenv push` command validates the environment, builds a Hugging Face-compatible Docker image, and uploads it. After deployment your space is available at:

	```
	https://huggingface.co/spaces/<repo-id>
	```

	The deployed space includes:
	- Web Interface at `/web` — interactive UI for exploring the environment
	- API Documentation at `/docs` — full OpenAPI / Swagger interface
	- Health Check at `/health` — container health monitoring

	### Options

	\| Flag \| Description \|
	\|------\|-------------\|
	\| `--directory`, `-d` \| Directory with `openenv.yaml` (default: current dir) \|
	\| `--repo-id`, `-r` \| Repository ID `username/repo-name` \|
	\| `--base-image`, `-b` \| Override Dockerfile `FROM` image \|
	\| `--private` \| Deploy as a private space (default: public) \|

	## Environment Variables (for LLM-based inference)

	Create a `.env` file (never commit it):

	```
	API_BASE_URL=https://router.huggingface.co/v1 # HF Router endpoint
	MODEL_NAME=Qwen/Qwen2.5-72B-Instruct # Model identifier
	HF_TOKEN=hf_... # Hugging Face API key
	```

	## Project Structure

	```
	rl-scheduling-env/
	├── Dockerfile # Container image (root, required by openenv)
	├── README.md # This file
	├── openenv.yaml # OpenEnv manifest
	├── pyproject.toml # Project metadata and dependencies
	├── uv.lock # Locked dependencies (generated by `uv lock`)
	├── __init__.py # Package exports
	├── models.py # Pydantic models: SchedulingAction,
	│ # SchedulingObservation, SchedulingState
	├── client.py # SchedulingEnv HTTP/WebSocket client
	├── inference.py # Heuristic baseline (no LLM required)
	└── server/
	├── __init__.py # Server package exports
	├── app.py # FastAPI app + SchedulingHTTPEnvServer
	├── scheduling_env_environment.py # Core RL environment (reset / step / state)
	├── scheduling_logic.py # Pure utility functions (conflict detection,
	│ # preference scoring, reward calculation)
	├── graders.py # SchedulingGrader (0.0–1.0 episode scorer)
	├── requirements.txt # Server-side Python dependencies
	└── scenarios/
	├── task1_easy.json # Easy: 2 attendees, free slot exists
	├── task2_medium.json # Medium: 4 attendees, 1 rescheduling needed
	└── task3_hard.json # Hard: 6 attendees, 3+ reschedulings needed
	```