Spaces:

Akshaykumarbm
/

scheduling_env

Sleeping

App Files Files Community

Akshaykumarbm commited on Apr 8

Commit

7bdbe90

verified ·

1 Parent(s): bf30e08

Upload folder using huggingface_hub

Browse files

Files changed (33) hide show

CLAUDE.md +91 -0
Dockerfile +81 -0
README.md +248 -3
__init__.py +17 -0
client.py +51 -0
docs/2026-04-07-153534-local-command-caveatcaveat-the-messages-below.txt +0 -0
docs/ENV_LEARNINGS.md +368 -0
docs/HACKATHON_META.md +324 -0
docs/hackathon-guide-rl-environments.md +228 -0
docs/superpowers/specs/2025-04-06-scheduling-env-design.md +2068 -0
inference.py +198 -0
models.py +156 -0
openenv.yaml +7 -0
openenv_scheduling_env.egg-info/PKG-INFO +10 -0
openenv_scheduling_env.egg-info/SOURCES.txt +19 -0
openenv_scheduling_env.egg-info/dependency_links.txt +1 -0
openenv_scheduling_env.egg-info/entry_points.txt +2 -0
openenv_scheduling_env.egg-info/requires.txt +6 -0
openenv_scheduling_env.egg-info/top_level.txt +1 -0
pyproject.toml +46 -0
sample_infrenae.py +189 -0
server/Dockerfile +80 -0
server/__init__.py +11 -0
server/app.py +159 -0
server/graders.py +87 -0
server/requirements.txt +6 -0
server/scenario_generator.py +403 -0
server/scenarios/task1_easy.json +41 -0
server/scenarios/task2_medium.json +70 -0
server/scenarios/task3_hard.json +108 -0
server/scheduling_env_environment.py +476 -0
server/scheduling_logic.py +324 -0
uv.lock +0 -0

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,91 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Repository Purpose
+OpenEnv RL environment for the **Meta OpenEnv Hackathon**. Implements an intelligent meeting scheduling environment where AI agents learn to schedule meetings across multiple attendees by proposing time slots, rescheduling lower-priority conflicts, and balancing participant preferences.
+## Development Commands
+```bash
+# Run baseline inference (heuristic, no LLM needed)
+python inference.py
+# Start server locally
+uvicorn server.app:app --reload
+# Validate environment for submission
+openenv validate
+# Generate/update lock file (required by validator)
+uv lock
+# Deploy to Hugging Face Spaces
+openenv push
+# Build Docker image (Dockerfile must be in root)
+docker build -t scheduling_env:latest .
+```
+## Architecture
+### OpenEnv Interface (client-server pattern)
+The environment follows OpenEnv's standard API:
+- **`POST /reset`** — starts a new episode, accepts `{"task_id": "task1_easy"}`. Returns observation.
+- **`POST /step`** — takes an action, returns observation with reward/done.
+- **`GET /state`** — returns internal environment state.
+- **`GET /health`** — health check.
+### Core Flow
+`server/app.py` creates a `SchedulingHTTPEnvServer` (subclasses `HTTPEnvServer`) that wraps a persistent `SchedulingEnvironment` instance. The server registers custom `/reset`, `/step`, `/state` routes.
+`server/scheduling_env_environment.py` — Main environment class implementing `Environment`. Loads JSON scenarios from `server/scenarios/`, processes 4 action types: `propose_slot`, `reschedule_meeting`, `finalize`, `reject`. Episode ends on `finalize`, `reject`, or timeout (20 steps).
+`server/scheduling_logic.py` — Pure utility functions: conflict detection, preference scoring, reward calculation, free-slot search. All datetime handling uses timezone-aware ISO 8601 strings. Calendar format: `Dict[str, List[List]]` where each entry is `[start_iso, end_iso, priority_int, summary_str]`.
+`models.py` — Pydantic models (`SchedulingAction`, `SchedulingObservation`, `SchedulingState`) imported by both server and client.
+`client.py` — `SchedulingEnv` extends `EnvClient` for WebSocket-based interaction.
+`inference.py` — Heuristic baseline (no LLM). Greedy free-slot search + lowest-priority rescheduling. Must emit `[START]`/`[STEP]`/`[END]` stdout format.
+### Reward Design
+Reward is multi-component, deducted from 1.0 (see `calculate_final_reward` in `scheduling_logic.py`):
+- Preference penalty: violations of preferred hours (+50), max meetings/day (+30), back-to-back (+20)
+- Rescheduling deduction: exponential penalty per meeting moved
+- Time deduction: 0.015 per step taken
+Step-level rewards: +0.5 (conflict-free proposal), +0.2 (reschedulable conflicts), -0.3 (non-reschedulable conflicts), -0.1/-0.2 (invalid actions).
+### Tasks (3 difficulty levels)
+JSON scenarios in `server/scenarios/`:
+- **task1_easy** — 2 attendees, free slot exists, no rescheduling needed. Expected score: 0.8–1.0
+- **task2_medium** — 3 attendees, requires 1 rescheduling. Expected score: 0.5–0.8
+- **task3_hard** — 4 attendees, multiple overlapping conflicts, cascading rescheduling. Expected score: 0.2–0.6
+### Key Constraint: Meeting IDs
+Format is `{attendee}_{start_iso}` (e.g., `user1_2025-04-07T09:00:00+00:00`). Used by `_find_meeting()` to look up calendar entries for rescheduling.
+## Hackathon Submission Requirements
+- `openenv validate` must pass
+- Dockerfile in root directory (not `/server`)
+- `inference.py` in root, uses `[START]`/`[STEP]`/`[END]` stdout format
+- 3+ tasks with graders scoring 0.0–1.0 with diverse scores
+- Runtime < 20 minutes on vcpu=2, memory=8GB
+- Deploy via `openenv push` to HF Spaces
+## Environment Variables (for LLM-based inference)
+Defined in `.env` (never commit):
+```
+API_BASE_URL    # HF Router endpoint (default: https://router.huggingface.co/v1)
+MODEL_NAME      # Model identifier (default: Qwen/Qwen2.5-72B-Instruct)
+HF_TOKEN        # Hugging Face API key
+```

Dockerfile ADDED Viewed

	@@ -0,0 +1,81 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=scheduling_env
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

README.md CHANGED Viewed

@@ -1,10 +1,255 @@
 ---
-title: Scheduling Env
-emoji: 📉
 colorFrom: blue
 colorTo: pink
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Scheduling Env Environment Server
+emoji: 🏏
 colorFrom: blue
 colorTo: pink
 sdk: docker
 pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
 ---
+# Scheduling Env Environment
+A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
+## Quick Start
+The simplest way to use the Scheduling Env environment is through the `SchedulingEnv` class:
+```python
+from scheduling_env import SchedulingAction, SchedulingEnv
+try:
+    # Create environment from Docker image
+    scheduling_envenv = SchedulingEnv.from_docker_image("scheduling_env-env:latest")
+    # Reset
+    result = scheduling_envenv.reset()
+    print(f"Reset: {result.observation.echoed_message}")
+    # Send multiple messages
+    messages = ["Hello, World!", "Testing echo", "Final message"]
+    for msg in messages:
+        result = scheduling_envenv.step(SchedulingAction(message=msg))
+        print(f"Sent: '{msg}'")
+        print(f"  → Echoed: '{result.observation.echoed_message}'")
+        print(f"  → Length: {result.observation.message_length}")
+        print(f"  → Reward: {result.reward}")
+finally:
+    # Always clean up
+    scheduling_envenv.close()
+```
+That's it! The `SchedulingEnv.from_docker_image()` method handles:
+- Starting the Docker container
+- Waiting for the server to be ready
+- Connecting to the environment
+- Container cleanup when you call `close()`
+## Building the Docker Image
+Before using the environment, you need to build the Docker image:
+```bash
+# From project root
+docker build -t scheduling_env-env:latest -f server/Dockerfile .
+```
+## Deploying to Hugging Face Spaces
+You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
+```bash
+# From the environment directory (where openenv.yaml is located)
+openenv push
+# Or specify options
+openenv push --namespace my-org --private
+```
+The `openenv push` command will:
+1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
+2. Prepare a custom build for Hugging Face Docker space (enables web interface)
+3. Upload to Hugging Face (ensuring you're logged in)
+### Prerequisites
+- Authenticate with Hugging Face: The command will prompt for login if not already authenticated
+### Options
+- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
+- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
+- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
+- `--private`: Deploy the space as private (default: public)
+### Examples
+```bash
+# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
+openenv push
+# Push to a specific repository
+openenv push --repo-id my-org/my-env
+# Push with a custom base image
+openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
+# Push as a private space
+openenv push --private
+# Combine options
+openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
+```
+After deployment, your space will be available at:
+`https://huggingface.co/spaces/<repo-id>`
+The deployed space includes:
+- **Web Interface** at `/web` - Interactive UI for exploring the environment
+- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
+- **Health Check** at `/health` - Container health monitoring
+- **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
+## Environment Details
+### Action
+**SchedulingAction**: Contains a single field
+- `message` (str) - The message to echo back
+### Observation
+**SchedulingObservation**: Contains the echo response and metadata
+- `echoed_message` (str) - The message echoed back
+- `message_length` (int) - Length of the message
+- `reward` (float) - Reward based on message length (length × 0.1)
+- `done` (bool) - Always False for echo environment
+- `metadata` (dict) - Additional info like step count
+### Reward
+The reward is calculated as: `message_length × 0.1`
+- "Hi" → reward: 0.2
+- "Hello, World!" → reward: 1.3
+- Empty message → reward: 0.0
+## Advanced Usage
+### Connecting to an Existing Server
+If you already have a Scheduling Env environment server running, you can connect directly:
+```python
+from scheduling_env import SchedulingEnv
+# Connect to existing server
+scheduling_envenv = SchedulingEnv(base_url="<ENV_HTTP_URL_HERE>")
+# Use as normal
+result = scheduling_envenv.reset()
+result = scheduling_envenv.step(SchedulingAction(message="Hello!"))
+```
+Note: When connecting to an existing server, `scheduling_envenv.close()` will NOT stop the server.
+### Using the Context Manager
+The client supports context manager usage for automatic connection management:
+```python
+from scheduling_env import SchedulingAction, SchedulingEnv
+# Connect with context manager (auto-connects and closes)
+with SchedulingEnv(base_url="http://localhost:8000") as env:
+    result = env.reset()
+    print(f"Reset: {result.observation.echoed_message}")
+    # Multiple steps with low latency
+    for msg in ["Hello", "World", "!"]:
+        result = env.step(SchedulingAction(message=msg))
+        print(f"Echoed: {result.observation.echoed_message}")
+```
+The client uses WebSocket connections for:
+- **Lower latency**: No HTTP connection overhead per request
+- **Persistent session**: Server maintains your environment state
+- **Efficient for episodes**: Better for many sequential steps
+### Concurrent WebSocket Sessions
+The server supports multiple concurrent WebSocket connections. To enable this,
+modify `server/app.py` to use factory mode:
+```python
+# In server/app.py - use factory mode for concurrent sessions
+app = create_app(
+    SchedulingEnvironment,  # Pass class, not instance
+    SchedulingAction,
+    SchedulingObservation,
+    max_concurrent_envs=4,  # Allow 4 concurrent sessions
+)
+```
+Then multiple clients can connect simultaneously:
+```python
+from scheduling_env import SchedulingAction, SchedulingEnv
+from concurrent.futures import ThreadPoolExecutor
+def run_episode(client_id: int):
+    with SchedulingEnv(base_url="http://localhost:8000") as env:
+        result = env.reset()
+        for i in range(10):
+            result = env.step(SchedulingAction(message=f"Client {client_id}, step {i}"))
+        return client_id, result.observation.message_length
+# Run 4 episodes concurrently
+with ThreadPoolExecutor(max_workers=4) as executor:
+    results = list(executor.map(run_episode, range(4)))
+```
+## Development & Testing
+### Direct Environment Testing
+Test the environment logic directly without starting the HTTP server:
+```bash
+# From the server directory
+python3 server/scheduling_env_environment.py
+```
+This verifies that:
+- Environment resets correctly
+- Step executes actions properly
+- State tracking works
+- Rewards are calculated correctly
+### Running Locally
+Run the server locally for development:
+```bash
+uvicorn server.app:app --reload
+```
+## Project Structure
+```
+scheduling_env/
+├── .dockerignore         # Docker build exclusions
+├── __init__.py            # Module exports
+├── README.md              # This file
+├── openenv.yaml           # OpenEnv manifest
+├── pyproject.toml         # Project metadata and dependencies
+├── uv.lock                # Locked dependencies (generated)
+├── client.py              # SchedulingEnv client
+├── models.py              # Action and Observation models
+└── server/
+    ├── __init__.py        # Server module exports
+    ├── scheduling_env_environment.py  # Core environment logic
+    ├── app.py             # FastAPI application (HTTP + WebSocket endpoints)
+    └── Dockerfile         # Container image definition
+```

__init__.py ADDED Viewed

	@@ -0,0 +1,17 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Scheduling Env Environment."""
+from .client import SchedulingEnv
+from .models import SchedulingAction, SchedulingObservation, SchedulingState
+__all__ = [
+    "SchedulingAction",
+    "SchedulingObservation",
+    "SchedulingState",
+    "SchedulingEnv",
+]

client.py ADDED Viewed

	@@ -0,0 +1,51 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Scheduling Environment Client."""
+from __future__ import annotations
+from typing import Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from .models import SchedulingAction, SchedulingObservation, SchedulingState
+class SchedulingEnv(
+    EnvClient[SchedulingAction, SchedulingObservation, SchedulingState]
+):
+    """Client for the Meeting Scheduling RL Environment.
+    Maintains a persistent WebSocket connection to the environment server.
+    Example::
+        with SchedulingEnv(base_url="http://localhost:8000") as client:
+            result = client.reset(task_id="task1_easy")
+            obs = result.observation
+            result = client.step(SchedulingAction(
+                action_type="propose_slot",
+                proposed_start="2025-04-07T10:00:00+00:00",
+                proposed_duration=30,
+            ))
+    """
+    def _step_payload(self, action: SchedulingAction) -> Dict:
+        return action.model_dump(exclude_none=True)
+    def _parse_result(self, payload: Dict) -> StepResult[SchedulingObservation]:
+        obs_data = payload.get("observation", payload)
+        observation = SchedulingObservation(**obs_data)
+        return StepResult(
+            observation=observation,
+            reward=observation.reward,
+            done=observation.done,
+        )
+    def _parse_state(self, payload: Dict) -> SchedulingState:
+        return SchedulingState(**payload)

docs/2026-04-07-153534-local-command-caveatcaveat-the-messages-below.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

docs/ENV_LEARNINGS.md ADDED Viewed

	@@ -0,0 +1,368 @@

+# OpenEnv Environment Research - Key Learnings
+Research conducted on 5 top OpenEnv environments to inform hackathon project development.
+## Executive Summary
+| Environment | Domain | Key Strength | Best For Learning |
+|-------------|--------|--------------|-------------------|
+| **calendar_env** | Calendar Management | Generic MCP wrapper architecture | Multi-tenant systems, database-backed tasks |
+| **reasoning_gym_env** | Reasoning Tasks | Minimal, single-step episodes | Simple task structures, dataset integration |
+| **tbench2_env** | Terminal/Tool Use | Dual execution modes (local/docker) | Tool benchmarking, session management |
+| **carla_env** | Autonomous Driving | Scenario-based design | Complex simulations, ethical dilemmas |
+| **repl_env** | Code Execution | Recursive LLM architecture | Interactive environments, reward shaping |
+---
+## 1. Calendar Environment (calendar_env)
+### Architecture Highlights
+- **Generic MCP Wrapper**: Fully reusable `openenv_wrapper/` for any MCP server
+- **Multi-Tenancy**: SQLite per agent via `x-database-id` header
+- **Rich Database Schema**: Google Calendar API v3 compliant models
+### Action/Observation Pattern
+```python
+# Action
+class MCPAction(Action):
+    action_type: Literal["ListToolsAction", "ToolCallAction"]
+    tool_name: Optional[str]
+    arguments: Optional[Dict]
+# Observation
+class MCPObservation(Observation):
+    success: bool
+    error_message: Optional[str]
+    tool_result: Optional[Dict]
+    reward: Optional[float]
+    done: bool
+```
+### Task Definition Pattern
+- **JSON Scenarios**: Version-controlled task definitions
+- **SQL Verifiers**: Programmatic graders checking database state
+- **3 Verifier Types**: database_state, response_check, tool_execution
+### Reward Design
+- Sparse binary rewards: +1.0 (success), -0.5 (error)
+- ListToolsAction: +0.1 (discovery reward)
+- Status code based with metadata for flexibility
+### Worth Copying
+1. **Generic wrapper architecture** - Copy `openenv_wrapper/` for new MCPs
+2. **Session manager pattern** - Multi-tenant database isolation
+3. **Verifier-driven tasks** - No code changes for new tasks
+4. **Config-driven tool discovery** - Dynamic tool handlers via importlib
+---
+## 2. Reasoning Gym Environment (reasoning_gym_env)
+### Architecture Highlights
+- **Minimal footprint**: ~200 lines core logic
+- **Single-step episodes**: reset() → step() → done
+- **Dataset persistence**: Reuse datasets across resets
+### Action/Observation Pattern
+```python
+# Action
+class ReasoningGymAction(Action):
+    answer: str  # Agent's answer
+# Observation
+class ReasoningGymObservation(Observation):
+    question: Optional[str]      # Only in reset()
+    score: Optional[float]       # Only after step()
+    correct_answer: Optional[str]
+    done: bool
+```
+### Task Definition Pattern
+- **External library**: `reasoning_gym` handles generation + scoring
+- **Simple datasets**: Single task type (leg_counting, reverse_sort, etc.)
+- **Composite datasets**: Mix multiple tasks with weights
+### Reward Design
+- **Binary/partial**: Depends on dataset scoring function
+- **Terminal only**: reward=0.0 on reset, actual score after step()
+- **Single-step**: No trajectory rewards
+### Worth Copying
+1. **Iterator pattern** - Seamless dataset cycling with StopIteration handling
+2. **Parameter idempotency** - reset() continues, reset(seed=...) restarts
+3. **Dataset caching** - Compare config to avoid rebuilding
+4. **Minimal state** - Just episode_id and step_count
+---
+## 3. TB2 Environment (tbench2_env)
+### Architecture Highlights
+- **Dual execution modes**: Local (CAMEL toolkit) vs Docker (TB2 fidelity)
+- **Session management**: Streaming process support via session_id
+- **Task auto-discovery**: Download from GitHub + cache locally
+### Action/Observation Pattern
+```python
+# Action
+class Tbench2Action(Action):
+    action_type: str  # exec, write, view, wait, kill, evaluate, etc.
+    command: str
+    session_id: Optional[str]
+    block: bool = True
+# Observation
+class Tbench2Observation(Observation):
+    instruction: str
+    output: str
+    success: bool
+    error: str
+    reward: Optional[float]  # Only on evaluate
+    done: bool              # Only on evaluate
+```
+### Task Definition Pattern
+- **TOML-based**: `task.toml` with environment + verifier config
+- **Pytest graders**: Each task has tests/ directory
+- **External benchmark**: Terminal-Bench 2 suite
+### Reward Design
+- **Binary**: 1.0 if all pytest tests pass, 0.0 otherwise
+- **Terminal only**: reward=None until evaluate action
+- **Exit code parsing**: `__TB2_EXIT_CODE__:$?` marker pattern
+### Worth Copying
+1. **Dual mode pattern** - Local + Docker execution with env var switching
+2. **Lazy dependency loading** - Import errors surface only when used
+3. **Docker-in-Docker safe** - Tar streaming instead of bind mounts
+4. **Session isolation** - Unique working directories per episode_id
+5. **Metadata-driven discovery** - Tasks self-describe requirements
+---
+## 4. CARLA Environment (carla_env)
+### Architecture Highlights
+- **Scenario system**: BaseScenario ABC with composable tasks
+- **Rubric factory**: Auto-select reward function by scenario type
+- **Mock mode**: Test without GPU/CARLA
+- **GPU-accelerated**: T4 16GB minimum for real mode
+### Action/Observation Pattern
+```python
+# Action
+class CarlaAction(Action):
+    action_type: str  # observe, control, navigate, capture_image, etc.
+    throttle: Optional[float]  # [0, 1] with Pydantic validation
+    steer: Optional[float]     # [-1, 1]
+    brake: Optional[float]     # [0, 1]
+# Observation
+class CarlaObservation(Observation):
+    scene_description: str
+    vehicle_state: Dict  # speed, location, rotation
+    collision_detected: bool
+    nearby_actors: List[Dict]
+    camera_images: Optional[Dict]
+    rubric_reward: float
+```
+### Task Definition Pattern
+- **9 Trolley scenarios**: Ethical dilemmas with expected outcomes
+- **Navigation tasks**: Maze (goal-directed), Free-roam (open-world)
+- **JSON externalized**: Benchmark definitions separate from code
+### Reward Design
+- **Trajectory-based (Trolley)**: r_t = 0.0 until terminal, then gamma-discounted final
+- **Step-level (Navigation)**: Progress + arrival bonus - collision penalty - time cost
+- **Scenario-specific**: compute_outcome() owns scoring logic
+### Worth Copying
+1. **Scenario ABC** - Each task owns physics + scoring independently
+2. **Rubric factory** - Auto-select reward function by task type
+3. **Dual mode** - Mock for testing, real for evaluation
+4. **Layered config** - Common + scenario-specific fields
+5. **JSON externalization** - Decouple task data from code
+---
+## 5. REPL Environment (repl_env)
+### Architecture Highlights
+- **Layered design**: Environment → Runner → Backend separation
+- **Recursive LLM**: Depth-limited child spawning with RLM pattern
+- **Composable rubrics**: Outcome + process rewards
+- **Thread-safe batching**: Multiple concurrent child queries
+### Action/Observation Pattern
+```python
+# Action
+class REPLAction(Action):
+    code: str
+    is_final: bool = False
+    final_answer: Optional[str] = None
+# Observation
+class REPLObservation(Observation):
+    result: CodeBlockResult  # stdout, stderr, locals_snapshot
+    available_variables: List[str]
+    iteration: int
+    done: bool
+    reward: float
+```
+### Task Definition Pattern
+- **Rubric-driven**: Ground truth passed at reset()
+- **Multiple finalization patterns**: FINAL(), FINAL_VAR(), dict with ready flag
+- **External graders**: CustomMetricRubric for user-provided scoring
+### Reward Design
+- **Composable**: REPLRubric = outcome + process
+- **Outcome (terminal)**: ExactMatch, FuzzyMatch, or CustomMetric
+- **Process (per-step)**: +success_reward, -error_penalty
+- **Failure**: -failure_reward if max_iterations without answer
+### Worth Copying
+1. **Composable rubrics** - outcome + process separation
+2. **Recursive backend** - Protocol-based with depth limits
+3. **Message-based loop** - Explicit iteration with timeout checks
+4. **Variable snapshots** - Serialize namespace state
+5. **Dual API** - Sync + async with same models
+6. **Cooperative timeout** - perf_counter() checks, not interrupts
+7. **Injected helpers** - llm_query, rlm_query available in namespace
+---
+## Cross-Cutting Patterns
+### 1. Pydantic Models Everywhere
+All environments use Pydantic BaseModel for:
+- Type safety + validation
+- JSON serialization
+- OpenAPI schema generation
+- Field descriptions for documentation
+### 2. FastAPI App Factory
+```python
+from openenv.core.env_server.http_server import create_app
+app = create_app(
+    MyEnvironment,
+    MyAction,
+    MyObservation,
+    env_name="my_env",
+    max_concurrent_envs=1,
+)
+```
+### 3. Client-Server Separation
+- Server: Implements Environment[Action, Observation, State]
+- Client: EnvClient[Action, Observation, State] wraps HTTP/WebSocket
+- Local variants for in-process testing
+### 4. Episode State Management
+```python
+class State(BaseModel):
+    episode_id: str        # UUID per episode
+    step_count: int        # Actions taken
+    # Environment-specific metrics
+```
+### 5. Metadata for Flexibility
+- Actions have optional `metadata: Dict[str, Any]`
+- Observations include `metadata` for extra context
+- Enables custom reward signals without model changes
+### 6. Docker + openenv.yaml
+```yaml
+spec_version: 1
+name: my_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000
+```
+### 7. Concurrent Sessions Support
+```python
+class MyEnvironment(Environment):
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+```
+---
+## Recommendations for Hackathon Project
+### Use calendar_env approach if:
+- Building database-backed environment (customer support, data cleaning)
+- Need multi-agent evaluation isolation
+- Want reusable wrapper for other MCPs
+### Use reasoning_gym_env approach if:
+- Simple single-step tasks (email triage, classification)
+- Dataset-based evaluation
+- Minimal code complexity desired
+### Use tbench2_env approach if:
+- Tool use benchmarking (API integration, CLI tools)
+- Need Docker isolation
+- Session-based interaction required
+### Use carla_env approach if:
+- Complex simulation with physics
+- Scenario-based curriculum learning
+- Trajectory-based rewards
+### Use repl_env approach if:
+- Code execution environment
+- Recursive reasoning needed
+- Composable reward functions
+---
+## Quick Start Checklist
+For your hackathon environment, ensure:
+- [ ] **3+ tasks with graders** returning scores 0.0-1.0
+- [ ] **Pydantic models** for Action, Observation, State
+- [ ] **openenv.yaml** with correct metadata
+- [ ] **inference.py** in root (uses HF Router, not OpenAI)
+- [ ] **STDOUT logging** with [START], [STEP], [END] format
+- [ ] **Dockerfile** in root directory (not /server)
+- [ ] **Meaningful rewards** that distinguish performance levels
+- [ ] **Real-world task** with genuine value
+- [ ] **< 20 min runtime** on vcpu=2, memory=8GB
+- [ ] **Passes `openenv validate`**
+---
+## Key Files to Reference
+### For Implementation Patterns:
+- `calendar_env/server/openenv_wrapper/mcp_env_environment.py` - Generic wrapper
+- `reasoning_gym_env/server/reasoning_gym_environment.py` - Minimal implementation
+- `tbench2_env/server/tbench2_env_environment.py` - Session management
+- `carla_env/server/benchmark_scenarios/base.py` - Scenario ABC
+- `repl_env/rubrics.py` - Composable reward design
+### For Client Usage:
+- `*/client.py` - All environments have reference implementations
+- `repl_env/runner.py` - Message-based orchestration loop
+### For Server Setup:
+- `*/server/app.py` - FastAPI app factory usage
+- `*/openenv.yaml` - Configuration examples
+- `*/Dockerfile` - Docker image patterns
+---
+## Next Steps
+1. **Choose architecture**: Pick closest reference environment to your task
+2. **Copy skeleton**: Use `openenv init` or copy from reference
+3. **Define models**: Start with Action/Observation Pydantic models
+4. **Implement graders**: 3 tasks with programmatic scoring
+5. **Test locally**: Use client.py pattern for rapid iteration
+6. **Validate**: Run `openenv validate` before deployment
+7. **Deploy**: `openenv push` to Hugging Face Spaces

docs/HACKATHON_META.md ADDED Viewed

	@@ -0,0 +1,324 @@

+# Meta OpenEnv Hackathon - Round 1
+## Overview
+Build a complete, real-world OpenEnv environment that an AI agent can learn from through the standard `step()` / `reset()` / `state()` API.
+## Task Requirements
+### Must-Have Features
+1. **Real-world Task Simulation**
+   - Must simulate tasks humans actually do
+   - Not games or toys
+   - Examples: email triage, code review, data cleaning, scheduling, customer support, content moderation
+2. **OpenEnv Spec Compliance**
+   - Typed Observation, Action, and Reward Pydantic models
+   - `step(action)` → returns observation, reward, done, info
+   - `reset()` → returns initial observation
+   - `state()` → returns current state
+   - `openenv.yaml` with metadata
+   - Must pass `openenv validate`
+3. **Minimum 3 Tasks with Agent Graders**
+   - Each task defines a concrete objective
+   - Programmatic grader scoring (0.0–1.0)
+   - Difficulty range: easy → medium → hard
+   - Clear, deterministic success/failure criteria
+4. **Meaningful Reward Function**
+   - Provides signal over full trajectory (not just binary)
+   - Rewards partial progress toward completion
+   - Penalizes undesirable behavior (infinite loops, destructive actions)
+5. **Baseline Inference Script**
+   - Uses OpenAI API client
+   - Reads credentials from `OPENAI_API_KEY` environment variable
+   - Produces reproducible baseline scores on all 3 tasks
+## Non-Functional Requirements
+### Deployment
+- **Hugging Face Space**: Environment must run as containerized HF Space tagged with `openenv`
+- **Dockerfile**: Working containerization with clean `docker build + docker run`
+### Documentation
+README must include:
+- Environment description and motivation
+- Action and observation space definitions
+- Task descriptions with expected difficulty
+- Setup and usage instructions
+- Baseline scores
+## Evaluation Criteria & Scoring
+### Scoring Breakdown (100 points)
+| Criterion | Weight | Description |
+|-----------|--------|-------------|
+| **Real-world utility** | 30% | Does the environment model a genuine task? Would someone use this for training/evaluating agents? |
+| **Task & grader quality** | 25% | Well-defined tasks with clear objectives? Accurate graders? Meaningful difficulty progression? |
+| **Environment design** | 20% | Clean state management, sensible action/observation spaces, good reward shaping, proper episode boundaries |
+| **Code quality & spec compliance** | 15% | Follows OpenEnv spec, clean structure, typed models, documented, tested, working Dockerfile |
+| **Creativity & novelty** | 10% | Novel problem domain, interesting mechanics, clever reward design, original approach |
+### Detailed Scoring Rubrics
+#### Real-world Utility (30%)
+- **0–5**: Toy/artificial problem with no practical application
+- **6–15**: Valid domain but shallow modeling
+- **16–25**: Good domain modeling, useful for agent evaluation
+- **26–30**: Excellent — fills real gap, immediate value for RL/agent community
+#### Task & Grader Quality (25%)
+- 3+ tasks with difficulty range?
+- Graders produce scores between 0.0–1.0?
+- Graders deterministic and reproducible?
+- Hard task genuinely challenges frontier models?
+#### Environment Design (20%)
+- `reset()` produces clean state?
+- Action/observation types well-designed and documented?
+- Reward function provides useful varying signal (not sparse)?
+- Episode boundaries sensible?
+#### Code Quality & Spec Compliance (15%)
+- `openenv validate` passes?
+- `docker build && docker run` works?
+- HF Space deploys and responds?
+- Baseline script runs and reproduces scores?
+#### Creativity & Novelty (10%)
+- Domain not seen in OpenEnv before?
+- Reward design has interesting properties?
+- Clever mechanics that make environment engaging?
+## Judging Process
+### Phase 1: Automated Validation (Pass/Fail Gate)
+- HF Space deploys
+- OpenEnv spec compliance
+- Dockerfile builds
+- Baseline reproduces
+- 3+ tasks with graders
+### Phase 2: Agentic Evaluation (Scored)
+- Baseline agent re-run
+- Standard Open LLM agent (e.g., Nemotron 3 Super) run against all environments
+- Score variance check
+### Phase 3: Human Review
+Top submissions reviewed by Meta and Hugging Face engineers for:
+- Real-world utility
+- Creativity
+- Exploit checks
+### Disqualification Criteria
+- Environment does not deploy or respond
+- Plagiarized or trivially modified existing environments
+- Graders that always return the same score
+- No baseline inference script
+## Pre-Submission Checklist
+All must pass or you're disqualified:
+- [ ] HF Space deploys (200 response to reset())
+- [ ] OpenEnv spec compliance validated
+- [ ] Dockerfile builds successfully
+- [ ] Baseline script reproduces without error
+- [ ] 3+ tasks with graders (scores in 0.0–1.0 range)
+## Mandatory Requirements
+### Environment Variables
+Must be defined in your environment configuration:
+```bash
+API_BASE_URL    # The API endpoint for the LLM
+MODEL_NAME      # The model identifier to use for inference
+HF_TOKEN        # Your Hugging Face / API key
+LOCAL_IMAGE_NAME # (Optional) Name of local image if using from_docker_image()
+```
+### Script Requirements
+- **Filename**: `inference.py` (must be in root directory)
+- **LLM Calls**: Must use OpenAI Client with above variables
+- **Logging Format**: Must follow [START], [STEP], [END] format (see below)
+### Infrastructure Restrictions
+- **Runtime**: Inference script must complete in < 20 minutes
+- **Resources**: Must run on vcpu=2, memory=8GB
+## STDOUT Logging Format
+### Required Format
+The script must emit exactly three line types to stdout, in this order:
+```
+[START] task=<task_name> env=<benchmark> model=<model_name>
+[STEP]  step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
+[END]   success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
+```
+### Format Rules
+- One [START] line at episode begin
+- One [STEP] line per step, immediately after `env.step()` returns
+- One [END] line after `env.close()`, always emitted (even on exception)
+- `reward` and `rewards` formatted to 2 decimal places
+- `done` and `success` are lowercase booleans: `true` or `false`
+- `error` is the raw `last_action_error` string, or `null` if none
+- All fields on a single line with no newlines within a line
+- Each task should return score in [0, 1]
+### Example Output
+```
+[START] task=click-test env=miniwob model=Qwen3-VL-30B
+[STEP] step=1 action=click('123') reward=0.00 done=false error=null
+[STEP] step=2 action=fill('456','text') reward=0.00 done=false error=null
+[STEP] step=3 action=click('789') reward=1.00 done=true error=null
+[END] success=true steps=3 score=1.00 rewards=0.00,0.00,1.00
+```
+## Sample Inference Script
+```python
+"""
+Inference Script Example
+===================================
+MANDATORY
+- Before submitting, ensure the following variables are defined in your environment configuration:
+    API_BASE_URL   The API endpoint for the LLM.
+    MODEL_NAME     The model identifier to use for inference.
+    HF_TOKEN       Your Hugging Face / API key.
+    LOCAL_IMAGE_NAME The name of the local image to use for the environment if you are using from_docker_image()
+                     method
+- Defaults are set only for API_BASE_URL and MODEL_NAME
+    (and should reflect your active inference setup):
+    API_BASE_URL = os.getenv("API_BASE_URL", "<your-active-endpoint>")
+    MODEL_NAME = os.getenv("MODEL_NAME", "<your-active-model>")
+- The inference script must be named `inference.py` and placed in the root directory of the project
+- Participants must use OpenAI Client for all LLM calls using above variables
+STDOUT FORMAT
+- The script must emit exactly three line types to stdout, in this order:
+    [START] task=<task_name> env=<benchmark> model=<model_name>
+    [STEP]  step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
+    [END]   success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
+  Rules:
+    - One [START] line at episode begin.
+    - One [STEP] line per step, immediately after env.step() returns.
+    - One [END] line after env.close(), always emitted (even on exception).
+    - reward and rewards are formatted to 2 decimal places.
+    - done and success are lowercase booleans: true or false.
+    - error is the raw last_action_error string, or null if none.
+    - All fields on a single line with no newlines within a line.
+    - Each tasks should return score in [0, 1]
+  Example:
+    [START] task=click-test env=miniwob model=Qwen3-VL-30B
+    [STEP] step=1 action=click('123') reward=0.00 done=false error=null
+    [STEP] step=2 action=fill('456','text') reward=0.00 done=false error=null
+    [STEP] step=3 action=click('789') reward=1.00 done=true error=null
+    [END] success=true steps=3 score=1.00 rewards=0.00,0.00,1.00
+"""
+import asyncio
+import os
+import textwrap
+from typing import List, Optional
+from openai import OpenAI
+from my_env_v4 import MyEnvV4Action, MyEnvV4Env
+IMAGE_NAME = os.getenv("IMAGE_NAME")  # If you are using docker image
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
+MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
+TASK_NAME = os.getenv("MY_ENV_V4_TASK", "echo")
+BENCHMARK = os.getenv("MY_ENV_V4_BENCHMARK", "my_env_v4")
+MAX_STEPS = 8
+TEMPERATURE = 0.7
+# TODO: Implement the rest of your inference script here
+```
+## Pre-Validation Script
+```bash
+#!/usr/bin/env bash
+#
+# validate-submission.sh — OpenEnv Submission Validator
+#
+# Checks that your HF Space is live, Docker image builds, and openenv validate passes.
+#
+# Prerequisites:
+#   - Docker:       https://docs.docker.com/get-docker/
+#   - openenv-core: pip install openenv-core
+#   - curl (usually pre-installed)
+#
+# Run:
+#   curl -fsSL https://raw.githubusercontent.com/<owner>/<repo>/main/scripts/validate-submission.sh | bash -s -- <ping_url> [repo_dir]
+#
+#   Or download and run locally:
+#     chmod +x validate-submission.sh
+#     ./validate-submission.sh <ping_url> [repo_dir]
+#
+# Arguments:
+#   ping_url   Your HuggingFace Space URL (e.g. https://your-space.hf.space)
+#   repo_dir   Path to your repo (default: current directory)
+#
+# Examples:
+#   ./validate-submission.sh https://my-team.hf.space
+#   ./validate-submission.sh https://my-team.hf.space ./my-repo
+#
+set -uo pipefail
+DOCKER_BUILD_TIMEOUT=600
+if [ -t 1 ]; then
+  RED='\033[0;31m'
+  GREEN='\033[0;32m'
+  YELLOW='\033[1;33m'
+  BOLD='\033[1m'
+  NC='\033[0m'
+else
+  RED=''
+  GREEN=''
+  YELLOW=''
+  BOLD=''
+  NC=''
+fi
+# TODO: Add the rest of the validation script
+```
+## Tips for Success
+1. **Choose a Real Problem**: Pick a task that has genuine value for the AI/agent community
+2. **Design Good Rewards**: Provide meaningful signals throughout the episode, not just at the end
+3. **Test Thoroughly**: Ensure your environment works cleanly with `docker build && docker run`
+4. **Document Well**: Clear README helps reviewers understand your contribution
+5. **Start Simple**: Get the basic OpenEnv spec working first, then add complexity
+6. **Run Validator**: Use the pre-validation script before submitting
+## Resources
+- OpenEnv Documentation: [Link to be added]
+- Hugging Face Spaces: https://huggingface.co/spaces
+- OpenAI API Client: https://platform.openai.com/docs/api-reference
+## Submission Deadline
+[To be announced]
+---
+**Good luck with your submission! 🚀**

docs/hackathon-guide-rl-environments.md ADDED Viewed

	@@ -0,0 +1,228 @@

+# Building RL Environments for Hackathon: Complete Guide
+## Overview
+This guide provides comprehensive insights for building real-world Reinforcement Learning (RL) environments using the OpenM (Open Environment) library for hackathon participation.
+---
+## 1. Fundamentals of Reinforcement Learning
+### The Mechanism
+- **How it Works:** Model generates candidate implementations (actions) → Environment verifies/tests → Environment provides reward signal (score) based on pre-defined rubrics
+- **Purpose:** Tells the model what is good or bad through trial and error rather than long-context prompts
+### Position in Training Pipeline
+- Typically follows **Supervised Fine-Tuning (SFT)**
+- Used to "squeeze out" final performance gains on specific capabilities
+- More efficient alternative to "in-context learning" (which degrades with longer prompts)
+### Key Challenges
+#### Reward Hacking
+- Models learn to "game" the verifier to get high scores without actually solving the task
+- **Mitigation:** Inspect output trajectories or use multiple reward functions
+#### Curriculum Learning
+- Start with easy tasks and build complexity progressively
+- Ensures model receives consistent reward signal
+- Prevents "wasting compute" on tasks that are too difficult initially
+---
+## 2. Introduction to OpenM
+### What is OpenM?
+- Collaborative project between Meta, Hugging Face, and others
+- Standardizes RL environments (like Hugging Face standardized language models)
+- Single, consistent API for environments
+- Interoperable with training frameworks (TRL, Unsloth, etc.)
+### Core Components
+Standard OpenM environment requires defining:
+- **Actions** (as Pydantic objects)
+- **Observations** (as Pydantic objects)
+- **States** (as Pydantic objects)
+---
+## 3. Technical Implementation
+### CLI Workflow
+```bash
+# Initialize skeleton environment
+openm init
+# Validate setup
+openm validate
+# Deploy to Hugging Face Spaces
+openm push
+```
+### Agent Integration
+- Use coding agents (like Codeex) with OpenM "skills"
+- Automatically generate environment code from prompts
+### Deployment
+- Environments deployed as Docker containers on Hugging Face
+- Provides web interface for manual testing and debugging
+- **Important:** Dockerfile must be moved outside `/server` folder to main project directory
+---
+## 4. Hackathon Requirements
+### Environment Quality
+#### Real-World Focus (Critical)
+- **Must build:** Real-world task environments (healthcare, email triage, code optimization)
+- **Avoid:** "Toy" environments, games (Wordle, Connect 4, etc.)
+- **Goal:** Environment that could realistically be used in model's post-training RL run
+#### Complexity Requirements
+- Map **long-running tasks** with multiple trajectories/routes
+- Agent should have various possible approaches to solve the task
+### Technical Requirements
+#### Mandatory Inference Script
+- **Required for every submission**
+- Used by organizers to evaluate environment effectiveness
+- Measures how well environment provides rewards to model
+#### API Configuration
+- **No OpenAI API key required**
+- Use **Hugging Face token** instead
+- Use provided **HF Router** (API base URL) for model calls
+- HF Router handles model calls through Hugging Face
+#### Docker Setup
+- Move Dockerfile outside `/server` folder to main project directory
+- Run `openm validate` before submission
+### Reward Signal Design
+#### Requirements
+- Score typically between 0 and 1
+- Must deliver valid signal indicating "good" or "bad" performance
+- **Grading Diversity:** Must not return same score every time
+- Should distinguish between different performance levels
+#### Best Practices
+- Start with achievable tasks for the model
+- Ensure task is feasible but challenging
+- Avoid tasks too difficult or out-of-distribution for the model
+---
+## 5. Grading Criteria
+Evaluation based on:
+1. **Utility of the Idea**
+   - How useful is the task for real-world AI?
+   - Does it represent authentic human tasks?
+2. **Quality of the Grader**
+   - Returns diverse scores (not same score every time)
+   - Value between 0 and 1
+   - Distinguishes performance levels
+3. **Technical Design**
+   - Environment architecture and implementation
+   - Successful execution of inference script
+4. **Novelty**
+   - Key criterion for high scores
+   - Create something not thought of yet
+   - Solve problems in unique domains
+   - **Plagiarism is strictly prohibited**
+---
+## 6. Submission Guidelines
+### Deadline
+- **Round One:** April 8th
+### Submission Process
+- Push environment to **Hugging Face Spaces** using `openm push`
+- Submit URL of Hugging Face Space
+- Multiple submissions allowed (latest accurate submission used)
+### Collaboration
+- Teams are **highly encouraged**
+- Helps manage technical and creative requirements
+---
+## 7. High-Value Environment Ideas
+### Healthcare Domain
+- Medical triage tools
+- Navigating medical records
+- Healthcare-specific software tool utilization
+### Productivity and Operations
+- **Email Triage:** Prioritize, categorize, respond to complex inbox
+- **Calendar Management:** Coordinate schedules, handle conflicts across multiple participants
+### Technical and Code Optimization
+- **Kernel Optimization:** Benchmark and optimize PyTorch/GPU kernels for speed and efficiency
+- **Repository Maintenance:** Navigate GitHub to identify/fix bugs, run test suites
+### Logistics and Travel
+- **Complex Flight Booking:** Navigate changing availability, multi-leg transfers, request missing information from users
+### API and Tool Integration
+- Wide set of real-world tools
+- Interactive APIs that agents must learn to use correctly
+---
+## 8. Best Practices Summary
+### Do's
+- Focus on real-world utility
+- Design long-running, multi-trajectory tasks
+- Implement diverse grading systems
+- Start with curriculum learning approach
+- Validate thoroughly before submission
+- Work in teams for better results
+- Aim for novelty and uniqueness
+### Don'ts
+- Avoid toy environments or games
+- Don't create tasks too difficult for models
+- Don't implement single-score graders
+- Avoid plagiarism
+- Don't submit without testing inference script
+- Don't use tasks without clear reward signals
+---
+## 9. Technical Checklist
+- [ ] Initialize project with `openm init`
+- [ ] Define Actions, Observations, States as Pydantic objects
+- [ ] Implement diverse reward function (0-1 range)
+- [ ] Create mandatory inference script
+- [ ] Configure HF token and router (not OpenAI key)
+- [ ] Move Dockerfile to main directory (outside /server)
+- [ ] Run `openm validate` to verify setup
+- [ ] Test environment locally
+- [ ] Deploy with `openm push` to Hugging Face Spaces
+- [ ] Submit Hugging Face Space URL before April 8th
+---
+## Resources
+- **OpenM Library:** Standardized RL environment framework
+- **Hugging Face Spaces:** Deployment platform
+- **HF Router:** API for model access
+- **Training Frameworks:** TRL, Unsloth (compatible with OpenM)
+---
+*This guide synthesizes best practices for building competitive RL environments for hackathons. Focus on real-world utility, technical excellence, and novel approaches for the best results.*

docs/superpowers/specs/2025-04-06-scheduling-env-design.md ADDED Viewed

	@@ -0,0 +1,2068 @@

+# Intelligent Meeting Scheduling Environment - Design Specification
+**Date**: 2025-04-06
+**Author**: Akshay Kumar
+**Hackathon**: Meta OpenEnv Hackathon - Round 1
+**Deadline**: April 8th, 2025
+---
+## Executive Summary
+This document specifies an OpenEnv RL environment for intelligent meeting scheduling based on the BotBooked.ai production system. The environment teaches agents to optimize meeting time slot selection, handle cascading rescheduling, and learn multi-stakeholder preferences through reinforcement learning.
+**Key Features:**
+- Multi-component dense reward function with diverse scoring (0.0-1.0 range)
+- 3 difficulty-graded tasks (Easy → Medium → Hard)
+- Multi-step action space (propose → reschedule → finalize)
+- Ported from proven 30KB BotBooked scheduling algorithm
+- Real-world utility: Executive scheduling is a $10B+ industry problem
+---
+## 1. Problem Statement
+### 1.1 Real-World Context
+Meeting scheduling involves:
+- Finding time slots that work for multiple participants
+- Balancing individual preferences (preferred hours, buffer times, meeting limits)
+- Handling calendar conflicts through intelligent rescheduling
+- Optimizing for efficiency (minimal disruptions, quick solutions)
+Current solutions (Calendly, Google Calendar auto-scheduling) use heuristic algorithms. This environment enables RL agents to learn optimal scheduling strategies through trial and error.
+### 1.2 Environment Goals
+The agent must learn to:
+1. **Propose valid time slots** that satisfy hard constraints (working hours, availability)
+2. **Minimize preference violations** (back-to-back meetings, outside preferred hours, daily limits)
+3. **Handle cascading rescheduling** when conflicts exist
+4. **Balance competing objectives** (speed vs. quality, individual vs. group preferences)
+### 1.3 Hackathon Alignment
+| Requirement | How We Meet It |
+|-------------|----------------|
+| Real-world task | Executive scheduling (genuine $10B+ industry value) |
+| 3 tasks with graders | Easy/Medium/Hard scenarios with programmatic scoring (0.0-1.0) |
+| Meaningful rewards | Dense multi-component signal with partial progress tracking |
+| OpenEnv compliance | Pydantic models, step/reset/state API, openenv.yaml |
+| Baseline inference | inference.py using HF Router with OpenAI client |
+| Diverse scores | Multi-component formula guarantees unique scores per trajectory |
+---
+## 2. Architecture
+### 2.1 High-Level System Design
+```
+┌─────────────────────────────────────────────────────────┐
+│                   OpenEnv HTTP Server                    │
+│  ┌───────────────────────────────────────────────────┐  │
+│  │  FastAPI App (create_app factory)                 │  │
+│  │    - POST /reset → Initialize episode             │  │
+│  │    - POST /step  → Execute action                 │  │
+│  │    - GET  /state → Get current state              │  │
+│  └───────────────────────────────────────────────────┘  │
+│                          ↓                              │
+│  ┌───────────────────────────────────────────────────┐  │
+│  │  SchedulingEnvironment                            │  │
+│  │    - reset(task_id, scenario) → Observation       │  │
+│  │    - step(action) → Observation                   │  │
+│  │    - state() → State                              │  │
+│  └───────────────────────────────────────────────────┘  │
+│                          ↓                              │
+│  ┌───────────────────────────────────────────────────┐  │
+│  │  BotBooked Core Logic (ported)                    │  │
+│  │    - find_earliest_slot()                         │  │
+│  │    - calculate_preference_score()                 │  │
+│  │    - check_conflicts()                            │  │
+│  │    - validate_constraints()                       │  │
+│  └───────────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────────┘
+                          ↓
+┌─────────────────────────────────────────────────────────┐
+│                   Python Client                          │
+│  SchedulingEnv (EnvClient wrapper)                      │
+│    - async/sync support                                 │
+│    - Type-safe action/observation handling             │
+└─────────────────────────────────────────────────────────┘
+```
+### 2.2 Directory Structure
+```
+scheduling_env/
+├── __init__.py                  # Package exports
+├── models.py                    # Pydantic models (Action, Observation, State)
+├── client.py                    # HTTP client (EnvClient wrapper)
+├── openenv.yaml                 # OpenEnv metadata
+├── pyproject.toml               # Dependencies
+├── README.md                    # Documentation
+├── inference.py                 # Baseline inference script (ROOT)
+├── Dockerfile                   # Docker image (ROOT)
+├── .env.example                 # Environment variables template
+├── server/
+│   ├── __init__.py
+│   ├── app.py                   # FastAPI app factory
+│   ├── environment.py           # SchedulingEnvironment class
+│   ├── scheduling_logic.py      # Ported BotBooked functions
+│   ├── graders.py               # Reward calculation
+│   └── scenarios/
+│       ├── task1_easy.json      # Easy scenario definition
+│       ├── task2_medium.json    # Medium scenario definition
+│       └── task3_hard.json      # Hard scenario definition
+└── tests/
+    ├── test_environment.py      # Unit tests
+    └── test_graders.py          # Grader validation
+```
+---
+## 3. Data Models
+### 3.1 Action Model
+```python
+class SchedulingAction(Action):
+    """Agent's action in the scheduling environment"""
+    action_type: Literal["propose_slot", "reschedule_meeting", "finalize", "reject"]
+    # For propose_slot - agent suggests a time slot
+    proposed_start: Optional[str] = None  # ISO8601 datetime string
+    proposed_duration: Optional[int] = None  # minutes
+    # For reschedule_meeting - agent moves an existing meeting
+    meeting_id_to_move: Optional[str] = None
+    new_start_time: Optional[str] = None  # ISO8601 datetime string
+    # Metadata (inherited from Action base class)
+    metadata: Dict[str, Any] = Field(default_factory=dict)
+```
+**Action Types:**
+- `propose_slot`: Agent proposes a time slot for the new meeting
+- `reschedule_meeting`: Agent reschedules a conflicting lower-priority meeting to open up a slot
+- `finalize`: Agent confirms current schedule is optimal and completes episode
+- `reject`: Agent gives up (no valid slot found)
+**Validation Rules:**
+- `propose_slot` requires both `proposed_start` and `proposed_duration`
+- `reschedule_meeting` requires both `meeting_id_to_move` and `new_start_time`
+- `finalize` and `reject` require no additional parameters
+### 3.2 Observation Model
+```python
+class SchedulingObservation(Observation):
+    """What agent sees after each step"""
+    # Meeting request details
+    requested_duration: int  # minutes
+    requested_priority: int  # 1=highest, 4=lowest
+    attendee_ids: List[str]  # e.g., ["user1", "user2"]
+    # Current calendar state (all attendees combined)
+    busy_slots: List[Dict[str, Any]]
+    # Format: [{start: ISO8601, end: ISO8601, priority: int, summary: str, attendee: str}]
+    # Working hours constraints (intersection of all attendees)
+    collective_work_hours: Dict[str, int]  # {min_start_hour: int, max_end_hour: int}
+    # Preference summary (aggregated from all attendees)
+    preference_constraints: Dict[str, Any]
+    # {max_meetings_per_day: int, requires_buffer: bool, buffer_minutes: int}
+    # Current proposal state
+    current_proposal: Optional[Dict[str, str]] = None  # {start: ISO8601, end: ISO8601}
+    conflicts: List[Dict[str, Any]] = []  # Meetings that conflict with current proposal
+    # Scoring metrics
+    preference_penalty: float = 0.0  # Current preference violation score
+    num_rescheduled: int = 0  # How many meetings moved so far
+    # Episode state
+    steps_taken: int
+    max_steps: int = 20  # Episode limit
+    # Status flags
+    success: bool = False  # Slot found and validated
+    error_message: Optional[str] = None
+    # Standard OpenEnv fields (inherited)
+    done: bool = False
+    reward: float = 0.0
+    metadata: Dict[str, Any] = Field(default_factory=dict)
+```
+**Design Rationale:**
+- Agent sees full calendar state (all busy slots across attendees)
+- Preferences are aggregated to collective constraints
+- Current proposal and conflicts help agent track progress
+- Error messages provide feedback on invalid actions
+### 3.3 State Model
+```python
+class SchedulingState(State):
+    """Internal environment state"""
+    # Standard fields (inherited from OpenEnv State)
+    episode_id: str  # Unique UUID per episode
+    step_count: int  # Number of steps taken
+    # Task info
+    task_id: str  # e.g., "task1_easy", "task2_medium", "task3_hard"
+    scenario_name: str  # Human-readable name
+    # Meeting request
+    meeting_request: Dict[str, Any]
+    # {duration: int, priority: int, attendees: List[str], summary: str}
+    # Calendar storage (BotBooked format)
+    calendars: Dict[str, List[Tuple[datetime, datetime, int, str]]]
+    # {user_id: [(start, end, priority, summary), ...]}
+    # Preferences (BotBooked format)
+    participant_preferences: Dict[str, Dict[str, Any]]
+    # {user_id: {preferred_hours: {start: int, end: int}, max_meetings_per_day: int,
+    #             avoid_back_to_back: bool, buffer_minutes: int}}
+    # Tracking
+    proposed_slot: Optional[Tuple[datetime, datetime]] = None
+    rescheduled_meetings: List[Dict[str, Any]] = []
+    # [{meeting_id: str, old_start: datetime, new_start: datetime, attendee: str}]
+    # Performance metrics
+    total_preference_penalty: float = 0.0
+    total_steps: int = 0
+    final_reward: float = 0.0
+    completed: bool = False
+```
+**Design Rationale:**
+- Maintains BotBooked data format (minimal translation layer)
+- Tracks rescheduling history for reward calculation
+- Stores proposed slot for validation across steps
+---
+## 4. Episode Flow
+### 4.1 Episode Lifecycle
+```
+┌─────────────────────────────────────────────────────────┐
+│ RESET                                                    │
+├─────────────────────────────────────────────────────────┤
+│ 1. Load scenario JSON (task1/task2/task3)               │
+│ 2. Initialize calendars with existing meetings          │
+│ 3. Load participant preferences                         │
+│ 4. Generate meeting request                             │
+│ 5. Calculate collective working hours                   │
+│ 6. Return initial observation (done=False, reward=0.0)  │
+└─────────────────────────────────────────────────────────┘
+                        ↓
+┌─────────────────────────────────────────────────────────┐
+│ STEP LOOP (max 20 steps)                                │
+├─────────────────────────────────────────────────────────┤
+│ Agent submits action → Environment processes → Returns  │
+│                                                          │
+│ Action Processing:                                       │
+│  • propose_slot: Validate time, check conflicts, score  │
+│  • reschedule_meeting: Move meeting, update calendars   │
+│  • finalize: Calculate final reward, end episode        │
+│  • reject: End episode with failure (reward=0.0)        │
+│                                                          │
+│ Returns: Observation(reward, done, success, conflicts)  │
+└─────────────────────────────────────────────────────────┘
+                        ↓
+┌─────────────────────────────────────────────────────────┐
+│ EPISODE END                                              │
+├─────────────────────────────────────────────────────────┤
+│ Termination Conditions:                                  │
+│  ✓ Agent calls "finalize" with valid schedule           │
+│  ✓ Agent calls "reject" (failure)                       │
+│  ✓ Max steps reached (20 steps timeout)                 │
+│  ✓ Hard constraint violated                             │
+│                                                          │
+│ Final reward: calculate_final_reward() → [0.0, 1.0]    │
+└─────────────────────────────────────────────────────────┘
+```
+### 4.2 Action Processing
+#### 4.2.1 propose_slot
+```python
+def _process_propose_slot(action: SchedulingAction) -> SchedulingObservation:
+    """
+    Agent proposes a time slot for the meeting
+    Steps:
+    1. Parse proposed_start and calculate proposed_end
+    2. Validate slot is within collective working hours
+    3. Find conflicts with existing meetings
+    4. Calculate preference penalty score
+    5. Update state with proposal
+    6. Return observation with step reward
+    Step Rewards:
+    - +0.5: No conflicts, low preference penalty (<100)
+    - +0.2: Conflicts exist, but all are lower priority (reschedulable)
+    - -0.3: Conflicts with higher priority meetings (invalid)
+    - -0.2: Outside working hours (hard constraint violation)
+    """
+    start_time = parse_iso8601(action.proposed_start)
+    end_time = start_time + timedelta(minutes=action.proposed_duration)
+    # Validate working hours
+    if not within_collective_hours(start_time, end_time, collective_work_hours):
+        return SchedulingObservation(
+            error_message="Proposed slot outside working hours",
+            reward=-0.2,
+            done=False
+        )
+    # Find conflicts
+    conflicts = []
+    for attendee in attendee_ids:
+        for meeting in calendars[attendee]:
+            if overlaps(start_time, end_time, meeting.start, meeting.end):
+                conflicts.append({
+                    'attendee': attendee,
+                    'start': meeting.start,
+                    'end': meeting.end,
+                    'priority': meeting.priority,
+                    'summary': meeting.summary,
+                    'meeting_id': f"{attendee}_{meeting.start.isoformat()}"
+                })
+    # Calculate preference penalty
+    preference_penalty = calculate_preference_score(
+        start_time,
+        action.proposed_duration,
+        participant_preferences
+    )
+    # Update state
+    state.proposed_slot = (start_time, end_time)
+    state.total_preference_penalty = preference_penalty
+    # Calculate step reward
+    if len(conflicts) == 0 and preference_penalty < 100:
+        step_reward = 0.5  # Perfect slot
+    elif len(conflicts) > 0:
+        if all(c['priority'] > requested_priority for c in conflicts):
+            step_reward = 0.2  # Reschedulable conflicts
+        else:
+            step_reward = -0.3  # Cannot reschedule (priority violation)
+    else:
+        step_reward = 0.0  # Free slot but high preference penalty
+    return SchedulingObservation(
+        current_proposal={'start': start_time.isoformat(), 'end': end_time.isoformat()},
+        conflicts=conflicts,
+        preference_penalty=preference_penalty,
+        reward=step_reward,
+        done=False
+    )
+```
+#### 4.2.2 reschedule_meeting
+```python
+def _process_reschedule_meeting(action: SchedulingAction) -> SchedulingObservation:
+    """
+    Agent reschedules a conflicting meeting to a new time
+    Steps:
+    1. Validate meeting_id exists and is in conflict list
+    2. Check priority (can only reschedule lower priority)
+    3. Validate new time slot is free for that attendee
+    4. Remove old meeting from calendar
+    5. Add meeting at new time
+    6. Update rescheduled_meetings list
+    7. Recalculate conflicts for current proposal
+    8. Return observation with step reward
+    Step Rewards:
+    - +0.5: Successful reschedule and all conflicts now resolved
+    - +0.3: Successful reschedule but conflicts remain
+    - -0.2: New slot not free or invalid
+    - -0.5: Attempted to reschedule higher priority meeting
+    """
+    # Find meeting
+    meeting = find_meeting_by_id(action.meeting_id_to_move, state.conflicts)
+    if not meeting:
+        return SchedulingObservation(
+            error_message="Invalid meeting_id or not in conflict list",
+            reward=-0.2,
+            done=False
+        )
+    # Check priority
+    if meeting['priority'] <= state.meeting_request['priority']:
+        return SchedulingObservation(
+            error_message="Cannot reschedule higher or equal priority meeting",
+            reward=-0.5,
+            done=False
+        )
+    # Validate new slot
+    new_start = parse_iso8601(action.new_start_time)
+    meeting_duration = (meeting['end'] - meeting['start']).seconds // 60
+    new_end = new_start + timedelta(minutes=meeting_duration)
+    if not is_slot_free(meeting['attendee'], new_start, new_end, calendars):
+        return SchedulingObservation(
+            error_message="New slot not free",
+            reward=-0.2,
+            done=False
+        )
+    # Update calendar
+    remove_meeting(calendars[meeting['attendee']], meeting['start'])
+    add_meeting(
+        calendars[meeting['attendee']],
+        new_start,
+        new_end,
+        meeting['priority'],
+        meeting['summary']
+    )
+    # Track rescheduling
+    state.rescheduled_meetings.append({
+        'meeting_id': action.meeting_id_to_move,
+        'old_start': meeting['start'].isoformat(),
+        'new_start': new_start.isoformat(),
+        'attendee': meeting['attendee']
+    })
+    state.num_rescheduled += 1
+    # Recalculate conflicts
+    new_conflicts = find_conflicts(
+        calendars,
+        state.proposed_slot[0],
+        state.proposed_slot[1],
+        attendee_ids
+    )
+    # Step reward
+    if len(new_conflicts) == 0:
+        step_reward = 0.5  # All conflicts resolved!
+    else:
+        step_reward = 0.3  # Progress made
+    return SchedulingObservation(
+        conflicts=new_conflicts,
+        num_rescheduled=state.num_rescheduled,
+        reward=step_reward,
+        done=False
+    )
+```
+#### 4.2.3 finalize
+```python
+def _process_finalize(action: SchedulingAction) -> SchedulingObservation:
+    """
+    Agent confirms schedule is optimal and ends episode
+    Steps:
+    1. Validate proposed_slot exists
+    2. Validate no unresolved conflicts
+    3. Calculate final reward
+    4. Mark episode as completed
+    5. Return observation with done=True
+    Final Reward: calculate_final_reward(preference_penalty, num_rescheduled, steps)
+    """
+    # Validate state
+    if state.proposed_slot is None:
+        return SchedulingObservation(
+            error_message="No slot proposed",
+            success=False,
+            reward=-0.5,
+            done=True
+        )
+    # Check for unresolved conflicts
+    current_conflicts = find_conflicts(
+        calendars,
+        state.proposed_slot[0],
+        state.proposed_slot[1],
+        attendee_ids
+    )
+    if len(current_conflicts) > 0:
+        return SchedulingObservation(
+            error_message=f"Unresolved conflicts: {len(current_conflicts)} meetings",
+            conflicts=current_conflicts,
+            success=False,
+            reward=-0.3,
+            done=True
+        )
+    # Calculate final reward
+    final_reward = calculate_final_reward(
+        preference_penalty=state.total_preference_penalty,
+        num_rescheduled=state.num_rescheduled,
+        steps_taken=state.step_count
+    )
+    # Update state
+    state.completed = True
+    state.final_reward = final_reward
+    return SchedulingObservation(
+        success=True,
+        reward=final_reward,
+        done=True,
+        metadata={'final_slot': state.proposed_slot}
+    )
+```
+#### 4.2.4 reject
+```python
+def _process_reject(action: SchedulingAction) -> SchedulingObservation:
+    """
+    Agent gives up on finding a valid schedule
+    Returns: Observation with done=True, reward=0.0, success=False
+    """
+    return SchedulingObservation(
+        success=False,
+        reward=0.0,
+        done=True,
+        error_message="Agent rejected scheduling task"
+    )
+```
+### 4.3 Episode Termination
+| Condition | done=True | Final Reward | Success |
+|-----------|-----------|--------------|---------|
+| Agent calls `finalize` with valid schedule | ✅ | calculate_final_reward() | ✅ |
+| Agent calls `finalize` with conflicts | ✅ | -0.3 | ❌ |
+| Agent calls `reject` | ✅ | 0.0 | ❌ |
+| Max steps reached (20) | ✅ | partial_credit() | ❌ |
+| Priority violation (reschedule higher priority) | ✅ | -0.5 | ❌ |
+**IMPORTANT - Partial Credit on Timeout**:
+To meet hackathon requirement "reward partial progress," we give partial credit when max steps reached:
+```python
+def _handle_timeout(state: SchedulingState) -> SchedulingObservation:
+    """Give partial credit if agent made progress before timeout"""
+    # No proposal at all - complete failure
+    if state.proposed_slot is None:
+        return SchedulingObservation(
+            success=False,
+            reward=0.0,
+            done=True,
+            error_message="Timeout: No slot proposed"
+        )
+    # Has proposal - check if it's valid
+    conflicts = find_conflicts(
+        state.calendars,
+        state.proposed_slot[0],
+        state.proposed_slot[1],
+        state.attendee_ids
+    )
+    if len(conflicts) == 0:
+        # Valid slot found, just didn't finalize in time
+        # Give 70% of what final score would have been
+        theoretical_score = calculate_final_reward(
+            state.total_preference_penalty,
+            state.num_rescheduled,
+            state.step_count
+        )
+        partial_reward = theoretical_score * 0.7
+    else:
+        # Made progress but still has conflicts
+        # Give credit based on how close to solution
+        progress = 1.0 - (len(conflicts) / max(1, len(state.attendee_ids)))
+        partial_reward = 0.2 * progress
+    return SchedulingObservation(
+        success=False,  # Technically failed (timeout)
+        reward=partial_reward,
+        done=True,
+        error_message=f"Timeout after {state.step_count} steps (partial credit: {partial_reward:.2f})"
+    )
+```
+---
+## 5. Reward Function
+### 5.1 Multi-Component Formula
+```python
+def calculate_final_reward(
+    preference_penalty: float,
+    num_rescheduled: int,
+    steps_taken: int,
+    success: bool = True
+) -> float:
+    """
+    Calculate final episode reward (clamped to [0.0, 1.0])
+    Components (NON-LINEAR to prevent reward hacking):
+    1. Base success: Start at 1.0
+    2. Preference penalty: Non-linear scaling (BotBooked scoring: 0=perfect, 50=minor, 100+=severe)
+    3. Efficiency penalty: EXPONENTIAL per meeting rescheduled (1st=-0.05, 2nd=-0.10, 3rd=-0.20)
+    4. Time penalty: -0.015 per step taken
+    Returns: float in [0.0, 1.0] range
+    ANTI-REWARD-HACKING DESIGN:
+    - Preference penalty uses power scaling to make violations hurt more
+    - Rescheduling penalty is exponential (discourages cascading rescheduling)
+    - Time penalty increased from 0.01 to 0.015 (max penalty 0.30 at 20 steps)
+    """
+    if not success:
+        return 0.0
+    reward = 1.0
+    # Component 1: Preference penalty with power scaling
+    # 0-50 points → -0.0 to -0.25 deduction
+    # 50-150 points → -0.25 to -0.75 deduction
+    # 150+ points → -0.75+ deduction (severe violations)
+    preference_deduction = min(0.75, (preference_penalty ** 1.2) / 200.0)
+    reward -= preference_deduction
+    # Component 2: EXPONENTIAL rescheduling penalty
+    # Prevents agents from over-rescheduling as a lazy strategy
+    if num_rescheduled > 0:
+        rescheduling_deduction = 0.05 * (1.8 ** num_rescheduled)
+        reward -= min(0.30, rescheduling_deduction)
+    # Component 3: Time penalty (encourage efficiency)
+    time_deduction = steps_taken * 0.015
+    reward -= time_deduction
+    # Clamp to valid range
+    return max(0.0, min(1.0, reward))
+```
+### 5.2 Preference Penalty Calculation
+```python
+def calculate_preference_score(
+    proposed_start: datetime,
+    duration: int,
+    participant_preferences: Dict[str, Dict]
+) -> float:
+    """
+    Calculate penalty points for preference violations (ported from BotBooked)
+    Violations per participant:
+    - Outside preferred hours: +50 points
+    - Exceeds max meetings per day: +30 points
+    - Back-to-back without buffer: +20 points
+    Returns: Sum of all penalties across participants
+    """
+    total_penalty = 0.0
+    proposed_end = proposed_start + timedelta(minutes=duration)
+    for user_id, prefs in participant_preferences.items():
+        user_penalty = 0.0
+        # Violation 1: Outside preferred hours
+        pref_start = prefs.get('preferred_hours', {}).get('start', 9)
+        pref_end = prefs.get('preferred_hours', {}).get('end', 17)
+        if proposed_start.hour < pref_start or proposed_end.hour > pref_end:
+            user_penalty += 50
+        # Violation 2: Exceeds max meetings per day
+        max_meetings = prefs.get('max_meetings_per_day', 999)
+        meetings_on_day = count_meetings_on_date(
+            calendars[user_id],
+            proposed_start.date()
+        )
+        if meetings_on_day >= max_meetings:
+            user_penalty += 30
+        # Violation 3: Back-to-back without buffer
+        avoid_btb = prefs.get('avoid_back_to_back', False)
+        buffer_min = prefs.get('buffer_minutes', 0)
+        if avoid_btb and buffer_min > 0:
+            has_violation = check_back_to_back(
+                calendars[user_id],
+                proposed_start,
+                proposed_end,
+                buffer_min
+            )
+            if has_violation:
+                user_penalty += 20
+        total_penalty += user_penalty
+    return total_penalty
+```
+### 5.3 Step Rewards (Dense Signal)
+| Action Result | Step Reward | Reasoning |
+|---------------|-------------|-----------|
+| propose_slot: no conflicts, penalty < 100 | +0.5 | Perfect slot found |
+| propose_slot: conflicts but reschedulable | +0.2 | Valid proposal |
+| propose_slot: conflicts with higher priority | -0.3 | Invalid choice |
+| propose_slot: outside work hours | -0.2 | Hard constraint violation |
+| reschedule_meeting: all conflicts resolved | +0.5 | Major progress |
+| reschedule_meeting: success, conflicts remain | +0.3 | Incremental progress |
+| reschedule_meeting: new slot not free | -0.2 | Failed attempt |
+| reschedule_meeting: priority violation | -0.5 | Rule violation |
+| finalize: valid schedule | final_reward | Success |
+| finalize: unresolved conflicts | -0.3 | Premature |
+| reject | 0.0 | Gave up |
+### 5.4 Score Examples
+#### Task 1 (Easy) - Expected 0.90-0.98
+```
+Scenario: 2 attendees, sparse calendars, loose preferences
+Agent Trajectory:
+  Step 1: propose_slot(10:00 AM, 30 min)
+    → No conflicts, preference_penalty=0
+    → reward=+0.5
+  Step 2: finalize()
+    → final_reward = 1.0 - (0^1.2)/200 - 0 - 2*0.015
+    → final_reward = 1.0 - 0.0 - 0.0 - 0.03
+    → final_reward = 0.97
+Score: 0.97 ✅
+```
+#### Task 2 (Medium) - Expected 0.55-0.70
+```
+Scenario: 4 attendees, moderate density, strict preferences
+Agent Trajectory:
+  Step 1: propose_slot(2:00 PM, 60 min)
+    → 1 conflict (priority 4), preference_penalty=50
+    → reward=+0.2
+  Step 2: reschedule_meeting(conflict_id, 4:00 PM)
+    → Success, no more conflicts
+    → reward=+0.5
+  Step 3: finalize()
+    → final_reward = 1.0 - (50^1.2)/200 - 0.05*(1.8^1) - 3*0.015
+    → final_reward = 1.0 - 0.25 - 0.09 - 0.045
+    → final_reward = 0.615
+Score: 0.62 ✅ (Medium difficulty confirmed)
+```
+#### Task 3 (Hard) - Expected 0.25-0.45
+```
+Scenario: 6 attendees, dense calendars, conflicting preferences
+Agent Trajectory:
+  Step 1: propose_slot(11:00 AM, 45 min)
+    → 3 conflicts, preference_penalty=120
+    → reward=+0.2
+  Step 2-4: reschedule 3 meetings
+    → rewards: +0.3, +0.3, +0.5
+  Step 5: finalize()
+    → final_reward = 1.0 - (120^1.2)/200 - 0.05*(1.8^3) - 5*0.015
+    → final_reward = 1.0 - 0.65 - 0.29 - 0.075
+    → final_reward = max(0.0, -0.015) = 0.0
+Score: 0.0 ❌ (Too harsh! Adjust scenario or reduce penalties slightly)
+CORRECTED Task 3 Trajectory (with preference_penalty=80):
+  → final_reward = 1.0 - (80^1.2)/200 - 0.05*(1.8^3) - 5*0.015
+  → final_reward = 1.0 - 0.43 - 0.29 - 0.075
+  → final_reward = 0.205
+Score: 0.21 ✅ (Hard but achievable)
+```
+---
+## 6. Task Scenarios
+### 6.1 Task 1: EASY - "Simple Team Sync"
+**Description**: Schedule a 30-minute team sync with 2 attendees who have sparse calendars.
+**Scenario JSON**: `server/scenarios/task1_easy.json`
+```json
+{
+  "task_id": "task1_easy",
+  "description": "Schedule a 30-minute team sync with 2 attendees",
+  "difficulty": "easy",
+  "meeting_request": {
+    "duration": 30,
+    "priority": 3,
+    "attendees": ["user1", "user2"],
+    "summary": "Team Sync"
+  },
+  "calendars": {
+    "user1": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Morning standup"],
+      ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 3, "Client call"]
+    ],
+    "user2": [
+      ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Team meeting"],
+      ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 3, "1-on-1"]
+    ]
+  },
+  "preferences": {
+    "user1": {
+      "preferred_hours": {"start": 9, "end": 17},
+      "max_meetings_per_day": 6,
+      "avoid_back_to_back": false,
+      "buffer_minutes": 0
+    },
+    "user2": {
+      "preferred_hours": {"start": 9, "end": 17},
+      "max_meetings_per_day": 6,
+      "avoid_back_to_back": false,
+      "buffer_minutes": 0
+    }
+  },
+  "expected_solution": {
+    "optimal_slot": "2025-04-07T10:00:00+00:00",
+    "expected_score_range": [0.8, 1.0],
+    "min_steps": 2,
+    "requires_rescheduling": false
+  }
+}
+```
+**Characteristics:**
+- 2 attendees (low coordination complexity)
+- Sparse calendars (2-3 meetings each)
+- Loose preferences (no back-to-back rules, wide hours)
+- Multiple free slots available
+- No rescheduling required
+**Grading:**
+- ✅ 0.8-1.0: Agent finds free slot in 2-4 steps
+- ⚠️ 0.5-0.8: Agent finds slot but inefficient (many steps)
+- ❌ 0.0-0.5: Agent fails or violates constraints
+### 6.2 Task 2: MEDIUM - "Cross-Team Planning"
+**Description**: Schedule a 60-minute planning session with 4 attendees with moderate calendar density.
+**Scenario JSON**: `server/scenarios/task2_medium.json`
+```json
+{
+  "task_id": "task2_medium",
+  "description": "Schedule a 60-minute planning session with 4 attendees",
+  "difficulty": "medium",
+  "meeting_request": {
+    "duration": 60,
+    "priority": 2,
+    "attendees": ["user1", "user2", "user3", "user4"],
+    "summary": "Cross-Team Planning"
+  },
+  "calendars": {
+    "user1": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Standup"],
+      ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 3, "Review"],
+      ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "Lunch meeting"],
+      ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 4, "Optional workshop"],
+      ["2025-04-07T16:30:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Sync"]
+    ],
+    "user2": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Standup"],
+      ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Client demo"],
+      ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 3, "Code review"],
+      ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Office hours"]
+    ],
+    "user3": [
+      ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 3, "Design review"],
+      ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Team lunch"],
+      ["2025-04-07T14:00:00+00:00", "2025-04-07T15:30:00+00:00", 2, "Sprint planning"],
+      ["2025-04-07T16:00:00+00:00", "2025-04-07T16:30:00+00:00", 4, "Coffee chat"]
+    ],
+    "user4": [
+      ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Strategy meeting"],
+      ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "1-on-1"],
+      ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 3, "Team sync"]
+    ]
+  },
+  "preferences": {
+    "user1": {
+      "preferred_hours": {"start": 10, "end": 16},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 15
+    },
+    "user2": {
+      "preferred_hours": {"start": 9, "end": 17},
+      "max_meetings_per_day": 4,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 10
+    },
+    "user3": {
+      "preferred_hours": {"start": 9, "end": 15},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": false,
+      "buffer_minutes": 0
+    },
+    "user4": {
+      "preferred_hours": {"start": 10, "end": 17},
+      "max_meetings_per_day": 6,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 15
+    }
+  },
+  "expected_solution": {
+    "optimal_slot": "2025-04-07T11:00:00+00:00",
+    "expected_score_range": [0.5, 0.7],
+    "min_steps": 3,
+    "requires_rescheduling": true,
+    "reschedulable_meetings": ["user3:Coffee chat (priority 4)"]
+  }
+}
+```
+**Characteristics:**
+- 4 attendees (moderate coordination)
+- Moderate calendar density (5-7 meetings each)
+- Conflicting preferences (narrow vs. wide hours)
+- Back-to-back avoidance rules
+- Requires 1 rescheduling
+**Grading:**
+- ✅ 0.6-0.7: Efficient rescheduling, respects preferences
+- ⚠️ 0.5-0.6: Valid solution with preference violations
+- ❌ 0.0-0.5: Excessive rescheduling or failure
+### 6.3 Task 3: HARD - "Executive Scheduling"
+**Description**: Schedule a 45-minute executive meeting with 6 attendees with very dense calendars.
+**Scenario JSON**: `server/scenarios/task3_hard.json`
+```json
+{
+  "task_id": "task3_hard",
+  "description": "Schedule a 45-minute executive meeting with 6 attendees",
+  "difficulty": "hard",
+  "meeting_request": {
+    "duration": 45,
+    "priority": 2,
+    "attendees": ["user1", "user2", "user3", "user4", "user5", "user6"],
+    "summary": "Executive Planning Session"
+  },
+  "calendars": {
+    "user1": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Strategy meeting"],
+      ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 3, "Team standup"],
+      ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Lunch meeting"],
+      ["2025-04-07T13:30:00+00:00", "2025-04-07T14:30:00+00:00", 2, "Client call"],
+      ["2025-04-07T15:00:00+00:00", "2025-04-07T15:45:00+00:00", 4, "Optional training"],
+      ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Project sync"]
+    ],
+    "user2": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T09:30:00+00:00", 2, "Morning sync"],
+      ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Design review"],
+      ["2025-04-07T11:30:00+00:00", "2025-04-07T12:30:00+00:00", 3, "Code review"],
+      ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "1-on-1"],
+      ["2025-04-07T14:30:00+00:00", "2025-04-07T15:30:00+00:00", 2, "Planning meeting"],
+      ["2025-04-07T16:00:00+00:00", "2025-04-07T16:45:00+00:00", 4, "Coffee chat"]
+    ],
+    "user3": [
+      ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 3, "Sprint planning"],
+      ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Architecture review"],
+      ["2025-04-07T12:30:00+00:00", "2025-04-07T13:30:00+00:00", 3, "Team lunch"],
+      ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 2, "Client demo"],
+      ["2025-04-07T15:30:00+00:00", "2025-04-07T16:15:00+00:00", 4, "Office hours"]
+    ],
+    "user4": [
+      ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Board meeting"],
+      ["2025-04-07T11:30:00+00:00", "2025-04-07T12:30:00+00:00", 3, "Product review"],
+      ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 2, "Executive sync"],
+      ["2025-04-07T14:30:00+00:00", "2025-04-07T15:30:00+00:00", 3, "Team meeting"],
+      ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 4, "Mentor session"]
+    ],
+    "user5": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 3, "Daily standup"],
+      ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 2, "Strategic planning"],
+      ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Working lunch"],
+      ["2025-04-07T13:30:00+00:00", "2025-04-07T14:30:00+00:00", 3, "Performance review"],
+      ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 2, "Budget meeting"],
+      ["2025-04-07T16:30:00+00:00", "2025-04-07T17:00:00+00:00", 4, "Optional networking"]
+    ],
+    "user6": [
+      ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 2, "Leadership meeting"],
+      ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 3, "Project checkpoint"],
+      ["2025-04-07T12:30:00+00:00", "2025-04-07T13:30:00+00:00", 3, "Team sync"],
+      ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 2, "Client meeting"],
+      ["2025-04-07T15:30:00+00:00", "2025-04-07T16:30:00+00:00", 4, "Training session"]
+    ]
+  },
+  "preferences": {
+    "user1": {
+      "preferred_hours": {"start": 10, "end": 16},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 15
+    },
+    "user2": {
+      "preferred_hours": {"start": 9, "end": 17},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 15
+    },
+    "user3": {
+      "preferred_hours": {"start": 9, "end": 15},
+      "max_meetings_per_day": 4,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 20
+    },
+    "user4": {
+      "preferred_hours": {"start": 10, "end": 17},
+      "max_meetings_per_day": 6,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 10
+    },
+    "user5": {
+      "preferred_hours": {"start": 9, "end": 16},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 15
+    },
+    "user6": {
+      "preferred_hours": {"start": 9, "end": 16},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 10
+    }
+  },
+  "expected_solution": {
+    "optimal_slot": "2025-04-07T15:00:00+00:00",
+    "expected_score_range": [0.25, 0.45],
+    "min_steps": 5,
+    "requires_rescheduling": true,
+    "reschedulable_meetings": [
+      "user1:Optional training (priority 4)",
+      "user2:Coffee chat (priority 4)",
+      "user5:Optional networking (priority 4)"
+    ],
+    "notes": "Multiple valid solutions exist. Agent must reschedule 3+ low-priority meetings."
+  }
+}
+```
+**Characteristics:**
+- 6 attendees (high coordination complexity)
+- Dense calendars (5-6 meetings each)
+- Conflicting narrow preference windows (user3: 9-15, user1: 10-16)
+- All users near max_meetings_per_day limit
+- Requires rescheduling 3+ meetings
+- Cascading rescheduling needed
+**Grading:**
+- ✅ 0.3-0.45: Successfully reschedules 3+ meetings in 5-8 steps
+- ⚠️ 0.2-0.3: Valid solution but excessive steps/rescheduling
+- ❌ 0.0-0.2: Gives up or violates priority rules
+---
+## 7. Grader Implementation
+```python
+class SchedulingGrader:
+    """Programmatic grader for scheduling tasks"""
+    def grade_episode(
+        self,
+        task_id: str,
+        final_state: SchedulingState,
+        final_observation: SchedulingObservation
+    ) -> float:
+        """
+        Calculate episode score in [0.0, 1.0] range
+        Process:
+        1. Check if successfully scheduled (done=True, success=True)
+        2. Use final_reward from calculate_final_reward()
+        3. Apply penalty for constraint violations
+        4. Return score
+        """
+        # Failed to schedule
+        if not final_state.completed or not final_observation.success:
+            return 0.0
+        # Get final reward (already in [0.0, 1.0] range)
+        score = final_state.final_reward
+        # Check for hard constraint violations
+        violations = self._check_violations(final_state)
+        if violations:
+            # Severe penalty for violations
+            score *= 0.5  # Cut score in half
+            logger.warning(f"Constraint violations: {violations}")
+        return score
+    def _check_violations(self, state: SchedulingState) -> List[str]:
+        """Detect hard constraint violations"""
+        violations = []
+        # Violation 1: Rescheduled higher priority meeting
+        for rescheduled in state.rescheduled_meetings:
+            original_meeting = find_original_meeting(
+                state.calendars,
+                rescheduled['attendee'],
+                rescheduled['old_start']
+            )
+            if original_meeting and original_meeting.priority <= state.meeting_request['priority']:
+                violations.append(
+                    f"Rescheduled higher priority meeting: "
+                    f"{rescheduled['attendee']} {rescheduled['old_start']}"
+                )
+        # Violation 2: Proposed slot outside collective working hours
+        if state.proposed_slot:
+            start, end = state.proposed_slot
+            collective_hours = calculate_collective_hours(state.participant_preferences)
+            if start.hour < collective_hours['min_start'] or end.hour > collective_hours['max_end']:
+                violations.append(
+                    f"Proposed slot outside working hours: "
+                    f"{start.isoformat()} to {end.isoformat()}"
+                )
+        # Violation 3: Overlapping meetings after rescheduling
+        for user_id, calendar in state.calendars.items():
+            overlaps = find_overlapping_meetings(calendar)
+            if overlaps:
+                violations.append(f"Overlapping meetings for {user_id}: {overlaps}")
+        return violations
+```
+### 7.1 Score Diversity Validation
+```python
+def validate_score_diversity():
+    """
+    Verify graders return diverse scores (not same score every time)
+    Runs 100 random episodes per task and checks:
+    - Variance > 0.01 (scores are diverse)
+    - Unique scores >= 20 (not clustering)
+    """
+    for task_id in ["task1_easy", "task2_medium", "task3_hard"]:
+        scores = []
+        for _ in range(100):
+            # Random agent policy
+            score = run_random_episode(task_id)
+            scores.append(score)
+        # Statistical checks
+        variance = np.var(scores)
+        unique_scores = len(set(scores))
+        score_range = (min(scores), max(scores))
+        # Assertions (fail early if grader is broken)
+        assert variance > 0.01, f"{task_id}: Scores too uniform (var={variance:.4f})"
+        assert unique_scores >= 20, f"{task_id}: Only {unique_scores} unique scores"
+        print(f"{task_id}: ✅ Pass")
+        print(f"  Variance: {variance:.4f}")
+        print(f"  Unique scores: {unique_scores}")
+        print(f"  Range: {score_range}")
+```
+---
+## 8. BotBooked Integration
+### 8.1 Porting Strategy
+The environment reuses proven logic from BotBooked (30KB `app.py`):
+**Functions to Port:**
+1. **find_earliest_slot()** → Used in environment to validate agent proposals
+2. **calculate_preference_score()** → Direct port for reward calculation
+3. **handle_rescheduling()** → Not directly used (agent does this), but reference for validation
+4. **check_back_to_back()** → Helper for preference scoring
+5. **parse_calendars()** → Calendar format conversion
+**Translation Layer:**
+```python
+# BotBooked format
+calendars = {
+    'user1': [
+        (datetime(2025, 4, 7, 9, 0), datetime(2025, 4, 7, 10, 0), 2, "Standup"),
+        ...
+    ]
+}
+# Environment state format (same)
+state.calendars = calendars  # No translation needed!
+# Scenario JSON format → BotBooked format
+def load_scenario(scenario_json: Dict) -> Tuple[Dict, Dict]:
+    """
+    Convert JSON scenario to BotBooked calendar format
+    Input: JSON with ISO8601 strings
+    Output: Dict with datetime tuples
+    """
+    calendars = {}
+    for user_id, meetings in scenario_json['calendars'].items():
+        calendars[user_id] = [
+            (
+                parse_iso8601(start),
+                parse_iso8601(end),
+                priority,
+                summary
+            )
+            for start, end, priority, summary in meetings
+        ]
+    return calendars, scenario_json['preferences']
+```
+### 8.2 Key Differences from BotBooked
+| Aspect | BotBooked | SchedulingEnv |
+|--------|-----------|---------------|
+| **Input** | Natural language email | Structured JSON scenario |
+| **LLM Usage** | Qwen-3 for parsing | No LLM (agent learns policy) |
+| **Algorithm** | Two-pass search (free → reschedulable) | Agent explores action space |
+| **Rescheduling** | Automatic recursion | Agent decides step-by-step |
+| **Output** | Scheduled meeting JSON | Reward signal for RL training |
+| **Fallbacks** | 3 fallback strategies | Episode terminates on failure |
+**Design Principle**: Environment provides state and validates actions; agent learns the scheduling strategy.
+### 8.3 BotBooked Integration Scope
+**What We Port from BotBooked**:
+1. **Validation functions**: `check_conflicts()`, `validate_constraints()` - ensures realistic constraints
+2. **Reward calculation**: `calculate_preference_score()` - proven penalty scoring (50/30/20 points)
+3. **Reference baseline**: `find_earliest_slot()` - used as heuristic baseline in `inference.py`
+**What We DON'T Port**:
+- BotBooked's automatic two-pass algorithm is NOT the agent's policy
+- Agent must learn its own scheduling strategy through RL
+- BotBooked provides ground truth for validation, not the agent's decision-making
+**Baseline Policy**:
+The heuristic baseline in `inference.py` uses BotBooked's `find_earliest_slot()` as a greedy policy for comparison. This establishes a performance floor - RL agents should learn to exceed this baseline by exploring better scheduling strategies.
+---
+## 9. Implementation Details
+### 9.1 Dependencies
+```toml
+# pyproject.toml
+[project]
+name = "scheduling-env"
+version = "0.1.0"
+dependencies = [
+    "openenv-core>=0.2.0",
+    "pydantic>=2.0.0",
+    "fastapi>=0.100.0",
+    "uvicorn>=0.23.0",
+    "python-dateutil>=2.8.0",
+    "openai>=1.0.0",  # For baseline inference
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=7.4.0",
+    "pytest-asyncio>=0.21.0",
+    "black>=23.0.0",
+    "mypy>=1.5.0",
+]
+```
+### 9.2 Environment Variables
+```bash
+# .env.example
+# NOTE: API keys NOT required for baseline inference (uses heuristic policy)
+# These are only needed if you want to test LLM-based agents later
+# API_BASE_URL=https://router.huggingface.co/v1  # Optional
+# MODEL_NAME=Qwen/Qwen2.5-72B-Instruct           # Optional
+# HF_TOKEN=your_hf_token_here                     # Optional
+LOCAL_IMAGE_NAME=scheduling-env:latest            # Optional for Docker
+```
+### 9.3 OpenEnv Configuration
+```yaml
+# openenv.yaml
+spec_version: 1
+name: scheduling_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000
+description: "Intelligent Meeting Scheduling Environment - Learn optimal scheduling through multi-stakeholder preference optimization"
+tags:
+  - scheduling
+  - calendar
+  - optimization
+  - multi-agent
+  - real-world
+```
+### 9.4 Dockerfile
+```dockerfile
+# Dockerfile (ROOT directory)
+FROM python:3.11-slim
+WORKDIR /app
+# Install dependencies
+COPY pyproject.toml .
+RUN pip install --no-cache-dir -e .
+# Copy code
+COPY scheduling_env/ ./scheduling_env/
+COPY server/ ./server/
+COPY inference.py .
+# Expose port
+EXPOSE 8000
+# Run server
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
+```
+### 9.5 Inference Script Structure
+```python
+# inference.py (ROOT directory)
+"""
+Baseline inference script for scheduling environment
+CRITICAL DESIGN DECISION: NO LLM USED
+- Uses HEURISTIC baseline policy (BotBooked greedy algorithm)
+- Deterministic and reproducible
+- Fast execution (~30 seconds for all 3 tasks)
+- No API keys required (pure algorithmic baseline)
+Requirements:
+- Outputs [START], [STEP], [END] format to stdout
+- Completes in < 20 minutes on vcpu=2, memory=8GB (actual: ~30 seconds)
+"""
+import os
+from datetime import datetime, timedelta
+from scheduling_env import SchedulingEnv, SchedulingAction
+from server.scheduling_logic import find_earliest_slot
+def baseline_policy(obs) -> SchedulingAction:
+    """
+    Heuristic baseline using BotBooked two-pass greedy algorithm
+    Strategy:
+    1. If no proposal yet: Use find_earliest_slot() to propose
+    2. If conflicts exist: Reschedule lowest-priority conflict
+    3. If no conflicts: Finalize
+    NO LLM - Pure algorithmic baseline for reproducibility
+    """
+    # Step 1: No proposal yet - find earliest slot
+    if obs.current_proposal is None:
+        # Convert observation to BotBooked calendar format
+        calendars = {}
+        for slot in obs.busy_slots:
+            attendee = slot['attendee']
+            if attendee not in calendars:
+                calendars[attendee] = []
+            calendars[attendee].append((
+                datetime.fromisoformat(slot['start']),
+                datetime.fromisoformat(slot['end']),
+                slot['priority'],
+                slot['summary']
+            ))
+        # Use BotBooked's find_earliest_slot (already implemented!)
+        result = find_earliest_slot(
+            calendars=calendars,
+            attendees=obs.attendee_ids,
+            duration_minutes=obs.requested_duration,
+            new_meeting_priority=obs.requested_priority,
+            search_start_time=datetime.now(),
+            max_preference_score=100
+        )
+        if result:
+            (start_time, end_time), conflicts = result
+            return SchedulingAction(
+                action_type="propose_slot",
+                proposed_start=start_time.isoformat(),
+                proposed_duration=obs.requested_duration
+            )
+        else:
+            # No slot found - reject
+            return SchedulingAction(action_type="reject")
+    # Step 2: Has proposal with conflicts - reschedule lowest priority
+    elif len(obs.conflicts) > 0:
+        # Sort conflicts by priority (highest number = lowest priority)
+        sorted_conflicts = sorted(obs.conflicts, key=lambda x: x['priority'], reverse=True)
+        lowest_priority_conflict = sorted_conflicts[0]
+        # Find next available slot after proposed meeting
+        conflict_duration = (
+            datetime.fromisoformat(lowest_priority_conflict['end']) -
+            datetime.fromisoformat(lowest_priority_conflict['start'])
+        ).seconds // 60
+        # Search after proposed slot + 15 min buffer
+        new_slot_start = datetime.fromisoformat(obs.current_proposal['end']) + timedelta(minutes=15)
+        return SchedulingAction(
+            action_type="reschedule_meeting",
+            meeting_id_to_move=lowest_priority_conflict['meeting_id'],
+            new_start_time=new_slot_start.isoformat()
+        )
+    # Step 3: No conflicts - finalize!
+    else:
+        return SchedulingAction(action_type="finalize")
+def main():
+    # Initialize environment (no API keys needed)
+    env = SchedulingEnv(base_url="http://localhost:8000").sync()
+    for task_id in ["task1_easy", "task2_medium", "task3_hard"]:
+        print(f"[START] task={task_id} env=scheduling_env model=heuristic_baseline")
+        obs = env.reset(task_id=task_id)
+        done = False
+        step = 0
+        rewards = []
+        while not done and step < 20:
+            # Heuristic baseline policy (NO LLM)
+            action = baseline_policy(obs)
+            result = env.step(action)
+            obs = result.observation
+            done = obs.done
+            reward = obs.reward
+            rewards.append(reward)
+            step += 1
+            # Log step
+            error = obs.error_message if obs.error_message else "null"
+            print(f"[STEP] step={step} action={action.action_type} reward={reward:.2f} done={str(done).lower()} error={error}")
+        # CRITICAL FIX: Final score is the LAST reward (when done=True)
+        # NOT the average of step rewards!
+        final_score = rewards[-1] if (done and rewards) else 0.0
+        success = obs.success
+        rewards_str = ",".join([f"{r:.2f}" for r in rewards])
+        print(f"[END] success={str(success).lower()} steps={step} score={final_score:.2f} rewards={rewards_str}")
+        env.reset()
+    env.close()
+if __name__ == "__main__":
+    main()
+```
+---
+## 10. Testing & Validation
+### 10.1 Unit Tests
+```python
+# tests/test_environment.py
+import pytest
+from scheduling_env.server.environment import SchedulingEnvironment
+from scheduling_env.models import SchedulingAction
+def test_reset_loads_scenario():
+    """Test that reset() loads scenario correctly"""
+    env = SchedulingEnvironment()
+    obs = env.reset(task_id="task1_easy")
+    assert obs.requested_duration == 30
+    assert len(obs.attendee_ids) == 2
+    assert not obs.done
+def test_propose_slot_no_conflicts():
+    """Test proposing a free slot"""
+    env = SchedulingEnvironment()
+    env.reset(task_id="task1_easy")
+    action = SchedulingAction(
+        action_type="propose_slot",
+        proposed_start="2025-04-07T10:00:00+00:00",
+        proposed_duration=30
+    )
+    obs = env.step(action)
+    assert obs.reward > 0  # Should be positive (good proposal)
+    assert len(obs.conflicts) == 0
+    assert not obs.done
+def test_reschedule_meeting():
+    """Test rescheduling a conflicting meeting"""
+    env = SchedulingEnvironment()
+    env.reset(task_id="task2_medium")
+    # Propose slot with conflict
+    action1 = SchedulingAction(
+        action_type="propose_slot",
+        proposed_start="2025-04-07T15:00:00+00:00",
+        proposed_duration=60
+    )
+    obs1 = env.step(action1)
+    assert len(obs1.conflicts) > 0
+    # Reschedule conflict
+    conflict_id = obs1.conflicts[0]['meeting_id']
+    action2 = SchedulingAction(
+        action_type="reschedule_meeting",
+        meeting_id_to_move=conflict_id,
+        new_start_time="2025-04-07T17:00:00+00:00"
+    )
+    obs2 = env.step(action2)
+    assert obs2.num_rescheduled == 1
+    assert obs2.reward > 0
+def test_finalize_success():
+    """Test finalizing a valid schedule"""
+    env = SchedulingEnvironment()
+    env.reset(task_id="task1_easy")
+    # Propose free slot
+    env.step(SchedulingAction(
+        action_type="propose_slot",
+        proposed_start="2025-04-07T10:00:00+00:00",
+        proposed_duration=30
+    ))
+    # Finalize
+    obs = env.step(SchedulingAction(action_type="finalize"))
+    assert obs.done
+    assert obs.success
+    assert obs.reward > 0.5  # Should be high reward
+```
+### 10.2 Integration Tests
+```python
+# tests/test_graders.py
+def test_score_diversity():
+    """Test that graders return diverse scores"""
+    from scheduling_env.server.graders import SchedulingGrader
+    grader = SchedulingGrader()
+    scores = []
+    # Run 50 random episodes
+    for _ in range(50):
+        env = SchedulingEnvironment()
+        env.reset(task_id="task2_medium")
+        # Random policy
+        while not env.state().completed:
+            action = random_action()
+            env.step(action)
+        score = grader.grade_episode(
+            "task2_medium",
+            env.state(),
+            env._last_observation
+        )
+        scores.append(score)
+    # Check diversity
+    variance = np.var(scores)
+    unique = len(set(scores))
+    assert variance > 0.01, f"Scores too uniform: var={variance}"
+    assert unique >= 15, f"Only {unique} unique scores"
+def test_reward_range():
+    """Test that all rewards are in [0.0, 1.0] range"""
+    env = SchedulingEnvironment()
+    for task_id in ["task1_easy", "task2_medium", "task3_hard"]:
+        env.reset(task_id=task_id)
+        for _ in range(10):
+            action = random_action()
+            obs = env.step(action)
+            assert 0.0 <= obs.reward <= 1.0, f"Reward out of range: {obs.reward}"
+            if obs.done:
+                break
+```
+### 10.3 Pre-Submission Checklist
+```bash
+#!/bin/bash
+# validate-submission.sh
+echo "=== OpenEnv Scheduling Environment Validation ==="
+# 1. OpenEnv validate
+echo "1. Running openenv validate..."
+openenv validate || exit 1
+# 2. Docker build
+echo "2. Building Docker image..."
+docker build -t scheduling-env:latest . || exit 1
+# 3. Run tests
+echo "3. Running tests..."
+pytest tests/ || exit 1
+# 4. Score diversity check
+echo "4. Checking score diversity..."
+python -m tests.validate_diversity || exit 1
+# 5. Inference script
+echo "5. Testing inference script..."
+docker run -e HF_TOKEN=$HF_TOKEN scheduling-env:latest python inference.py || exit 1
+# 6. HF Space deployment test
+echo "6. Deploying to HF Space..."
+openenv push || exit 1
+echo "✅ All validations passed!"
+```
+---
+## 11. Deployment
+### 11.1 Local Development
+```bash
+# Install dependencies
+pip install -e .
+# Run server
+uvicorn server.app:app --reload --port 8000
+# In another terminal, test client
+python -c "
+from scheduling_env import SchedulingEnv
+env = SchedulingEnv(base_url='http://localhost:8000').sync()
+obs = env.reset(task_id='task1_easy')
+print(obs)
+"
+```
+### 11.2 Docker Deployment
+```bash
+# Build image
+docker build -t scheduling-env:latest .
+# Run container
+docker run -p 8000:8000 -e HF_TOKEN=$HF_TOKEN scheduling-env:latest
+# Test inference
+docker exec -it <container_id> python inference.py
+```
+### 11.3 Hugging Face Spaces
+```bash
+# Initialize openenv
+openenv init
+# Validate
+openenv validate
+# Push to HF Spaces
+openenv push
+# Test deployed space
+curl https://your-space.hf.space/reset -X POST \
+  -H "Content-Type: application/json" \
+  -d '{"task_id": "task1_easy"}'
+```
+---
+## 12. Success Metrics
+### 12.1 Hackathon Criteria Checklist
+- [x] **Real-world utility (30%)**: Executive scheduling ($10B+ industry)
+- [x] **3 tasks with graders (25%)**: Easy/Medium/Hard with programmatic scoring
+- [x] **Environment design (20%)**: Multi-step actions, dense rewards, clean state
+- [x] **Code quality (15%)**: OpenEnv spec, Pydantic models, working Dockerfile
+- [x] **Creativity (10%)**: Novel domain (first scheduling RL env), multi-stakeholder optimization
+### 12.2 Technical Requirements
+- [x] Typed Action/Observation/State Pydantic models
+- [x] `step()`, `reset()`, `state()` API
+- [x] `openenv.yaml` with metadata
+- [x] Passes `openenv validate`
+- [x] `inference.py` in root with [START]/[STEP]/[END] logging
+- [x] Dockerfile in root (not /server)
+- [x] Scores in [0.0, 1.0] range
+- [x] Graders return diverse scores (multi-component formula)
+- [x] < 20 min runtime on vcpu=2, memory=8GB
+### 12.3 Expected Scores
+| Task | Difficulty | Random Agent | Heuristic Baseline | RL Agent Target | Expected Range |
+|------|------------|--------------|-------------------|-----------------|----------------|
+| Task 1 | Easy | 0.3-0.5 | 0.90-0.98 | 0.95-1.0 | 0.7-1.0 |
+| Task 2 | Medium | 0.1-0.3 | 0.55-0.70 | 0.75-0.85 | 0.4-0.8 |
+| Task 3 | Hard | 0.0-0.1 | 0.25-0.45 | 0.50-0.70 | 0.1-0.6 |
+**Notes**:
+- **Random Agent**: Takes random valid actions (for diversity validation)
+- **Heuristic Baseline**: BotBooked greedy algorithm (no LLM, deterministic)
+- **RL Agent Target**: What a trained RL agent should achieve
+- **Expected Range**: Full score distribution across all agent types
+---
+## 13. Future Enhancements
+(Not for initial hackathon submission, but documented for post-hackathon)
+1. **Recurring meetings**: Add support for weekly/bi-weekly scheduling
+2. **Time zone handling**: Multi-timezone scheduling
+3. **Preference learning**: Agent learns user preferences from feedback
+4. **Calendar integration**: Real Google Calendar API integration
+5. **Multi-day scheduling**: Schedule across multiple days
+6. **Attendee prioritization**: Weight attendees by importance
+7. **Meeting splitting**: Divide long meetings into multiple shorter slots
+8. **Travel time**: Account for physical meeting location travel time
+---
+## Appendix A: BotBooked Algorithm Reference
+### Original Two-Pass Algorithm
+```python
+def find_earliest_slot(calendars, attendees, duration, priority):
+    """
+    BotBooked's proven scheduling algorithm (reference only)
+    Pass 1: Find completely free slot
+    Pass 2: Find reschedulable slot (if Pass 1 fails)
+    """
+    # Pass 1: Free slot search
+    busy_slots = aggregate_busy_slots(calendars, attendees)
+    for gap_start, gap_end in find_gaps(busy_slots):
+        if gap_end - gap_start >= duration:
+            if within_work_hours(gap_start) and preference_score(gap_start) < 100:
+                return (gap_start, gap_start + duration), []
+    # Pass 2: Reschedulable slot search
+    for potential_start in iterate_time_slots():
+        conflicts = find_conflicts(calendars, potential_start, duration)
+        if all(c.priority > priority for c in conflicts):
+            if within_work_hours(potential_start) and preference_score(potential_start) < 150:
+                return (potential_start, potential_start + duration), conflicts
+    return None
+```
+---
+## Appendix B: Action Space Examples
+### Example 1: Perfect Slot (Task 1)
+```json
+{
+  "action_type": "propose_slot",
+  "proposed_start": "2025-04-07T10:00:00+00:00",
+  "proposed_duration": 30
+}
+```
+**Expected Response:**
+```json
+{
+  "reward": 0.5,
+  "done": false,
+  "conflicts": [],
+  "preference_penalty": 0.0,
+  "current_proposal": {
+    "start": "2025-04-07T10:00:00+00:00",
+    "end": "2025-04-07T10:30:00+00:00"
+  }
+}
+```
+### Example 2: Rescheduling (Task 2)
+```json
+{
+  "action_type": "reschedule_meeting",
+  "meeting_id_to_move": "user3_2025-04-07T16:00:00",
+  "new_start_time": "2025-04-07T17:00:00+00:00"
+}
+```
+**Expected Response:**
+```json
+{
+  "reward": 0.5,
+  "done": false,
+  "conflicts": [],
+  "num_rescheduled": 1,
+  "rescheduled_meetings": [
+    {
+      "meeting_id": "user3_2025-04-07T16:00:00",
+      "old_start": "2025-04-07T16:00:00+00:00",
+      "new_start": "2025-04-07T17:00:00+00:00"
+    }
+  ]
+}
+```
+### Example 3: Finalize
+```json
+{
+  "action_type": "finalize"
+}
+```
+**Expected Response:**
+```json
+{
+  "reward": 0.62,
+  "done": true,
+  "success": true,
+  "metadata": {
+    "final_slot": ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00"]
+  }
+}
+```
+---
+## Document Control
+**Version**: 1.1
+**Date**: 2025-04-06 (Updated 2026-04-07)
+**Status**: Approved for Implementation - CRITICAL FIXES APPLIED
+**Next Steps**: Begin implementation immediately (deadline: April 8th)
+---
+## APPENDIX C: Implementation Action Plan (8 Hours to Deadline)
+### Phase 1: Core Implementation (4 hours)
+#### 1.1 Port BotBooked Functions (1.5 hours)
+Create `server/scheduling_logic.py`:
+- Copy `calculate_preference_score()` from BotBooked (lines 251-279)
+- Copy `find_earliest_slot()` from BotBooked (lines 398-454)
+- Copy `get_user_preferences()` from BotBooked (lines 239-249)
+- Add helper functions for calendar manipulation
+#### 1.2 Implement Environment (1.5 hours)
+Create `server/environment.py`:
+- `SchedulingEnvironment` class with OpenEnv interface
+- `reset()` - Load scenario JSON, initialize state
+- `step()` - Process actions and return observations
+- `_process_propose_slot()` - Validate proposals using BotBooked logic
+- `_process_reschedule_meeting()` - Update calendars
+- `_process_finalize()` - Calculate final reward with NEW formula
+- `_handle_timeout()` - Partial credit implementation
+#### 1.3 Create Graders (30 min)
+Create `server/graders.py`:
+- `calculate_final_reward()` with NON-LINEAR penalties
+- `SchedulingGrader` class with validation checks
+- Score diversity validation functions
+#### 1.4 Write Task Scenarios (30 min)
+Create JSON files in `server/scenarios/`:
+- `task1_easy.json` - 2 attendees, sparse calendars
+- `task2_medium.json` - 4 attendees, moderate density
+- `task3_hard.json` - 6 attendees, dense calendars (COMPLETE SPEC ABOVE)
+### Phase 2: Baseline & Testing (2 hours)
+#### 2.1 Implement Baseline Policy (45 min)
+Create `inference.py` (ROOT):
+- `baseline_policy()` using BotBooked greedy algorithm
+- `convert_obs_to_calendar_format()` helper
+- Main loop with CORRECT score calculation (final reward, not average)
+- [START]/[STEP]/[END] logging format
+#### 2.2 Local Testing (1 hour)
+```bash
+# Terminal 1: Start server
+uvicorn server.app:app --port 8000 --reload
+# Terminal 2: Run inference
+python inference.py
+# Verify:
+# - All 3 tasks complete successfully
+# - Scores in expected ranges
+# - Runtime < 1 minute total
+# - Output format matches requirements
+```
+#### 2.3 Fix Bugs (15 min)
+- Debug any environment errors
+- Verify reward calculations match spec
+- Test edge cases (timeout, no solution, priority violations)
+### Phase 3: Docker & Deployment (2 hours)
+#### 3.1 Docker Setup (30 min)
+Create `Dockerfile` (ROOT):
+```dockerfile
+FROM python:3.11-slim
+WORKDIR /app
+COPY . .
+RUN pip install --no-cache-dir -e .
+EXPOSE 8000
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
+```
+Create `pyproject.toml`:
+```toml
+[project]
+name = "scheduling-env"
+version = "0.1.0"
+dependencies = [
+    "openenv-core>=0.2.0",
+    "pydantic>=2.0.0",
+    "fastapi>=0.100.0",
+    "uvicorn>=0.23.0",
+    "python-dateutil>=2.8.0",
+]
+```
+#### 3.2 Validation (30 min)
+```bash
+# Build Docker
+docker build -t scheduling-env:latest .
+# Test Docker
+docker run -p 8000:8000 scheduling-env:latest
+# OpenEnv validation
+openenv validate
+# Must pass ALL checks:
+# ✓ Pydantic models valid
+# ✓ openenv.yaml correct
+# ✓ Server responds to /reset
+```
+#### 3.3 Deploy to HF Spaces (1 hour)
+```bash
+# Push to Hugging Face
+openenv push
+# Test deployed space
+curl https://your-space.hf.space/reset \
+  -X POST \
+  -H "Content-Type: application/json" \
+  -d '{"task_id": "task1_easy"}'
+# Should return 200 OK with observation
+# Run inference on deployed space
+# Verify scores match local testing
+```
+### Critical Success Checklist
+Before submission, verify:
+- [ ] HF Space deploys (200 response to /reset)
+- [ ] `openenv validate` passes
+- [ ] Dockerfile builds without errors
+- [ ] `inference.py` runs in < 1 minute
+- [ ] All 3 tasks complete successfully
+- [ ] Scores in expected ranges (Task1: 0.9+, Task2: 0.6+, Task3: 0.3+)
+- [ ] [START]/[STEP]/[END] format correct
+- [ ] Reward function uses NON-LINEAR penalties
+- [ ] Partial credit on timeout implemented
+- [ ] Score diversity validation passes
+### Expected Timeline
+| Phase | Duration | Completion Time |
+|-------|----------|-----------------|
+| Phase 1: Core Implementation | 4 hours | +4 hours |
+| Phase 2: Baseline & Testing | 2 hours | +6 hours |
+| Phase 3: Docker & Deployment | 2 hours | +8 hours |
+| **Total** | **8 hours** | **Ready for submission** |
+### Estimated Score (After Fixes)
+| Criterion | Weight | Score | Points |
+|-----------|--------|-------|--------|
+| Real-world utility | 30% | Excellent | 27/30 |
+| Task & grader quality | 25% | Strong | 22/25 |
+| Environment design | 20% | Strong | 17/20 |
+| Code quality & spec | 15% | Excellent | 14/15 |
+| Creativity & novelty | 10% | Good | 7/10 |
+| **TOTAL** | **100%** | - | **87/100 (A-)** |
+**Projected Rank**: Top 15-20% if executed correctly
+---
+**End of Design Specification**

inference.py ADDED Viewed

	@@ -0,0 +1,198 @@

+#!/usr/bin/env python3
+"""
+Baseline inference script for the Meeting Scheduling RL Environment.
+Uses a HEURISTIC policy (BotBooked greedy algorithm) - NO LLM required.
+Deterministic, reproducible, fast (~seconds for all 3 tasks).
+Output format: [START]/[STEP]/[END] per hackathon spec.
+"""
+from __future__ import annotations
+import sys
+from datetime import datetime, timedelta, timezone
+from server.scheduling_env_environment import SchedulingEnvironment
+from models import SchedulingAction
+from server.scheduling_logic import find_earliest_free_slot, parse_iso
+def baseline_policy(obs) -> SchedulingAction:
+    """Heuristic baseline using greedy slot search + lowest-priority rescheduling."""
+    # Step 1: No proposal yet -> find a free slot
+    if obs.current_proposal is None:
+        # Build calendars dict from busy_slots
+        calendars = {}
+        for slot in obs.busy_slots:
+            att = slot["attendee"]
+            if att not in calendars:
+                calendars[att] = []
+            calendars[att].append([slot["start"], slot["end"], slot["priority"], slot["summary"]])
+        # Try to find a completely free slot
+        free = find_earliest_free_slot(
+            calendars,
+            obs.attendee_ids,
+            obs.requested_duration,
+            obs.busy_slots[0]["start"] if obs.busy_slots else "2025-04-07T09:00:00+00:00",
+            obs.collective_work_hours,
+        )
+        if free:
+            return SchedulingAction(
+                action_type="propose_slot",
+                proposed_start=free,
+                proposed_duration=obs.requested_duration,
+            )
+        # No completely free slot found.
+        # Scan 15-min increments within collective hours for a slot with only
+        # reschedulable conflicts (priority > requested_priority).
+        min_h = obs.collective_work_hours.get("min_start_hour", 9)
+        max_h = obs.collective_work_hours.get("max_end_hour", 17)
+        duration = obs.requested_duration
+        tz = timezone.utc
+        candidate = datetime(2025, 4, 7, min_h, 0, 0, tzinfo=tz)
+        end_boundary = datetime(2025, 4, 7, max_h, 0, 0, tzinfo=tz)
+        step_delta = timedelta(minutes=15)
+        best_candidate = None
+        best_conflict_count = 999
+        while candidate + timedelta(minutes=duration) <= end_boundary:
+            c_start = candidate.isoformat()
+            c_end = (candidate + timedelta(minutes=duration)).isoformat()
+            # Count conflicts at this candidate
+            conflicts_here = []
+            for att in obs.attendee_ids:
+                for entry in calendars.get(att, []):
+                    e_start = parse_iso(entry[0])
+                    e_end = parse_iso(entry[1])
+                    if candidate < e_end and e_start < candidate + timedelta(minutes=duration):
+                        conflicts_here.append(entry)
+            # Check if all conflicts are reschedulable
+            all_reschedulable = all(
+                c[2] > obs.requested_priority for c in conflicts_here
+            )
+            if all_reschedulable and len(conflicts_here) < best_conflict_count:
+                best_candidate = c_start
+                best_conflict_count = len(conflicts_here)
+                if best_conflict_count == 0:
+                    break  # Perfect slot
+            candidate += step_delta
+        if best_candidate:
+            return SchedulingAction(
+                action_type="propose_slot",
+                proposed_start=best_candidate,
+                proposed_duration=duration,
+            )
+        # Last resort: propose at collective hours start (will likely conflict)
+        fallback = f"2025-04-07T{min_h:02d}:00:00+00:00"
+        return SchedulingAction(
+            action_type="propose_slot",
+            proposed_start=fallback,
+            proposed_duration=obs.requested_duration,
+        )
+    # Step 2: Has proposal with conflicts -> reschedule lowest-priority conflict
+    if obs.conflicts:
+        sorted_conflicts = sorted(obs.conflicts, key=lambda x: x["priority"], reverse=True)
+        target = sorted_conflicts[0]
+        # Can only reschedule lower priority
+        if target["priority"] <= obs.requested_priority:
+            return SchedulingAction(action_type="reject")
+        # Find a free slot for this attendee to move the meeting to.
+        # Search in early morning (06:00-08:00) and late evening (17:00-20:00).
+        attendee = target["attendee"]
+        meeting_dur = parse_iso(target["end"]) - parse_iso(target["start"])
+        dur_min = int(meeting_dur.total_seconds() // 60)
+        # Build this attendee's calendar
+        att_cal = [
+            s for s in obs.busy_slots if s["attendee"] == attendee
+        ]
+        att_entries = [[s["start"], s["end"], s["priority"], s["summary"]] for s in att_cal]
+        new_time = None
+        # Try slots at 06:00, 06:30, 07:00, 07:30, 17:00, 17:30, 18:00, 18:30, 19:00
+        for h, m in [(6,0),(6,30),(7,0),(7,30),(17,0),(17,30),(18,0),(18,30),(19,0),(19,30),(20,0)]:
+            cand = datetime(2025, 4, 7, h, m, 0, tzinfo=timezone.utc)
+            cand_end = cand + timedelta(minutes=dur_min)
+            cand_iso = cand.isoformat()
+            cand_end_iso = cand_end.isoformat()
+            # Check free for this attendee
+            conflict_found = False
+            for e in att_entries:
+                es = parse_iso(e[0])
+                ee = parse_iso(e[1])
+                if cand < ee and es < cand_end:
+                    conflict_found = True
+                    break
+            if not conflict_found:
+                new_time = cand_iso
+                break
+        if not new_time:
+            # Give up on this conflict, try rejecting
+            return SchedulingAction(action_type="reject")
+        return SchedulingAction(
+            action_type="reschedule_meeting",
+            meeting_id_to_move=target["meeting_id"],
+            new_start_time=new_time,
+        )
+    # Step 3: No conflicts -> finalize
+    return SchedulingAction(action_type="finalize")
+def main():
+    env = SchedulingEnvironment()
+    for task_id in ["task1_easy", "task2_medium", "task3_hard"]:
+        print(f"[START] task={task_id} env=scheduling_env model=heuristic_baseline")
+        obs = env.reset(task_id=task_id)
+        done = False
+        step = 0
+        rewards = []
+        while not done and step < 20:
+            action = baseline_policy(obs)
+            obs = env.step(action)
+            done = obs.done
+            reward = obs.reward if obs.reward is not None else 0.0
+            rewards.append(reward)
+            step += 1
+            error = obs.error_message if obs.error_message else "null"
+            print(
+                f"[STEP]  step={step} action={action.action_type} "
+                f"reward={reward:.2f} done={str(done).lower()} error={error}"
+            )
+        final_score = rewards[-1] if (done and rewards) else 0.0
+        success = obs.success if hasattr(obs, "success") else False
+        rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+        print(
+            f"[END]   success={str(success).lower()} steps={step} "
+            f"score={final_score:.2f} rewards={rewards_str}"
+        )
+        print()
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,156 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data models for the Meeting Scheduling RL Environment.
+Defines the Action, Observation, and State Pydantic models used by the
+scheduling environment to coordinate meeting proposals, rescheduling,
+and conflict resolution across multiple attendees.
+"""
+from __future__ import annotations
+from typing import Any, Dict, List, Literal, Optional
+from pydantic import Field
+from openenv.core.env_server.types import Action, Observation, State
+class SchedulingAction(Action):
+    """Action the agent can take in the scheduling environment."""
+    action_type: Literal["propose_slot", "reschedule_meeting", "finalize", "reject"] = Field(
+        default="propose_slot",
+        description="Type of scheduling action to perform.",
+    )
+    proposed_start: Optional[str] = Field(
+        default=None,
+        description="ISO8601 datetime string for the proposed meeting start (used with propose_slot).",
+    )
+    proposed_duration: Optional[int] = Field(
+        default=None,
+        description="Duration in minutes for the proposed meeting (used with propose_slot).",
+    )
+    meeting_id_to_move: Optional[str] = Field(
+        default=None,
+        description="Identifier of an existing meeting to reschedule (used with reschedule_meeting).",
+    )
+    new_start_time: Optional[str] = Field(
+        default=None,
+        description="ISO8601 datetime string for the new start time of a rescheduled meeting.",
+    )
+class SchedulingObservation(Observation):
+    """Observation returned to the agent after each step."""
+    requested_duration: int = Field(
+        default=0,
+        description="Requested meeting duration in minutes.",
+    )
+    requested_priority: int = Field(
+        default=3,
+        description="Priority of the meeting request (1=highest, 5=lowest).",
+    )
+    attendee_ids: List[str] = Field(
+        default_factory=list,
+        description="List of attendee user IDs required for the meeting.",
+    )
+    busy_slots: List[Dict[str, Any]] = Field(
+        default_factory=list,
+        description="Busy time slots: [{start, end, priority, summary, attendee}].",
+    )
+    collective_work_hours: Dict[str, int] = Field(
+        default_factory=dict,
+        description="Shared working hours window: {min_start_hour, max_end_hour}.",
+    )
+    preference_constraints: Dict[str, Any] = Field(
+        default_factory=dict,
+        description="Attendee preference constraints (e.g. preferred times, avoid windows).",
+    )
+    current_proposal: Optional[Dict[str, str]] = Field(
+        default=None,
+        description="Currently proposed slot: {start, end} as ISO8601 strings.",
+    )
+    conflicts: List[Dict[str, Any]] = Field(
+        default_factory=list,
+        description="List of conflicts with the current proposal.",
+    )
+    preference_penalty: float = Field(
+        default=0.0,
+        description="Accumulated penalty from violating attendee preferences.",
+    )
+    num_rescheduled: int = Field(
+        default=0,
+        description="Number of existing meetings rescheduled so far.",
+    )
+    steps_taken: int = Field(
+        default=0,
+        description="Number of steps taken in the current episode.",
+    )
+    max_steps: int = Field(
+        default=20,
+        description="Maximum number of steps allowed in the episode.",
+    )
+    success: bool = Field(
+        default=False,
+        description="Whether the meeting was successfully scheduled.",
+    )
+    error_message: Optional[str] = Field(
+        default=None,
+        description="Error message if the last action was invalid.",
+    )
+class SchedulingState(State):
+    """Internal environment state tracking the full scheduling episode."""
+    task_id: str = Field(
+        default="",
+        description="Unique identifier for the current task.",
+    )
+    scenario_name: str = Field(
+        default="",
+        description="Human-readable name of the scheduling scenario.",
+    )
+    meeting_request: Dict[str, Any] = Field(
+        default_factory=dict,
+        description="The incoming meeting request details.",
+    )
+    calendars: Dict[str, List[Any]] = Field(
+        default_factory=dict,
+        description="Per-user calendars: {user_id: [[start, end, priority, summary], ...]}.",
+    )
+    participant_preferences: Dict[str, Dict[str, Any]] = Field(
+        default_factory=dict,
+        description="Per-participant scheduling preferences.",
+    )
+    proposed_slot: Optional[List[str]] = Field(
+        default=None,
+        description="Currently proposed slot as [start_iso, end_iso].",
+    )
+    rescheduled_meetings: List[Dict[str, Any]] = Field(
+        default_factory=list,
+        description="List of meetings that have been rescheduled during this episode.",
+    )
+    total_preference_penalty: float = Field(
+        default=0.0,
+        description="Cumulative penalty from preference violations.",
+    )
+    total_steps: int = Field(
+        default=0,
+        description="Total steps taken so far in the episode.",
+    )
+    final_reward: float = Field(
+        default=0.0,
+        description="Final computed reward for the episode.",
+    )
+    completed: bool = Field(
+        default=False,
+        description="Whether the episode has ended.",
+    )

openenv.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+spec_version: 1
+name: scheduling_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000
+description: "Intelligent Meeting Scheduling - Learn optimal scheduling through multi-stakeholder preference optimization"

openenv_scheduling_env.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,10 @@

+Metadata-Version: 2.4
+Name: openenv-scheduling_env
+Version: 0.1.0
+Summary: Scheduling Env environment for OpenEnv
+Requires-Python: >=3.10
+Requires-Dist: huggingface-hub>=1.9.1
+Requires-Dist: openenv-core[core]>=0.2.2
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

openenv_scheduling_env.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+README.md
+pyproject.toml
+./__init__.py
+./client.py
+./inference.py
+./models.py
+./sample_infrenae.py
+openenv_scheduling_env.egg-info/PKG-INFO
+openenv_scheduling_env.egg-info/SOURCES.txt
+openenv_scheduling_env.egg-info/dependency_links.txt
+openenv_scheduling_env.egg-info/entry_points.txt
+openenv_scheduling_env.egg-info/requires.txt
+openenv_scheduling_env.egg-info/top_level.txt
+server/__init__.py
+server/app.py
+server/graders.py
+server/scenario_generator.py
+server/scheduling_env_environment.py
+server/scheduling_logic.py

openenv_scheduling_env.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

openenv_scheduling_env.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [console_scripts]
2	+ server = scheduling_env.server.app:main

openenv_scheduling_env.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+huggingface-hub>=1.9.1
+openenv-core[core]>=0.2.2
+[dev]
+pytest>=8.0.0
+pytest-cov>=4.0.0

openenv_scheduling_env.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ scheduling_env

pyproject.toml ADDED Viewed

	@@ -0,0 +1,46 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-scheduling_env"
+version = "0.1.0"
+description = "Scheduling Env environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    "huggingface-hub>=1.9.1",
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    # install from github
+    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
+    "openenv-core[core]>=0.2.2",
+    # Environment-specific dependencies
+    # Add all dependencies needed for your environment here
+    # Examples:
+    # "numpy>=1.19.0",
+    # "torch>=2.0.0",
+    # "gymnasium>=0.29.0",
+    # "openspiel>=1.0.0",
+    # "smolagents>=1.22.0,<2",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m scheduling_env.server.app
+server = "scheduling_env.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["scheduling_env", "scheduling_env.server"]
+package-dir = { "scheduling_env" = ".", "scheduling_env.server" = "server" }

sample_infrenae.py ADDED Viewed

	@@ -0,0 +1,189 @@

+"""
+Inference Script Example
+===================================
+MANDATORY
+- Before submitting, ensure the following variables are defined in your environment configuration:
+    API_BASE_URL   The API endpoint for the LLM.
+    MODEL_NAME     The model identifier to use for inference.
+    HF_TOKEN       Your Hugging Face / API key.
+    LOCAL_IMAGE_NAME The name of the local image to use for the environment if you are using from_docker_image()
+                     method
+- Defaults are set only for API_BASE_URL and MODEL_NAME
+    (and should reflect your active inference setup):
+    API_BASE_URL = os.getenv("API_BASE_URL", "<your-active-endpoint>")
+    MODEL_NAME = os.getenv("MODEL_NAME", "<your-active-model>")
+- The inference script must be named `inference.py` and placed in the root directory of the project
+- Participants must use OpenAI Client for all LLM calls using above variables
+STDOUT FORMAT
+- The script must emit exactly three line types to stdout, in this order:
+    [START] task=<task_name> env=<benchmark> model=<model_name>
+    [STEP]  step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
+    [END]   success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
+  Rules:
+    - One [START] line at episode begin.
+    - One [STEP] line per step, immediately after env.step() returns.
+    - One [END] line after env.close(), always emitted (even on exception).
+    - reward and rewards are formatted to 2 decimal places.
+    - done and success are lowercase booleans: true or false.
+    - error is the raw last_action_error string, or null if none.
+    - All fields on a single line with no newlines within a line.
+    - Each tasks should return score in [0, 1]
+  Example:
+    [START] task=click-test env=miniwob model=Qwen3-VL-30B
+    [STEP] step=1 action=click('123') reward=0.00 done=false error=null
+    [STEP] step=2 action=fill('456','text') reward=0.00 done=false error=null
+    [STEP] step=3 action=click('789') reward=1.00 done=true error=null
+    [END] success=true steps=3 score=1.00 rewards=0.00,0.00,1.00
+"""
+import asyncio
+import os
+import textwrap
+from typing import List, Optional
+from openai import OpenAI
+from my_env_v4 import MyEnvV4Action, MyEnvV4Env
+IMAGE_NAME = os.getenv("IMAGE_NAME") # If you are using docker image
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
+MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
+TASK_NAME = os.getenv("MY_ENV_V4_TASK", "echo")
+BENCHMARK = os.getenv("MY_ENV_V4_BENCHMARK", "my_env_v4")
+MAX_STEPS = 8
+TEMPERATURE = 0.7
+MAX_TOKENS = 150
+SUCCESS_SCORE_THRESHOLD = 0.1  # normalized score in [0, 1]
+# Max possible reward: each token contributes 0.1, across all steps
+_MAX_REWARD_PER_STEP = MAX_TOKENS * 0.1
+MAX_TOTAL_REWARD = MAX_STEPS * _MAX_REWARD_PER_STEP
+SYSTEM_PROMPT = textwrap.dedent(
+    """
+    You are interacting with a simple echo environment.
+    Each turn you must send a message. The environment will echo it back.
+    Reward is proportional to message length: reward = len(message) * 0.1
+    Your goal is to maximize total reward by sending meaningful, substantive messages.
+    Reply with exactly one message string — no quotes, no prefixes, just the message text.
+    """
+).strip()
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
+def build_user_prompt(step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
+    history_block = "\n".join(history[-4:]) if history else "None"
+    return textwrap.dedent(
+        f"""
+        Step: {step}
+        Last echoed message: {last_echoed!r}
+        Last reward: {last_reward:.2f}
+        Previous steps:
+        {history_block}
+        Send your next message.
+        """
+    ).strip()
+def get_model_message(client: OpenAI, step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
+    user_prompt = build_user_prompt(step, last_echoed, last_reward, history)
+    try:
+        completion = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[
+                {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user", "content": user_prompt},
+            ],
+            temperature=TEMPERATURE,
+            max_tokens=MAX_TOKENS,
+            stream=False,
+        )
+        text = (completion.choices[0].message.content or "").strip()
+        return text if text else "hello"
+    except Exception as exc:
+        print(f"[DEBUG] Model request failed: {exc}", flush=True)
+        return "hello"
+async def main() -> None:
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    env = await MyEnvV4Env.from_docker_image(IMAGE_NAME)
+    history: List[str] = []
+    rewards: List[float] = []
+    steps_taken = 0
+    score = 0.0
+    success = False
+    log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
+    try:
+        result = await env.reset() # OpenENV.reset()
+        last_echoed = result.observation.echoed_message
+        last_reward = 0.0
+        for step in range(1, MAX_STEPS + 1):
+            if result.done:
+                break
+            message = get_model_message(client, step, last_echoed, last_reward, history)
+            result = await env.step(MyEnvV4Action(message=message))
+            obs = result.observation
+            reward = result.reward or 0.0
+            done = result.done
+            error = None
+            rewards.append(reward)
+            steps_taken = step
+            last_echoed = obs.echoed_message
+            last_reward = reward
+            log_step(step=step, action=message, reward=reward, done=done, error=error)
+            history.append(f"Step {step}: {message!r} -> reward {reward:+.2f}")
+            if done:
+                break
+        score = sum(rewards) / MAX_TOTAL_REWARD if MAX_TOTAL_REWARD > 0 else 0.0
+        score = min(max(score, 0.0), 1.0)  # clamp to [0, 1]
+        success = score >= SUCCESS_SCORE_THRESHOLD
+    finally:
+        try:
+            await env.close()
+        except Exception as e:
+            print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
+        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+if __name__ == "__main__":
+    asyncio.run(main())

server/Dockerfile ADDED Viewed

	@@ -0,0 +1,80 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=scheduling_env
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Scheduling Env environment server components."""
+from .scheduling_env_environment import SchedulingEnvironment
+__all__ = ["SchedulingEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,159 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+FastAPI application for the Meeting Scheduling RL Environment.
+Uses a custom HTTP server pattern (based on calendar_env reference)
+to maintain a persistent environment instance across HTTP calls,
+enabling stateful multi-step episodes via /reset → /step → /state.
+"""
+from __future__ import annotations
+import asyncio
+import logging
+from typing import Any, Dict, Optional
+from fastapi import Body, FastAPI
+from openenv.core.env_server.http_server import HTTPEnvServer
+try:
+    from ..models import SchedulingAction, SchedulingObservation, SchedulingState
+    from .scheduling_env_environment import SchedulingEnvironment
+except (ModuleNotFoundError, ImportError):
+    from models import SchedulingAction, SchedulingObservation, SchedulingState
+    from server.scheduling_env_environment import SchedulingEnvironment
+logger = logging.getLogger(__name__)
+class SchedulingHTTPEnvServer(HTTPEnvServer):
+    """Custom HTTP server that maintains a persistent env instance.
+    Follows the pattern from OpenEnv's calendar_env: subclass HTTPEnvServer,
+    create one persistent environment, and register custom routes that
+    use it for all HTTP requests.
+    """
+    def __init__(self, env, action_cls, observation_cls):
+        self.action_cls = action_cls
+        self.observation_cls = observation_cls
+        super().__init__(env=env, action_cls=action_cls, observation_cls=observation_cls)
+        # Persistent environment for HTTP endpoints
+        if callable(self._env_factory):
+            self.env = self._env_factory()
+        else:
+            self.env = self._env_factory
+    def register_routes(self, app: FastAPI) -> None:  # type: ignore[override]
+        """Register custom /reset, /step, /state endpoints."""
+        @app.post("/reset")
+        async def reset_handler(
+            body: Optional[Dict[str, Any]] = Body(default=None),
+        ) -> Dict[str, Any]:
+            body = body or {}
+            task_id = body.get("task_id", "task1_easy")
+            loop = asyncio.get_event_loop()
+            observation = await loop.run_in_executor(
+                None, lambda: self.env.reset(task_id=task_id)
+            )
+            obs_dict = (
+                observation.model_dump()
+                if hasattr(observation, "model_dump")
+                else observation.__dict__
+            )
+            return {
+                "observation": obs_dict,
+                "done": getattr(observation, "done", False),
+                "reward": getattr(observation, "reward", 0.0),
+                **obs_dict,
+            }
+        @app.post("/step")
+        async def step_handler(
+            body: Dict[str, Any] = Body(...),
+        ) -> Dict[str, Any]:
+            # Support both {"action": {...}} and direct action fields
+            action_data = body.get("action", body)
+            try:
+                action = self.action_cls(**action_data)
+            except Exception as e:
+                logger.error("Failed to deserialize action: %s", e)
+                return {
+                    "observation": {
+                        "success": False,
+                        "error_message": f"Invalid action: {e}",
+                        "done": False,
+                        "reward": -1.0,
+                    },
+                    "done": False,
+                    "reward": -1.0,
+                }
+            loop = asyncio.get_event_loop()
+            observation = await loop.run_in_executor(
+                None, self.env.step, action
+            )
+            obs_dict = (
+                observation.model_dump()
+                if hasattr(observation, "model_dump")
+                else observation.__dict__
+            )
+            return {
+                "observation": obs_dict,
+                "done": getattr(observation, "done", False),
+                "reward": getattr(observation, "reward", 0.0),
+                **obs_dict,
+            }
+        @app.get("/state")
+        async def state_handler() -> Dict[str, Any]:
+            state = self.env.state
+            return (
+                state.model_dump()
+                if hasattr(state, "model_dump")
+                else state.__dict__
+            )
+        @app.get("/health")
+        async def health_handler() -> Dict[str, str]:
+            return {"status": "healthy", "environment": "scheduling_env"}
+def create_scheduling_environment():
+    """Factory function for the scheduling environment."""
+    return SchedulingEnvironment()
+# Build the FastAPI app with custom stateful HTTP server
+app = FastAPI(
+    title="Scheduling RL Environment",
+    description="Intelligent Meeting Scheduling Environment for OpenEnv",
+    version="1.0.0",
+)
+_server = SchedulingHTTPEnvServer(
+    env=create_scheduling_environment,
+    action_cls=SchedulingAction,
+    observation_cls=SchedulingObservation,
+)
+_server.register_routes(app)
+def main():
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)
+if __name__ == "__main__":
+    main()

server/graders.py ADDED Viewed

	@@ -0,0 +1,87 @@

+"""Graders for the meeting-scheduling RL environment.
+Provides programmatic scoring (0.0–1.0) per episode and validation
+that graders produce diverse scores across different agent trajectories.
+"""
+from __future__ import annotations
+import logging
+from typing import List
+from .scheduling_logic import (
+    calculate_collective_hours,
+    calculate_final_reward,
+    find_conflicts,
+    parse_iso,
+)
+logger = logging.getLogger(__name__)
+class SchedulingGrader:
+    """Programmatic grader for scheduling tasks."""
+    def grade_episode(self, final_state, final_observation) -> float:
+        """Score an episode in [0.0, 1.0].
+        Returns ``final_state.final_reward`` when the episode completed
+        successfully, with a 50 % penalty applied if any hard constraint
+        violations are detected.
+        """
+        if not final_state.completed or not final_observation.success:
+            return 0.0
+        score = final_state.final_reward
+        violations = self._check_violations(final_state)
+        if violations:
+            score *= 0.5
+            logger.warning("Constraint violations: %s", violations)
+        return max(0.0, min(1.0, score))
+    def _check_violations(self, state) -> List[str]:
+        """Detect hard constraint violations in the final state."""
+        violations: List[str] = []
+        req_priority = state.meeting_request.get("priority", 99)
+        # Violation 1: Rescheduled equal-or-higher priority meeting
+        for rm in state.rescheduled_meetings:
+            attendee = rm["attendee"]
+            old_start = rm["old_start"]
+            for entry in state.calendars.get(attendee, []):
+                if entry[0] == old_start and entry[2] <= req_priority:
+                    violations.append(
+                        f"Rescheduled higher priority meeting: "
+                        f"{attendee} {old_start}"
+                    )
+        # Violation 2: Proposed slot outside collective working hours
+        if state.proposed_slot:
+            collective = calculate_collective_hours(state.participant_preferences)
+            start = parse_iso(state.proposed_slot[0])
+            end = parse_iso(state.proposed_slot[1])
+            if start.hour < collective["min_start_hour"]:
+                violations.append(
+                    f"Slot starts before working hours: {state.proposed_slot[0]}"
+                )
+            if end.hour > collective["max_end_hour"] or (
+                end.hour == collective["max_end_hour"] and end.minute > 0
+            ):
+                violations.append(
+                    f"Slot ends after working hours: {state.proposed_slot[1]}"
+                )
+        # Violation 3: Overlapping meetings after rescheduling
+        for user_id, calendar in state.calendars.items():
+            sorted_cal = sorted(calendar, key=lambda e: e[0])
+            for i in range(len(sorted_cal) - 1):
+                end_i = parse_iso(sorted_cal[i][1])
+                start_next = parse_iso(sorted_cal[i + 1][0])
+                if end_i > start_next:
+                    violations.append(
+                        f"Overlap for {user_id}: {sorted_cal[i][3]} / {sorted_cal[i+1][3]}"
+                    )
+        return violations

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+openenv[core]>=0.2.0
+fastapi>=0.115.0
+uvicorn>=0.24.0

server/scenario_generator.py ADDED Viewed

	@@ -0,0 +1,403 @@

+"""Random scenario generator for the meeting-scheduling RL environment.
+Produces solvable scenarios with controlled difficulty so agents cannot
+memorise fixed answers.  Difficulty is governed by a parameter dict that
+controls attendee count, calendar density, and preference strictness.
+Usage:
+    from server.scenario_generator import generate_scenario
+    scenario = generate_scenario("random_medium")
+    scenario = generate_scenario("random_easy", seed=42)   # reproducible
+"""
+from __future__ import annotations
+import random
+from datetime import date, datetime, timedelta, timezone
+from typing import Any, Dict, List, Optional, Tuple
+from .scheduling_logic import (
+    find_conflicts,
+    is_slot_free,
+    within_collective_hours,
+    calculate_collective_hours,
+)
+# ── Difficulty presets ────────────────────────────────────────────────
+DIFFICULTY_PARAMS: Dict[str, Dict[str, Any]] = {
+    "easy": {
+        "num_attendees": (2, 2),          # (min, max)
+        "meetings_per_user": (1, 3),
+        "meeting_duration": (30, 30),     # requested meeting duration in min
+        "request_priority": (3, 4),       # lower number = higher priority
+        "existing_priority_range": (2, 5),
+        "pref_start_range": (8, 10),
+        "pref_end_range": (16, 18),
+        "max_meetings_day_range": (5, 8),
+        "avoid_btb_prob": 0.1,
+        "buffer_range": (0, 10),
+        "calendar_slot_min": 30,
+        "calendar_slot_max": 60,
+        "guarantee_free_slot": True,      # easy always has a free slot
+    },
+    "medium": {
+        "num_attendees": (3, 4),
+        "meetings_per_user": (3, 5),
+        "meeting_duration": (30, 60),
+        "request_priority": (2, 3),
+        "existing_priority_range": (2, 5),
+        "pref_start_range": (9, 10),
+        "pref_end_range": (15, 17),
+        "max_meetings_day_range": (4, 6),
+        "avoid_btb_prob": 0.5,
+        "buffer_range": (10, 20),
+        "calendar_slot_min": 30,
+        "calendar_slot_max": 90,
+        "guarantee_free_slot": False,
+    },
+    "hard": {
+        "num_attendees": (4, 6),
+        "meetings_per_user": (4, 7),
+        "meeting_duration": (30, 60),
+        "request_priority": (1, 2),
+        "existing_priority_range": (2, 5),
+        "pref_start_range": (9, 11),
+        "pref_end_range": (15, 16),
+        "max_meetings_day_range": (3, 5),
+        "avoid_btb_prob": 0.7,
+        "buffer_range": (10, 25),
+        "calendar_slot_min": 30,
+        "calendar_slot_max": 90,
+        "guarantee_free_slot": False,
+    },
+}
+MEETING_SUMMARIES = [
+    "Standup", "Sprint planning", "Design review", "Code review",
+    "Client call", "1-on-1", "Team sync", "Project checkpoint",
+    "Lunch meeting", "Strategy session", "Architecture review",
+    "Product demo", "Budget meeting", "Office hours", "Workshop",
+    "Training session", "Coffee chat", "Retrospective",
+    "Performance review", "Brainstorming", "Board meeting",
+    "Vendor call", "Onboarding session", "Knowledge sharing",
+]
+# ── Helpers ───────────────────────────────────────────────────────────
+def _rand_int(lo: int, hi: int, rng: random.Random) -> int:
+    return rng.randint(lo, hi)
+def _rand_range(r: Tuple[int, int], rng: random.Random) -> int:
+    return rng.randint(r[0], r[1])
+def _random_weekday(rng: random.Random) -> date:
+    """Pick a random weekday within the next 30 days."""
+    base = date(2025, 4, 7)  # fixed base so TZ stays consistent
+    offset = rng.randint(0, 29)
+    d = base + timedelta(days=offset)
+    # shift to nearest weekday
+    while d.weekday() >= 5:
+        d += timedelta(days=1)
+    return d
+def _generate_calendar(
+    user_id: str,
+    target_date: date,
+    num_meetings: int,
+    params: Dict[str, Any],
+    rng: random.Random,
+    reserved_slots: Optional[List[Tuple[datetime, datetime]]] = None,
+) -> List[List]:
+    """Generate random non-overlapping calendar entries for one user."""
+    tz = timezone.utc
+    day_start_hour = _rand_range(params["pref_start_range"], rng)
+    day_end_hour = _rand_range(params["pref_end_range"], rng)
+    if day_end_hour <= day_start_hour + 2:
+        day_end_hour = day_start_hour + 6
+    entries: List[List] = []
+    occupied: List[Tuple[datetime, datetime]] = []
+    if reserved_slots:
+        occupied.extend(reserved_slots)
+    attempts = 0
+    while len(entries) < num_meetings and attempts < 80:
+        attempts += 1
+        dur = _rand_range(
+            (params["calendar_slot_min"], params["calendar_slot_max"]),
+            rng,
+        )
+        # round to 15-min
+        dur = max(15, (dur // 15) * 15)
+        hour = rng.randint(day_start_hour, max(day_start_hour, day_end_hour - 1))
+        minute = rng.choice([0, 15, 30, 45])
+        start = datetime(target_date.year, target_date.month, target_date.day,
+                         hour, minute, 0, tzinfo=tz)
+        end = start + timedelta(minutes=dur)
+        boundary = datetime(target_date.year, target_date.month, target_date.day,
+                            day_end_hour, 0, 0, tzinfo=tz)
+        if end > boundary:
+            continue
+        # check overlap with already placed meetings
+        overlap = False
+        for occ_s, occ_e in occupied:
+            if start < occ_e and occ_s < end:
+                overlap = True
+                break
+        if overlap:
+            continue
+        priority = _rand_range(params["existing_priority_range"], rng)
+        summary = rng.choice(MEETING_SUMMARIES)
+        entries.append([start.isoformat(), end.isoformat(), priority, summary])
+        occupied.append((start, end))
+    # sort by start time
+    entries.sort(key=lambda e: e[0])
+    return entries
+def _generate_preferences(
+    user_id: str,
+    params: Dict[str, Any],
+    rng: random.Random,
+) -> Dict[str, Any]:
+    """Generate random but realistic preferences for one user."""
+    start_h = _rand_range(params["pref_start_range"], rng)
+    end_h = _rand_range(params["pref_end_range"], rng)
+    if end_h <= start_h + 4:
+        end_h = start_h + 6
+    if end_h > 18:
+        end_h = 18
+    avoid_btb = rng.random() < params["avoid_btb_prob"]
+    buffer = _rand_range(params["buffer_range"], rng) if avoid_btb else 0
+    # round buffer to 5
+    buffer = (buffer // 5) * 5
+    return {
+        "preferred_hours": {"start": start_h, "end": end_h},
+        "max_meetings_per_day": _rand_range(params["max_meetings_day_range"], rng),
+        "avoid_back_to_back": avoid_btb,
+        "buffer_minutes": buffer,
+    }
+# ── Solvability check ────────────────────────────────────────────────
+def _find_solvable_slot(
+    calendars: Dict[str, List[List]],
+    attendees: List[str],
+    duration: int,
+    request_priority: int,
+    collective_hours: Dict[str, int],
+    target_date: date,
+    allow_rescheduling: bool = True,
+) -> Optional[str]:
+    """Check if at least one slot exists (possibly after rescheduling).
+    Returns the ISO start time of a viable slot, or None.
+    """
+    tz = timezone.utc
+    min_h = collective_hours["min_start_hour"]
+    max_h = collective_hours["max_end_hour"]
+    candidate = datetime(target_date.year, target_date.month, target_date.day,
+                         min_h, 0, 0, tzinfo=tz)
+    end_boundary = datetime(target_date.year, target_date.month, target_date.day,
+                            max_h, 0, 0, tzinfo=tz)
+    step = timedelta(minutes=15)
+    while candidate + timedelta(minutes=duration) <= end_boundary:
+        c_start = candidate.isoformat()
+        c_end = (candidate + timedelta(minutes=duration)).isoformat()
+        conflicts = find_conflicts(calendars, c_start, c_end, attendees)
+        if len(conflicts) == 0:
+            return c_start
+        if allow_rescheduling:
+            # solvable if ALL conflicts have strictly lower priority (higher number)
+            all_reschedulable = all(c["priority"] > request_priority for c in conflicts)
+            if all_reschedulable:
+                return c_start
+        candidate += step
+    return None
+# ── Plant a guaranteed free slot (easy mode) ──────────────────────────
+def _plant_free_slot(
+    calendars: Dict[str, List[List]],
+    attendees: List[str],
+    duration: int,
+    collective_hours: Dict[str, int],
+    target_date: date,
+    rng: random.Random,
+) -> Optional[str]:
+    """Remove conflicts from a random viable slot to guarantee a free one.
+    Returns the ISO start of the planted slot.
+    """
+    tz = timezone.utc
+    min_h = collective_hours["min_start_hour"]
+    max_h = collective_hours["max_end_hour"]
+    # collect all possible starts
+    candidates = []
+    t = datetime(target_date.year, target_date.month, target_date.day,
+                 min_h, 0, 0, tzinfo=tz)
+    end_boundary = datetime(target_date.year, target_date.month, target_date.day,
+                            max_h, 0, 0, tzinfo=tz)
+    step = timedelta(minutes=15)
+    while t + timedelta(minutes=duration) <= end_boundary:
+        candidates.append(t)
+        t += step
+    rng.shuffle(candidates)
+    for candidate in candidates:
+        c_start = candidate.isoformat()
+        c_end = (candidate + timedelta(minutes=duration)).isoformat()
+        # remove any overlapping entries for all attendees
+        for att in attendees:
+            calendars[att] = [
+                e for e in calendars[att]
+                if not (candidate < datetime.fromisoformat(e[1])
+                        and datetime.fromisoformat(e[0]) < candidate + timedelta(minutes=duration))
+            ]
+        # verify it's now free
+        conflicts = find_conflicts(calendars, c_start, c_end, attendees)
+        if len(conflicts) == 0:
+            return c_start
+    return None
+# ── Main generator ────────────────────────────────────────────────────
+def generate_scenario(
+    difficulty: str = "medium",
+    seed: Optional[int] = None,
+) -> Dict[str, Any]:
+    """Generate a random solvable scheduling scenario.
+    Args:
+        difficulty: One of "easy", "medium", "hard".
+        seed: Optional RNG seed for reproducibility.
+    Returns:
+        A scenario dict with the same structure as the static JSON files.
+    """
+    if difficulty not in DIFFICULTY_PARAMS:
+        raise ValueError(f"Unknown difficulty: {difficulty}. Use easy/medium/hard.")
+    params = DIFFICULTY_PARAMS[difficulty]
+    rng = random.Random(seed)
+    target_date = _random_weekday(rng)
+    num_attendees = _rand_range(params["num_attendees"], rng)
+    attendees = [f"user{i+1}" for i in range(num_attendees)]
+    duration = _rand_range(params["meeting_duration"], rng)
+    # round to 15
+    duration = max(15, (duration // 15) * 15)
+    request_priority = _rand_range(params["request_priority"], rng)
+    # generate preferences first (needed for collective hours)
+    preferences: Dict[str, Dict] = {}
+    for att in attendees:
+        preferences[att] = _generate_preferences(att, params, rng)
+    collective_hours = calculate_collective_hours(preferences)
+    # safety: ensure window is wide enough for the meeting
+    window = collective_hours["max_end_hour"] - collective_hours["min_start_hour"]
+    if window * 60 < duration:
+        collective_hours["max_end_hour"] = collective_hours["min_start_hour"] + (duration // 60) + 2
+        # also widen preferences
+        for att in attendees:
+            preferences[att]["preferred_hours"]["end"] = max(
+                preferences[att]["preferred_hours"]["end"],
+                collective_hours["max_end_hour"],
+            )
+    # generate calendars
+    calendars: Dict[str, List[List]] = {}
+    for att in attendees:
+        n_meetings = _rand_range(params["meetings_per_user"], rng)
+        calendars[att] = _generate_calendar(att, target_date, n_meetings, params, rng)
+    # ensure solvability
+    if params["guarantee_free_slot"]:
+        _plant_free_slot(calendars, attendees, duration, collective_hours, target_date, rng)
+    # verify at least one solution exists (with rescheduling allowed)
+    max_retries = 10
+    for attempt in range(max_retries):
+        viable = _find_solvable_slot(
+            calendars, attendees, duration, request_priority,
+            collective_hours, target_date, allow_rescheduling=True,
+        )
+        if viable is not None:
+            break
+        # regenerate calendars with fewer meetings to open up space
+        for att in attendees:
+            reduced = max(1, _rand_range(params["meetings_per_user"], rng) - 1)
+            calendars[att] = _generate_calendar(att, target_date, reduced, params, rng)
+    else:
+        # last resort: plant a free slot
+        _plant_free_slot(calendars, attendees, duration, collective_hours, target_date, rng)
+    # find solution info for metadata
+    free_slot = _find_solvable_slot(
+        calendars, attendees, duration, request_priority,
+        collective_hours, target_date, allow_rescheduling=False,
+    )
+    needs_rescheduling = free_slot is None
+    best_slot = free_slot or _find_solvable_slot(
+        calendars, attendees, duration, request_priority,
+        collective_hours, target_date, allow_rescheduling=True,
+    )
+    task_id = f"random_{difficulty}"
+    description_map = {
+        "easy": f"Schedule a {duration}-min meeting with {num_attendees} attendees (random easy)",
+        "medium": f"Schedule a {duration}-min meeting with {num_attendees} attendees (random medium)",
+        "hard": f"Schedule a {duration}-min meeting with {num_attendees} attendees (random hard)",
+    }
+    return {
+        "task_id": task_id,
+        "description": description_map[difficulty],
+        "difficulty": difficulty,
+        "meeting_request": {
+            "duration": duration,
+            "priority": request_priority,
+            "attendees": attendees,
+            "summary": rng.choice([
+                "Team Sync", "Planning Session", "Design Review",
+                "Sprint Review", "Cross-Team Standup", "Strategy Meeting",
+            ]),
+        },
+        "calendars": calendars,
+        "preferences": preferences,
+        "expected_solution": {
+            "optimal_slot": best_slot,
+            "requires_rescheduling": needs_rescheduling,
+            "generated": True,
+        },
+    }

server/scenarios/task1_easy.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "task_id": "task1_easy",
+  "description": "Schedule a 30-minute team sync with 2 attendees",
+  "difficulty": "easy",
+  "meeting_request": {
+    "duration": 30,
+    "priority": 3,
+    "attendees": ["user1", "user2"],
+    "summary": "Team Sync"
+  },
+  "calendars": {
+    "user1": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Morning standup"],
+      ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 3, "Client call"]
+    ],
+    "user2": [
+      ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Team meeting"],
+      ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 3, "1-on-1"]
+    ]
+  },
+  "preferences": {
+    "user1": {
+      "preferred_hours": {"start": 9, "end": 17},
+      "max_meetings_per_day": 6,
+      "avoid_back_to_back": false,
+      "buffer_minutes": 0
+    },
+    "user2": {
+      "preferred_hours": {"start": 9, "end": 17},
+      "max_meetings_per_day": 6,
+      "avoid_back_to_back": false,
+      "buffer_minutes": 0
+    }
+  },
+  "expected_solution": {
+    "optimal_slot": "2025-04-07T10:00:00+00:00",
+    "expected_score_range": [0.8, 1.0],
+    "min_steps": 2,
+    "requires_rescheduling": false
+  }
+}

server/scenarios/task2_medium.json ADDED Viewed

	@@ -0,0 +1,70 @@

+{
+  "task_id": "task2_medium",
+  "description": "Schedule a 60-minute planning session with 4 attendees",
+  "difficulty": "medium",
+  "meeting_request": {
+    "duration": 60,
+    "priority": 2,
+    "attendees": ["user1", "user2", "user3", "user4"],
+    "summary": "Cross-Team Planning"
+  },
+  "calendars": {
+    "user1": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Standup"],
+      ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 3, "Review"],
+      ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "Lunch meeting"],
+      ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 4, "Optional workshop"],
+      ["2025-04-07T16:30:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Sync"]
+    ],
+    "user2": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Standup"],
+      ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Client demo"],
+      ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 3, "Code review"],
+      ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Office hours"]
+    ],
+    "user3": [
+      ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 3, "Design review"],
+      ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Team lunch"],
+      ["2025-04-07T14:00:00+00:00", "2025-04-07T15:30:00+00:00", 2, "Sprint planning"],
+      ["2025-04-07T16:00:00+00:00", "2025-04-07T16:30:00+00:00", 4, "Coffee chat"]
+    ],
+    "user4": [
+      ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Strategy meeting"],
+      ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "1-on-1"],
+      ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 3, "Team sync"]
+    ]
+  },
+  "preferences": {
+    "user1": {
+      "preferred_hours": {"start": 10, "end": 16},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 15
+    },
+    "user2": {
+      "preferred_hours": {"start": 9, "end": 17},
+      "max_meetings_per_day": 4,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 10
+    },
+    "user3": {
+      "preferred_hours": {"start": 9, "end": 15},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": false,
+      "buffer_minutes": 0
+    },
+    "user4": {
+      "preferred_hours": {"start": 10, "end": 17},
+      "max_meetings_per_day": 6,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 15
+    }
+  },
+  "expected_solution": {
+    "optimal_slot": "2025-04-07T11:00:00+00:00",
+    "expected_score_range": [0.5, 0.7],
+    "min_steps": 3,
+    "requires_rescheduling": true,
+    "reschedulable_meetings": ["user3:Coffee chat (priority 4)"]
+  }
+}

server/scenarios/task3_hard.json ADDED Viewed

	@@ -0,0 +1,108 @@

+{
+  "task_id": "task3_hard",
+  "description": "Schedule a 45-minute executive meeting with 6 attendees",
+  "difficulty": "hard",
+  "meeting_request": {
+    "duration": 45,
+    "priority": 2,
+    "attendees": ["user1", "user2", "user3", "user4", "user5", "user6"],
+    "summary": "Executive Planning Session"
+  },
+  "calendars": {
+    "user1": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 2, "Strategy meeting"],
+      ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 3, "Team standup"],
+      ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Lunch meeting"],
+      ["2025-04-07T13:30:00+00:00", "2025-04-07T14:30:00+00:00", 2, "Client call"],
+      ["2025-04-07T15:00:00+00:00", "2025-04-07T15:45:00+00:00", 4, "Optional training"],
+      ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 3, "Project sync"]
+    ],
+    "user2": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T09:30:00+00:00", 2, "Morning sync"],
+      ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Design review"],
+      ["2025-04-07T11:30:00+00:00", "2025-04-07T12:30:00+00:00", 3, "Code review"],
+      ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 3, "1-on-1"],
+      ["2025-04-07T14:30:00+00:00", "2025-04-07T15:30:00+00:00", 2, "Planning meeting"],
+      ["2025-04-07T16:00:00+00:00", "2025-04-07T16:45:00+00:00", 4, "Coffee chat"]
+    ],
+    "user3": [
+      ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 3, "Sprint planning"],
+      ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 2, "Architecture review"],
+      ["2025-04-07T12:30:00+00:00", "2025-04-07T13:30:00+00:00", 3, "Team lunch"],
+      ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 2, "Client demo"],
+      ["2025-04-07T15:30:00+00:00", "2025-04-07T16:15:00+00:00", 4, "Office hours"]
+    ],
+    "user4": [
+      ["2025-04-07T10:00:00+00:00", "2025-04-07T11:00:00+00:00", 2, "Board meeting"],
+      ["2025-04-07T11:30:00+00:00", "2025-04-07T12:30:00+00:00", 3, "Product review"],
+      ["2025-04-07T13:00:00+00:00", "2025-04-07T14:00:00+00:00", 2, "Executive sync"],
+      ["2025-04-07T14:30:00+00:00", "2025-04-07T15:30:00+00:00", 3, "Team meeting"],
+      ["2025-04-07T16:00:00+00:00", "2025-04-07T17:00:00+00:00", 4, "Mentor session"]
+    ],
+    "user5": [
+      ["2025-04-07T09:00:00+00:00", "2025-04-07T10:00:00+00:00", 3, "Daily standup"],
+      ["2025-04-07T10:30:00+00:00", "2025-04-07T11:30:00+00:00", 2, "Strategic planning"],
+      ["2025-04-07T12:00:00+00:00", "2025-04-07T13:00:00+00:00", 3, "Working lunch"],
+      ["2025-04-07T13:30:00+00:00", "2025-04-07T14:30:00+00:00", 3, "Performance review"],
+      ["2025-04-07T15:00:00+00:00", "2025-04-07T16:00:00+00:00", 2, "Budget meeting"],
+      ["2025-04-07T16:30:00+00:00", "2025-04-07T17:00:00+00:00", 4, "Optional networking"]
+    ],
+    "user6": [
+      ["2025-04-07T09:30:00+00:00", "2025-04-07T10:30:00+00:00", 2, "Leadership meeting"],
+      ["2025-04-07T11:00:00+00:00", "2025-04-07T12:00:00+00:00", 3, "Project checkpoint"],
+      ["2025-04-07T12:30:00+00:00", "2025-04-07T13:30:00+00:00", 3, "Team sync"],
+      ["2025-04-07T14:00:00+00:00", "2025-04-07T15:00:00+00:00", 2, "Client meeting"],
+      ["2025-04-07T15:30:00+00:00", "2025-04-07T16:30:00+00:00", 4, "Training session"]
+    ]
+  },
+  "preferences": {
+    "user1": {
+      "preferred_hours": {"start": 10, "end": 16},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 15
+    },
+    "user2": {
+      "preferred_hours": {"start": 9, "end": 17},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 15
+    },
+    "user3": {
+      "preferred_hours": {"start": 9, "end": 15},
+      "max_meetings_per_day": 4,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 20
+    },
+    "user4": {
+      "preferred_hours": {"start": 10, "end": 17},
+      "max_meetings_per_day": 6,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 10
+    },
+    "user5": {
+      "preferred_hours": {"start": 9, "end": 16},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 15
+    },
+    "user6": {
+      "preferred_hours": {"start": 9, "end": 16},
+      "max_meetings_per_day": 5,
+      "avoid_back_to_back": true,
+      "buffer_minutes": 10
+    }
+  },
+  "expected_solution": {
+    "optimal_slot": "2025-04-07T15:00:00+00:00",
+    "expected_score_range": [0.25, 0.45],
+    "min_steps": 5,
+    "requires_rescheduling": true,
+    "reschedulable_meetings": [
+      "user1:Optional training (priority 4)",
+      "user2:Coffee chat (priority 4)",
+      "user5:Optional networking (priority 4)"
+    ],
+    "notes": "Multiple valid solutions exist. Agent must reschedule 3+ low-priority meetings."
+  }
+}

server/scheduling_env_environment.py ADDED Viewed

	@@ -0,0 +1,476 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Meeting Scheduling RL Environment.
+Teaches agents to optimally schedule meetings across multiple attendees
+by proposing time slots, rescheduling lower-priority conflicts, and
+balancing participant preferences.
+"""
+from __future__ import annotations
+import copy
+import json
+import logging
+from datetime import timedelta
+from pathlib import Path
+from uuid import uuid4
+from openenv.core.env_server.interfaces import Environment
+try:
+    from ..models import SchedulingAction, SchedulingObservation, SchedulingState
+except ImportError:
+    from models import SchedulingAction, SchedulingObservation, SchedulingState
+from .scheduling_logic import (
+    build_busy_slots,
+    calculate_collective_hours,
+    calculate_final_reward,
+    calculate_preference_score,
+    find_conflicts,
+    is_slot_free,
+    parse_iso,
+    within_collective_hours,
+)
+from .scenario_generator import generate_scenario
+logger = logging.getLogger(__name__)
+SCENARIOS_DIR = Path(__file__).parent / "scenarios"
+MAX_STEPS = 20
+class SchedulingEnvironment(Environment):
+    """RL environment for intelligent meeting scheduling.
+    The agent must learn to:
+    1. Propose valid time slots satisfying hard constraints
+    2. Minimize preference violations
+    3. Handle cascading rescheduling when conflicts exist
+    4. Balance speed vs. quality
+    """
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(self):
+        self._state = SchedulingState(episode_id=str(uuid4()), step_count=0)
+        self._scenario: dict = {}
+        self._collective_hours: dict = {}
+    # ------------------------------------------------------------------
+    # OpenEnv interface
+    # ------------------------------------------------------------------
+    def reset(self, **kwargs) -> SchedulingObservation:
+        """Reset environment for a new episode.
+        Accepts ``task_id`` kwarg.  Static tasks (``"task1_easy"`` etc.) load
+        from JSON.  Random tasks (``"random_easy"``, ``"random_medium"``,
+        ``"random_hard"``) generate a fresh scenario every call.  An optional
+        ``seed`` kwarg makes random generation reproducible.
+        """
+        task_id = kwargs.get("task_id", "task1_easy")
+        # ── random scenario generation ──
+        if task_id.startswith("random_"):
+            difficulty = task_id.split("_", 1)[1]
+            seed = kwargs.get("seed", None)
+            try:
+                self._scenario = generate_scenario(difficulty, seed=seed)
+            except ValueError:
+                return SchedulingObservation(
+                    error_message=f"Unknown difficulty in task_id: {task_id}",
+                    done=True,
+                    reward=0.0,
+                )
+        else:
+            # ── static JSON scenario ──
+            scenario_path = SCENARIOS_DIR / f"{task_id}.json"
+            if not scenario_path.exists():
+                return SchedulingObservation(
+                    error_message=f"Unknown task_id: {task_id}",
+                    done=True,
+                    reward=0.0,
+                )
+            with open(scenario_path) as f:
+                self._scenario = json.load(f)
+        req = self._scenario["meeting_request"]
+        prefs = self._scenario["preferences"]
+        self._collective_hours = calculate_collective_hours(prefs)
+        self._state = SchedulingState(
+            episode_id=str(uuid4()),
+            step_count=0,
+            task_id=task_id,
+            scenario_name=self._scenario.get("description", task_id),
+            meeting_request=req,
+            calendars=copy.deepcopy(self._scenario["calendars"]),
+            participant_preferences=prefs,
+            proposed_slot=None,
+            rescheduled_meetings=[],
+            total_preference_penalty=0.0,
+            total_steps=0,
+            final_reward=0.0,
+            completed=False,
+        )
+        attendees = req["attendees"]
+        return SchedulingObservation(
+            requested_duration=req["duration"],
+            requested_priority=req["priority"],
+            attendee_ids=attendees,
+            busy_slots=build_busy_slots(self._state.calendars, attendees),
+            collective_work_hours=self._collective_hours,
+            preference_constraints=self._aggregate_preferences(prefs),
+            current_proposal=None,
+            conflicts=[],
+            preference_penalty=0.0,
+            num_rescheduled=0,
+            steps_taken=0,
+            max_steps=MAX_STEPS,
+            success=False,
+            error_message=None,
+            done=False,
+            reward=0.0,
+        )
+    def step(self, action: SchedulingAction) -> SchedulingObservation:  # type: ignore[override]
+        """Process one agent action and return an observation."""
+        if self._state.completed:
+            return self._obs(error_message="Episode already completed", done=True, reward=0.0)
+        self._state.step_count += 1
+        self._state.total_steps += 1
+        # Timeout check
+        if self._state.step_count >= MAX_STEPS:
+            return self._handle_timeout()
+        action_type = action.action_type
+        if action_type == "propose_slot":
+            return self._process_propose_slot(action)
+        elif action_type == "reschedule_meeting":
+            return self._process_reschedule_meeting(action)
+        elif action_type == "finalize":
+            return self._process_finalize()
+        elif action_type == "reject":
+            return self._process_reject()
+        else:
+            return self._obs(error_message=f"Unknown action_type: {action_type}", reward=-0.1)
+    @property
+    def state(self) -> SchedulingState:
+        return self._state
+    # ------------------------------------------------------------------
+    # Action handlers
+    # ------------------------------------------------------------------
+    def _process_propose_slot(self, action: SchedulingAction) -> SchedulingObservation:
+        if not action.proposed_start or not action.proposed_duration:
+            return self._obs(
+                error_message="propose_slot requires proposed_start and proposed_duration",
+                reward=-0.1,
+            )
+        try:
+            start = parse_iso(action.proposed_start)
+        except (ValueError, TypeError):
+            return self._obs(error_message="Invalid proposed_start format", reward=-0.1)
+        end = start + timedelta(minutes=action.proposed_duration)
+        start_iso = start.isoformat()
+        end_iso = end.isoformat()
+        attendees = self._state.meeting_request["attendees"]
+        req_priority = self._state.meeting_request["priority"]
+        # Validate working hours
+        if not within_collective_hours(start_iso, end_iso, self._collective_hours):
+            return self._obs(
+                error_message="Proposed slot outside working hours",
+                reward=-0.2,
+            )
+        # Find conflicts
+        conflicts = find_conflicts(
+            self._state.calendars, start_iso, end_iso, attendees
+        )
+        # Calculate preference penalty
+        pref_penalty = calculate_preference_score(
+            start_iso,
+            action.proposed_duration,
+            self._state.participant_preferences,
+            self._state.calendars,
+        )
+        # Update state
+        self._state.proposed_slot = [start_iso, end_iso]
+        self._state.total_preference_penalty = pref_penalty
+        # Step reward
+        if len(conflicts) == 0 and pref_penalty < 100:
+            step_reward = 0.5
+        elif len(conflicts) > 0:
+            if all(c["priority"] > req_priority for c in conflicts):
+                step_reward = 0.2
+            else:
+                step_reward = -0.3
+        else:
+            step_reward = 0.0
+        return self._obs(
+            current_proposal={"start": start_iso, "end": end_iso},
+            conflicts=conflicts,
+            preference_penalty=pref_penalty,
+            reward=step_reward,
+        )
+    def _process_reschedule_meeting(self, action: SchedulingAction) -> SchedulingObservation:
+        if not action.meeting_id_to_move or not action.new_start_time:
+            return self._obs(
+                error_message="reschedule_meeting requires meeting_id_to_move and new_start_time",
+                reward=-0.1,
+            )
+        if self._state.proposed_slot is None:
+            return self._obs(
+                error_message="Must propose a slot before rescheduling",
+                reward=-0.2,
+            )
+        # Find the meeting to move
+        meeting = self._find_meeting(action.meeting_id_to_move)
+        if meeting is None:
+            return self._obs(
+                error_message=f"Meeting not found: {action.meeting_id_to_move}",
+                reward=-0.2,
+            )
+        req_priority = self._state.meeting_request["priority"]
+        if meeting["priority"] <= req_priority:
+            return self._obs(
+                error_message="Cannot reschedule equal or higher priority meeting",
+                reward=-0.5,
+            )
+        # Validate new slot
+        try:
+            new_start = parse_iso(action.new_start_time)
+        except (ValueError, TypeError):
+            return self._obs(error_message="Invalid new_start_time format", reward=-0.1)
+        old_start = parse_iso(meeting["start"])
+        old_end = parse_iso(meeting["end"])
+        duration = old_end - old_start
+        new_end = new_start + duration
+        new_start_iso = new_start.isoformat()
+        new_end_iso = new_end.isoformat()
+        attendee = meeting["attendee"]
+        if not is_slot_free(attendee, new_start_iso, new_end_iso, self._state.calendars):
+            return self._obs(error_message="New slot not free for attendee", reward=-0.2)
+        # Update calendar: remove old, add new
+        cal = self._state.calendars[attendee]
+        self._state.calendars[attendee] = [
+            e for e in cal if e[0] != meeting["start"]
+        ]
+        self._state.calendars[attendee].append(
+            [new_start_iso, new_end_iso, meeting["priority"], meeting["summary"]]
+        )
+        self._state.rescheduled_meetings.append({
+            "meeting_id": action.meeting_id_to_move,
+            "old_start": meeting["start"],
+            "new_start": new_start_iso,
+            "attendee": attendee,
+        })
+        # Recalculate conflicts for current proposal
+        attendees = self._state.meeting_request["attendees"]
+        new_conflicts = find_conflicts(
+            self._state.calendars,
+            self._state.proposed_slot[0],
+            self._state.proposed_slot[1],
+            attendees,
+        )
+        num_rescheduled = len(self._state.rescheduled_meetings)
+        step_reward = 0.5 if len(new_conflicts) == 0 else 0.3
+        return self._obs(
+            conflicts=new_conflicts,
+            num_rescheduled=num_rescheduled,
+            reward=step_reward,
+        )
+    def _process_finalize(self) -> SchedulingObservation:
+        if self._state.proposed_slot is None:
+            self._state.completed = True
+            return self._obs(
+                error_message="No slot proposed",
+                success=False,
+                reward=0.0,
+                done=True,
+            )
+        attendees = self._state.meeting_request["attendees"]
+        conflicts = find_conflicts(
+            self._state.calendars,
+            self._state.proposed_slot[0],
+            self._state.proposed_slot[1],
+            attendees,
+        )
+        if len(conflicts) > 0:
+            self._state.completed = True
+            return self._obs(
+                error_message=f"Unresolved conflicts: {len(conflicts)} meetings",
+                conflicts=conflicts,
+                success=False,
+                reward=0.0,
+                done=True,
+            )
+        final_reward = calculate_final_reward(
+            preference_penalty=self._state.total_preference_penalty,
+            num_rescheduled=len(self._state.rescheduled_meetings),
+            steps_taken=self._state.step_count,
+            success=True,
+        )
+        self._state.completed = True
+        self._state.final_reward = final_reward
+        return self._obs(
+            success=True,
+            reward=final_reward,
+            done=True,
+        )
+    def _process_reject(self) -> SchedulingObservation:
+        self._state.completed = True
+        return self._obs(
+            success=False,
+            reward=0.0,
+            done=True,
+            error_message="Agent rejected scheduling task",
+        )
+    def _handle_timeout(self) -> SchedulingObservation:
+        """Give partial credit when max steps reached."""
+        self._state.completed = True
+        if self._state.proposed_slot is None:
+            return self._obs(
+                success=False,
+                reward=0.0,
+                done=True,
+                error_message="Timeout: No slot proposed",
+            )
+        attendees = self._state.meeting_request["attendees"]
+        conflicts = find_conflicts(
+            self._state.calendars,
+            self._state.proposed_slot[0],
+            self._state.proposed_slot[1],
+            attendees,
+        )
+        if len(conflicts) == 0:
+            theoretical = calculate_final_reward(
+                self._state.total_preference_penalty,
+                len(self._state.rescheduled_meetings),
+                self._state.step_count,
+            )
+            partial = theoretical * 0.7
+        else:
+            progress = 1.0 - (len(conflicts) / max(1, len(attendees)))
+            partial = 0.2 * progress
+        self._state.final_reward = partial
+        return self._obs(
+            success=False,
+            reward=partial,
+            done=True,
+            error_message=f"Timeout after {self._state.step_count} steps (partial credit: {partial:.2f})",
+        )
+    # ------------------------------------------------------------------
+    # Helpers
+    # ------------------------------------------------------------------
+    def _obs(self, **overrides) -> SchedulingObservation:
+        """Build an observation from current state, applying overrides."""
+        req = self._state.meeting_request
+        attendees = req.get("attendees", [])
+        defaults = dict(
+            requested_duration=req.get("duration", 0),
+            requested_priority=req.get("priority", 3),
+            attendee_ids=attendees,
+            busy_slots=build_busy_slots(self._state.calendars, attendees),
+            collective_work_hours=self._collective_hours,
+            preference_constraints=self._aggregate_preferences(
+                self._state.participant_preferences
+            ),
+            current_proposal=(
+                {"start": self._state.proposed_slot[0], "end": self._state.proposed_slot[1]}
+                if self._state.proposed_slot
+                else None
+            ),
+            conflicts=[],
+            preference_penalty=self._state.total_preference_penalty,
+            num_rescheduled=len(self._state.rescheduled_meetings),
+            steps_taken=self._state.step_count,
+            max_steps=MAX_STEPS,
+            success=False,
+            error_message=None,
+            done=False,
+            reward=0.0,
+        )
+        defaults.update(overrides)
+        return SchedulingObservation(**defaults)
+    def _find_meeting(self, meeting_id: str) -> dict | None:
+        """Look up a meeting by its id (format: attendee_startiso)."""
+        parts = meeting_id.split("_", 1)
+        if len(parts) != 2:
+            return None
+        attendee, start_iso = parts
+        for entry in self._state.calendars.get(attendee, []):
+            if entry[0] == start_iso:
+                return {
+                    "attendee": attendee,
+                    "start": entry[0],
+                    "end": entry[1],
+                    "priority": entry[2],
+                    "summary": entry[3],
+                }
+        return None
+    @staticmethod
+    def _aggregate_preferences(prefs: dict) -> dict:
+        """Summarize preferences for the observation."""
+        if not prefs:
+            return {}
+        max_meetings = min(p.get("max_meetings_per_day", 99) for p in prefs.values())
+        any_buffer = any(p.get("avoid_back_to_back", False) for p in prefs.values())
+        buffer_mins = max(
+            (p.get("buffer_minutes", 0) for p in prefs.values() if p.get("avoid_back_to_back")),
+            default=0,
+        )
+        return {
+            "max_meetings_per_day": max_meetings,
+            "requires_buffer": any_buffer,
+            "buffer_minutes": buffer_mins,
+        }

server/scheduling_logic.py ADDED Viewed

	@@ -0,0 +1,324 @@

+"""Pure utility functions for the meeting-scheduling RL environment.
+Calendar format: Dict[str, List[List]]
+  Each entry is [start_iso, end_iso, priority_int, summary_str].
+All datetimes are timezone-aware ISO 8601 strings.
+"""
+from __future__ import annotations
+import json
+from datetime import datetime, date, timedelta
+from pathlib import Path
+from typing import Dict, List, Optional
+def parse_iso(s: str) -> datetime:
+    """Parse an ISO 8601 string into a datetime object."""
+    return datetime.fromisoformat(s)
+def load_scenario(scenario_path: str) -> dict:
+    """Load a scenario JSON file and return the parsed dict."""
+    with open(scenario_path, "r") as f:
+        return json.load(f)
+def find_conflicts(
+    calendars: Dict[str, List[List]],
+    proposed_start_iso: str,
+    proposed_end_iso: str,
+    attendee_ids: List[str],
+) -> List[Dict]:
+    """Find calendar conflicts between a proposed slot and existing meetings.
+    Two intervals overlap when start1 < end2 and start2 < end1.
+    Returns:
+        List of conflict dicts with keys: attendee, start, end, priority,
+        summary, meeting_id.
+    """
+    proposed_start = parse_iso(proposed_start_iso)
+    proposed_end = parse_iso(proposed_end_iso)
+    conflicts: List[Dict] = []
+    for attendee in attendee_ids:
+        entries = calendars.get(attendee, [])
+        for entry in entries:
+            entry_start_iso, entry_end_iso, priority, summary = entry
+            entry_start = parse_iso(entry_start_iso)
+            entry_end = parse_iso(entry_end_iso)
+            if proposed_start < entry_end and entry_start < proposed_end:
+                conflicts.append({
+                    "attendee": attendee,
+                    "start": entry_start_iso,
+                    "end": entry_end_iso,
+                    "priority": priority,
+                    "summary": summary,
+                    "meeting_id": f"{attendee}_{entry_start_iso}",
+                })
+    return conflicts
+def calculate_collective_hours(preferences: Dict) -> Dict[str, int]:
+    """Find the intersection of all users' preferred working hours.
+    Each user preference has 'preferred_hours': {'start': int, 'end': int}.
+    Returns:
+        {"min_start_hour": <max of all starts>, "max_end_hour": <min of all ends>}
+    """
+    start_hours = [p.get("preferred_hours", {}).get("start", 9) for p in preferences.values()]
+    end_hours = [p.get("preferred_hours", {}).get("end", 17) for p in preferences.values()]
+    return {
+        "min_start_hour": max(start_hours),
+        "max_end_hour": min(end_hours),
+    }
+def within_collective_hours(
+    start_iso: str,
+    end_iso: str,
+    collective_hours: Dict[str, int],
+) -> bool:
+    """Check if a proposed slot falls within collective working hours.
+    The start hour must be >= min_start_hour and the end hour must be
+    <= max_end_hour (exact hour boundary is allowed).
+    """
+    start = parse_iso(start_iso)
+    end = parse_iso(end_iso)
+    min_start = collective_hours["min_start_hour"]
+    max_end = collective_hours["max_end_hour"]
+    if start.hour < min_start:
+        return False
+    # Handle end at exact hour boundary (minute == 0) vs. mid-hour
+    if end.minute == 0 and end.second == 0:
+        if end.hour > max_end:
+            return False
+    else:
+        if end.hour >= max_end:
+            return False
+    return True
+def count_meetings_on_date(calendar_entries: List[List], target_date: date) -> int:
+    """Count how many meetings a user has on a given date."""
+    count = 0
+    for entry in calendar_entries:
+        entry_start = parse_iso(entry[0])
+        if entry_start.date() == target_date:
+            count += 1
+    return count
+def check_back_to_back(
+    calendar_entries: List[List],
+    proposed_start_iso: str,
+    proposed_end_iso: str,
+    buffer_minutes: int,
+) -> bool:
+    """Check if a proposed meeting would be back-to-back with an existing one.
+    Returns True if any existing meeting ends within buffer_minutes before
+    the proposed start, or starts within buffer_minutes after the proposed end.
+    """
+    proposed_start = parse_iso(proposed_start_iso)
+    proposed_end = parse_iso(proposed_end_iso)
+    buffer = timedelta(minutes=buffer_minutes)
+    for entry in calendar_entries:
+        entry_start = parse_iso(entry[0])
+        entry_end = parse_iso(entry[1])
+        # Existing meeting ends close before proposed start
+        gap_before = proposed_start - entry_end
+        if timedelta(0) <= gap_before < buffer:
+            return True
+        # Existing meeting starts close after proposed end
+        gap_after = entry_start - proposed_end
+        if timedelta(0) <= gap_after < buffer:
+            return True
+    return False
+def calculate_preference_score(
+    proposed_start_iso: str,
+    duration_minutes: int,
+    participant_preferences: Dict,
+    calendars: Dict[str, List[List]],
+) -> float:
+    """Calculate penalty points for scheduling preference violations.
+    Penalty rules:
+        - Outside preferred hours: +50 per participant
+        - Exceeds max meetings per day: +30 per participant
+        - Back-to-back without buffer: +20 per participant
+    Returns:
+        Total penalty sum (float).
+    """
+    proposed_start = parse_iso(proposed_start_iso)
+    proposed_end = proposed_start + timedelta(minutes=duration_minutes)
+    proposed_end_iso = proposed_end.isoformat()
+    proposed_date = proposed_start.date()
+    total_penalty = 0.0
+    for participant, prefs in participant_preferences.items():
+        pref_hours = prefs.get("preferred_hours", {})
+        pref_start = pref_hours.get("start", 9)
+        pref_end = pref_hours.get("end", 17)
+        max_meetings = prefs.get("max_meetings_per_day", 8)
+        avoid_btb = prefs.get("avoid_back_to_back", False)
+        buffer_mins = prefs.get("buffer_minutes", 0)
+        # Outside preferred hours
+        collective = {"min_start_hour": pref_start, "max_end_hour": pref_end}
+        if not within_collective_hours(proposed_start_iso, proposed_end_iso, collective):
+            total_penalty += 50
+        # Exceeds max meetings per day
+        entries = calendars.get(participant, [])
+        existing_count = count_meetings_on_date(entries, proposed_date)
+        if existing_count + 1 > max_meetings:
+            total_penalty += 30
+        # Back-to-back without buffer (only if user cares about it)
+        if avoid_btb and buffer_mins > 0:
+            if check_back_to_back(entries, proposed_start_iso, proposed_end_iso, buffer_mins):
+                total_penalty += 20
+    return total_penalty
+def is_slot_free(
+    attendee: str,
+    start_iso: str,
+    end_iso: str,
+    calendars: Dict[str, List[List]],
+) -> bool:
+    """Check if a time slot is free for a specific attendee (no overlaps)."""
+    start = parse_iso(start_iso)
+    end = parse_iso(end_iso)
+    for entry in calendars.get(attendee, []):
+        entry_start = parse_iso(entry[0])
+        entry_end = parse_iso(entry[1])
+        if start < entry_end and entry_start < end:
+            return False
+    return True
+def calculate_final_reward(
+    preference_penalty: float,
+    num_rescheduled: int,
+    steps_taken: int,
+    success: bool = True,
+) -> float:
+    """Compute the multi-component reward for an episode, clamped to [0.0, 1.0].
+    Components (deducted from 1.0):
+        - Preference deduction: min(0.75, (preference_penalty ** 1.2) / 200.0)
+        - Rescheduling deduction: min(0.30, 0.05 * (1.8 ** num_rescheduled))
+          (only applied when num_rescheduled > 0)
+        - Time deduction: steps_taken * 0.015
+    Returns 0.0 if the episode was not successful.
+    """
+    if not success:
+        return 0.0
+    reward = 1.0
+    # Preference deduction
+    pref_deduction = min(0.75, (preference_penalty ** 1.2) / 200.0)
+    reward -= pref_deduction
+    # Rescheduling deduction (exponential)
+    if num_rescheduled > 0:
+        reschedule_deduction = min(0.30, 0.05 * (1.8 ** num_rescheduled))
+        reward -= reschedule_deduction
+    # Time deduction
+    time_deduction = steps_taken * 0.015
+    reward -= time_deduction
+    return max(0.0, min(1.0, reward))
+def build_busy_slots(
+    calendars: Dict[str, List[List]],
+    attendee_ids: List[str],
+) -> List[Dict]:
+    """Convert calendar data to observation-friendly busy_slots format.
+    Returns:
+        List of dicts with keys: start, end, priority, summary, attendee.
+    """
+    busy_slots: List[Dict] = []
+    for attendee in attendee_ids:
+        for entry in calendars.get(attendee, []):
+            start_iso, end_iso, priority, summary = entry
+            busy_slots.append({
+                "start": start_iso,
+                "end": end_iso,
+                "priority": priority,
+                "summary": summary,
+                "attendee": attendee,
+            })
+    return busy_slots
+def find_earliest_free_slot(
+    calendars: Dict[str, List[List]],
+    attendees: List[str],
+    duration_minutes: int,
+    search_date_iso: str,
+    collective_hours: Dict[str, int],
+) -> Optional[str]:
+    """Find the earliest free slot on a given date for all attendees.
+    Iterates from min_start_hour to max_end_hour in 15-minute increments.
+    Returns the ISO 8601 string of the first conflict-free slot, or None.
+    """
+    search_date = parse_iso(search_date_iso)
+    base_date = search_date.date()
+    tz = search_date.tzinfo
+    min_start = collective_hours["min_start_hour"]
+    max_end = collective_hours["max_end_hour"]
+    candidate = datetime(base_date.year, base_date.month, base_date.day,
+                         min_start, 0, 0, tzinfo=tz)
+    end_boundary = datetime(base_date.year, base_date.month, base_date.day,
+                            max_end, 0, 0, tzinfo=tz)
+    step = timedelta(minutes=15)
+    while candidate + timedelta(minutes=duration_minutes) <= end_boundary:
+        candidate_iso = candidate.isoformat()
+        candidate_end_iso = (candidate + timedelta(minutes=duration_minutes)).isoformat()
+        all_free = True
+        for attendee in attendees:
+            if not is_slot_free(attendee, candidate_iso, candidate_end_iso, calendars):
+                all_free = False
+                break
+        if all_free:
+            return candidate_iso
+        candidate += step
+    return None

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff