# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Repository Purpose

OpenEnv RL environment for the **Meta OpenEnv Hackathon**. Implements an intelligent meeting scheduling environment where AI agents learn to schedule meetings across multiple attendees by proposing time slots, rescheduling lower-priority conflicts, and balancing participant preferences.

## Development Commands

```bash
# Run baseline inference (heuristic, no LLM needed)
python inference.py

# Start server locally
uvicorn server.app:app --reload

# Validate environment for submission
openenv validate

# Generate/update lock file (required by validator)
uv lock

# Deploy to Hugging Face Spaces
openenv push

# Build Docker image (Dockerfile must be in root)
docker build -t scheduling_env:latest .
```

## Architecture

### OpenEnv Interface (client-server pattern)

The environment follows OpenEnv's standard API:
- **`POST /reset`** — starts a new episode, accepts `{"task_id": "task1_easy"}`. Returns observation.
- **`POST /step`** — takes an action, returns observation with reward/done.
- **`GET /state`** — returns internal environment state.
- **`GET /health`** — health check.

### Core Flow

`server/app.py` creates a `SchedulingHTTPEnvServer` (subclasses `HTTPEnvServer`) that wraps a persistent `SchedulingEnvironment` instance. The server registers custom `/reset`, `/step`, `/state` routes.

`server/scheduling_env_environment.py` — Main environment class implementing `Environment`. Loads JSON scenarios from `server/scenarios/`, processes 4 action types: `propose_slot`, `reschedule_meeting`, `finalize`, `reject`. Episode ends on `finalize`, `reject`, or timeout (20 steps).

`server/scheduling_logic.py` — Pure utility functions: conflict detection, preference scoring, reward calculation, free-slot search. All datetime handling uses timezone-aware ISO 8601 strings. Calendar format: `Dict[str, List[List]]` where each entry is `[start_iso, end_iso, priority_int, summary_str]`.

`models.py` — Pydantic models (`SchedulingAction`, `SchedulingObservation`, `SchedulingState`) imported by both server and client.

`client.py` — `SchedulingEnv` extends `EnvClient` for WebSocket-based interaction.

`inference.py` — Heuristic baseline (no LLM). Greedy free-slot search + lowest-priority rescheduling. Must emit `[START]`/`[STEP]`/`[END]` stdout format.

### Reward Design

Reward is multi-component, deducted from 1.0 (see `calculate_final_reward` in `scheduling_logic.py`):
- Preference penalty: violations of preferred hours (+50), max meetings/day (+30), back-to-back (+20)
- Rescheduling deduction: exponential penalty per meeting moved
- Time deduction: 0.015 per step taken

Step-level rewards: +0.5 (conflict-free proposal), +0.2 (reschedulable conflicts), -0.3 (non-reschedulable conflicts), -0.1/-0.2 (invalid actions).

### Tasks (3 difficulty levels)

JSON scenarios in `server/scenarios/`:
- **task1_easy** — 2 attendees, free slot exists, no rescheduling needed. Expected score: 0.8–1.0
- **task2_medium** — 3 attendees, requires 1 rescheduling. Expected score: 0.5–0.8
- **task3_hard** — 4 attendees, multiple overlapping conflicts, cascading rescheduling. Expected score: 0.2–0.6

### Key Constraint: Meeting IDs

Format is `{attendee}_{start_iso}` (e.g., `user1_2025-04-07T09:00:00+00:00`). Used by `_find_meeting()` to look up calendar entries for rescheduling.

## Hackathon Submission Requirements

- `openenv validate` must pass
- Dockerfile in root directory (not `/server`)
- `inference.py` in root, uses `[START]`/`[STEP]`/`[END]` stdout format
- 3+ tasks with graders scoring 0.0–1.0 with diverse scores
- Runtime < 20 minutes on vcpu=2, memory=8GB
- Deploy via `openenv push` to HF Spaces

## Environment Variables (for LLM-based inference)

Defined in `.env` (never commit):
```
API_BASE_URL    # HF Router endpoint (default: https://router.huggingface.co/v1)
MODEL_NAME      # Model identifier (default: Qwen/Qwen2.5-72B-Instruct)
HF_TOKEN        # Hugging Face API key
```