Spaces:

Vittal-M
/

openenv-hackathon

Sleeping

App Files Files Community

openenv-hackathon / README.md

Vittal-M

Upload README.md with huggingface_hub

e3d838a verified 2 months ago

preview code

raw

history blame contribute delete

14.2 kB

metadata

title: SchedulingOptEnv
emoji: 🗓️
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
tags:
  - openenv
  - reinforcement-learning
  - scheduling
  - agent
license: mit

SchedulingOptEnv

A Markov Decision Environment for Training Autonomous
Scheduling Optimisation Agents

Meta × Scaler — OpenEnv Hackathon Submission

Abstract

We present SchedulingOptEnv, a real-world training environment for autonomous AI agents built upon the OpenEnv framework. The environment formalises combinatorial scheduling optimisation as a sequential decision problem, exposing agents to three progressively challenging sub-tasks: binary feasibility determination, multi-class constraint-violation classification, and full schedule repair. Each task is paired with a structured, differentiable reward function that provides dense, partial-progress signals rather than sparse binary outcomes. A 12-instance scheduling corpus covering five distinct constraint-violation classes, a FastAPI inference server, and a GPT-4o-mini baseline are included. The environment is deployable as a Docker container on Hugging Face Spaces with a single command.

1. Introduction

Combinatorial scheduling — the assignment of jobs to machines subject to resource, temporal, and precedence constraints — is a foundational problem in operations research, manufacturing, cloud computing, and logistics. Despite its industrial importance, existing benchmarks for evaluating AI agents on scheduling tasks are either purely offline (single-pass solution quality) or narrowly scoped to continuous optimisation rather than the constraint-satisfaction and repair workflow practised by human planners.

OpenEnv [1] provides an abstraction layer for building interactive environments where agents act, receive graded feedback, and improve across episodes. SchedulingOptEnv fills a gap by framing schedule analysis and repair as a Markov Decision Process (MDP) with:

A well-defined observation space (JSON-encoded scheduling instance, task context, step counter)
A structured action space (categorical labels or JSON repair schedules)
A multi-component reward function that awards partial credit for structurally valid but suboptimal repairs
Three difficulty tiers mirroring the cognitive complexity gradient faced by human schedulers

2. Environment Design

2.1 MDP Formulation

Component	Definition
State S	Current scheduling instance, task type, step count, episode history
Observation O	`{schedule_instance: str (JSON), task_id, context, step_number}`
Action A	`{response: str, task_id: str}`
Reward R	Float ∈ [0.0, 1.0] from task-specific grader
Horizon T	Task-dependent: 3 / 5 / 8 steps
Terminal	done = True when T reached or R ≥ 0.95

2.2 Scheduling Instance Corpus

The environment ships with 12 curated scheduling instances spanning five constraint-violation classes plus two fully feasible baselines. Instances are drawn from a task-aware pool: feasibility-check episodes see all 12, while classification and repair episodes see only the 10 infeasible instances.

#	Feasible	Violation Class	Description
0	No	`resource_overload`	J1 and J2 overlap on single-capacity machine M1
1	No	`deadline_violation`	J1 starts late and finishes after hard deadline
2	No	`precedence_violation`	J2 starts before its predecessor J1 finishes
3	No	`availability_conflict`	J1 scheduled outside machine operating hours
4	No	`capacity_exceeded`	3 concurrent jobs on capacity-2 machine
5	No	`resource_overload`	Pairwise overlap of J1 and J2 on capacity-1 machine
6	No	`deadline_violation`	Precedence chain forces J3 past hard deadline
7	No	`precedence_violation`	J3 starts before both predecessors complete
8	No	`availability_conflict`	J1 extends into machine maintenance window
9	No	`capacity_exceeded`	4 concurrent jobs on capacity-3 machine
10	Yes	—	Fully feasible 3-job, 2-machine schedule
11	Yes	—	Fully feasible 5-job, 3-machine schedule with precedence

3. Tasks

Task 1 — Feasibility Check (Easy)

Objective: Given a JSON-encoded scheduling instance (jobs, machines, proposed assignments), determine whether the schedule satisfies all constraints.

Action space: {"feasible", "infeasible"}

Grading function:

R(a, g) = 1.0   if normalise(a) == ground_truth
          0.1   if a is non-empty but incorrect
          0.0   if a is empty

Episode horizon: 3 steps. Target agent accuracy: ~90%.

Task 2 — Conflict Classification (Medium)

Objective: Identify the constraint violation present in an infeasible schedule from the closed vocabulary: {resource_overload, deadline_violation, precedence_violation, availability_conflict, capacity_exceeded}

Grading function:

R(a, g) = 1.0   if a == ground_truth                             (exact)
          0.5   if a ∈ related_group(ground_truth)               (partial)
          0.1   if a ∈ valid_categories \ related_group(g)       (wrong family)
          0.0   if a ∉ valid_categories                          (unparseable)

where related_groups = [{resource_overload, capacity_exceeded}, {deadline_violation, precedence_violation}].

Episode horizon: 5 steps. Target agent accuracy: ~60%.

Task 3 — Schedule Repair (Hard)

Objective: Return a corrected schedule as a JSON object that resolves all constraint violations and minimises total makespan.

Required JSON format:

{
  "assignments": [
    {"job_id": "J1", "machine_id": "M1", "start_time": 0},
    {"job_id": "J2", "machine_id": "M1", "start_time": 4}
  ]
}

Grading function (additive, max 1.0):

R(a, g) = 0.2 × parseable_json(a)
        + 0.2 × valid_schema(a, g)
        + 0.4 × constraint_satisfaction_ratio(a, g)
        + 0.2 × optimality_score(makespan(a), makespan*(g))

where:

parseable_json(a) — 1 if the response parses as valid JSON, else 0
valid_schema(a, g) — 1 if all required fields are present and all jobs are assigned, else 0
constraint_satisfaction_ratio(a, g) — fraction of four constraint categories satisfied: capacity, deadlines, precedence, availability (each worth 0.25)
optimality_score(m, m*) — 1.0 if m ≤ 1.30·m* ; 0.5 if m ≤ 1.60·m* ; 0 otherwise

Episode horizon: 8 steps. Target agent accuracy: ~30%.

4. Server API

The environment is exposed over HTTP via a FastAPI server on port 7860 (Hugging Face Spaces default).

Method	Endpoint	Description
`GET`	`/health`	Liveness probe — returns `{"status": "ok"}`
`POST`	`/reset`	Begin new episode: `{"task_id": "feasibility_check"}`
`POST`	`/step`	Submit action: `{"response": "infeasible", "task_id": "feasibility_check"}`
`GET`	`/state`	Full internal state snapshot
`GET`	`/tasks`	Task catalogue with action schemas
`POST`	`/grader`	Direct grader invocation for offline evaluation
`GET`	`/baseline`	Trigger baseline inference; returns per-task scores

5. Baseline

A standalone inference script (baseline.py) evaluates GPT-4o-mini on all three tasks. When OPENAI_API_KEY is not set, the script falls back to oracle mock responses, enabling offline verification of the grading pipeline without API access.

5.1 Baseline Scores (Mock / Oracle)

Task	Instances	Average Score
Feasibility Check	12	1.000
Conflict Classification	10	1.000
Schedule Repair	10	1.000
Overall		1.000

6. Setup and Deployment

6.1 Prerequisites

Requirement	Version
Python	≥ 3.11
pip	≥ 22.0
Docker (optional)	≥ 20.10
Git	≥ 2.30

6.2 Local Installation

# 1. Clone the repository
git clone https://github.com/Vittal-Mukunda/OpenEnv-Hackathon-Meta-x-Scaler.git
cd OpenEnv-Hackathon-Meta-x-Scaler

# 2. Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate        # Linux / macOS
# .venv\Scripts\activate         # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Launch the server
uvicorn server:app --host 0.0.0.0 --port 7860

# 5. Verify the server is running
curl http://localhost:7860/health
# Expected: {"status":"ok"}

6.3 Docker Deployment

# Build the image
docker build -t scheduling-opt-env .

# Run the container
docker run -p 7860:7860 scheduling-opt-env

# Verify
curl http://localhost:7860/health

6.4 Hugging Face Spaces

Push this repository to a Hugging Face Space configured with the Docker SDK. The server listens on port 7860, which Spaces exposes automatically. No additional configuration is required.

6.5 Running the Baseline

# Without API key (uses oracle mock responses — scores 1.0 on all tasks)
python baseline.py

# With OpenAI API key (evaluates GPT-4o-mini)
export OPENAI_API_KEY=sk-...
python baseline.py

7. Example Interaction

# 1. Health check
curl http://localhost:7860/health

# 2. Start a feasibility-check episode
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "feasibility_check"}'

# 3. Submit a feasibility answer
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"response": "infeasible", "task_id": "feasibility_check"}'

# 4. Start a conflict-classification episode
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "conflict_classification"}'

# 5. Classify the violation
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"response": "resource_overload", "task_id": "conflict_classification"}'

# 6. Start a schedule-repair episode
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "schedule_repair"}'

# 7. Submit a repaired schedule
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{
    "response": "{\"assignments\": [{\"job_id\": \"J1\", \"machine_id\": \"M1\", \"start_time\": 0}]}",
    "task_id": "schedule_repair"
  }'

# 8. Inspect environment state
curl http://localhost:7860/state

# 9. Invoke a grader directly
curl -X POST http://localhost:7860/grader \
  -H "Content-Type: application/json" \
  -d '{
    "action": {"response": "deadline_violation", "task_id": "conflict_classification"},
    "ground_truth": {"violation_type": "deadline_violation"}
  }'

8. Project Structure

.
├── openenv.yaml                  # OpenEnv metadata manifest
├── models.py                     # Pydantic v2 data models (Observation, Action, Reward)
├── environment.py                # SchedulingOptEnv core (reset / step / state + instance bank)
├── server.py                     # FastAPI HTTP server (7 endpoints)
├── baseline.py                   # GPT-4o-mini baseline with oracle fallback
├── Dockerfile                    # Container definition (python:3.11-slim, port 7860)
├── requirements.txt              # Python dependencies
├── tasks/
│   ├── __init__.py               # Task module exports
│   ├── task1_easy.py             # Feasibility check — episode runner + instance accessor
│   ├── task2_medium.py           # Conflict classification — episode runner + instance accessor
│   └── task3_hard.py             # Schedule repair — episode runner + instance accessor
└── graders/
    ├── __init__.py               # Grader exports (FeasibilityGrader, ConflictGrader, RepairGrader)
    ├── grader_detection.py       # Grader: feasibility (binary, synonym-aware)
    ├── grader_classification.py  # Grader: conflict classification (family-aware partial credit)
    └── grader_fix.py             # Grader: schedule repair (4-component additive reward)

9. Dependencies

Package	Version	Purpose
`fastapi`	≥ 0.104	HTTP server framework
`uvicorn`	≥ 0.24	ASGI server
`pydantic`	≥ 2.5	Data validation and serialisation
`openai`	≥ 1.6	LLM baseline inference
`pyyaml`	≥ 6.0	YAML manifest parsing
`httpx`	≥ 0.25	Async HTTP client

10. References

[1] OpenEnv Framework. Building Real-World AI Agent Training Environments. Meta × Scaler Hackathon, 2026.

[2] Pinedo, M. L. Scheduling: Theory, Algorithms, and Systems (5th ed.). Springer, 2016.

[3] Garey, M. R., & Johnson, D. S. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.

[4] Zhang, C. et al. Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning. NeurIPS 2020.

[5] Kwon, Y.-D. et al. POMO: Policy Optimization with Multiple Optima for Reinforcement Learning. NeurIPS 2020.