openenv-hackathon / README.md
Vittal-M's picture
Upload README.md with huggingface_hub
e3d838a verified
metadata
title: SchedulingOptEnv
emoji: πŸ—“οΈ
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
tags:
  - openenv
  - reinforcement-learning
  - scheduling
  - agent
license: mit

SchedulingOptEnv

A Markov Decision Environment for Training Autonomous
Scheduling Optimisation Agents

Meta Γ— Scaler β€” OpenEnv Hackathon Submission

Python 3.11+ FastAPI Pydantic v2 Docker | HF Spaces MIT License


Abstract

We present SchedulingOptEnv, a real-world training environment for autonomous AI agents built upon the OpenEnv framework. The environment formalises combinatorial scheduling optimisation as a sequential decision problem, exposing agents to three progressively challenging sub-tasks: binary feasibility determination, multi-class constraint-violation classification, and full schedule repair. Each task is paired with a structured, differentiable reward function that provides dense, partial-progress signals rather than sparse binary outcomes. A 12-instance scheduling corpus covering five distinct constraint-violation classes, a FastAPI inference server, and a GPT-4o-mini baseline are included. The environment is deployable as a Docker container on Hugging Face Spaces with a single command.


1. Introduction

Combinatorial scheduling β€” the assignment of jobs to machines subject to resource, temporal, and precedence constraints β€” is a foundational problem in operations research, manufacturing, cloud computing, and logistics. Despite its industrial importance, existing benchmarks for evaluating AI agents on scheduling tasks are either purely offline (single-pass solution quality) or narrowly scoped to continuous optimisation rather than the constraint-satisfaction and repair workflow practised by human planners.

OpenEnv [1] provides an abstraction layer for building interactive environments where agents act, receive graded feedback, and improve across episodes. SchedulingOptEnv fills a gap by framing schedule analysis and repair as a Markov Decision Process (MDP) with:

  • A well-defined observation space (JSON-encoded scheduling instance, task context, step counter)
  • A structured action space (categorical labels or JSON repair schedules)
  • A multi-component reward function that awards partial credit for structurally valid but suboptimal repairs
  • Three difficulty tiers mirroring the cognitive complexity gradient faced by human schedulers

2. Environment Design

2.1 MDP Formulation

Component Definition
State S Current scheduling instance, task type, step count, episode history
Observation O {schedule_instance: str (JSON), task_id, context, step_number}
Action A {response: str, task_id: str}
Reward R Float ∈ [0.0, 1.0] from task-specific grader
Horizon T Task-dependent: 3 / 5 / 8 steps
Terminal done = True when T reached or R β‰₯ 0.95

2.2 Scheduling Instance Corpus

The environment ships with 12 curated scheduling instances spanning five constraint-violation classes plus two fully feasible baselines. Instances are drawn from a task-aware pool: feasibility-check episodes see all 12, while classification and repair episodes see only the 10 infeasible instances.

# Feasible Violation Class Description
0 No resource_overload J1 and J2 overlap on single-capacity machine M1
1 No deadline_violation J1 starts late and finishes after hard deadline
2 No precedence_violation J2 starts before its predecessor J1 finishes
3 No availability_conflict J1 scheduled outside machine operating hours
4 No capacity_exceeded 3 concurrent jobs on capacity-2 machine
5 No resource_overload Pairwise overlap of J1 and J2 on capacity-1 machine
6 No deadline_violation Precedence chain forces J3 past hard deadline
7 No precedence_violation J3 starts before both predecessors complete
8 No availability_conflict J1 extends into machine maintenance window
9 No capacity_exceeded 4 concurrent jobs on capacity-3 machine
10 Yes β€” Fully feasible 3-job, 2-machine schedule
11 Yes β€” Fully feasible 5-job, 3-machine schedule with precedence

3. Tasks

Task 1 β€” Feasibility Check (Easy)

Objective: Given a JSON-encoded scheduling instance (jobs, machines, proposed assignments), determine whether the schedule satisfies all constraints.

Action space: {"feasible", "infeasible"}

Grading function:

R(a, g) = 1.0   if normalise(a) == ground_truth
          0.1   if a is non-empty but incorrect
          0.0   if a is empty

Episode horizon: 3 steps. Target agent accuracy: ~90%.


Task 2 β€” Conflict Classification (Medium)

Objective: Identify the constraint violation present in an infeasible schedule from the closed vocabulary: {resource_overload, deadline_violation, precedence_violation, availability_conflict, capacity_exceeded}

Grading function:

R(a, g) = 1.0   if a == ground_truth                             (exact)
          0.5   if a ∈ related_group(ground_truth)               (partial)
          0.1   if a ∈ valid_categories \ related_group(g)       (wrong family)
          0.0   if a βˆ‰ valid_categories                          (unparseable)

where related_groups = [{resource_overload, capacity_exceeded}, {deadline_violation, precedence_violation}].

Episode horizon: 5 steps. Target agent accuracy: ~60%.


Task 3 β€” Schedule Repair (Hard)

Objective: Return a corrected schedule as a JSON object that resolves all constraint violations and minimises total makespan.

Required JSON format:

{
  "assignments": [
    {"job_id": "J1", "machine_id": "M1", "start_time": 0},
    {"job_id": "J2", "machine_id": "M1", "start_time": 4}
  ]
}

Grading function (additive, max 1.0):

R(a, g) = 0.2 Γ— parseable_json(a)
        + 0.2 Γ— valid_schema(a, g)
        + 0.4 Γ— constraint_satisfaction_ratio(a, g)
        + 0.2 Γ— optimality_score(makespan(a), makespan*(g))

where:

  • parseable_json(a) β€” 1 if the response parses as valid JSON, else 0
  • valid_schema(a, g) β€” 1 if all required fields are present and all jobs are assigned, else 0
  • constraint_satisfaction_ratio(a, g) β€” fraction of four constraint categories satisfied: capacity, deadlines, precedence, availability (each worth 0.25)
  • optimality_score(m, m*) β€” 1.0 if m ≀ 1.30Β·m* ; 0.5 if m ≀ 1.60Β·m* ; 0 otherwise

Episode horizon: 8 steps. Target agent accuracy: ~30%.


4. Server API

The environment is exposed over HTTP via a FastAPI server on port 7860 (Hugging Face Spaces default).

Method Endpoint Description
GET /health Liveness probe β€” returns {"status": "ok"}
POST /reset Begin new episode: {"task_id": "feasibility_check"}
POST /step Submit action: {"response": "infeasible", "task_id": "feasibility_check"}
GET /state Full internal state snapshot
GET /tasks Task catalogue with action schemas
POST /grader Direct grader invocation for offline evaluation
GET /baseline Trigger baseline inference; returns per-task scores

5. Baseline

A standalone inference script (baseline.py) evaluates GPT-4o-mini on all three tasks. When OPENAI_API_KEY is not set, the script falls back to oracle mock responses, enabling offline verification of the grading pipeline without API access.

5.1 Baseline Scores (Mock / Oracle)

Task Instances Average Score
Feasibility Check 12 1.000
Conflict Classification 10 1.000
Schedule Repair 10 1.000
Overall 1.000

6. Setup and Deployment

6.1 Prerequisites

Requirement Version
Python β‰₯ 3.11
pip β‰₯ 22.0
Docker (optional) β‰₯ 20.10
Git β‰₯ 2.30

6.2 Local Installation

# 1. Clone the repository
git clone https://github.com/Vittal-Mukunda/OpenEnv-Hackathon-Meta-x-Scaler.git
cd OpenEnv-Hackathon-Meta-x-Scaler

# 2. Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate        # Linux / macOS
# .venv\Scripts\activate         # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Launch the server
uvicorn server:app --host 0.0.0.0 --port 7860

# 5. Verify the server is running
curl http://localhost:7860/health
# Expected: {"status":"ok"}

6.3 Docker Deployment

# Build the image
docker build -t scheduling-opt-env .

# Run the container
docker run -p 7860:7860 scheduling-opt-env

# Verify
curl http://localhost:7860/health

6.4 Hugging Face Spaces

Push this repository to a Hugging Face Space configured with the Docker SDK. The server listens on port 7860, which Spaces exposes automatically. No additional configuration is required.

6.5 Running the Baseline

# Without API key (uses oracle mock responses β€” scores 1.0 on all tasks)
python baseline.py

# With OpenAI API key (evaluates GPT-4o-mini)
export OPENAI_API_KEY=sk-...
python baseline.py

7. Example Interaction

# 1. Health check
curl http://localhost:7860/health

# 2. Start a feasibility-check episode
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "feasibility_check"}'

# 3. Submit a feasibility answer
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"response": "infeasible", "task_id": "feasibility_check"}'

# 4. Start a conflict-classification episode
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "conflict_classification"}'

# 5. Classify the violation
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"response": "resource_overload", "task_id": "conflict_classification"}'

# 6. Start a schedule-repair episode
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "schedule_repair"}'

# 7. Submit a repaired schedule
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{
    "response": "{\"assignments\": [{\"job_id\": \"J1\", \"machine_id\": \"M1\", \"start_time\": 0}]}",
    "task_id": "schedule_repair"
  }'

# 8. Inspect environment state
curl http://localhost:7860/state

# 9. Invoke a grader directly
curl -X POST http://localhost:7860/grader \
  -H "Content-Type: application/json" \
  -d '{
    "action": {"response": "deadline_violation", "task_id": "conflict_classification"},
    "ground_truth": {"violation_type": "deadline_violation"}
  }'

8. Project Structure

.
β”œβ”€β”€ openenv.yaml                  # OpenEnv metadata manifest
β”œβ”€β”€ models.py                     # Pydantic v2 data models (Observation, Action, Reward)
β”œβ”€β”€ environment.py                # SchedulingOptEnv core (reset / step / state + instance bank)
β”œβ”€β”€ server.py                     # FastAPI HTTP server (7 endpoints)
β”œβ”€β”€ baseline.py                   # GPT-4o-mini baseline with oracle fallback
β”œβ”€β”€ Dockerfile                    # Container definition (python:3.11-slim, port 7860)
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ tasks/
β”‚   β”œβ”€β”€ __init__.py               # Task module exports
β”‚   β”œβ”€β”€ task1_easy.py             # Feasibility check β€” episode runner + instance accessor
β”‚   β”œβ”€β”€ task2_medium.py           # Conflict classification β€” episode runner + instance accessor
β”‚   └── task3_hard.py             # Schedule repair β€” episode runner + instance accessor
└── graders/
    β”œβ”€β”€ __init__.py               # Grader exports (FeasibilityGrader, ConflictGrader, RepairGrader)
    β”œβ”€β”€ grader_detection.py       # Grader: feasibility (binary, synonym-aware)
    β”œβ”€β”€ grader_classification.py  # Grader: conflict classification (family-aware partial credit)
    └── grader_fix.py             # Grader: schedule repair (4-component additive reward)

9. Dependencies

Package Version Purpose
fastapi β‰₯ 0.104 HTTP server framework
uvicorn β‰₯ 0.24 ASGI server
pydantic β‰₯ 2.5 Data validation and serialisation
openai β‰₯ 1.6 LLM baseline inference
pyyaml β‰₯ 6.0 YAML manifest parsing
httpx β‰₯ 0.25 Async HTTP client

10. References

[1] OpenEnv Framework. Building Real-World AI Agent Training Environments. Meta Γ— Scaler Hackathon, 2026.

[2] Pinedo, M. L. Scheduling: Theory, Algorithms, and Systems (5th ed.). Springer, 2016.

[3] Garey, M. R., & Johnson, D. S. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.

[4] Zhang, C. et al. Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning. NeurIPS 2020.

[5] Kwon, Y.-D. et al. POMO: Policy Optimization with Multiple Optima for Reinforcement Learning. NeurIPS 2020.