File size: 14,177 Bytes
e3d838a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
---

title: SchedulingOptEnv
emoji: πŸ—“οΈ
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
tags:
  - openenv
  - reinforcement-learning
  - scheduling
  - agent
license: mit
---


<h1 align="center">SchedulingOptEnv</h1>
<h3 align="center">A Markov Decision Environment for Training Autonomous<br>Scheduling Optimisation Agents</h3>

<p align="center"><em>Meta Γ— Scaler β€” OpenEnv Hackathon Submission</em></p>

<p align="center">
  <img src="https://img.shields.io/badge/python-3.11+-blue" alt="Python 3.11+">
  <img src="https://img.shields.io/badge/framework-FastAPI-009688" alt="FastAPI">
  <img src="https://img.shields.io/badge/models-Pydantic%20v2-e92063" alt="Pydantic v2">
  <img src="https://img.shields.io/badge/deploy-Docker%20%7C%20HF%20Spaces-yellow" alt="Docker | HF Spaces">
  <img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License">
</p>

---

## Abstract

We present **SchedulingOptEnv**, a real-world training environment for autonomous AI agents built upon the OpenEnv framework. The environment formalises combinatorial scheduling optimisation as a sequential decision problem, exposing agents to three progressively challenging sub-tasks: binary feasibility determination, multi-class constraint-violation classification, and full schedule repair. Each task is paired with a structured, differentiable reward function that provides dense, partial-progress signals rather than sparse binary outcomes. A 12-instance scheduling corpus covering five distinct constraint-violation classes, a FastAPI inference server, and a GPT-4o-mini baseline are included. The environment is deployable as a Docker container on Hugging Face Spaces with a single command.

---

## 1. Introduction

Combinatorial scheduling β€” the assignment of jobs to machines subject to resource, temporal, and precedence constraints β€” is a foundational problem in operations research, manufacturing, cloud computing, and logistics. Despite its industrial importance, existing benchmarks for evaluating AI agents on scheduling tasks are either purely offline (single-pass solution quality) or narrowly scoped to continuous optimisation rather than the constraint-satisfaction and repair workflow practised by human planners.

OpenEnv [1] provides an abstraction layer for building *interactive* environments where agents act, receive graded feedback, and improve across episodes. SchedulingOptEnv fills a gap by framing schedule analysis and repair as a Markov Decision Process (MDP) with:

- A well-defined **observation space** (JSON-encoded scheduling instance, task context, step counter)
- A structured **action space** (categorical labels or JSON repair schedules)
- A **multi-component reward function** that awards partial credit for structurally valid but suboptimal repairs
- Three **difficulty tiers** mirroring the cognitive complexity gradient faced by human schedulers

---

## 2. Environment Design

### 2.1 MDP Formulation

| Component | Definition |
|-----------|-----------|
| State *S* | Current scheduling instance, task type, step count, episode history |
| Observation *O* | `{schedule_instance: str (JSON), task_id, context, step_number}` |
| Action *A* | `{response: str, task_id: str}` |
| Reward *R* | Float ∈ [0.0, 1.0] from task-specific grader |
| Horizon *T* | Task-dependent: 3 / 5 / 8 steps |
| Terminal | *done* = True when *T* reached or *R* β‰₯ 0.95 |

### 2.2 Scheduling Instance Corpus

The environment ships with **12 curated scheduling instances** spanning five constraint-violation classes plus two fully feasible baselines. Instances are drawn from a task-aware pool: feasibility-check episodes see all 12, while classification and repair episodes see only the 10 infeasible instances.

| # | Feasible | Violation Class | Description |
|---|----------|----------------|-------------|
| 0 | No | `resource_overload` | J1 and J2 overlap on single-capacity machine M1 |
| 1 | No | `deadline_violation` | J1 starts late and finishes after hard deadline |
| 2 | No | `precedence_violation` | J2 starts before its predecessor J1 finishes |
| 3 | No | `availability_conflict` | J1 scheduled outside machine operating hours |
| 4 | No | `capacity_exceeded` | 3 concurrent jobs on capacity-2 machine |
| 5 | No | `resource_overload` | Pairwise overlap of J1 and J2 on capacity-1 machine |
| 6 | No | `deadline_violation` | Precedence chain forces J3 past hard deadline |
| 7 | No | `precedence_violation` | J3 starts before both predecessors complete |
| 8 | No | `availability_conflict` | J1 extends into machine maintenance window |
| 9 | No | `capacity_exceeded` | 4 concurrent jobs on capacity-3 machine |
| 10 | Yes | β€” | Fully feasible 3-job, 2-machine schedule |
| 11 | Yes | β€” | Fully feasible 5-job, 3-machine schedule with precedence |

---

## 3. Tasks

### Task 1 β€” Feasibility Check *(Easy)*

**Objective:** Given a JSON-encoded scheduling instance (jobs, machines, proposed assignments), determine whether the schedule satisfies all constraints.

**Action space:** `{"feasible", "infeasible"}`

**Grading function:**

```

R(a, g) = 1.0   if normalise(a) == ground_truth

          0.1   if a is non-empty but incorrect

          0.0   if a is empty

```

**Episode horizon:** 3 steps. **Target agent accuracy:** ~90%.

---

### Task 2 β€” Conflict Classification *(Medium)*

**Objective:** Identify the constraint violation present in an infeasible schedule from the closed vocabulary:
`{resource_overload, deadline_violation, precedence_violation, availability_conflict, capacity_exceeded}`

**Grading function:**

```

R(a, g) = 1.0   if a == ground_truth                             (exact)

          0.5   if a ∈ related_group(ground_truth)               (partial)

          0.1   if a ∈ valid_categories \ related_group(g)       (wrong family)

          0.0   if a βˆ‰ valid_categories                          (unparseable)

```

where `related_groups = [{resource_overload, capacity_exceeded}, {deadline_violation, precedence_violation}]`.

**Episode horizon:** 5 steps. **Target agent accuracy:** ~60%.

---

### Task 3 β€” Schedule Repair *(Hard)*

**Objective:** Return a corrected schedule as a JSON object that resolves all constraint violations and minimises total makespan.

**Required JSON format:**
```json

{

  "assignments": [

    {"job_id": "J1", "machine_id": "M1", "start_time": 0},

    {"job_id": "J2", "machine_id": "M1", "start_time": 4}

  ]

}

```

**Grading function (additive, max 1.0):**

```

R(a, g) = 0.2 Γ— parseable_json(a)

        + 0.2 Γ— valid_schema(a, g)

        + 0.4 Γ— constraint_satisfaction_ratio(a, g)

        + 0.2 Γ— optimality_score(makespan(a), makespan*(g))

```

where:
- `parseable_json(a)` β€” 1 if the response parses as valid JSON, else 0
- `valid_schema(a, g)` β€” 1 if all required fields are present and all jobs are assigned, else 0
- `constraint_satisfaction_ratio(a, g)` β€” fraction of four constraint categories satisfied:
  capacity, deadlines, precedence, availability (each worth 0.25)
- `optimality_score(m, m*)` β€” 1.0 if *m* ≀ 1.30Β·*m** ; 0.5 if *m* ≀ 1.60Β·*m** ; 0 otherwise

**Episode horizon:** 8 steps. **Target agent accuracy:** ~30%.

---

## 4. Server API

The environment is exposed over HTTP via a FastAPI server on port **7860** (Hugging Face Spaces default).

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/health` | Liveness probe β€” returns `{"status": "ok"}` |
| `POST` | `/reset` | Begin new episode: `{"task_id": "feasibility_check"}` |
| `POST` | `/step` | Submit action: `{"response": "infeasible", "task_id": "feasibility_check"}` |
| `GET` | `/state` | Full internal state snapshot |
| `GET` | `/tasks` | Task catalogue with action schemas |
| `POST` | `/grader` | Direct grader invocation for offline evaluation |
| `GET` | `/baseline` | Trigger baseline inference; returns per-task scores |

---

## 5. Baseline

A standalone inference script (`baseline.py`) evaluates GPT-4o-mini on all three tasks. When `OPENAI_API_KEY` is not set, the script falls back to oracle mock responses, enabling offline verification of the grading pipeline without API access.

### 5.1 Baseline Scores (Mock / Oracle)

| Task | Instances | Average Score |
|------|-----------|--------------|
| Feasibility Check | 12 | 1.000 |
| Conflict Classification | 10 | 1.000 |
| Schedule Repair | 10 | 1.000 |
| **Overall** | | **1.000** |

---

## 6. Setup and Deployment

### 6.1 Prerequisites

| Requirement | Version |
|-------------|---------|
| Python | β‰₯ 3.11 |
| pip | β‰₯ 22.0 |
| Docker *(optional)* | β‰₯ 20.10 |
| Git | β‰₯ 2.30 |

### 6.2 Local Installation

```bash

# 1. Clone the repository

git clone https://github.com/Vittal-Mukunda/OpenEnv-Hackathon-Meta-x-Scaler.git

cd OpenEnv-Hackathon-Meta-x-Scaler



# 2. Create and activate a virtual environment (recommended)

python -m venv .venv

source .venv/bin/activate        # Linux / macOS

# .venv\Scripts\activate         # Windows



# 3. Install dependencies

pip install -r requirements.txt



# 4. Launch the server

uvicorn server:app --host 0.0.0.0 --port 7860



# 5. Verify the server is running

curl http://localhost:7860/health

# Expected: {"status":"ok"}

```

### 6.3 Docker Deployment

```bash

# Build the image

docker build -t scheduling-opt-env .



# Run the container

docker run -p 7860:7860 scheduling-opt-env



# Verify

curl http://localhost:7860/health

```

### 6.4 Hugging Face Spaces

Push this repository to a Hugging Face Space configured with the **Docker** SDK. The server listens on port 7860, which Spaces exposes automatically. No additional configuration is required.

### 6.5 Running the Baseline

```bash

# Without API key (uses oracle mock responses β€” scores 1.0 on all tasks)

python baseline.py



# With OpenAI API key (evaluates GPT-4o-mini)

export OPENAI_API_KEY=sk-...

python baseline.py

```

---

## 7. Example Interaction

```bash

# 1. Health check

curl http://localhost:7860/health



# 2. Start a feasibility-check episode

curl -X POST http://localhost:7860/reset \

  -H "Content-Type: application/json" \

  -d '{"task_id": "feasibility_check"}'



# 3. Submit a feasibility answer

curl -X POST http://localhost:7860/step \

  -H "Content-Type: application/json" \

  -d '{"response": "infeasible", "task_id": "feasibility_check"}'



# 4. Start a conflict-classification episode

curl -X POST http://localhost:7860/reset \

  -H "Content-Type: application/json" \

  -d '{"task_id": "conflict_classification"}'



# 5. Classify the violation

curl -X POST http://localhost:7860/step \

  -H "Content-Type: application/json" \

  -d '{"response": "resource_overload", "task_id": "conflict_classification"}'



# 6. Start a schedule-repair episode

curl -X POST http://localhost:7860/reset \

  -H "Content-Type: application/json" \

  -d '{"task_id": "schedule_repair"}'



# 7. Submit a repaired schedule

curl -X POST http://localhost:7860/step \

  -H "Content-Type: application/json" \

  -d '{

    "response": "{\"assignments\": [{\"job_id\": \"J1\", \"machine_id\": \"M1\", \"start_time\": 0}]}",

    "task_id": "schedule_repair"

  }'



# 8. Inspect environment state

curl http://localhost:7860/state



# 9. Invoke a grader directly

curl -X POST http://localhost:7860/grader \

  -H "Content-Type: application/json" \

  -d '{

    "action": {"response": "deadline_violation", "task_id": "conflict_classification"},

    "ground_truth": {"violation_type": "deadline_violation"}

  }'

```

---

## 8. Project Structure

```

.

β”œβ”€β”€ openenv.yaml                  # OpenEnv metadata manifest

β”œβ”€β”€ models.py                     # Pydantic v2 data models (Observation, Action, Reward)

β”œβ”€β”€ environment.py                # SchedulingOptEnv core (reset / step / state + instance bank)

β”œβ”€β”€ server.py                     # FastAPI HTTP server (7 endpoints)

β”œβ”€β”€ baseline.py                   # GPT-4o-mini baseline with oracle fallback

β”œβ”€β”€ Dockerfile                    # Container definition (python:3.11-slim, port 7860)

β”œβ”€β”€ requirements.txt              # Python dependencies

β”œβ”€β”€ tasks/

β”‚   β”œβ”€β”€ __init__.py               # Task module exports

β”‚   β”œβ”€β”€ task1_easy.py             # Feasibility check β€” episode runner + instance accessor

β”‚   β”œβ”€β”€ task2_medium.py           # Conflict classification β€” episode runner + instance accessor

β”‚   └── task3_hard.py             # Schedule repair β€” episode runner + instance accessor

└── graders/

    β”œβ”€β”€ __init__.py               # Grader exports (FeasibilityGrader, ConflictGrader, RepairGrader)

    β”œβ”€β”€ grader_detection.py       # Grader: feasibility (binary, synonym-aware)

    β”œβ”€β”€ grader_classification.py  # Grader: conflict classification (family-aware partial credit)

    └── grader_fix.py             # Grader: schedule repair (4-component additive reward)

```

---

## 9. Dependencies

| Package | Version | Purpose |
|---------|---------|---------|
| `fastapi` | β‰₯ 0.104 | HTTP server framework |
| `uvicorn` | β‰₯ 0.24 | ASGI server |
| `pydantic` | β‰₯ 2.5 | Data validation and serialisation |
| `openai` | β‰₯ 1.6 | LLM baseline inference |
| `pyyaml` | β‰₯ 6.0 | YAML manifest parsing |
| `httpx` | β‰₯ 0.25 | Async HTTP client |

---

## 10. References

[1] OpenEnv Framework. *Building Real-World AI Agent Training Environments*. Meta Γ— Scaler Hackathon, 2026.

[2] Pinedo, M. L. *Scheduling: Theory, Algorithms, and Systems* (5th ed.). Springer, 2016.

[3] Garey, M. R., & Johnson, D. S. *Computers and Intractability: A Guide to the Theory of NP-Completeness*. W. H. Freeman, 1979.

[4] Zhang, C. et al. *Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning*. NeurIPS 2020.

[5] Kwon, Y.-D. et al. *POMO: Policy Optimization with Multiple Optima for Reinforcement Learning*. NeurIPS 2020.