Spaces:

Vittal-M
/

openenv-hackathon

Sleeping

App Files Files Community

Vittal-M commited on Apr 4

Commit

e3d838a

verified ·

1 Parent(s): 730b455

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +354 -10

README.md CHANGED Viewed

@@ -1,10 +1,354 @@
----
-title: Openenv Hackathon
-emoji: 📈
-colorFrom: green
-colorTo: green
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: SchedulingOptEnv
+emoji: 🗓️
+colorFrom: blue
+colorTo: green
+sdk: docker
+app_port: 7860
+tags:
+  - openenv
+  - reinforcement-learning
+  - scheduling
+  - agent
+license: mit
+---
+<h1 align="center">SchedulingOptEnv</h1>
+<h3 align="center">A Markov Decision Environment for Training Autonomous<br>Scheduling Optimisation Agents</h3>
+<p align="center"><em>Meta × Scaler — OpenEnv Hackathon Submission</em></p>
+<p align="center">
+  <img src="https://img.shields.io/badge/python-3.11+-blue" alt="Python 3.11+">
+  <img src="https://img.shields.io/badge/framework-FastAPI-009688" alt="FastAPI">
+  <img src="https://img.shields.io/badge/models-Pydantic%20v2-e92063" alt="Pydantic v2">
+  <img src="https://img.shields.io/badge/deploy-Docker%20%7C%20HF%20Spaces-yellow" alt="Docker | HF Spaces">
+  <img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License">
+</p>
+---
+## Abstract
+We present **SchedulingOptEnv**, a real-world training environment for autonomous AI agents built upon the OpenEnv framework. The environment formalises combinatorial scheduling optimisation as a sequential decision problem, exposing agents to three progressively challenging sub-tasks: binary feasibility determination, multi-class constraint-violation classification, and full schedule repair. Each task is paired with a structured, differentiable reward function that provides dense, partial-progress signals rather than sparse binary outcomes. A 12-instance scheduling corpus covering five distinct constraint-violation classes, a FastAPI inference server, and a GPT-4o-mini baseline are included. The environment is deployable as a Docker container on Hugging Face Spaces with a single command.
+---
+## 1. Introduction
+Combinatorial scheduling — the assignment of jobs to machines subject to resource, temporal, and precedence constraints — is a foundational problem in operations research, manufacturing, cloud computing, and logistics. Despite its industrial importance, existing benchmarks for evaluating AI agents on scheduling tasks are either purely offline (single-pass solution quality) or narrowly scoped to continuous optimisation rather than the constraint-satisfaction and repair workflow practised by human planners.
+OpenEnv [1] provides an abstraction layer for building *interactive* environments where agents act, receive graded feedback, and improve across episodes. SchedulingOptEnv fills a gap by framing schedule analysis and repair as a Markov Decision Process (MDP) with:
+- A well-defined **observation space** (JSON-encoded scheduling instance, task context, step counter)
+- A structured **action space** (categorical labels or JSON repair schedules)
+- A **multi-component reward function** that awards partial credit for structurally valid but suboptimal repairs
+- Three **difficulty tiers** mirroring the cognitive complexity gradient faced by human schedulers
+---
+## 2. Environment Design
+### 2.1 MDP Formulation
+| Component | Definition |
+|-----------|-----------|
+| State *S* | Current scheduling instance, task type, step count, episode history |
+| Observation *O* | `{schedule_instance: str (JSON), task_id, context, step_number}` |
+| Action *A* | `{response: str, task_id: str}` |
+| Reward *R* | Float ∈ [0.0, 1.0] from task-specific grader |
+| Horizon *T* | Task-dependent: 3 / 5 / 8 steps |
+| Terminal | *done* = True when *T* reached or *R* ≥ 0.95 |
+### 2.2 Scheduling Instance Corpus
+The environment ships with **12 curated scheduling instances** spanning five constraint-violation classes plus two fully feasible baselines. Instances are drawn from a task-aware pool: feasibility-check episodes see all 12, while classification and repair episodes see only the 10 infeasible instances.
+| # | Feasible | Violation Class | Description |
+|---|----------|----------------|-------------|
+| 0 | No | `resource_overload` | J1 and J2 overlap on single-capacity machine M1 |
+| 1 | No | `deadline_violation` | J1 starts late and finishes after hard deadline |
+| 2 | No | `precedence_violation` | J2 starts before its predecessor J1 finishes |
+| 3 | No | `availability_conflict` | J1 scheduled outside machine operating hours |
+| 4 | No | `capacity_exceeded` | 3 concurrent jobs on capacity-2 machine |
+| 5 | No | `resource_overload` | Pairwise overlap of J1 and J2 on capacity-1 machine |
+| 6 | No | `deadline_violation` | Precedence chain forces J3 past hard deadline |
+| 7 | No | `precedence_violation` | J3 starts before both predecessors complete |
+| 8 | No | `availability_conflict` | J1 extends into machine maintenance window |
+| 9 | No | `capacity_exceeded` | 4 concurrent jobs on capacity-3 machine |
+| 10 | Yes | — | Fully feasible 3-job, 2-machine schedule |
+| 11 | Yes | — | Fully feasible 5-job, 3-machine schedule with precedence |
+---
+## 3. Tasks
+### Task 1 — Feasibility Check *(Easy)*
+**Objective:** Given a JSON-encoded scheduling instance (jobs, machines, proposed assignments), determine whether the schedule satisfies all constraints.
+**Action space:** `{"feasible", "infeasible"}`
+**Grading function:**
+```
+R(a, g) = 1.0   if normalise(a) == ground_truth
+          0.1   if a is non-empty but incorrect
+          0.0   if a is empty
+```
+**Episode horizon:** 3 steps. **Target agent accuracy:** ~90%.
+---
+### Task 2 — Conflict Classification *(Medium)*
+**Objective:** Identify the constraint violation present in an infeasible schedule from the closed vocabulary:
+`{resource_overload, deadline_violation, precedence_violation, availability_conflict, capacity_exceeded}`
+**Grading function:**
+```
+R(a, g) = 1.0   if a == ground_truth                             (exact)
+          0.5   if a ∈ related_group(ground_truth)               (partial)
+          0.1   if a ∈ valid_categories \ related_group(g)       (wrong family)
+          0.0   if a ∉ valid_categories                          (unparseable)
+```
+where `related_groups = [{resource_overload, capacity_exceeded}, {deadline_violation, precedence_violation}]`.
+**Episode horizon:** 5 steps. **Target agent accuracy:** ~60%.
+---
+### Task 3 — Schedule Repair *(Hard)*
+**Objective:** Return a corrected schedule as a JSON object that resolves all constraint violations and minimises total makespan.
+**Required JSON format:**
+```json
+{
+  "assignments": [
+    {"job_id": "J1", "machine_id": "M1", "start_time": 0},
+    {"job_id": "J2", "machine_id": "M1", "start_time": 4}
+  ]
+}
+```
+**Grading function (additive, max 1.0):**
+```
+R(a, g) = 0.2 × parseable_json(a)
+        + 0.2 × valid_schema(a, g)
+        + 0.4 × constraint_satisfaction_ratio(a, g)
+        + 0.2 × optimality_score(makespan(a), makespan*(g))
+```
+where:
+- `parseable_json(a)` — 1 if the response parses as valid JSON, else 0
+- `valid_schema(a, g)` — 1 if all required fields are present and all jobs are assigned, else 0
+- `constraint_satisfaction_ratio(a, g)` — fraction of four constraint categories satisfied:
+  capacity, deadlines, precedence, availability (each worth 0.25)
+- `optimality_score(m, m*)` — 1.0 if *m* ≤ 1.30·*m** ; 0.5 if *m* ≤ 1.60·*m** ; 0 otherwise
+**Episode horizon:** 8 steps. **Target agent accuracy:** ~30%.
+---
+## 4. Server API
+The environment is exposed over HTTP via a FastAPI server on port **7860** (Hugging Face Spaces default).
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| `GET` | `/health` | Liveness probe — returns `{"status": "ok"}` |
+| `POST` | `/reset` | Begin new episode: `{"task_id": "feasibility_check"}` |
+| `POST` | `/step` | Submit action: `{"response": "infeasible", "task_id": "feasibility_check"}` |
+| `GET` | `/state` | Full internal state snapshot |
+| `GET` | `/tasks` | Task catalogue with action schemas |
+| `POST` | `/grader` | Direct grader invocation for offline evaluation |
+| `GET` | `/baseline` | Trigger baseline inference; returns per-task scores |
+---
+## 5. Baseline
+A standalone inference script (`baseline.py`) evaluates GPT-4o-mini on all three tasks. When `OPENAI_API_KEY` is not set, the script falls back to oracle mock responses, enabling offline verification of the grading pipeline without API access.
+### 5.1 Baseline Scores (Mock / Oracle)
+| Task | Instances | Average Score |
+|------|-----------|--------------|
+| Feasibility Check | 12 | 1.000 |
+| Conflict Classification | 10 | 1.000 |
+| Schedule Repair | 10 | 1.000 |
+| **Overall** | | **1.000** |
+---
+## 6. Setup and Deployment
+### 6.1 Prerequisites
+| Requirement | Version |
+|-------------|---------|
+| Python | ≥ 3.11 |
+| pip | ≥ 22.0 |
+| Docker *(optional)* | ≥ 20.10 |
+| Git | ≥ 2.30 |
+### 6.2 Local Installation
+```bash
+# 1. Clone the repository
+git clone https://github.com/Vittal-Mukunda/OpenEnv-Hackathon-Meta-x-Scaler.git
+cd OpenEnv-Hackathon-Meta-x-Scaler
+# 2. Create and activate a virtual environment (recommended)
+python -m venv .venv
+source .venv/bin/activate        # Linux / macOS
+# .venv\Scripts\activate         # Windows
+# 3. Install dependencies
+pip install -r requirements.txt
+# 4. Launch the server
+uvicorn server:app --host 0.0.0.0 --port 7860
+# 5. Verify the server is running
+curl http://localhost:7860/health
+# Expected: {"status":"ok"}
+```
+### 6.3 Docker Deployment
+```bash
+# Build the image
+docker build -t scheduling-opt-env .
+# Run the container
+docker run -p 7860:7860 scheduling-opt-env
+# Verify
+curl http://localhost:7860/health
+```
+### 6.4 Hugging Face Spaces
+Push this repository to a Hugging Face Space configured with the **Docker** SDK. The server listens on port 7860, which Spaces exposes automatically. No additional configuration is required.
+### 6.5 Running the Baseline
+```bash
+# Without API key (uses oracle mock responses — scores 1.0 on all tasks)
+python baseline.py
+# With OpenAI API key (evaluates GPT-4o-mini)
+export OPENAI_API_KEY=sk-...
+python baseline.py
+```
+---
+## 7. Example Interaction
+```bash
+# 1. Health check
+curl http://localhost:7860/health
+# 2. Start a feasibility-check episode
+curl -X POST http://localhost:7860/reset \
+  -H "Content-Type: application/json" \
+  -d '{"task_id": "feasibility_check"}'
+# 3. Submit a feasibility answer
+curl -X POST http://localhost:7860/step \
+  -H "Content-Type: application/json" \
+  -d '{"response": "infeasible", "task_id": "feasibility_check"}'
+# 4. Start a conflict-classification episode
+curl -X POST http://localhost:7860/reset \
+  -H "Content-Type: application/json" \
+  -d '{"task_id": "conflict_classification"}'
+# 5. Classify the violation
+curl -X POST http://localhost:7860/step \
+  -H "Content-Type: application/json" \
+  -d '{"response": "resource_overload", "task_id": "conflict_classification"}'
+# 6. Start a schedule-repair episode
+curl -X POST http://localhost:7860/reset \
+  -H "Content-Type: application/json" \
+  -d '{"task_id": "schedule_repair"}'
+# 7. Submit a repaired schedule
+curl -X POST http://localhost:7860/step \
+  -H "Content-Type: application/json" \
+  -d '{
+    "response": "{\"assignments\": [{\"job_id\": \"J1\", \"machine_id\": \"M1\", \"start_time\": 0}]}",
+    "task_id": "schedule_repair"
+  }'
+# 8. Inspect environment state
+curl http://localhost:7860/state
+# 9. Invoke a grader directly
+curl -X POST http://localhost:7860/grader \
+  -H "Content-Type: application/json" \
+  -d '{
+    "action": {"response": "deadline_violation", "task_id": "conflict_classification"},
+    "ground_truth": {"violation_type": "deadline_violation"}
+  }'
+```
+---
+## 8. Project Structure
+```
+.
+├── openenv.yaml                  # OpenEnv metadata manifest
+├── models.py                     # Pydantic v2 data models (Observation, Action, Reward)
+├── environment.py                # SchedulingOptEnv core (reset / step / state + instance bank)
+├── server.py                     # FastAPI HTTP server (7 endpoints)
+├── baseline.py                   # GPT-4o-mini baseline with oracle fallback
+├── Dockerfile                    # Container definition (python:3.11-slim, port 7860)
+├── requirements.txt              # Python dependencies
+├── tasks/
+│   ├── __init__.py               # Task module exports
+│   ├── task1_easy.py             # Feasibility check — episode runner + instance accessor
+│   ├── task2_medium.py           # Conflict classification — episode runner + instance accessor
+│   └── task3_hard.py             # Schedule repair — episode runner + instance accessor
+└── graders/
+    ├── __init__.py               # Grader exports (FeasibilityGrader, ConflictGrader, RepairGrader)
+    ├── grader_detection.py       # Grader: feasibility (binary, synonym-aware)
+    ├── grader_classification.py  # Grader: conflict classification (family-aware partial credit)
+    └── grader_fix.py             # Grader: schedule repair (4-component additive reward)
+```
+---
+## 9. Dependencies
+| Package | Version | Purpose |
+|---------|---------|---------|
+| `fastapi` | ≥ 0.104 | HTTP server framework |
+| `uvicorn` | ≥ 0.24 | ASGI server |
+| `pydantic` | ≥ 2.5 | Data validation and serialisation |
+| `openai` | ≥ 1.6 | LLM baseline inference |
+| `pyyaml` | ≥ 6.0 | YAML manifest parsing |
+| `httpx` | ≥ 0.25 | Async HTTP client |
+---
+## 10. References
+[1] OpenEnv Framework. *Building Real-World AI Agent Training Environments*. Meta × Scaler Hackathon, 2026.
+[2] Pinedo, M. L. *Scheduling: Theory, Algorithms, and Systems* (5th ed.). Springer, 2016.
+[3] Garey, M. R., & Johnson, D. S. *Computers and Intractability: A Guide to the Theory of NP-Completeness*. W. H. Freeman, 1979.
+[4] Zhang, C. et al. *Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning*. NeurIPS 2020.
+[5] Kwon, Y.-D. et al. *POMO: Policy Optimization with Multiple Optima for Reinforcement Learning*. NeurIPS 2020.