Aldrimore commited on
Commit
f302e97
Β·
1 Parent(s): 7f61e7c

fix: add graders to openenv.yaml and clamp scores to (0, 1) exclusive

Browse files

Phase 2 validator requires each task to declare a grader in openenv.yaml
and each grader to produce scores strictly between 0.0 and 1.0.

- openenv.yaml: add grader: factory_env.grader:score_episode to all 3 tasks
- grader.py: clamp compute_score output to [0.001, 0.999] β€” easy task was
returning exactly 1.0 (perfect baseline), which the validator rejected
- README: update easy baseline score from 1.000 to ~0.999
- Remove stale root server.py (entry point is server/app.py per pyproject.toml)

Files changed (4) hide show
  1. README.md +2 -2
  2. factory_env/grader.py +6 -2
  3. openenv.yaml +3 -0
  4. server.py +0 -31
README.md CHANGED
@@ -51,7 +51,7 @@ An [OpenEnv](https://github.com/openenv/openenv)-compliant RL environment simula
51
 
52
  | Task | Machines | Jobs | Failure Rate | Max Steps | Baseline Score |
53
  |------|----------|------|-------------|-----------|----------------|
54
- | easy | 2 | 3 | 0% | 20 | 1.000 |
55
  | medium | 4 | 7 | 8% | 30 | ~0.557 |
56
  | hard | 6 | 12 | 15% | 40 | ~0.457 |
57
 
@@ -97,7 +97,7 @@ docker run -e OPENAI_API_KEY=<key> -e FACTORY_TASK=easy -p 7860:7860 factory-env
97
 
98
  | Task | Score | Steps |
99
  |------|-------|-------|
100
- | easy | 1.000 | 4 |
101
  | medium | ~0.529 | 12 |
102
  | hard | ~0.533 | 34 |
103
 
 
51
 
52
  | Task | Machines | Jobs | Failure Rate | Max Steps | Baseline Score |
53
  |------|----------|------|-------------|-----------|----------------|
54
+ | easy | 2 | 3 | 0% | 20 | ~0.999 |
55
  | medium | 4 | 7 | 8% | 30 | ~0.557 |
56
  | hard | 6 | 12 | 15% | 40 | ~0.457 |
57
 
 
97
 
98
  | Task | Score | Steps |
99
  |------|-------|-------|
100
+ | easy | ~0.999 | 4 |
101
  | medium | ~0.529 | 12 |
102
  | hard | ~0.533 | 34 |
103
 
factory_env/grader.py CHANGED
@@ -1,11 +1,15 @@
 
 
 
 
1
  def compute_score(completed, on_time, total_jobs, late_jobs, task="easy"):
2
  if total_jobs == 0:
3
- return 0.0
4
  completion_rate = completed / total_jobs
5
  on_time_rate = on_time / max(completed, 1)
6
  utilization_bonus = max(0.0, 1.0 - late_jobs / max(completed, 1))
7
  score = 0.5 * completion_rate + 0.3 * on_time_rate + 0.2 * utilization_bonus
8
- return round(max(0.0, min(1.0, score)), 4)
9
 
10
 
11
  def score_episode(env) -> float:
 
1
+ _SCORE_MIN = 0.001
2
+ _SCORE_MAX = 0.999
3
+
4
+
5
  def compute_score(completed, on_time, total_jobs, late_jobs, task="easy"):
6
  if total_jobs == 0:
7
+ return _SCORE_MIN
8
  completion_rate = completed / total_jobs
9
  on_time_rate = on_time / max(completed, 1)
10
  utilization_bonus = max(0.0, 1.0 - late_jobs / max(completed, 1))
11
  score = 0.5 * completion_rate + 0.3 * on_time_rate + 0.2 * utilization_bonus
12
+ return round(max(_SCORE_MIN, min(_SCORE_MAX, score)), 4)
13
 
14
 
15
  def score_episode(env) -> float:
openenv.yaml CHANGED
@@ -7,10 +7,13 @@ entry_point: factory_env.env:FactoryEnv
7
  tasks:
8
  - name: easy
9
  description: 2 machines, 3 jobs, no failures, 20 steps
 
10
  - name: medium
11
  description: 4 machines, 7 jobs, 8% failure rate, 30 steps
 
12
  - name: hard
13
  description: 6 machines, 12 jobs, 15% failure rate, 40 steps
 
14
 
15
  action_space:
16
  type: text
 
7
  tasks:
8
  - name: easy
9
  description: 2 machines, 3 jobs, no failures, 20 steps
10
+ grader: factory_env.grader:score_episode
11
  - name: medium
12
  description: 4 machines, 7 jobs, 8% failure rate, 30 steps
13
+ grader: factory_env.grader:score_episode
14
  - name: hard
15
  description: 6 machines, 12 jobs, 15% failure rate, 40 steps
16
+ grader: factory_env.grader:score_episode
17
 
18
  action_space:
19
  type: text
server.py DELETED
@@ -1,31 +0,0 @@
1
- """
2
- Smart Factory Scheduling β€” OpenEnv Server
3
- ==========================================
4
- Routes (HTTP + WebSocket):
5
- GET / β†’ Gradio UI (set ENABLE_WEB_INTERFACE=1) or redirect to /web
6
- GET /health β†’ {"status": "healthy"}
7
- POST /reset β†’ reset episode, returns observation
8
- POST /step β†’ execute action, returns observation + reward + done
9
- GET /state β†’ current environment state
10
- GET /schema β†’ action / observation JSON schemas
11
- WS /ws β†’ persistent WebSocket session (used by FactoryEnvClient)
12
-
13
- Set ENABLE_WEB_INTERFACE=1 to enable the built-in Gradio UI at /web.
14
- """
15
- import os
16
- from openenv.core import create_app
17
- from factory_env.env import FactoryEnv
18
- from factory_env.models import FactoryAction, FactoryObservation
19
-
20
- TASK = os.getenv("FACTORY_TASK", "easy")
21
-
22
- app = create_app(
23
- env=lambda: FactoryEnv(task=TASK, seed=42),
24
- action_cls=FactoryAction,
25
- observation_cls=FactoryObservation,
26
- env_name="factory_env",
27
- )
28
-
29
- if __name__ == "__main__":
30
- import uvicorn
31
- uvicorn.run(app, host="0.0.0.0", port=int(os.getenv("PORT", 7860)))