Cyber-Machine commited on
Commit
2dfa6e3
·
verified ·
1 Parent(s): fab9447

docs: add README.md

Browse files

fix: update color scheme in README.md

Files changed (1) hide show
  1. README.md +232 -0
README.md ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: WorkflowArena
3
+ emoji: 🏗️
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: docker
7
+ pinned: false
8
+ app_port: 8000
9
+ base_path: /web
10
+ tags:
11
+ - openenv
12
+ - workflow-orchestration
13
+ - reinforcement-learning
14
+ ---
15
+
16
+ # WorkflowArena
17
+
18
+ WorkflowArena is an OpenEnv benchmark for scheduling dependency-constrained work on limited workers.
19
+ Each episode is a seeded workflow DAG. The agent must decide when to dispatch ready tasks, when to wait,
20
+ and how to trade off deadline pressure, worker utilization, critical-path protection, and unfinished work.
21
+
22
+ ## Problem
23
+
24
+ This environment models a common orchestration problem:
25
+
26
+ - tasks have dependencies, so not everything can start immediately
27
+ - workers are limited, so not every ready task can run at once
28
+ - deadlines and priorities are uneven, so the obvious greedy move is not always best
29
+ - higher difficulties add time pressure and failure dynamics
30
+
31
+ The action space is intentionally small:
32
+
33
+ 1. `dispatch(task_ids=[...])`
34
+ 2. `wait()`
35
+
36
+ That keeps the challenge focused on decision quality rather than action syntax.
37
+
38
+ ## Episode Loop
39
+
40
+ 1. `reset()` generates a deterministic episode from `preset`, `seed`, and `worker_count`.
41
+ 2. The observation exposes ready, running, blocked, and completed tasks plus planner hints.
42
+ 3. The agent either dispatches a legal batch of ready tasks or waits for the next completion event.
43
+ 4. Time advances only on `wait()`.
44
+ 5. The episode ends when:
45
+ - all tasks complete, or
46
+ - the preset time budget is exhausted, or
47
+ - the safety step limit is hit
48
+
49
+ ## Difficulty Presets
50
+
51
+ ### `easy`
52
+
53
+ - smaller DAGs
54
+ - softer deadlines
55
+ - no fixed time budget
56
+ - no failure events
57
+
58
+ This is the baseline teaching mode. Good play mostly means keeping workers busy and avoiding obviously bad waits.
59
+
60
+ ### `medium`
61
+
62
+ - larger DAGs
63
+ - tighter deadlines
64
+ - fixed episode time budget
65
+ - terminal penalty for unfinished work
66
+
67
+ This is where the environment becomes a real tradeoff problem. The agent may not be able to finish everything,
68
+ so it must decide what is worth finishing before time runs out.
69
+
70
+ ### `hard`
71
+
72
+ - denser DAGs
73
+ - tighter deadlines
74
+ - tighter time budget than `medium`
75
+ - temporary worker outages
76
+ - task retry failures
77
+
78
+ In hard mode, usable capacity can shrink temporarily and a task may fail at completion and return to the ready queue.
79
+
80
+ ## Rewards
81
+
82
+ WorkflowArena uses shaped rewards so local decisions have immediate feedback, while terminal scoring still matters.
83
+
84
+ ### Per-step reward channels
85
+
86
+ The observation exposes `last_reward_breakdown` with these channels:
87
+
88
+ - `completion_reward`: reward for tasks that finished on the latest `wait()`
89
+ - `utilization_reward`: reward for keeping workers occupied
90
+ - `deadline_reward`: positive for on-time completion, negative for lateness
91
+ - `criticality_reward`: reward for progress on high-impact work
92
+ - `idle_penalty`: penalty for avoidable waiting or leaving useful capacity idle
93
+ - `invalid_action_penalty`: penalty for malformed or infeasible actions
94
+ - `terminal_makespan_score`: terminal efficiency score at episode end
95
+ - `unfinished_task_penalty`: terminal penalty for incomplete work when the episode ends before all tasks finish
96
+
97
+ ### Reward design intent
98
+
99
+ The reward is set up to encourage:
100
+
101
+ - filling worker capacity when good work is available
102
+ - respecting deadlines
103
+ - protecting high-priority and critical-path tasks
104
+ - avoiding pointless waits
105
+ - finishing as much important work as possible before the time budget expires
106
+
107
+ The terminal score is bounded and deterministic. Higher values correspond to stronger schedules.
108
+
109
+ ## Failures and Constraints
110
+
111
+ The environment keeps the action space fixed, but higher presets change the transition dynamics.
112
+
113
+ ### Capacity constraint
114
+
115
+ - `dispatch(task_ids=[...])` cannot exceed current free capacity
116
+ - only tasks in `ready_tasks` are legal to dispatch
117
+
118
+ ### Hard-mode worker outages
119
+
120
+ - a temporary outage can reduce usable workers
121
+ - `total_workers` stays constant
122
+ - `effective_workers` reflects usable workers after degradation
123
+ - `free_workers` is computed from `effective_workers`, not from the original total
124
+
125
+ ### Hard-mode retry failures
126
+
127
+ - a running task may fail at completion
128
+ - it consumes time but does not complete
129
+ - it returns to `ready_tasks`
130
+ - `attempt_count` shows how many retry failures that task has already consumed
131
+
132
+ ## Observation Contract
133
+
134
+ The main observation type is [`WorkflowArenaObservation`](workflow_arena/models.py).
135
+ Important fields include:
136
+
137
+ - `current_time`
138
+ - `total_workers`
139
+ - `effective_workers`
140
+ - `degraded_workers`
141
+ - `free_workers`
142
+ - `time_budget`
143
+ - `time_remaining`
144
+ - `progress`
145
+ - `ready_tasks`
146
+ - `running_tasks`
147
+ - `completed_tasks`
148
+ - `blocked_tasks`
149
+ - `recent_failure_events`
150
+ - `last_reward_breakdown`
151
+ - `success_metrics`
152
+ - `validation_error`
153
+
154
+ Each task view includes:
155
+
156
+ - `task_id`
157
+ - `duration`
158
+ - `priority`
159
+ - `deadline`
160
+ - `criticality`
161
+ - `slack`
162
+ - `downstream_count`
163
+ - `dependencies`
164
+ - `attempt_count`
165
+
166
+ ## Expected Agent Output
167
+
168
+ Agents are expected to return compact JSON actions in one of these exact forms:
169
+
170
+ ```json
171
+ { "action_type": "wait", "task_ids": [] }
172
+ ```
173
+
174
+ ```json
175
+ { "action_type": "dispatch", "task_ids": ["task_01", "task_02"] }
176
+ ```
177
+
178
+ Rules:
179
+
180
+ - dispatch only task ids that appear in `ready_tasks`
181
+ - do not exceed `free_workers`
182
+ - do not send duplicate ids
183
+ - `wait()` must use an empty `task_ids` list
184
+
185
+ ## Success Metrics
186
+
187
+ The environment reports schedule quality through `success_metrics`:
188
+
189
+ - `makespan`
190
+ - `worker_utilization`
191
+ - `deadline_miss_count`
192
+ - `unfinished_task_count`
193
+ - `weighted_priority_completion`
194
+ - `benchmark_score`
195
+
196
+ Interpretation:
197
+
198
+ - higher `benchmark_score` is better
199
+ - lower `deadline_miss_count` is better
200
+ - lower `unfinished_task_count` is better
201
+ - `makespan` is only populated when everything completed
202
+
203
+ ## Expected Outputs for Evaluation
204
+
205
+ For benchmark use, an agent should produce:
206
+
207
+ 1. a legal JSON action at every step
208
+ 2. a full episode rollout until termination
209
+ 3. a final observation containing the terminal score and success metrics
210
+
211
+ Typical downstream evaluation reads:
212
+
213
+ - cumulative reward
214
+ - final `benchmark_score`
215
+ - whether the agent completed all tasks
216
+ - how many deadlines were missed
217
+ - how much important work remained unfinished
218
+
219
+ ## Local Development
220
+
221
+ Validate the environment:
222
+
223
+ ```bash
224
+ .venv/bin/python -m openenv.cli.__main__ validate workflow_arena
225
+ ```
226
+
227
+ Run the server locally:
228
+
229
+ ```bash
230
+ cd workflow_arena
231
+ uv run --project . server
232
+ ```