Spaces:
Sleeping
Simplify openenv.yaml to match passing submission schema
Browse filesPhase 2 Task Validation keeps failing with "Not enough tasks with graders"
even though our /tasks endpoint, task_definitions.TASKS dict, and
openenv.yaml all declare 3 graded tasks. After comparing against the three
known-passing reference submissions (Calendar Scheduling, SQL Repair,
Warehouse Logistics), the most likely cause is that the Scaler grader
parses openenv.yaml with a strict schema and rejects extra fields like
has_grader: true and difficulty.
Rewrite openenv.yaml to match the Warehouse Logistics format exactly:
- Top level: name, version, description, entrypoint, type, spec_version
- type: http (not space), spec_version: "1.0" (string, not int 1)
- Tasks with only id, name, description (no has_grader, no difficulty)
- Single-line descriptions (no YAML folded blocks)
Warehouse passes Phase 2 with exactly this schema and only tasks declared
in yaml (no custom /tasks endpoint). Conforming to their pattern is the
safest bet for clearing the task count check.
- openenv.yaml +11 -30
|
@@ -1,35 +1,16 @@
|
|
| 1 |
-
spec_version: 1
|
| 2 |
name: dispatchpulse
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
# Graded tasks — each has a deterministic grader returning a score in [0.0, 1.0]
|
| 9 |
tasks:
|
| 10 |
- id: easy
|
| 11 |
-
name:
|
| 12 |
-
|
| 13 |
-
description: >
|
| 14 |
-
Routine urban shift. Five calls over 30 minutes, four units (ALS, BLS,
|
| 15 |
-
fire engine, police), one well-equipped hospital. Callers report
|
| 16 |
-
accurately. Optimal play scores ~0.85+.
|
| 17 |
-
has_grader: true
|
| 18 |
-
|
| 19 |
- id: medium
|
| 20 |
-
name:
|
| 21 |
-
|
| 22 |
-
description: >
|
| 23 |
-
Urban scenario. 15 calls in 45 minutes, 6 units, 2 hospitals. Mass
|
| 24 |
-
casualty bus accident at minute 12 and 20% caller inaccuracy.
|
| 25 |
-
Reasonable play scores ~0.55-0.70.
|
| 26 |
-
has_grader: true
|
| 27 |
-
|
| 28 |
- id: hard
|
| 29 |
-
name:
|
| 30 |
-
|
| 31 |
-
description: >
|
| 32 |
-
Earthquake response scenario. 30 calls in 60 minutes, 8 units,
|
| 33 |
-
3 hospitals (one on diversion). 35% caller misreporting due to panic.
|
| 34 |
-
Strong play scores ~0.40-0.55.
|
| 35 |
-
has_grader: true
|
|
|
|
|
|
|
| 1 |
name: dispatchpulse
|
| 2 |
+
version: 1.0.0
|
| 3 |
+
description: Emergency dispatch coordinator environment. The agent triages incoming 911 calls and dispatches limited units under time pressure. Rewards use real clinical survival curves from EMS literature.
|
| 4 |
+
entrypoint: server.app:app
|
| 5 |
+
type: http
|
| 6 |
+
spec_version: "1.0"
|
|
|
|
| 7 |
tasks:
|
| 8 |
- id: easy
|
| 9 |
+
name: "Routine Urban Shift"
|
| 10 |
+
description: "Five emergency calls arrive over 30 minutes. Four units (ALS, BLS, fire engine, police) and one well-equipped hospital. Accurate callers. Optimal play scores 0.85 or higher."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
- id: medium
|
| 12 |
+
name: "Urban Mass Casualty"
|
| 13 |
+
description: "Fifteen calls over 45 minutes including a bus accident at minute 12 that spawns multiple severity-1 trauma calls. Six units, two hospitals, 20% caller inaccuracy. Core challenge is ALS conservation."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
- id: hard
|
| 15 |
+
name: "Earthquake Response"
|
| 16 |
+
description: "Thirty calls over 60 minutes. Eight units, three hospitals (one on diversion). 35% caller misreporting due to panic. Deliberately resource-scarce; strong play scores 0.40 to 0.55."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|