Arun-Sanjay commited on
Commit
d0bfa46
·
1 Parent(s): 431e294

Simplify openenv.yaml to match passing submission schema

Browse files

Phase 2 Task Validation keeps failing with "Not enough tasks with graders"
even though our /tasks endpoint, task_definitions.TASKS dict, and
openenv.yaml all declare 3 graded tasks. After comparing against the three
known-passing reference submissions (Calendar Scheduling, SQL Repair,
Warehouse Logistics), the most likely cause is that the Scaler grader
parses openenv.yaml with a strict schema and rejects extra fields like
has_grader: true and difficulty.

Rewrite openenv.yaml to match the Warehouse Logistics format exactly:
- Top level: name, version, description, entrypoint, type, spec_version
- type: http (not space), spec_version: "1.0" (string, not int 1)
- Tasks with only id, name, description (no has_grader, no difficulty)
- Single-line descriptions (no YAML folded blocks)

Warehouse passes Phase 2 with exactly this schema and only tasks declared
in yaml (no custom /tasks endpoint). Conforming to their pattern is the
safest bet for clearing the task count check.

Files changed (1) hide show
  1. openenv.yaml +11 -30
openenv.yaml CHANGED
@@ -1,35 +1,16 @@
1
- spec_version: 1
2
  name: dispatchpulse
3
- type: space
4
- runtime: fastapi
5
- app: server.app:app
6
- port: 8000
7
-
8
- # Graded tasks — each has a deterministic grader returning a score in [0.0, 1.0]
9
  tasks:
10
  - id: easy
11
- name: easy
12
- difficulty: easy
13
- description: >
14
- Routine urban shift. Five calls over 30 minutes, four units (ALS, BLS,
15
- fire engine, police), one well-equipped hospital. Callers report
16
- accurately. Optimal play scores ~0.85+.
17
- has_grader: true
18
-
19
  - id: medium
20
- name: medium
21
- difficulty: medium
22
- description: >
23
- Urban scenario. 15 calls in 45 minutes, 6 units, 2 hospitals. Mass
24
- casualty bus accident at minute 12 and 20% caller inaccuracy.
25
- Reasonable play scores ~0.55-0.70.
26
- has_grader: true
27
-
28
  - id: hard
29
- name: hard
30
- difficulty: hard
31
- description: >
32
- Earthquake response scenario. 30 calls in 60 minutes, 8 units,
33
- 3 hospitals (one on diversion). 35% caller misreporting due to panic.
34
- Strong play scores ~0.40-0.55.
35
- has_grader: true
 
 
1
  name: dispatchpulse
2
+ version: 1.0.0
3
+ description: Emergency dispatch coordinator environment. The agent triages incoming 911 calls and dispatches limited units under time pressure. Rewards use real clinical survival curves from EMS literature.
4
+ entrypoint: server.app:app
5
+ type: http
6
+ spec_version: "1.0"
 
7
  tasks:
8
  - id: easy
9
+ name: "Routine Urban Shift"
10
+ description: "Five emergency calls arrive over 30 minutes. Four units (ALS, BLS, fire engine, police) and one well-equipped hospital. Accurate callers. Optimal play scores 0.85 or higher."
 
 
 
 
 
 
11
  - id: medium
12
+ name: "Urban Mass Casualty"
13
+ description: "Fifteen calls over 45 minutes including a bus accident at minute 12 that spawns multiple severity-1 trauma calls. Six units, two hospitals, 20% caller inaccuracy. Core challenge is ALS conservation."
 
 
 
 
 
 
14
  - id: hard
15
+ name: "Earthquake Response"
16
+ description: "Thirty calls over 60 minutes. Eight units, three hospitals (one on diversion). 35% caller misreporting due to panic. Deliberately resource-scarce; strong play scores 0.40 to 0.55."