pandelis commited on
Commit
bb1b296
·
verified ·
1 Parent(s): a699987

Add Zerolang editing environment

Browse files
.gitignore ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ .venv/
2
+ .zero/
3
+ __pycache__/
4
+ *.py[cod]
5
+ dist/
6
+ outputs/
7
+ .pytest_cache/
8
+ .ruff_cache/
README.md ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - zerolang
4
+ - reinforcement-learning
5
+ - verifiers
6
+ - code-editing
7
+ - tool-use
8
+ - graph-editing
9
+ - laguna-xs2
10
+ license: apache-2.0
11
+ ---
12
+
13
+ # Zerolang Editing
14
+
15
+ `zerolang-editing` is a Verifiers/Prime RL environment for training coding agents
16
+ to edit [Zerolang](https://github.com/vercel-labs/zerolang) programs through
17
+ checked graph edits instead of loose text replacement.
18
+
19
+ The core task is intentionally narrow: each rollout starts with a `.0` source
20
+ file already written to disk, asks the model for a semantic code edit, and
21
+ scores the edited file after the model uses Zerolang tooling. The intended
22
+ successful behavior is:
23
+
24
+ 1. Inspect the file with Zerolang graph/check tools.
25
+ 2. Identify the relevant graph hash and semantic node.
26
+ 3. Apply a checked `zero graph patch` operation to the on-disk file.
27
+ 4. Finish with a compact JSON response pointing at the edited path.
28
+
29
+ This repository contains the environment source package, synthetic task
30
+ builders, tool wrappers, and documentation. The trained checkpoint from hosted
31
+ RL runs is published separately by the training service when a run is finalized.
32
+
33
+ ## Why This Exists
34
+
35
+ Most code-editing agents learn to patch source through line-oriented text
36
+ operations. Zerolang exposes a graph-level editing surface where a patch is
37
+ guarded by the expected graph hash and the expected field value. That makes
38
+ edits auditable and harder to apply to stale or mismatched code.
39
+
40
+ This environment is designed to train that behavior directly. It rewards
41
+ successful checked graph patches, while still checking that the resulting file
42
+ compiles and matches the hidden target source.
43
+
44
+ ## Environment Summary
45
+
46
+ - **Package name:** `zerolang-editing`
47
+ - **Prime environment ID:** `pandelis/zerolang-editing`
48
+ - **Version in this repo:** `0.1.8`
49
+ - **Task type:** multi-turn tool-use code editing
50
+ - **Language under edit:** Zerolang `.0`
51
+ - **Train split:** 209 deterministic synthetic tasks
52
+ - **Eval split:** 67 held-out deterministic synthetic tasks
53
+ - **Primary reward target:** successful `zero_graph_patch` on the rollout file
54
+
55
+ ## Rollout Contract
56
+
57
+ Each task row includes an initial Zerolang source program and a hidden target
58
+ program. At rollout setup time, the environment writes the initial source to:
59
+
60
+ ```text
61
+ <temporary rollout workspace>/program.0
62
+ ```
63
+
64
+ The model receives that path in the user prompt. Tools must operate on `path`
65
+ arguments that point to this `.0` file. Pasting the full source into tool calls
66
+ is rejected because the training target is disk-backed graph editing, not
67
+ source-string rewriting.
68
+
69
+ The environment canonicalizes recoverable path mistakes, such as missing paths
70
+ or paths outside the rollout workspace, back to the rollout file and records
71
+ those corrections. The `path_argument_valid` metric rewards clean tool calls
72
+ that did not require correction.
73
+
74
+ ## Tools
75
+
76
+ The environment exposes only Zerolang-specific tools:
77
+
78
+ | Tool | Purpose |
79
+ | --- | --- |
80
+ | `zero_check(path)` | Run `zero check --json` against a `.0` file. |
81
+ | `zero_graph_summary(path)` | Return compact graph hash and patchable node facts. |
82
+ | `zero_graph_dump(path)` | Run `zero graph dump` for detailed graph inspection. |
83
+ | `zero_graph_json(path)` | Run `zero graph --json`. |
84
+ | `zero_fix_plan(path)` | Run `zero fix --plan --json`. |
85
+ | `zero_graph_patch(path, expect_graph_hash, op)` | Apply one checked graph patch operation to the file. |
86
+ | `zero_skills_get(skill)` | Load version-matched Zerolang guidance such as `language`, `diagnostics`, or `stdlib`. |
87
+
88
+ Example checked patch shape:
89
+
90
+ ```bash
91
+ zero graph patch program.0 \
92
+ --expect-graph-hash graph:49dd208f8361c221 \
93
+ --op 'set node="#78ac4364" field="value" expect="66" value="65"'
94
+ ```
95
+
96
+ ## Reward Metrics
97
+
98
+ The main rubric is weighted toward actually patching the graph and producing
99
+ the hidden target program.
100
+
101
+ | Metric | Weight | Meaning |
102
+ | --- | ---: | --- |
103
+ | `graph_patch_success` | 0.50 | A successful `zero_graph_patch` call edited the file to the hidden target. |
104
+ | `target_source_match` | 0.20 | The final on-disk source matches the target after whitespace normalization. |
105
+ | `zero_check_pass` | 0.15 | The edited file passes `zero check --json`. |
106
+ | `zerolang_surface_used` | 0.10 | The rollout used graph hashes, node IDs, `expect`, or graph-patch semantics. |
107
+ | `path_argument_valid` | 0.05 | Tool calls used the rollout `.0` path without harness-side correction. |
108
+
109
+ The reward is intentionally not fully binary. A model can get partial credit for
110
+ producing compilable code and using the right interface, but the highest reward
111
+ requires the checked graph patch to land correctly.
112
+
113
+ ## Dataset Construction
114
+
115
+ The synthetic tasks are generated from canonical Zerolang snippets:
116
+
117
+ 1. Build an initial `.0` program.
118
+ 2. Select a patchable semantic node, usually a literal, function value, call
119
+ target, or printed diagnostic string.
120
+ 3. Mutate the semantic value to produce the target program.
121
+ 4. Store the target source and task metadata.
122
+ 5. During rollout, require the model to recover the target through graph tools.
123
+
124
+ The environment currently focuses on deterministic editing families where
125
+ `zero graph patch` support is reliable. The task builders live in:
126
+
127
+ - `zerolang_editing/tasks.py`
128
+ - `zerolang_editing/train_tasks.py`
129
+ - `zerolang_editing/task_builders.py`
130
+
131
+ ## Installation
132
+
133
+ Install from Prime Hub:
134
+
135
+ ```bash
136
+ prime env install pandelis/zerolang-editing@0.1.8
137
+ ```
138
+
139
+ Install from this repository:
140
+
141
+ ```bash
142
+ uv sync
143
+ uv run python -m compileall zerolang_editing
144
+ ```
145
+
146
+ Zerolang is required at runtime. If `zero` is not already on `PATH`, the tool
147
+ wrapper checks `$HOME/.zero/bin/zero` and can download a release binary into a
148
+ temporary install directory.
149
+
150
+ ## Local Eval
151
+
152
+ ```bash
153
+ prime eval run ./environments/zerolang_editing \
154
+ -m poolside/laguna-xs.2 \
155
+ -n 3 -r 1 -t 2048 -T 0.4 \
156
+ -a '{"split":"eval","max_turns":10}' \
157
+ -s -d -A
158
+ ```
159
+
160
+ For quick package-level validation:
161
+
162
+ ```bash
163
+ cd environments/zerolang_editing
164
+ uv run python -m compileall zerolang_editing
165
+ uv run python - <<'PY'
166
+ from zerolang_editing.zerolang_editing import load_environment
167
+ env = load_environment(split="eval", max_examples=1, max_turns=2)
168
+ print(type(env).__name__, len(env.dataset))
169
+ PY
170
+ ```
171
+
172
+ ## Hosted RL Configuration
173
+
174
+ The overnight Laguna XS.2 run uses:
175
+
176
+ ```toml
177
+ model = "poolside/Laguna-XS.2"
178
+ max_steps = 200
179
+ batch_size = 64
180
+ rollouts_per_example = 8
181
+ learning_rate = 1e-4
182
+
183
+ [sampling]
184
+ max_tokens = 2048
185
+ temperature = 0.4
186
+ enable_thinking = true
187
+ ```
188
+
189
+ The config is stored in:
190
+
191
+ ```text
192
+ configs/rl/zerolang-editing-laguna-xs2-overnight.toml
193
+ ```
194
+
195
+ ## Previous Training Signal
196
+
197
+ A 20-step stress run on `poolside/Laguna-XS.2` completed successfully before
198
+ the overnight scale-up:
199
+
200
+ - Baseline eval Avg@1: `0.1500`
201
+ - Step 15 eval Avg@1: `0.2357`
202
+ - Final eval Avg@1: `0.2250`
203
+ - First 10 train-step reward average: `0.1606`
204
+ - Last 10 train-step reward average: `0.2056`
205
+ - No fatal orchestrator errors, no eval truncation, no no-response.
206
+
207
+ The main failure signatures were invalid tool paths: missing `path` arguments
208
+ and paths outside the rollout workspace. Version `0.1.8` keeps the path sandbox
209
+ but converts recoverable path mistakes into canonicalized calls against the
210
+ rollout file and adds a small clean-path reward term.
211
+
212
+ ## Repository Contents
213
+
214
+ ```text
215
+ README.md
216
+ pyproject.toml
217
+ uv.lock
218
+ configs/
219
+ rl/
220
+ zerolang-editing-laguna-xs2-20step.toml
221
+ zerolang-editing-laguna-xs2-overnight.toml
222
+ zerolang_editing/
223
+ __init__.py
224
+ task_builders.py
225
+ tasks.py
226
+ train_tasks.py
227
+ zero_tools.py
228
+ zerolang_editing.py
229
+ ```
230
+
231
+ Build artifacts, local virtualenvs, Zerolang caches, rollout outputs, and
232
+ compiled Python caches are intentionally excluded from the Hugging Face repo.
233
+
234
+ ## Limitations
235
+
236
+ - The task distribution is synthetic and should be expanded before treating the
237
+ trained behavior as general Zerolang editing competence.
238
+ - Current graph-edit families focus on reliable literal/value style patches.
239
+ - The environment is designed for RL tool-use behavior, not as a standalone
240
+ benchmark of general coding ability.
241
+ - This repo contains the environment source, not final model weights.
configs/rl/zerolang-editing-laguna-xs2-20step.toml ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Conservative scale-up from zerolang-editing-stress.toml.
2
+
3
+ model = "poolside/Laguna-XS.2"
4
+ max_steps = 20
5
+ batch_size = 32
6
+ rollouts_per_example = 4
7
+ learning_rate = 1e-4
8
+
9
+ [sampling]
10
+ max_tokens = 2048
11
+ temperature = 0.4
12
+ enable_thinking = true
13
+
14
+ [[env]]
15
+ id = "pandelis/zerolang-editing"
16
+ version = "0.1.8"
17
+
18
+ [env.args]
19
+ split = "train"
20
+ max_turns = 10
21
+
22
+ [eval]
23
+ interval = 5
24
+ num_examples = 8
25
+ rollouts_per_example = 1
26
+ eval_base_model = true
27
+
28
+ [[eval.env]]
29
+ id = "pandelis/zerolang-editing"
30
+ version = "0.1.8"
31
+
32
+ [eval.env.args]
33
+ split = "eval"
34
+ max_turns = 10
configs/rl/zerolang-editing-laguna-xs2-overnight.toml ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Overnight scale-up from zerolang-editing-laguna-xs2-20step.toml.
2
+ # Previous 20-step run was stable and improved held-out Avg@1 from 0.1500 to 0.2250.
3
+
4
+ model = "poolside/Laguna-XS.2"
5
+ max_steps = 200
6
+ batch_size = 64
7
+ rollouts_per_example = 8
8
+ learning_rate = 1e-4
9
+
10
+ [sampling]
11
+ max_tokens = 2048
12
+ temperature = 0.4
13
+ enable_thinking = true
14
+
15
+ [[env]]
16
+ id = "pandelis/zerolang-editing"
17
+ version = "0.1.8"
18
+
19
+ [env.args]
20
+ split = "train"
21
+ max_turns = 10
22
+
23
+ [eval]
24
+ interval = 10
25
+ num_examples = 16
26
+ rollouts_per_example = 1
27
+ eval_base_model = true
28
+
29
+ [[eval.env]]
30
+ id = "pandelis/zerolang-editing"
31
+ version = "0.1.8"
32
+
33
+ [eval.env.args]
34
+ split = "eval"
35
+ max_turns = 10
pyproject.toml ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "zerolang-editing"
3
+ description = "Tool-backed Zerolang editing tasks for graph-first code repair and refactoring"
4
+ tags = ["zerolang", "graph-editing", "code-editing", "train", "eval"]
5
+ version = "0.1.8"
6
+ requires-python = ">=3.10"
7
+ dependencies = [
8
+ "datasets>=2.19.0",
9
+ "verifiers>=0.1.14",
10
+ ]
11
+
12
+ [build-system]
13
+ requires = ["hatchling"]
14
+ build-backend = "hatchling.build"
15
+
16
+ [tool.hatch.build]
17
+ include = ["zerolang_editing", "pyproject.toml"]
18
+
19
+ [tool.verifiers.eval]
20
+ num_examples = 3
21
+ rollouts_per_example = 1
22
+ max_turns = 6
uv.lock ADDED
The diff for this file is too large to render. See raw diff
 
zerolang_editing/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ from .zerolang_editing import load_environment
2
+
3
+ __all__ = ["load_environment"]
zerolang_editing/task_builders.py ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Task construction helpers for synthetic Zerolang editing rows."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any
6
+
7
+
8
+ def _source(text: str) -> str:
9
+ return text.strip() + "\n"
10
+
11
+
12
+ def _write_program(message: str, *, raises: bool = True) -> str:
13
+ raises_suffix = " raises" if raises else ""
14
+ return _source(
15
+ f"""
16
+ pub fn main(world: World) -> Void{raises_suffix} {{
17
+ check world.out.write("{message}\\n")
18
+ }}
19
+ """
20
+ )
21
+
22
+
23
+ def _literal_task(
24
+ task_id: str, old: str, new: str, goal: str | None = None, *, split: str = "eval"
25
+ ) -> dict[str, Any]:
26
+ return {
27
+ "id": task_id,
28
+ "split": split,
29
+ "category": "graph_patch_literal",
30
+ "goal": goal or f'Replace the string literal "{old}\\n" with "{new}\\n".',
31
+ "source": _write_program(old),
32
+ "target_source": _write_program(new),
33
+ }
34
+
35
+
36
+ def _branch_literal_task(
37
+ task_id: str, helper: str, old: str, new: str, *, split: str = "eval"
38
+ ) -> dict[str, Any]:
39
+ return {
40
+ "id": task_id,
41
+ "split": split,
42
+ "category": "graph_patch_literal",
43
+ "goal": (
44
+ "Keep the helper-controlled branch intact and update only the string "
45
+ f'literal from "{old}\\n" to "{new}\\n".'
46
+ ),
47
+ "source": _source(
48
+ f"""
49
+ fn {helper}() -> i32 {{
50
+ return 1
51
+ }}
52
+
53
+ pub fn main(world: World) -> Void raises {{
54
+ if {helper}() == 1 {{
55
+ check world.out.write("{old}\\n")
56
+ }}
57
+ }}
58
+ """
59
+ ),
60
+ "target_source": _source(
61
+ f"""
62
+ fn {helper}() -> i32 {{
63
+ return 1
64
+ }}
65
+
66
+ pub fn main(world: World) -> Void raises {{
67
+ if {helper}() == 1 {{
68
+ check world.out.write("{new}\\n")
69
+ }}
70
+ }}
71
+ """
72
+ ),
73
+ }
74
+
75
+
76
+ def _helper_task(
77
+ task_id: str,
78
+ helper: str,
79
+ source_expr: str,
80
+ target_expr: str,
81
+ expected: int,
82
+ output: str,
83
+ *,
84
+ split: str = "eval",
85
+ ) -> dict[str, Any]:
86
+ return {
87
+ "id": task_id,
88
+ "split": split,
89
+ "category": "semantic_update",
90
+ "goal": (
91
+ f"Update {helper}() so it returns {expected} and the existing main "
92
+ f"branch prints {output}."
93
+ ),
94
+ "source": _source(
95
+ f"""
96
+ fn {helper}() -> i32 {{
97
+ return {source_expr}
98
+ }}
99
+
100
+ pub fn main(world: World) -> Void raises {{
101
+ if {helper}() == {expected} {{
102
+ check world.out.write("{output}\\n")
103
+ }}
104
+ }}
105
+ """
106
+ ),
107
+ "target_source": _source(
108
+ f"""
109
+ fn {helper}() -> i32 {{
110
+ return {target_expr}
111
+ }}
112
+
113
+ pub fn main(world: World) -> Void raises {{
114
+ if {helper}() == {expected} {{
115
+ check world.out.write("{output}\\n")
116
+ }}
117
+ }}
118
+ """
119
+ ),
120
+ }
121
+
122
+
123
+ def _two_helper_task(
124
+ task_id: str,
125
+ helper: str,
126
+ other: str,
127
+ source_expr: str,
128
+ target_expr: str,
129
+ other_expr: str,
130
+ expected: int,
131
+ *,
132
+ split: str = "eval",
133
+ ) -> dict[str, Any]:
134
+ return {
135
+ "id": task_id,
136
+ "split": split,
137
+ "category": "semantic_update",
138
+ "goal": (
139
+ f"Update only {helper}() so main writes ok when the comparison succeeds; "
140
+ f"leave {other}() unchanged."
141
+ ),
142
+ "source": _source(
143
+ f"""
144
+ fn {helper}() -> i32 {{
145
+ return {source_expr}
146
+ }}
147
+
148
+ fn {other}() -> i32 {{
149
+ return {other_expr}
150
+ }}
151
+
152
+ pub fn main(world: World) -> Void raises {{
153
+ if {helper}() == {expected} {{
154
+ check world.out.write("ok\\n")
155
+ }}
156
+ }}
157
+ """
158
+ ),
159
+ "target_source": _source(
160
+ f"""
161
+ fn {helper}() -> i32 {{
162
+ return {target_expr}
163
+ }}
164
+
165
+ fn {other}() -> i32 {{
166
+ return {other_expr}
167
+ }}
168
+
169
+ pub fn main(world: World) -> Void raises {{
170
+ if {helper}() == {expected} {{
171
+ check world.out.write("ok\\n")
172
+ }}
173
+ }}
174
+ """
175
+ ),
176
+ }
177
+
178
+
179
+ def _call_task(
180
+ task_id: str, source_args: str, target_args: str, expected: int, *, split: str = "eval"
181
+ ) -> dict[str, Any]:
182
+ return {
183
+ "id": task_id,
184
+ "split": split,
185
+ "category": "call_update",
186
+ "goal": "Keep add unchanged, but edit one call argument so the comparison is true.",
187
+ "source": _source(
188
+ f"""
189
+ fn add(a: i32, b: i32) -> i32 {{
190
+ return a + b
191
+ }}
192
+
193
+ pub fn main(world: World) -> Void raises {{
194
+ if add({source_args}) == {expected} {{
195
+ check world.out.write("ok\\n")
196
+ }}
197
+ }}
198
+ """
199
+ ),
200
+ "target_source": _source(
201
+ f"""
202
+ fn add(a: i32, b: i32) -> i32 {{
203
+ return a + b
204
+ }}
205
+
206
+ pub fn main(world: World) -> Void raises {{
207
+ if add({target_args}) == {expected} {{
208
+ check world.out.write("ok\\n")
209
+ }}
210
+ }}
211
+ """
212
+ ),
213
+ }
214
+
215
+
216
+ def _condition_task(
217
+ task_id: str,
218
+ helper: str,
219
+ returned: int,
220
+ source_compare: int,
221
+ output: str,
222
+ *,
223
+ split: str = "eval",
224
+ ) -> dict[str, Any]:
225
+ return {
226
+ "id": task_id,
227
+ "split": split,
228
+ "category": "condition_update",
229
+ "goal": (
230
+ "Edit the comparison literal so the branch is true without changing "
231
+ f"{helper}() or the output string."
232
+ ),
233
+ "source": _source(
234
+ f"""
235
+ fn {helper}() -> i32 {{
236
+ return {returned}
237
+ }}
238
+
239
+ pub fn main(world: World) -> Void raises {{
240
+ if {helper}() == {source_compare} {{
241
+ check world.out.write("{output}\\n")
242
+ }}
243
+ }}
244
+ """
245
+ ),
246
+ "target_source": _source(
247
+ f"""
248
+ fn {helper}() -> i32 {{
249
+ return {returned}
250
+ }}
251
+
252
+ pub fn main(world: World) -> Void raises {{
253
+ if {helper}() == {returned} {{
254
+ check world.out.write("{output}\\n")
255
+ }}
256
+ }}
257
+ """
258
+ ),
259
+ }
260
+
261
+
262
+ def _diagnostic_task(task_id: str, message: str, *, split: str = "eval") -> dict[str, Any]:
263
+ return {
264
+ "id": task_id,
265
+ "split": split,
266
+ "category": "diagnostic_repair",
267
+ "goal": "Repair the main signature so the existing world.out.write check is valid.",
268
+ "source": _write_program(message, raises=False),
269
+ "target_source": _write_program(message, raises=True),
270
+ }
zerolang_editing/tasks.py ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Synthetic task corpus for the Zerolang editing environment."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from .task_builders import (
6
+ _branch_literal_task,
7
+ _call_task,
8
+ _condition_task,
9
+ _diagnostic_task,
10
+ _helper_task,
11
+ _literal_task,
12
+ _two_helper_task,
13
+ )
14
+ from .train_tasks import TRAIN_TASKS
15
+
16
+
17
+ EVAL_TASKS: list[dict[str, Any]] = [
18
+ _literal_task(
19
+ "literal-string-graph-patch",
20
+ "hello from zero",
21
+ "hello graph",
22
+ "Change the printed string from hello from zero to hello graph.",
23
+ ),
24
+ _literal_task(
25
+ "repair-unknown-message",
26
+ "draft",
27
+ "fixed by zero",
28
+ 'Replace the string literal "draft\\n" with "fixed by zero\\n".',
29
+ ),
30
+ _literal_task(
31
+ "literal-status-ready",
32
+ "status: draft",
33
+ "status: ready",
34
+ 'Replace the string literal "status: draft\\n" with "status: ready\\n".',
35
+ ),
36
+ _literal_task(
37
+ "literal-counter-pass",
38
+ "counter failed",
39
+ "counter passed",
40
+ 'Replace the string literal "counter failed\\n" with "counter passed\\n".',
41
+ ),
42
+ _literal_task(
43
+ "literal-agent-graph",
44
+ "agent used text",
45
+ "agent used graph",
46
+ 'Replace the string literal "agent used text\\n" with "agent used graph\\n".',
47
+ ),
48
+ *[
49
+ _literal_task(task_id, old, new)
50
+ for task_id, old, new in [
51
+ ("literal-alpha-beta", "alpha", "beta"),
52
+ ("literal-start-finish", "start", "finish"),
53
+ ("literal-left-right", "left", "right"),
54
+ ("literal-plan-done", "plan pending", "plan done"),
55
+ ("literal-state-green", "state: red", "state: green"),
56
+ ("literal-cache-hot", "cache cold", "cache hot"),
57
+ ]
58
+ ],
59
+ *[
60
+ _literal_task(task_id, old, new, goal)
61
+ for task_id, old, new, goal in [
62
+ (
63
+ "literal-colon-version",
64
+ "status: init [v1]",
65
+ "status: init [v2]",
66
+ "Update the status bracket code from [v1] to [v2].",
67
+ ),
68
+ (
69
+ "literal-api-version",
70
+ "load path /api/v1/health",
71
+ "load path /api/v2/health",
72
+ "Switch the printed endpoint from v1 to v2 while keeping the same path.",
73
+ ),
74
+ (
75
+ "literal-score-number",
76
+ "score: 42/100",
77
+ "score: 99/100",
78
+ "Change the score text from 42 to 99.",
79
+ ),
80
+ (
81
+ "literal-status-code",
82
+ "error: [404] failed",
83
+ "error: [200] resolved",
84
+ "Edit the status code label from 404 to 200 in brackets.",
85
+ ),
86
+ (
87
+ "literal-progress-percent",
88
+ "progress: 50% complete",
89
+ "progress: 75% complete",
90
+ "Update the progress percentage from 50 to 75.",
91
+ ),
92
+ (
93
+ "literal-time-stamp",
94
+ "time stamp 12:34",
95
+ "time stamp 13:00",
96
+ "Change the time from 12:34 to 13:00 in the output string.",
97
+ ),
98
+ (
99
+ "literal-list-separator",
100
+ "list [a/b/c]",
101
+ "list [a-b-c]",
102
+ "Adjust the list label to use dashes instead of slashes.",
103
+ ),
104
+ (
105
+ "literal-coordinate-label",
106
+ "coords (x:1,y:2)",
107
+ "coords (x:3,y:4)",
108
+ "Update the coordinate label from (x:1,y:2) to (x:3,y:4).",
109
+ ),
110
+ ]
111
+ ],
112
+ *[
113
+ _branch_literal_task(task_id, helper, old, new)
114
+ for task_id, helper, old, new in [
115
+ ("branch-literal-ready-version", "ready", "ready v1", "ready v2"),
116
+ ("branch-literal-mode-active", "can_send", "mode: standby", "mode: active"),
117
+ ("branch-literal-status-ok", "enabled", "status: ok [404]", "status: ok [200]"),
118
+ ("branch-literal-step-count", "feature_flag", "steps: 1/3 complete", "steps: 2/3 complete"),
119
+ ("branch-literal-coordinate", "should_emit", "coords (x:1,y:1)", "coords (x:2,y:2)"),
120
+ ("branch-literal-check-pass", "allow_output", "check: fail", "check: pass"),
121
+ ("branch-literal-health", "should_log", "health: warn", "health: ok"),
122
+ ("branch-literal-phase", "gate_open", "phase: draft", "phase: final"),
123
+ ]
124
+ ],
125
+ *[
126
+ _helper_task(task_id, helper, source_expr, target_expr, expected, output)
127
+ for task_id, helper, source_expr, target_expr, expected, output in [
128
+ ("helper-score-42", "score", "40 + 1", "40 + 2", 42, "ready"),
129
+ ("helper-answer-41", "answer", "20 + 20", "20 + 21", 41, "green light"),
130
+ ("helper-total-18", "total", "30 - 13", "30 - 12", 18, "total ok"),
131
+ ("helper-count-9", "count", "12 - 4", "12 - 3", 9, "count passed"),
132
+ ("helper-value-15", "value", "7 + 7", "7 + 8", 15, "value good"),
133
+ ("helper-limit-24", "limit", "50 - 27", "50 - 26", 24, "limit open"),
134
+ ("helper-score-31", "score", "15 + 15", "15 + 16", 31, "score matched"),
135
+ ("helper-answer-8", "answer", "10 - 3", "10 - 2", 8, "done"),
136
+ ]
137
+ ],
138
+ *[
139
+ _two_helper_task(task_id, helper, other, source_expr, target_expr, other_expr, expected)
140
+ for task_id, helper, other, source_expr, target_expr, other_expr, expected in [
141
+ ("two-helper-score", "score", "spare", "12 + 9", "13 + 9", "30 - 8", 22),
142
+ ("two-helper-total", "total", "backup", "40 - 19", "41 - 19", "5 + 7", 22),
143
+ ("two-helper-count", "count", "idle", "18 + 3", "19 + 3", "14 - 2", 22),
144
+ ("two-helper-value", "value", "other", "55 - 35", "56 - 35", "6 + 2", 21),
145
+ ("two-helper-answer", "answer", "spare", "27 - 7", "28 - 7", "8 + 1", 21),
146
+ ("two-helper-level", "level", "helper", "9 + 10", "10 + 10", "40 - 3", 20),
147
+ ("two-helper-points", "points", "extra", "64 - 46", "65 - 46", "11 + 4", 19),
148
+ ("two-helper-result", "result", "unused", "33 - 12", "34 - 12", "2 + 2", 22),
149
+ ]
150
+ ],
151
+ *[
152
+ _call_task(task_id, source_args, target_args, expected)
153
+ for task_id, source_args, target_args, expected in [
154
+ ("call-update-five", "1, 1", "4, 1", 5),
155
+ ("call-update-ten", "7, 1", "7, 3", 10),
156
+ ("call-update-eleven", "0, 6", "5, 6", 11),
157
+ ("call-update-twenty", "9, 9", "11, 9", 20),
158
+ ("call-update-seven", "2, 2", "5, 2", 7),
159
+ ("call-update-twelve", "10, 0", "10, 2", 12),
160
+ ("call-update-sixteen", "8, 4", "8, 8", 16),
161
+ ("call-update-thirteen", "6, 6", "6, 7", 13),
162
+ ]
163
+ ],
164
+ *[
165
+ _condition_task(task_id, helper, returned, source_compare, "match found")
166
+ for task_id, helper, returned, source_compare in [
167
+ ("condition-count-four", "count", 4, 1),
168
+ ("condition-level-nine", "level", 9, 2),
169
+ ("condition-token-twelve", "token", 12, 8),
170
+ ("condition-value-fifteen", "value", 15, 10),
171
+ ("condition-flag-six", "flag", 6, 3),
172
+ ("condition-score-eleven", "score", 11, 7),
173
+ ("condition-count-eight", "count", 8, 4),
174
+ ("condition-marker-fourteen", "marker", 14, 0),
175
+ ]
176
+ ],
177
+ *[
178
+ _diagnostic_task(task_id, message)
179
+ for task_id, message in [
180
+ ("diagnostic-starting-up", "starting up"),
181
+ ("diagnostic-hello-main", "hello from main"),
182
+ ("diagnostic-message", "diagnostic message"),
183
+ ("diagnostic-payload-logged", "payload logged"),
184
+ ("diagnostic-attempt-write", "attempt write"),
185
+ ("diagnostic-retrying-output", "retrying output"),
186
+ ("diagnostic-done", "done"),
187
+ ("diagnostic-needs-raises", "needs raises"),
188
+ ]
189
+ ],
190
+ ]
191
+
192
+
193
+ SYNTHETIC_TASKS: list[dict[str, Any]] = [*EVAL_TASKS, *TRAIN_TASKS]
zerolang_editing/train_tasks.py ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Synthetic training rows for the Zerolang editing environment."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any
6
+
7
+ from .task_builders import (
8
+ _branch_literal_task,
9
+ _call_task,
10
+ _condition_task,
11
+ _diagnostic_task,
12
+ _helper_task,
13
+ _literal_task,
14
+ _two_helper_task,
15
+ )
16
+
17
+
18
+ LEGACY_TRAIN_TASKS: list[dict[str, Any]] = [
19
+ _helper_task(
20
+ "helper-return-update",
21
+ "answer",
22
+ "40 + 1",
23
+ "40 + 2",
24
+ 42,
25
+ "math works",
26
+ split="train",
27
+ ),
28
+ _call_task("callee-argument-update", "2, 2", "2, 3", 5, split="train"),
29
+ _condition_task("comparison-target-update", "score", 7, 8, "ready", split="train"),
30
+ _diagnostic_task("fallible-main-repair", "needs raises", split="train"),
31
+ ]
32
+
33
+
34
+ def _literal_train_tasks() -> list[dict[str, Any]]:
35
+ pairs = [
36
+ ("queue pending", "queue ready"),
37
+ ("job queued", "job running"),
38
+ ("job running", "job complete"),
39
+ ("build red", "build green"),
40
+ ("node cold", "node warm"),
41
+ ("cache miss", "cache hit"),
42
+ ("retry later", "retry now"),
43
+ ("draft note", "final note"),
44
+ ("plan open", "plan closed"),
45
+ ("graph stale", "graph fresh"),
46
+ ("route /v1/run", "route /v2/run"),
47
+ ("status [100]", "status [200]"),
48
+ ("phase: alpha", "phase: beta"),
49
+ ("phase: beta", "phase: gamma"),
50
+ ("step 1/4", "step 2/4"),
51
+ ("step 2/4", "step 3/4"),
52
+ ("score 10/20", "score 18/20"),
53
+ ("level: low", "level: high"),
54
+ ("mode manual", "mode auto"),
55
+ ("window closed", "window open"),
56
+ ("target west", "target east"),
57
+ ("port 3000", "port 8080"),
58
+ ("run id a1", "run id b2"),
59
+ ("batch small", "batch large"),
60
+ ("token old", "token new"),
61
+ ("edge loose", "edge locked"),
62
+ ("module local", "module remote"),
63
+ ("worker idle", "worker busy"),
64
+ ("agent paused", "agent active"),
65
+ ("output empty", "output full"),
66
+ ("index 0", "index 1"),
67
+ ("flag off", "flag on"),
68
+ ("signal weak", "signal strong"),
69
+ ("health warn", "health pass"),
70
+ ("check skipped", "check passed"),
71
+ ("ticket open", "ticket merged"),
72
+ ("snapshot old", "snapshot new"),
73
+ ("profile dev", "profile prod"),
74
+ ("version 0.1", "version 0.2"),
75
+ ("result unknown", "result known"),
76
+ ]
77
+ return [
78
+ _literal_task(f"train-literal-{index:03d}", old, new, split="train")
79
+ for index, (old, new) in enumerate(pairs, start=1)
80
+ ]
81
+
82
+
83
+ def _branch_literal_train_tasks() -> list[dict[str, Any]]:
84
+ specs = [
85
+ ("ready_gate", "gate draft", "gate ready"),
86
+ ("emit_gate", "emit old", "emit new"),
87
+ ("mode_gate", "mode test", "mode live"),
88
+ ("route_gate", "route blue", "route green"),
89
+ ("status_gate", "status low", "status high"),
90
+ ("phase_gate", "phase one", "phase two"),
91
+ ("counter_gate", "count fail", "count pass"),
92
+ ("worker_gate", "worker wait", "worker run"),
93
+ ("deploy_gate", "deploy hold", "deploy ship"),
94
+ ("review_gate", "review open", "review done"),
95
+ ("graph_gate", "graph dirty", "graph clean"),
96
+ ("patch_gate", "patch text", "patch graph"),
97
+ ("score_gate", "score bad", "score good"),
98
+ ("plan_gate", "plan rough", "plan exact"),
99
+ ("test_gate", "test flaky", "test stable"),
100
+ ("queue_gate", "queue blocked", "queue clear"),
101
+ ("cache_gate", "cache cold", "cache hot"),
102
+ ("trace_gate", "trace off", "trace on"),
103
+ ("run_gate", "run dry", "run real"),
104
+ ("sync_gate", "sync stale", "sync current"),
105
+ ]
106
+ return [
107
+ _branch_literal_task(f"train-branch-literal-{index:03d}", helper, old, new, split="train")
108
+ for index, (helper, old, new) in enumerate(specs, start=1)
109
+ ]
110
+
111
+
112
+ def _helper_train_tasks() -> list[dict[str, Any]]:
113
+ helpers = ["answer", "score", "total", "count", "value", "limit", "level", "points"]
114
+ outputs = ["ok", "ready", "matched", "accepted", "passed", "open", "done", "green"]
115
+ tasks: list[dict[str, Any]] = []
116
+
117
+ for index in range(1, 26):
118
+ left = 10 + index
119
+ target_right = 3 + (index % 9)
120
+ source_right = target_right - 1
121
+ expected = left + target_right
122
+ tasks.append(
123
+ _helper_task(
124
+ f"train-helper-add-{index:03d}",
125
+ helpers[index % len(helpers)],
126
+ f"{left} + {source_right}",
127
+ f"{left} + {target_right}",
128
+ expected,
129
+ outputs[index % len(outputs)],
130
+ split="train",
131
+ )
132
+ )
133
+
134
+ for index in range(1, 26):
135
+ left = 60 + index
136
+ target_right = 5 + (index % 11)
137
+ source_right = target_right + 1
138
+ expected = left - target_right
139
+ tasks.append(
140
+ _helper_task(
141
+ f"train-helper-sub-{index:03d}",
142
+ helpers[(index + 3) % len(helpers)],
143
+ f"{left} - {source_right}",
144
+ f"{left} - {target_right}",
145
+ expected,
146
+ outputs[(index + 2) % len(outputs)],
147
+ split="train",
148
+ )
149
+ )
150
+
151
+ return tasks
152
+
153
+
154
+ def _two_helper_train_tasks() -> list[dict[str, Any]]:
155
+ primary_helpers = ["score", "total", "count", "value", "answer", "level", "points", "result"]
156
+ other_helpers = ["spare", "backup", "idle", "other", "side", "helper", "extra", "unused"]
157
+ tasks: list[dict[str, Any]] = []
158
+ for index in range(1, 21):
159
+ left = 20 + index
160
+ target_right = 2 + (index % 7)
161
+ source_right = target_right - 1
162
+ expected = left + target_right
163
+ other_expr = f"{4 + index % 6} + {8 + index % 5}"
164
+ tasks.append(
165
+ _two_helper_task(
166
+ f"train-two-helper-{index:03d}",
167
+ primary_helpers[index % len(primary_helpers)],
168
+ other_helpers[index % len(other_helpers)],
169
+ f"{left} + {source_right}",
170
+ f"{left} + {target_right}",
171
+ other_expr,
172
+ expected,
173
+ split="train",
174
+ )
175
+ )
176
+ return tasks
177
+
178
+
179
+ def _call_train_tasks() -> list[dict[str, Any]]:
180
+ tasks: list[dict[str, Any]] = []
181
+ for index in range(1, 31):
182
+ left = 1 + (index % 17)
183
+ target_right = 2 + (index % 13)
184
+ source_right = target_right - 1
185
+ expected = left + target_right
186
+ tasks.append(
187
+ _call_task(
188
+ f"train-call-update-{index:03d}",
189
+ f"{left}, {source_right}",
190
+ f"{left}, {target_right}",
191
+ expected,
192
+ split="train",
193
+ )
194
+ )
195
+ return tasks
196
+
197
+
198
+ def _condition_train_tasks() -> list[dict[str, Any]]:
199
+ helpers = ["score", "count", "level", "token", "value", "flag", "marker", "limit"]
200
+ tasks: list[dict[str, Any]] = []
201
+ for index in range(1, 26):
202
+ returned = 5 + (index * 3)
203
+ source_compare = returned + 1
204
+ tasks.append(
205
+ _condition_task(
206
+ f"train-condition-update-{index:03d}",
207
+ helpers[index % len(helpers)],
208
+ returned,
209
+ source_compare,
210
+ "matched",
211
+ split="train",
212
+ )
213
+ )
214
+ return tasks
215
+
216
+
217
+ def _diagnostic_train_tasks() -> list[dict[str, Any]]:
218
+ messages = [
219
+ "train starting",
220
+ "train ready",
221
+ "diagnostic pass",
222
+ "writer needs raises",
223
+ "output accepted",
224
+ "payload saved",
225
+ "attempt complete",
226
+ "retry complete",
227
+ "batch emitted",
228
+ "sample logged",
229
+ "graph checked",
230
+ "patch validated",
231
+ "route verified",
232
+ "state stored",
233
+ "run complete",
234
+ "score written",
235
+ "marker emitted",
236
+ "world write",
237
+ "tool output",
238
+ "final line",
239
+ ]
240
+ return [
241
+ _diagnostic_task(f"train-diagnostic-{index:03d}", message, split="train")
242
+ for index, message in enumerate(messages, start=1)
243
+ ]
244
+
245
+
246
+ TRAIN_TASKS: list[dict[str, Any]] = [
247
+ *_literal_train_tasks(),
248
+ *_branch_literal_train_tasks(),
249
+ *LEGACY_TRAIN_TASKS,
250
+ *_helper_train_tasks(),
251
+ *_two_helper_train_tasks(),
252
+ *_call_train_tasks(),
253
+ *_condition_train_tasks(),
254
+ *_diagnostic_train_tasks(),
255
+ ]
zerolang_editing/zero_tools.py ADDED
@@ -0,0 +1,310 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Path-based Zerolang compiler tools for the editing environment."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import hashlib
6
+ import json
7
+ import os
8
+ import platform
9
+ import re
10
+ import shutil
11
+ import subprocess
12
+ import tempfile
13
+ import threading
14
+ import urllib.request
15
+ from pathlib import Path
16
+ from typing import Any
17
+
18
+
19
+ _ZERO_INSTALL_LOCK = threading.Lock()
20
+
21
+
22
+ def _download(url: str, timeout: int = 60) -> bytes:
23
+ with urllib.request.urlopen(url, timeout=timeout) as response:
24
+ return response.read()
25
+
26
+
27
+ def _zero_asset_candidates() -> list[str]:
28
+ system = platform.system()
29
+ machine = platform.machine().lower()
30
+ if machine in {"arm64", "aarch64"}:
31
+ cpu = "arm64"
32
+ elif machine in {"x86_64", "amd64"}:
33
+ cpu = "x64"
34
+ else:
35
+ return []
36
+
37
+ if system == "Darwin":
38
+ return [f"zero-darwin-{cpu}"]
39
+ if system == "Linux":
40
+ return [f"zero-linux-musl-{cpu}", f"zero-linux-{cpu}"]
41
+ return []
42
+
43
+
44
+ def _install_zero_binary() -> str | None:
45
+ install_dir = Path(
46
+ os.environ.get("ZERO_INSTALL_DIR")
47
+ or Path(tempfile.gettempdir()) / "zerolang-editing-zero" / "bin"
48
+ ).expanduser()
49
+ binary = install_dir / "zero"
50
+ if binary.exists():
51
+ return str(binary)
52
+
53
+ with _ZERO_INSTALL_LOCK:
54
+ if binary.exists():
55
+ return str(binary)
56
+
57
+ base_url = os.environ.get(
58
+ "ZERO_DOWNLOAD_BASE_URL",
59
+ "https://github.com/vercel-labs/zero/releases/latest/download",
60
+ ).rstrip("/")
61
+ checksums_text = _download(f"{base_url}/CHECKSUMS.txt").decode()
62
+ checksums = {}
63
+ for line in checksums_text.splitlines():
64
+ parts = line.split()
65
+ if len(parts) >= 2:
66
+ checksums[parts[1]] = parts[0]
67
+
68
+ install_dir.mkdir(parents=True, exist_ok=True)
69
+ last_error: Exception | None = None
70
+ for asset in _zero_asset_candidates():
71
+ try:
72
+ data = _download(f"{base_url}/{asset}")
73
+ expected = checksums.get(asset)
74
+ actual = hashlib.sha256(data).hexdigest()
75
+ if expected and actual != expected:
76
+ raise RuntimeError(f"checksum mismatch for {asset}")
77
+ binary.write_bytes(data)
78
+ os.chmod(binary, 0o755)
79
+ check = subprocess.run(
80
+ [str(binary), "--version"],
81
+ text=True,
82
+ capture_output=True,
83
+ timeout=10,
84
+ )
85
+ if check.returncode == 0:
86
+ return str(binary)
87
+ except Exception as exc:
88
+ last_error = exc
89
+ if binary.exists():
90
+ binary.unlink()
91
+ continue
92
+ if last_error is not None:
93
+ raise RuntimeError(f"failed to install zero binary: {last_error}") from last_error
94
+ return None
95
+
96
+
97
+ def _zero_binary(zero_path: str | None = None) -> str | None:
98
+ candidates = [
99
+ zero_path,
100
+ shutil.which("zero"),
101
+ str(Path.home() / ".zero" / "bin" / "zero"),
102
+ ]
103
+ for candidate in candidates:
104
+ if candidate and Path(candidate).exists():
105
+ return candidate
106
+ return _install_zero_binary()
107
+
108
+
109
+ def _json_tool_result(result: dict[str, Any]) -> str:
110
+ return json.dumps(result, indent=2, sort_keys=True)
111
+
112
+
113
+ def read_source(path: str | Path) -> str:
114
+ return Path(path).read_text()
115
+
116
+
117
+ def _source_fingerprint(path: Path) -> dict[str, Any]:
118
+ if not path.exists():
119
+ return {"path": str(path), "exists": False}
120
+ data = path.read_bytes()
121
+ return {
122
+ "path": str(path),
123
+ "exists": True,
124
+ "bytes": len(data),
125
+ "source_sha256": hashlib.sha256(data).hexdigest(),
126
+ }
127
+
128
+
129
+ def _summarize_graph_dump(graph_dump: str) -> dict[str, Any]:
130
+ summary: dict[str, Any] = {
131
+ "hash": None,
132
+ "literals": [],
133
+ "functions": [],
134
+ "calls": [],
135
+ "identifiers": [],
136
+ }
137
+ for line in graph_dump.splitlines():
138
+ if line.startswith("hash "):
139
+ summary["hash"] = line.split('"', 2)[1]
140
+ elif " Literal " in line:
141
+ match = re.match(r'node (#[0-9a-f]+) Literal type:"([^"]+)" value:"(.*)"', line)
142
+ if match:
143
+ summary["literals"].append(
144
+ {"node": match.group(1), "type": match.group(2), "value": match.group(3)}
145
+ )
146
+ elif " Function " in line:
147
+ match = re.match(r'node (#[0-9a-f]+) Function name:"([^"]+)" type:"([^"]+)"', line)
148
+ if match:
149
+ summary["functions"].append(
150
+ {"node": match.group(1), "name": match.group(2), "type": match.group(3)}
151
+ )
152
+ elif " MethodCall " in line:
153
+ match = re.match(r'node (#[0-9a-f]+) MethodCall name:"([^"]+)" type:"([^"]+)"', line)
154
+ if match:
155
+ summary["calls"].append(
156
+ {"node": match.group(1), "name": match.group(2), "type": match.group(3)}
157
+ )
158
+ elif " Identifier " in line:
159
+ match = re.match(r'node (#[0-9a-f]+) Identifier name:"([^"]+)"', line)
160
+ if match:
161
+ summary["identifiers"].append({"node": match.group(1), "name": match.group(2)})
162
+ return summary
163
+
164
+
165
+ def run_zero_path(args: list[str], path: str | Path, zero_path: str | None = None) -> dict[str, Any]:
166
+ binary = _zero_binary(zero_path)
167
+ source_path = Path(path)
168
+ if binary is None:
169
+ return {
170
+ "ok": False,
171
+ "tool_error": "zero binary not found; install with https://zerolang.ai/install.sh",
172
+ **_source_fingerprint(source_path),
173
+ }
174
+ if not source_path.exists():
175
+ return {
176
+ "ok": False,
177
+ "tool_error": f"source file does not exist: {source_path}",
178
+ **_source_fingerprint(source_path),
179
+ }
180
+
181
+ proc = subprocess.run(
182
+ [binary, *args, str(source_path)],
183
+ text=True,
184
+ capture_output=True,
185
+ timeout=10,
186
+ env={**os.environ, "PATH": f"{Path(binary).parent}:{os.environ.get('PATH', '')}"},
187
+ )
188
+ return {
189
+ "ok": proc.returncode == 0,
190
+ "returncode": proc.returncode,
191
+ "stdout": proc.stdout[-12000:],
192
+ "stderr": proc.stderr[-4000:],
193
+ **_source_fingerprint(source_path),
194
+ }
195
+
196
+
197
+ def run_zero_source(
198
+ args: list[str], source: str, zero_path: str | None = None
199
+ ) -> dict[str, Any]:
200
+ with tempfile.TemporaryDirectory(prefix="zerolang-editing-score-") as tmp:
201
+ source_path = Path(tmp) / "program.0"
202
+ source_path.write_text(source)
203
+ return run_zero_path(args, source_path, zero_path)
204
+
205
+
206
+ def make_zero_tools(zero_path: str | None = None) -> list[Any]:
207
+ def zero_check(path: str) -> str:
208
+ """Run `zero check --json` on a `.0` file path on disk."""
209
+ return _json_tool_result(run_zero_path(["check", "--json"], path, zero_path))
210
+
211
+ def zero_graph_summary(path: str) -> str:
212
+ """Return compact graph hash and patchable node facts for a `.0` file path."""
213
+ result = run_zero_path(["graph", "dump"], path, zero_path)
214
+ if result.get("ok"):
215
+ result["summary"] = _summarize_graph_dump(result.get("stdout", ""))
216
+ return _json_tool_result(result)
217
+
218
+ def zero_graph_dump(path: str) -> str:
219
+ """Run `zero graph dump` on a `.0` file path on disk."""
220
+ return _json_tool_result(run_zero_path(["graph", "dump"], path, zero_path))
221
+
222
+ def zero_graph_json(path: str) -> str:
223
+ """Run `zero graph --json` on a `.0` file path on disk."""
224
+ return _json_tool_result(run_zero_path(["graph", "--json"], path, zero_path))
225
+
226
+ def zero_fix_plan(path: str) -> str:
227
+ """Run `zero fix --plan --json` on a `.0` file path on disk."""
228
+ return _json_tool_result(run_zero_path(["fix", "--plan", "--json"], path, zero_path))
229
+
230
+ def zero_graph_patch(path: str, expect_graph_hash: str, op: str) -> str:
231
+ """Apply one checked `zero graph patch` operation to a `.0` file path on disk."""
232
+ binary = _zero_binary(zero_path)
233
+ source_path = Path(path)
234
+ if binary is None:
235
+ return _json_tool_result(
236
+ {
237
+ "ok": False,
238
+ "tool_error": "zero binary not found; install with https://zerolang.ai/install.sh",
239
+ **_source_fingerprint(source_path),
240
+ }
241
+ )
242
+ if not source_path.exists():
243
+ return _json_tool_result(
244
+ {
245
+ "ok": False,
246
+ "tool_error": f"source file does not exist: {source_path}",
247
+ **_source_fingerprint(source_path),
248
+ }
249
+ )
250
+ proc = subprocess.run(
251
+ [
252
+ binary,
253
+ "graph",
254
+ "patch",
255
+ str(source_path),
256
+ "--expect-graph-hash",
257
+ expect_graph_hash,
258
+ "--op",
259
+ op,
260
+ ],
261
+ text=True,
262
+ capture_output=True,
263
+ timeout=10,
264
+ env={**os.environ, "PATH": f"{Path(binary).parent}:{os.environ.get('PATH', '')}"},
265
+ )
266
+ return _json_tool_result(
267
+ {
268
+ "ok": proc.returncode == 0,
269
+ "returncode": proc.returncode,
270
+ "stdout": proc.stdout[-12000:],
271
+ "stderr": proc.stderr[-4000:],
272
+ **_source_fingerprint(source_path),
273
+ }
274
+ )
275
+
276
+ def zero_skills_get(skill: str) -> str:
277
+ """Return version-matched Zerolang guidance for `language`, `diagnostics`, `stdlib`, or `zero`."""
278
+ binary = _zero_binary(zero_path)
279
+ if binary is None:
280
+ return _json_tool_result(
281
+ {
282
+ "ok": False,
283
+ "tool_error": "zero binary not found; install with https://zerolang.ai/install.sh",
284
+ }
285
+ )
286
+ proc = subprocess.run(
287
+ [binary, "skills", "get", skill],
288
+ text=True,
289
+ capture_output=True,
290
+ timeout=10,
291
+ env={**os.environ, "PATH": f"{Path(binary).parent}:{os.environ.get('PATH', '')}"},
292
+ )
293
+ return _json_tool_result(
294
+ {
295
+ "ok": proc.returncode == 0,
296
+ "returncode": proc.returncode,
297
+ "stdout": proc.stdout[-12000:],
298
+ "stderr": proc.stderr[-4000:],
299
+ }
300
+ )
301
+
302
+ return [
303
+ zero_check,
304
+ zero_graph_summary,
305
+ zero_graph_dump,
306
+ zero_graph_json,
307
+ zero_fix_plan,
308
+ zero_graph_patch,
309
+ zero_skills_get,
310
+ ]
zerolang_editing/zerolang_editing.py ADDED
@@ -0,0 +1,418 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Prime Verifiers environment for Zerolang graph-first editing."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+ import os
7
+ import re
8
+ import tempfile
9
+ from collections.abc import Mapping
10
+ from pathlib import Path
11
+ from typing import Any
12
+
13
+ from datasets import Dataset
14
+ import verifiers as vf
15
+
16
+ from .tasks import SYNTHETIC_TASKS
17
+ from .zero_tools import make_zero_tools, read_source, run_zero_path, run_zero_source
18
+
19
+
20
+ SYSTEM_PROMPT = """\
21
+ You are Roder, a coding agent running in an evaluation harness.
22
+
23
+ Complete the requested code edit, use the available tools when they are useful,
24
+ and return a concise final answer. The task source is already written to disk;
25
+ operate on the provided `.0` file path.
26
+ """
27
+
28
+ ZERO_FILE_PLACEHOLDER = "{{ZERO_FILE_PATH}}"
29
+
30
+
31
+ def _normalize_source(source: str) -> str:
32
+ return "\n".join(line.rstrip() for line in source.strip().splitlines()).strip()
33
+
34
+
35
+ def _message_role(message: Any) -> str | None:
36
+ if isinstance(message, dict):
37
+ return message.get("role")
38
+ return getattr(message, "role", None)
39
+
40
+
41
+ def _message_content(message: Any) -> str:
42
+ if isinstance(message, dict):
43
+ content = message.get("content", "")
44
+ else:
45
+ content = getattr(message, "content", "")
46
+ return content if isinstance(content, str) else str(content)
47
+
48
+
49
+ def _completion_text(completion: Any) -> str:
50
+ if isinstance(completion, str):
51
+ return completion
52
+ for message in reversed(completion or []):
53
+ if _message_role(message) == "assistant":
54
+ content = _message_content(message)
55
+ if content:
56
+ return content
57
+ return _message_content((completion or [{}])[-1]) if completion else ""
58
+
59
+
60
+ def _extract_json_payload(completion: Any) -> dict[str, Any] | None:
61
+ text = _completion_text(completion).strip()
62
+ fenced_json = re.search(r"```json\s*(.*?)```", text, re.DOTALL | re.IGNORECASE)
63
+ if fenced_json:
64
+ text = fenced_json.group(1).strip()
65
+
66
+ for candidate in (text, None):
67
+ if candidate is None:
68
+ object_match = re.search(r"\{.*\}", text, re.DOTALL)
69
+ if object_match is None:
70
+ continue
71
+ candidate = object_match.group(0)
72
+ try:
73
+ payload = json.loads(candidate)
74
+ except json.JSONDecodeError:
75
+ continue
76
+ if isinstance(payload, dict):
77
+ return payload
78
+ return None
79
+
80
+
81
+ def _extract_final_source(completion: Any) -> str:
82
+ payload = _extract_json_payload(completion)
83
+ if payload is not None and isinstance(payload.get("final_source"), str):
84
+ return payload["final_source"]
85
+
86
+ text = _completion_text(completion).strip()
87
+ fenced_zero = re.search(r"```(?:zero|0)?\s*(.*?)```", text, re.DOTALL | re.IGNORECASE)
88
+ if fenced_zero:
89
+ return fenced_zero.group(1).strip()
90
+ return text
91
+
92
+
93
+ def _state_file_path(state: Any) -> str | None:
94
+ if isinstance(state, dict):
95
+ path = state.get("zero_file_path")
96
+ return path if isinstance(path, str) else None
97
+ return None
98
+
99
+
100
+ def _state_file_source(state: Any) -> str:
101
+ path = _state_file_path(state)
102
+ if not path:
103
+ return ""
104
+ try:
105
+ return read_source(path)
106
+ except OSError:
107
+ return ""
108
+
109
+
110
+ def _scored_source(completion: Any, state: Any = None) -> str:
111
+ disk_source = _state_file_source(state)
112
+ if disk_source:
113
+ return disk_source
114
+ return _extract_final_source(completion)
115
+
116
+
117
+ def _tool_was_called(state: Any) -> bool:
118
+ for turn in (state or {}).get("trajectory", []):
119
+ for message in turn.get("completion", []):
120
+ tool_calls = getattr(message, "tool_calls", None)
121
+ if tool_calls:
122
+ return True
123
+ if isinstance(message, dict) and message.get("tool_calls"):
124
+ return True
125
+ return False
126
+
127
+
128
+ def _make_prompt(row: dict[str, Any]) -> list[dict[str, str]]:
129
+ return [
130
+ {
131
+ "role": "user",
132
+ "content": (
133
+ f"Task id: {row['id']}\n"
134
+ f"Edit goal: {row['goal']}\n\n"
135
+ "The Zerolang source has been written to this file:\n"
136
+ f"{ZERO_FILE_PLACEHOLDER}\n\n"
137
+ "Use tool arguments with `path` set to that `.0` file. "
138
+ "The grader will read the edited file from disk and run `zero check` on it. "
139
+ "Return a JSON object with `path` when finished."
140
+ ),
141
+ }
142
+ ]
143
+
144
+
145
+ def _build_dataset(split: str, max_examples: int | None) -> Dataset:
146
+ rows: list[dict[str, Any]] = []
147
+ for task in SYNTHETIC_TASKS:
148
+ if split != "all" and task["split"] != split:
149
+ continue
150
+ rows.append(
151
+ {
152
+ "prompt": _make_prompt(task),
153
+ "answer": task["target_source"],
154
+ "info": json.dumps(
155
+ {
156
+ "id": task["id"],
157
+ "category": task["category"],
158
+ "split": task["split"],
159
+ "goal": task["goal"],
160
+ "source": task["source"],
161
+ "target_source": task["target_source"],
162
+ }
163
+ ),
164
+ }
165
+ )
166
+ if max_examples is not None:
167
+ rows = rows[: int(max_examples)]
168
+ return Dataset.from_list(rows)
169
+
170
+
171
+ def _workspace_root() -> Path:
172
+ configured = os.environ.get("ZEROLANG_EDITING_WORKDIR")
173
+ if configured:
174
+ return Path(configured).expanduser()
175
+ return Path(tempfile.gettempdir()) / "zerolang-editing-rollouts"
176
+
177
+
178
+ def _safe_task_id(task_id: str) -> str:
179
+ return re.sub(r"[^A-Za-z0-9_.-]+", "-", task_id).strip("-") or "task"
180
+
181
+
182
+ def _replace_prompt_path(messages: Any, path: str) -> None:
183
+ for message in messages or []:
184
+ if isinstance(message, dict):
185
+ content = message.get("content")
186
+ if isinstance(content, str):
187
+ message["content"] = content.replace(ZERO_FILE_PLACEHOLDER, path)
188
+ continue
189
+ content = getattr(message, "content", None)
190
+ if isinstance(content, str):
191
+ setattr(message, "content", content.replace(ZERO_FILE_PLACEHOLDER, path))
192
+
193
+
194
+ def _is_relative_to(child: Path, parent: Path) -> bool:
195
+ try:
196
+ child.relative_to(parent)
197
+ return True
198
+ except ValueError:
199
+ return False
200
+
201
+
202
+ class ZerolangPathToolEnv(vf.StatefulToolEnv):
203
+ """Tool environment that creates one on-disk `.0` file per rollout."""
204
+
205
+ def __init__(self, *args: Any, workspace_root: Path | None = None, **kwargs: Any):
206
+ super().__init__(*args, **kwargs)
207
+ self.workspace_root = workspace_root or _workspace_root()
208
+
209
+ async def setup_state(self, state: vf.State) -> None:
210
+ info = state.get("info") or {}
211
+ source = info.get("source") if isinstance(info, dict) else None
212
+ if not isinstance(source, str) or not source.strip():
213
+ raise ValueError("zerolang-editing rows must include info.source")
214
+
215
+ task_id = info.get("id", "task") if isinstance(info, dict) else "task"
216
+ workspace = self.workspace_root / f"{_safe_task_id(str(task_id))}-{state['trajectory_id']}"
217
+ workspace.mkdir(parents=True, exist_ok=True)
218
+ file_path = workspace / "program.0"
219
+ file_path.write_text(source)
220
+
221
+ state["zero_workspace"] = str(workspace.resolve())
222
+ state["zero_file_path"] = str(file_path.resolve())
223
+ _replace_prompt_path(state.get("prompt"), state["zero_file_path"])
224
+
225
+ def update_tool_args(
226
+ self,
227
+ tool_name: str,
228
+ tool_args: dict,
229
+ messages: vf.Messages,
230
+ state: vf.State,
231
+ **kwargs: Any,
232
+ ) -> dict:
233
+ if "source" in tool_args:
234
+ raise ValueError("Zerolang tools operate on `path`; do not pass source text.")
235
+ if tool_name == "zero_skills_get":
236
+ return tool_args
237
+
238
+ workspace = Path(str(state["zero_workspace"])).resolve()
239
+ fallback = Path(str(state["zero_file_path"])).resolve()
240
+ raw_value = tool_args.get("path")
241
+ correction_reason: str | None = None
242
+
243
+ if raw_value in {None, ""}:
244
+ resolved = fallback
245
+ correction_reason = "missing_path"
246
+ else:
247
+ raw_path = Path(str(raw_value)).expanduser()
248
+ resolved = (
249
+ (workspace / raw_path).resolve()
250
+ if not raw_path.is_absolute()
251
+ else raw_path.resolve()
252
+ )
253
+ if not _is_relative_to(resolved, workspace):
254
+ resolved = fallback
255
+ correction_reason = "outside_workspace"
256
+ elif resolved.suffix != ".0":
257
+ resolved = fallback
258
+ correction_reason = "non_zero_path"
259
+
260
+ if correction_reason is not None:
261
+ state.setdefault("zero_path_arg_corrections", []).append(
262
+ {
263
+ "tool_name": tool_name,
264
+ "reason": correction_reason,
265
+ "raw_path": "" if raw_value is None else str(raw_value),
266
+ }
267
+ )
268
+ tool_args["path"] = str(resolved)
269
+ return tool_args
270
+
271
+
272
+ async def target_source_match(completion: Any, answer: str, state: Any = None, **_: Any) -> float:
273
+ scored_source = _scored_source(completion, state)
274
+ return 1.0 if _normalize_source(scored_source) == _normalize_source(answer) else 0.0
275
+
276
+
277
+ async def zero_check_pass(completion: Any, state: Any = None, **_: Any) -> float:
278
+ path = _state_file_path(state)
279
+ if path and Path(path).exists():
280
+ result = run_zero_path(["check", "--json"], path)
281
+ else:
282
+ final_source = _extract_final_source(completion)
283
+ if not final_source.strip():
284
+ return 0.0
285
+ result = run_zero_source(["check", "--json"], final_source)
286
+ if not result.get("ok"):
287
+ return 0.0
288
+ try:
289
+ parsed = json.loads(result.get("stdout") or "{}")
290
+ except json.JSONDecodeError:
291
+ return 0.0
292
+ return 1.0 if parsed.get("ok") is True else 0.0
293
+
294
+
295
+ def _walk_graph_patch_payloads(value: Any, seen: set[int] | None = None):
296
+ if seen is None:
297
+ seen = set()
298
+ value_id = id(value)
299
+ if value_id in seen:
300
+ return
301
+ seen.add(value_id)
302
+
303
+ if isinstance(value, Mapping):
304
+ stdout = value.get("stdout")
305
+ if isinstance(stdout, str) and "program graph patch ok" in stdout:
306
+ yield value
307
+ for item in value.values():
308
+ yield from _walk_graph_patch_payloads(item, seen)
309
+ return
310
+
311
+ if isinstance(value, (list, tuple)):
312
+ for item in value:
313
+ yield from _walk_graph_patch_payloads(item, seen)
314
+ return
315
+
316
+ if isinstance(value, str) and "program graph patch ok" in value:
317
+ try:
318
+ parsed = json.loads(value)
319
+ except json.JSONDecodeError:
320
+ return
321
+ yield from _walk_graph_patch_payloads(parsed, seen)
322
+ return
323
+
324
+ model_dump = getattr(value, "model_dump", None)
325
+ if callable(model_dump):
326
+ try:
327
+ yield from _walk_graph_patch_payloads(model_dump(), seen)
328
+ except Exception:
329
+ pass
330
+ return
331
+
332
+ attrs = getattr(value, "__dict__", None)
333
+ if isinstance(attrs, dict):
334
+ yield from _walk_graph_patch_payloads(attrs, seen)
335
+
336
+
337
+ async def graph_patch_success(
338
+ completion: Any = None, state: Any = None, answer: str = "", **_: Any
339
+ ) -> float:
340
+ search_root = (state or {}).get("trajectory", state or {}) if isinstance(state, Mapping) else state
341
+ for payload in _walk_graph_patch_payloads(search_root or {}):
342
+ path = payload.get("path")
343
+ if isinstance(path, str):
344
+ try:
345
+ patched_source = read_source(path)
346
+ except OSError:
347
+ continue
348
+ if _normalize_source(patched_source) == _normalize_source(answer):
349
+ return 1.0
350
+
351
+ text = _completion_text(completion)
352
+ if _tool_was_called(state) and "graph_patch" in text:
353
+ scored_source = _scored_source(completion, state)
354
+ if _normalize_source(scored_source) == _normalize_source(answer):
355
+ return 1.0
356
+ return 0.0
357
+
358
+
359
+ async def zerolang_surface_used(completion: Any, state: Any = None, **_: Any) -> float:
360
+ if _tool_was_called(state):
361
+ text = _completion_text(completion).lower()
362
+ if "zero_graph_patch" in text or "graph_patch" in text:
363
+ return 1.0
364
+ text = _completion_text(completion).lower()
365
+ markers = [
366
+ "zero_graph_patch",
367
+ "graph_patch",
368
+ "expect_graph_hash",
369
+ "set node=",
370
+ "field=\"value\"",
371
+ "expect=",
372
+ "graph hash",
373
+ "node #",
374
+ ]
375
+ return 1.0 if any(marker in text for marker in markers) else 0.0
376
+
377
+
378
+ async def path_argument_valid(completion: Any, state: Any = None, **_: Any) -> float:
379
+ if not _tool_was_called(state):
380
+ return 0.0
381
+ corrections = (state or {}).get("zero_path_arg_corrections", [])
382
+ return 0.0 if corrections else 1.0
383
+
384
+
385
+ def load_environment(
386
+ split: str = "eval",
387
+ max_examples: int | None = None,
388
+ max_turns: int = 6,
389
+ zero_path: str | None = None,
390
+ enable_tools: bool = True,
391
+ **_: Any,
392
+ ) -> vf.Environment:
393
+ """Load the Zerolang editing environment."""
394
+ if split not in {"train", "eval", "all"}:
395
+ raise ValueError("split must be one of: train, eval, all")
396
+
397
+ dataset = _build_dataset(split=split, max_examples=max_examples)
398
+ rubric = vf.Rubric(
399
+ funcs=[
400
+ graph_patch_success,
401
+ target_source_match,
402
+ zero_check_pass,
403
+ zerolang_surface_used,
404
+ path_argument_valid,
405
+ ],
406
+ weights=[0.50, 0.20, 0.15, 0.10, 0.05],
407
+ )
408
+
409
+ if enable_tools:
410
+ return ZerolangPathToolEnv(
411
+ dataset=dataset,
412
+ rubric=rubric,
413
+ tools=make_zero_tools(zero_path),
414
+ max_turns=max_turns,
415
+ system_prompt=SYSTEM_PROMPT,
416
+ )
417
+
418
+ return vf.SingleTurnEnv(dataset=dataset, rubric=rubric, system_prompt=SYSTEM_PROMPT)