Add Zerolang editing environment

Browse files

Files changed (12) hide show

.gitignore +8 -0
README.md +241 -0
configs/rl/zerolang-editing-laguna-xs2-20step.toml +34 -0
configs/rl/zerolang-editing-laguna-xs2-overnight.toml +35 -0
pyproject.toml +22 -0
uv.lock +0 -0
zerolang_editing/__init__.py +3 -0
zerolang_editing/task_builders.py +270 -0
zerolang_editing/tasks.py +193 -0
zerolang_editing/train_tasks.py +255 -0
zerolang_editing/zero_tools.py +310 -0
zerolang_editing/zerolang_editing.py +418 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,8 @@

+.venv/
+.zero/
+__pycache__/
+*.py[cod]
+dist/
+outputs/
+.pytest_cache/
+.ruff_cache/

README.md ADDED Viewed

	@@ -0,0 +1,241 @@

+---
+tags:
+  - zerolang
+  - reinforcement-learning
+  - verifiers
+  - code-editing
+  - tool-use
+  - graph-editing
+  - laguna-xs2
+license: apache-2.0
+---
+# Zerolang Editing
+`zerolang-editing` is a Verifiers/Prime RL environment for training coding agents
+to edit [Zerolang](https://github.com/vercel-labs/zerolang) programs through
+checked graph edits instead of loose text replacement.
+The core task is intentionally narrow: each rollout starts with a `.0` source
+file already written to disk, asks the model for a semantic code edit, and
+scores the edited file after the model uses Zerolang tooling. The intended
+successful behavior is:
+1. Inspect the file with Zerolang graph/check tools.
+2. Identify the relevant graph hash and semantic node.
+3. Apply a checked `zero graph patch` operation to the on-disk file.
+4. Finish with a compact JSON response pointing at the edited path.
+This repository contains the environment source package, synthetic task
+builders, tool wrappers, and documentation. The trained checkpoint from hosted
+RL runs is published separately by the training service when a run is finalized.
+## Why This Exists
+Most code-editing agents learn to patch source through line-oriented text
+operations. Zerolang exposes a graph-level editing surface where a patch is
+guarded by the expected graph hash and the expected field value. That makes
+edits auditable and harder to apply to stale or mismatched code.
+This environment is designed to train that behavior directly. It rewards
+successful checked graph patches, while still checking that the resulting file
+compiles and matches the hidden target source.
+## Environment Summary
+- **Package name:** `zerolang-editing`
+- **Prime environment ID:** `pandelis/zerolang-editing`
+- **Version in this repo:** `0.1.8`
+- **Task type:** multi-turn tool-use code editing
+- **Language under edit:** Zerolang `.0`
+- **Train split:** 209 deterministic synthetic tasks
+- **Eval split:** 67 held-out deterministic synthetic tasks
+- **Primary reward target:** successful `zero_graph_patch` on the rollout file
+## Rollout Contract
+Each task row includes an initial Zerolang source program and a hidden target
+program. At rollout setup time, the environment writes the initial source to:
+```text
+<temporary rollout workspace>/program.0
+```
+The model receives that path in the user prompt. Tools must operate on `path`
+arguments that point to this `.0` file. Pasting the full source into tool calls
+is rejected because the training target is disk-backed graph editing, not
+source-string rewriting.
+The environment canonicalizes recoverable path mistakes, such as missing paths
+or paths outside the rollout workspace, back to the rollout file and records
+those corrections. The `path_argument_valid` metric rewards clean tool calls
+that did not require correction.
+## Tools
+The environment exposes only Zerolang-specific tools:
+| Tool | Purpose |
+| --- | --- |
+| `zero_check(path)` | Run `zero check --json` against a `.0` file. |
+| `zero_graph_summary(path)` | Return compact graph hash and patchable node facts. |
+| `zero_graph_dump(path)` | Run `zero graph dump` for detailed graph inspection. |
+| `zero_graph_json(path)` | Run `zero graph --json`. |
+| `zero_fix_plan(path)` | Run `zero fix --plan --json`. |
+| `zero_graph_patch(path, expect_graph_hash, op)` | Apply one checked graph patch operation to the file. |
+| `zero_skills_get(skill)` | Load version-matched Zerolang guidance such as `language`, `diagnostics`, or `stdlib`. |
+Example checked patch shape:
+```bash
+zero graph patch program.0 \
+  --expect-graph-hash graph:49dd208f8361c221 \
+  --op 'set node="#78ac4364" field="value" expect="66" value="65"'
+```
+## Reward Metrics
+The main rubric is weighted toward actually patching the graph and producing
+the hidden target program.
+| Metric | Weight | Meaning |
+| --- | ---: | --- |
+| `graph_patch_success` | 0.50 | A successful `zero_graph_patch` call edited the file to the hidden target. |
+| `target_source_match` | 0.20 | The final on-disk source matches the target after whitespace normalization. |
+| `zero_check_pass` | 0.15 | The edited file passes `zero check --json`. |
+| `zerolang_surface_used` | 0.10 | The rollout used graph hashes, node IDs, `expect`, or graph-patch semantics. |
+| `path_argument_valid` | 0.05 | Tool calls used the rollout `.0` path without harness-side correction. |
+The reward is intentionally not fully binary. A model can get partial credit for
+producing compilable code and using the right interface, but the highest reward
+requires the checked graph patch to land correctly.
+## Dataset Construction
+The synthetic tasks are generated from canonical Zerolang snippets:
+1. Build an initial `.0` program.
+2. Select a patchable semantic node, usually a literal, function value, call
+   target, or printed diagnostic string.
+3. Mutate the semantic value to produce the target program.
+4. Store the target source and task metadata.
+5. During rollout, require the model to recover the target through graph tools.
+The environment currently focuses on deterministic editing families where
+`zero graph patch` support is reliable. The task builders live in:
+- `zerolang_editing/tasks.py`
+- `zerolang_editing/train_tasks.py`
+- `zerolang_editing/task_builders.py`
+## Installation
+Install from Prime Hub:
+```bash
+prime env install pandelis/zerolang-editing@0.1.8
+```
+Install from this repository:
+```bash
+uv sync
+uv run python -m compileall zerolang_editing
+```
+Zerolang is required at runtime. If `zero` is not already on `PATH`, the tool
+wrapper checks `$HOME/.zero/bin/zero` and can download a release binary into a
+temporary install directory.
+## Local Eval
+```bash
+prime eval run ./environments/zerolang_editing \
+  -m poolside/laguna-xs.2 \
+  -n 3 -r 1 -t 2048 -T 0.4 \
+  -a '{"split":"eval","max_turns":10}' \
+  -s -d -A
+```
+For quick package-level validation:
+```bash
+cd environments/zerolang_editing
+uv run python -m compileall zerolang_editing
+uv run python - <<'PY'
+from zerolang_editing.zerolang_editing import load_environment
+env = load_environment(split="eval", max_examples=1, max_turns=2)
+print(type(env).__name__, len(env.dataset))
+PY
+```
+## Hosted RL Configuration
+The overnight Laguna XS.2 run uses:
+```toml
+model = "poolside/Laguna-XS.2"
+max_steps = 200
+batch_size = 64
+rollouts_per_example = 8
+learning_rate = 1e-4
+[sampling]
+max_tokens = 2048
+temperature = 0.4
+enable_thinking = true
+```
+The config is stored in:
+```text
+configs/rl/zerolang-editing-laguna-xs2-overnight.toml
+```
+## Previous Training Signal
+A 20-step stress run on `poolside/Laguna-XS.2` completed successfully before
+the overnight scale-up:
+- Baseline eval Avg@1: `0.1500`
+- Step 15 eval Avg@1: `0.2357`
+- Final eval Avg@1: `0.2250`
+- First 10 train-step reward average: `0.1606`
+- Last 10 train-step reward average: `0.2056`
+- No fatal orchestrator errors, no eval truncation, no no-response.
+The main failure signatures were invalid tool paths: missing `path` arguments
+and paths outside the rollout workspace. Version `0.1.8` keeps the path sandbox
+but converts recoverable path mistakes into canonicalized calls against the
+rollout file and adds a small clean-path reward term.
+## Repository Contents
+```text
+README.md
+pyproject.toml
+uv.lock
+configs/
+  rl/
+    zerolang-editing-laguna-xs2-20step.toml
+    zerolang-editing-laguna-xs2-overnight.toml
+zerolang_editing/
+  __init__.py
+  task_builders.py
+  tasks.py
+  train_tasks.py
+  zero_tools.py
+  zerolang_editing.py
+```
+Build artifacts, local virtualenvs, Zerolang caches, rollout outputs, and
+compiled Python caches are intentionally excluded from the Hugging Face repo.
+## Limitations
+- The task distribution is synthetic and should be expanded before treating the
+  trained behavior as general Zerolang editing competence.
+- Current graph-edit families focus on reliable literal/value style patches.
+- The environment is designed for RL tool-use behavior, not as a standalone
+  benchmark of general coding ability.
+- This repo contains the environment source, not final model weights.

configs/rl/zerolang-editing-laguna-xs2-20step.toml ADDED Viewed

	@@ -0,0 +1,34 @@

+# Conservative scale-up from zerolang-editing-stress.toml.
+model = "poolside/Laguna-XS.2"
+max_steps = 20
+batch_size = 32
+rollouts_per_example = 4
+learning_rate = 1e-4
+[sampling]
+max_tokens = 2048
+temperature = 0.4
+enable_thinking = true
+[[env]]
+id = "pandelis/zerolang-editing"
+version = "0.1.8"
+[env.args]
+split = "train"
+max_turns = 10
+[eval]
+interval = 5
+num_examples = 8
+rollouts_per_example = 1
+eval_base_model = true
+[[eval.env]]
+id = "pandelis/zerolang-editing"
+version = "0.1.8"
+[eval.env.args]
+split = "eval"
+max_turns = 10

configs/rl/zerolang-editing-laguna-xs2-overnight.toml ADDED Viewed

	@@ -0,0 +1,35 @@

+# Overnight scale-up from zerolang-editing-laguna-xs2-20step.toml.
+# Previous 20-step run was stable and improved held-out Avg@1 from 0.1500 to 0.2250.
+model = "poolside/Laguna-XS.2"
+max_steps = 200
+batch_size = 64
+rollouts_per_example = 8
+learning_rate = 1e-4
+[sampling]
+max_tokens = 2048
+temperature = 0.4
+enable_thinking = true
+[[env]]
+id = "pandelis/zerolang-editing"
+version = "0.1.8"
+[env.args]
+split = "train"
+max_turns = 10
+[eval]
+interval = 10
+num_examples = 16
+rollouts_per_example = 1
+eval_base_model = true
+[[eval.env]]
+id = "pandelis/zerolang-editing"
+version = "0.1.8"
+[eval.env.args]
+split = "eval"
+max_turns = 10

pyproject.toml ADDED Viewed

	@@ -0,0 +1,22 @@

+[project]
+name = "zerolang-editing"
+description = "Tool-backed Zerolang editing tasks for graph-first code repair and refactoring"
+tags = ["zerolang", "graph-editing", "code-editing", "train", "eval"]
+version = "0.1.8"
+requires-python = ">=3.10"
+dependencies = [
+    "datasets>=2.19.0",
+    "verifiers>=0.1.14",
+]
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[tool.hatch.build]
+include = ["zerolang_editing", "pyproject.toml"]
+[tool.verifiers.eval]
+num_examples = 3
+rollouts_per_example = 1
+max_turns = 6

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff

zerolang_editing/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .zerolang_editing import load_environment
2	+
3	+ __all__ = ["load_environment"]

zerolang_editing/task_builders.py ADDED Viewed

	@@ -0,0 +1,270 @@

+"""Task construction helpers for synthetic Zerolang editing rows."""
+from __future__ import annotations
+from typing import Any
+def _source(text: str) -> str:
+    return text.strip() + "\n"
+def _write_program(message: str, *, raises: bool = True) -> str:
+    raises_suffix = " raises" if raises else ""
+    return _source(
+        f"""
+pub fn main(world: World) -> Void{raises_suffix} {{
+    check world.out.write("{message}\\n")
+}}
+"""
+    )
+def _literal_task(
+    task_id: str, old: str, new: str, goal: str | None = None, *, split: str = "eval"
+) -> dict[str, Any]:
+    return {
+        "id": task_id,
+        "split": split,
+        "category": "graph_patch_literal",
+        "goal": goal or f'Replace the string literal "{old}\\n" with "{new}\\n".',
+        "source": _write_program(old),
+        "target_source": _write_program(new),
+    }
+def _branch_literal_task(
+    task_id: str, helper: str, old: str, new: str, *, split: str = "eval"
+) -> dict[str, Any]:
+    return {
+        "id": task_id,
+        "split": split,
+        "category": "graph_patch_literal",
+        "goal": (
+            "Keep the helper-controlled branch intact and update only the string "
+            f'literal from "{old}\\n" to "{new}\\n".'
+        ),
+        "source": _source(
+            f"""
+fn {helper}() -> i32 {{
+    return 1
+}}
+pub fn main(world: World) -> Void raises {{
+    if {helper}() == 1 {{
+        check world.out.write("{old}\\n")
+    }}
+}}
+"""
+        ),
+        "target_source": _source(
+            f"""
+fn {helper}() -> i32 {{
+    return 1
+}}
+pub fn main(world: World) -> Void raises {{
+    if {helper}() == 1 {{
+        check world.out.write("{new}\\n")
+    }}
+}}
+"""
+        ),
+    }
+def _helper_task(
+    task_id: str,
+    helper: str,
+    source_expr: str,
+    target_expr: str,
+    expected: int,
+    output: str,
+    *,
+    split: str = "eval",
+) -> dict[str, Any]:
+    return {
+        "id": task_id,
+        "split": split,
+        "category": "semantic_update",
+        "goal": (
+            f"Update {helper}() so it returns {expected} and the existing main "
+            f"branch prints {output}."
+        ),
+        "source": _source(
+            f"""
+fn {helper}() -> i32 {{
+    return {source_expr}
+}}
+pub fn main(world: World) -> Void raises {{
+    if {helper}() == {expected} {{
+        check world.out.write("{output}\\n")
+    }}
+}}
+"""
+        ),
+        "target_source": _source(
+            f"""
+fn {helper}() -> i32 {{
+    return {target_expr}
+}}
+pub fn main(world: World) -> Void raises {{
+    if {helper}() == {expected} {{
+        check world.out.write("{output}\\n")
+    }}
+}}
+"""
+        ),
+    }
+def _two_helper_task(
+    task_id: str,
+    helper: str,
+    other: str,
+    source_expr: str,
+    target_expr: str,
+    other_expr: str,
+    expected: int,
+    *,
+    split: str = "eval",
+) -> dict[str, Any]:
+    return {
+        "id": task_id,
+        "split": split,
+        "category": "semantic_update",
+        "goal": (
+            f"Update only {helper}() so main writes ok when the comparison succeeds; "
+            f"leave {other}() unchanged."
+        ),
+        "source": _source(
+            f"""
+fn {helper}() -> i32 {{
+    return {source_expr}
+}}
+fn {other}() -> i32 {{
+    return {other_expr}
+}}
+pub fn main(world: World) -> Void raises {{
+    if {helper}() == {expected} {{
+        check world.out.write("ok\\n")
+    }}
+}}
+"""
+        ),
+        "target_source": _source(
+            f"""
+fn {helper}() -> i32 {{
+    return {target_expr}
+}}
+fn {other}() -> i32 {{
+    return {other_expr}
+}}
+pub fn main(world: World) -> Void raises {{
+    if {helper}() == {expected} {{
+        check world.out.write("ok\\n")
+    }}
+}}
+"""
+        ),
+    }
+def _call_task(
+    task_id: str, source_args: str, target_args: str, expected: int, *, split: str = "eval"
+) -> dict[str, Any]:
+    return {
+        "id": task_id,
+        "split": split,
+        "category": "call_update",
+        "goal": "Keep add unchanged, but edit one call argument so the comparison is true.",
+        "source": _source(
+            f"""
+fn add(a: i32, b: i32) -> i32 {{
+    return a + b
+}}
+pub fn main(world: World) -> Void raises {{
+    if add({source_args}) == {expected} {{
+        check world.out.write("ok\\n")
+    }}
+}}
+"""
+        ),
+        "target_source": _source(
+            f"""
+fn add(a: i32, b: i32) -> i32 {{
+    return a + b
+}}
+pub fn main(world: World) -> Void raises {{
+    if add({target_args}) == {expected} {{
+        check world.out.write("ok\\n")
+    }}
+}}
+"""
+        ),
+    }
+def _condition_task(
+    task_id: str,
+    helper: str,
+    returned: int,
+    source_compare: int,
+    output: str,
+    *,
+    split: str = "eval",
+) -> dict[str, Any]:
+    return {
+        "id": task_id,
+        "split": split,
+        "category": "condition_update",
+        "goal": (
+            "Edit the comparison literal so the branch is true without changing "
+            f"{helper}() or the output string."
+        ),
+        "source": _source(
+            f"""
+fn {helper}() -> i32 {{
+    return {returned}
+}}
+pub fn main(world: World) -> Void raises {{
+    if {helper}() == {source_compare} {{
+        check world.out.write("{output}\\n")
+    }}
+}}
+"""
+        ),
+        "target_source": _source(
+            f"""
+fn {helper}() -> i32 {{
+    return {returned}
+}}
+pub fn main(world: World) -> Void raises {{
+    if {helper}() == {returned} {{
+        check world.out.write("{output}\\n")
+    }}
+}}
+"""
+        ),
+    }
+def _diagnostic_task(task_id: str, message: str, *, split: str = "eval") -> dict[str, Any]:
+    return {
+        "id": task_id,
+        "split": split,
+        "category": "diagnostic_repair",
+        "goal": "Repair the main signature so the existing world.out.write check is valid.",
+        "source": _write_program(message, raises=False),
+        "target_source": _write_program(message, raises=True),
+    }

zerolang_editing/tasks.py ADDED Viewed

	@@ -0,0 +1,193 @@

+"""Synthetic task corpus for the Zerolang editing environment."""
+from __future__ import annotations
+from .task_builders import (
+    _branch_literal_task,
+    _call_task,
+    _condition_task,
+    _diagnostic_task,
+    _helper_task,
+    _literal_task,
+    _two_helper_task,
+)
+from .train_tasks import TRAIN_TASKS
+EVAL_TASKS: list[dict[str, Any]] = [
+    _literal_task(
+        "literal-string-graph-patch",
+        "hello from zero",
+        "hello graph",
+        "Change the printed string from hello from zero to hello graph.",
+    ),
+    _literal_task(
+        "repair-unknown-message",
+        "draft",
+        "fixed by zero",
+        'Replace the string literal "draft\\n" with "fixed by zero\\n".',
+    ),
+    _literal_task(
+        "literal-status-ready",
+        "status: draft",
+        "status: ready",
+        'Replace the string literal "status: draft\\n" with "status: ready\\n".',
+    ),
+    _literal_task(
+        "literal-counter-pass",
+        "counter failed",
+        "counter passed",
+        'Replace the string literal "counter failed\\n" with "counter passed\\n".',
+    ),
+    _literal_task(
+        "literal-agent-graph",
+        "agent used text",
+        "agent used graph",
+        'Replace the string literal "agent used text\\n" with "agent used graph\\n".',
+    ),
+    *[
+        _literal_task(task_id, old, new)
+        for task_id, old, new in [
+            ("literal-alpha-beta", "alpha", "beta"),
+            ("literal-start-finish", "start", "finish"),
+            ("literal-left-right", "left", "right"),
+            ("literal-plan-done", "plan pending", "plan done"),
+            ("literal-state-green", "state: red", "state: green"),
+            ("literal-cache-hot", "cache cold", "cache hot"),
+        ]
+    ],
+    *[
+        _literal_task(task_id, old, new, goal)
+        for task_id, old, new, goal in [
+            (
+                "literal-colon-version",
+                "status: init [v1]",
+                "status: init [v2]",
+                "Update the status bracket code from [v1] to [v2].",
+            ),
+            (
+                "literal-api-version",
+                "load path /api/v1/health",
+                "load path /api/v2/health",
+                "Switch the printed endpoint from v1 to v2 while keeping the same path.",
+            ),
+            (
+                "literal-score-number",
+                "score: 42/100",
+                "score: 99/100",
+                "Change the score text from 42 to 99.",
+            ),
+            (
+                "literal-status-code",
+                "error: [404] failed",
+                "error: [200] resolved",
+                "Edit the status code label from 404 to 200 in brackets.",
+            ),
+            (
+                "literal-progress-percent",
+                "progress: 50% complete",
+                "progress: 75% complete",
+                "Update the progress percentage from 50 to 75.",
+            ),
+            (
+                "literal-time-stamp",
+                "time stamp 12:34",
+                "time stamp 13:00",
+                "Change the time from 12:34 to 13:00 in the output string.",
+            ),
+            (
+                "literal-list-separator",
+                "list [a/b/c]",
+                "list [a-b-c]",
+                "Adjust the list label to use dashes instead of slashes.",
+            ),
+            (
+                "literal-coordinate-label",
+                "coords (x:1,y:2)",
+                "coords (x:3,y:4)",
+                "Update the coordinate label from (x:1,y:2) to (x:3,y:4).",
+            ),
+        ]
+    ],
+    *[
+        _branch_literal_task(task_id, helper, old, new)
+        for task_id, helper, old, new in [
+            ("branch-literal-ready-version", "ready", "ready v1", "ready v2"),
+            ("branch-literal-mode-active", "can_send", "mode: standby", "mode: active"),
+            ("branch-literal-status-ok", "enabled", "status: ok [404]", "status: ok [200]"),
+            ("branch-literal-step-count", "feature_flag", "steps: 1/3 complete", "steps: 2/3 complete"),
+            ("branch-literal-coordinate", "should_emit", "coords (x:1,y:1)", "coords (x:2,y:2)"),
+            ("branch-literal-check-pass", "allow_output", "check: fail", "check: pass"),
+            ("branch-literal-health", "should_log", "health: warn", "health: ok"),
+            ("branch-literal-phase", "gate_open", "phase: draft", "phase: final"),
+        ]
+    ],
+    *[
+        _helper_task(task_id, helper, source_expr, target_expr, expected, output)
+        for task_id, helper, source_expr, target_expr, expected, output in [
+            ("helper-score-42", "score", "40 + 1", "40 + 2", 42, "ready"),
+            ("helper-answer-41", "answer", "20 + 20", "20 + 21", 41, "green light"),
+            ("helper-total-18", "total", "30 - 13", "30 - 12", 18, "total ok"),
+            ("helper-count-9", "count", "12 - 4", "12 - 3", 9, "count passed"),
+            ("helper-value-15", "value", "7 + 7", "7 + 8", 15, "value good"),
+            ("helper-limit-24", "limit", "50 - 27", "50 - 26", 24, "limit open"),
+            ("helper-score-31", "score", "15 + 15", "15 + 16", 31, "score matched"),
+            ("helper-answer-8", "answer", "10 - 3", "10 - 2", 8, "done"),
+        ]
+    ],
+    *[
+        _two_helper_task(task_id, helper, other, source_expr, target_expr, other_expr, expected)
+        for task_id, helper, other, source_expr, target_expr, other_expr, expected in [
+            ("two-helper-score", "score", "spare", "12 + 9", "13 + 9", "30 - 8", 22),
+            ("two-helper-total", "total", "backup", "40 - 19", "41 - 19", "5 + 7", 22),
+            ("two-helper-count", "count", "idle", "18 + 3", "19 + 3", "14 - 2", 22),
+            ("two-helper-value", "value", "other", "55 - 35", "56 - 35", "6 + 2", 21),
+            ("two-helper-answer", "answer", "spare", "27 - 7", "28 - 7", "8 + 1", 21),
+            ("two-helper-level", "level", "helper", "9 + 10", "10 + 10", "40 - 3", 20),
+            ("two-helper-points", "points", "extra", "64 - 46", "65 - 46", "11 + 4", 19),
+            ("two-helper-result", "result", "unused", "33 - 12", "34 - 12", "2 + 2", 22),
+        ]
+    ],
+    *[
+        _call_task(task_id, source_args, target_args, expected)
+        for task_id, source_args, target_args, expected in [
+            ("call-update-five", "1, 1", "4, 1", 5),
+            ("call-update-ten", "7, 1", "7, 3", 10),
+            ("call-update-eleven", "0, 6", "5, 6", 11),
+            ("call-update-twenty", "9, 9", "11, 9", 20),
+            ("call-update-seven", "2, 2", "5, 2", 7),
+            ("call-update-twelve", "10, 0", "10, 2", 12),
+            ("call-update-sixteen", "8, 4", "8, 8", 16),
+            ("call-update-thirteen", "6, 6", "6, 7", 13),
+        ]
+    ],
+    *[
+        _condition_task(task_id, helper, returned, source_compare, "match found")
+        for task_id, helper, returned, source_compare in [
+            ("condition-count-four", "count", 4, 1),
+            ("condition-level-nine", "level", 9, 2),
+            ("condition-token-twelve", "token", 12, 8),
+            ("condition-value-fifteen", "value", 15, 10),
+            ("condition-flag-six", "flag", 6, 3),
+            ("condition-score-eleven", "score", 11, 7),
+            ("condition-count-eight", "count", 8, 4),
+            ("condition-marker-fourteen", "marker", 14, 0),
+        ]
+    ],
+    *[
+        _diagnostic_task(task_id, message)
+        for task_id, message in [
+            ("diagnostic-starting-up", "starting up"),
+            ("diagnostic-hello-main", "hello from main"),
+            ("diagnostic-message", "diagnostic message"),
+            ("diagnostic-payload-logged", "payload logged"),
+            ("diagnostic-attempt-write", "attempt write"),
+            ("diagnostic-retrying-output", "retrying output"),
+            ("diagnostic-done", "done"),
+            ("diagnostic-needs-raises", "needs raises"),
+        ]
+    ],
+]
+SYNTHETIC_TASKS: list[dict[str, Any]] = [*EVAL_TASKS, *TRAIN_TASKS]

zerolang_editing/train_tasks.py ADDED Viewed

	@@ -0,0 +1,255 @@

+"""Synthetic training rows for the Zerolang editing environment."""
+from __future__ import annotations
+from typing import Any
+from .task_builders import (
+    _branch_literal_task,
+    _call_task,
+    _condition_task,
+    _diagnostic_task,
+    _helper_task,
+    _literal_task,
+    _two_helper_task,
+)
+LEGACY_TRAIN_TASKS: list[dict[str, Any]] = [
+    _helper_task(
+        "helper-return-update",
+        "answer",
+        "40 + 1",
+        "40 + 2",
+        42,
+        "math works",
+        split="train",
+    ),
+    _call_task("callee-argument-update", "2, 2", "2, 3", 5, split="train"),
+    _condition_task("comparison-target-update", "score", 7, 8, "ready", split="train"),
+    _diagnostic_task("fallible-main-repair", "needs raises", split="train"),
+]
+def _literal_train_tasks() -> list[dict[str, Any]]:
+    pairs = [
+        ("queue pending", "queue ready"),
+        ("job queued", "job running"),
+        ("job running", "job complete"),
+        ("build red", "build green"),
+        ("node cold", "node warm"),
+        ("cache miss", "cache hit"),
+        ("retry later", "retry now"),
+        ("draft note", "final note"),
+        ("plan open", "plan closed"),
+        ("graph stale", "graph fresh"),
+        ("route /v1/run", "route /v2/run"),
+        ("status [100]", "status [200]"),
+        ("phase: alpha", "phase: beta"),
+        ("phase: beta", "phase: gamma"),
+        ("step 1/4", "step 2/4"),
+        ("step 2/4", "step 3/4"),
+        ("score 10/20", "score 18/20"),
+        ("level: low", "level: high"),
+        ("mode manual", "mode auto"),
+        ("window closed", "window open"),
+        ("target west", "target east"),
+        ("port 3000", "port 8080"),
+        ("run id a1", "run id b2"),
+        ("batch small", "batch large"),
+        ("token old", "token new"),
+        ("edge loose", "edge locked"),
+        ("module local", "module remote"),
+        ("worker idle", "worker busy"),
+        ("agent paused", "agent active"),
+        ("output empty", "output full"),
+        ("index 0", "index 1"),
+        ("flag off", "flag on"),
+        ("signal weak", "signal strong"),
+        ("health warn", "health pass"),
+        ("check skipped", "check passed"),
+        ("ticket open", "ticket merged"),
+        ("snapshot old", "snapshot new"),
+        ("profile dev", "profile prod"),
+        ("version 0.1", "version 0.2"),
+        ("result unknown", "result known"),
+    ]
+    return [
+        _literal_task(f"train-literal-{index:03d}", old, new, split="train")
+        for index, (old, new) in enumerate(pairs, start=1)
+    ]
+def _branch_literal_train_tasks() -> list[dict[str, Any]]:
+    specs = [
+        ("ready_gate", "gate draft", "gate ready"),
+        ("emit_gate", "emit old", "emit new"),
+        ("mode_gate", "mode test", "mode live"),
+        ("route_gate", "route blue", "route green"),
+        ("status_gate", "status low", "status high"),
+        ("phase_gate", "phase one", "phase two"),
+        ("counter_gate", "count fail", "count pass"),
+        ("worker_gate", "worker wait", "worker run"),
+        ("deploy_gate", "deploy hold", "deploy ship"),
+        ("review_gate", "review open", "review done"),
+        ("graph_gate", "graph dirty", "graph clean"),
+        ("patch_gate", "patch text", "patch graph"),
+        ("score_gate", "score bad", "score good"),
+        ("plan_gate", "plan rough", "plan exact"),
+        ("test_gate", "test flaky", "test stable"),
+        ("queue_gate", "queue blocked", "queue clear"),
+        ("cache_gate", "cache cold", "cache hot"),
+        ("trace_gate", "trace off", "trace on"),
+        ("run_gate", "run dry", "run real"),
+        ("sync_gate", "sync stale", "sync current"),
+    ]
+    return [
+        _branch_literal_task(f"train-branch-literal-{index:03d}", helper, old, new, split="train")
+        for index, (helper, old, new) in enumerate(specs, start=1)
+    ]
+def _helper_train_tasks() -> list[dict[str, Any]]:
+    helpers = ["answer", "score", "total", "count", "value", "limit", "level", "points"]
+    outputs = ["ok", "ready", "matched", "accepted", "passed", "open", "done", "green"]
+    tasks: list[dict[str, Any]] = []
+    for index in range(1, 26):
+        left = 10 + index
+        target_right = 3 + (index % 9)
+        source_right = target_right - 1
+        expected = left + target_right
+        tasks.append(
+            _helper_task(
+                f"train-helper-add-{index:03d}",
+                helpers[index % len(helpers)],
+                f"{left} + {source_right}",
+                f"{left} + {target_right}",
+                expected,
+                outputs[index % len(outputs)],
+                split="train",
+            )
+        )
+    for index in range(1, 26):
+        left = 60 + index
+        target_right = 5 + (index % 11)
+        source_right = target_right + 1
+        expected = left - target_right
+        tasks.append(
+            _helper_task(
+                f"train-helper-sub-{index:03d}",
+                helpers[(index + 3) % len(helpers)],
+                f"{left} - {source_right}",
+                f"{left} - {target_right}",
+                expected,
+                outputs[(index + 2) % len(outputs)],
+                split="train",
+            )
+        )
+    return tasks
+def _two_helper_train_tasks() -> list[dict[str, Any]]:
+    primary_helpers = ["score", "total", "count", "value", "answer", "level", "points", "result"]
+    other_helpers = ["spare", "backup", "idle", "other", "side", "helper", "extra", "unused"]
+    tasks: list[dict[str, Any]] = []
+    for index in range(1, 21):
+        left = 20 + index
+        target_right = 2 + (index % 7)
+        source_right = target_right - 1
+        expected = left + target_right
+        other_expr = f"{4 + index % 6} + {8 + index % 5}"
+        tasks.append(
+            _two_helper_task(
+                f"train-two-helper-{index:03d}",
+                primary_helpers[index % len(primary_helpers)],
+                other_helpers[index % len(other_helpers)],
+                f"{left} + {source_right}",
+                f"{left} + {target_right}",
+                other_expr,
+                expected,
+                split="train",
+            )
+        )
+    return tasks
+def _call_train_tasks() -> list[dict[str, Any]]:
+    tasks: list[dict[str, Any]] = []
+    for index in range(1, 31):
+        left = 1 + (index % 17)
+        target_right = 2 + (index % 13)
+        source_right = target_right - 1
+        expected = left + target_right
+        tasks.append(
+            _call_task(
+                f"train-call-update-{index:03d}",
+                f"{left}, {source_right}",
+                f"{left}, {target_right}",
+                expected,
+                split="train",
+            )
+        )
+    return tasks
+def _condition_train_tasks() -> list[dict[str, Any]]:
+    helpers = ["score", "count", "level", "token", "value", "flag", "marker", "limit"]
+    tasks: list[dict[str, Any]] = []
+    for index in range(1, 26):
+        returned = 5 + (index * 3)
+        source_compare = returned + 1
+        tasks.append(
+            _condition_task(
+                f"train-condition-update-{index:03d}",
+                helpers[index % len(helpers)],
+                returned,
+                source_compare,
+                "matched",
+                split="train",
+            )
+        )
+    return tasks
+def _diagnostic_train_tasks() -> list[dict[str, Any]]:
+    messages = [
+        "train starting",
+        "train ready",
+        "diagnostic pass",
+        "writer needs raises",
+        "output accepted",
+        "payload saved",
+        "attempt complete",
+        "retry complete",
+        "batch emitted",
+        "sample logged",
+        "graph checked",
+        "patch validated",
+        "route verified",
+        "state stored",
+        "run complete",
+        "score written",
+        "marker emitted",
+        "world write",
+        "tool output",
+        "final line",
+    ]
+    return [
+        _diagnostic_task(f"train-diagnostic-{index:03d}", message, split="train")
+        for index, message in enumerate(messages, start=1)
+    ]
+TRAIN_TASKS: list[dict[str, Any]] = [
+    *_literal_train_tasks(),
+    *_branch_literal_train_tasks(),
+    *LEGACY_TRAIN_TASKS,
+    *_helper_train_tasks(),
+    *_two_helper_train_tasks(),
+    *_call_train_tasks(),
+    *_condition_train_tasks(),
+    *_diagnostic_train_tasks(),
+]

zerolang_editing/zero_tools.py ADDED Viewed

	@@ -0,0 +1,310 @@

+"""Path-based Zerolang compiler tools for the editing environment."""
+from __future__ import annotations
+import hashlib
+import json
+import os
+import platform
+import re
+import shutil
+import subprocess
+import tempfile
+import threading
+import urllib.request
+from pathlib import Path
+from typing import Any
+_ZERO_INSTALL_LOCK = threading.Lock()
+def _download(url: str, timeout: int = 60) -> bytes:
+    with urllib.request.urlopen(url, timeout=timeout) as response:
+        return response.read()
+def _zero_asset_candidates() -> list[str]:
+    system = platform.system()
+    machine = platform.machine().lower()
+    if machine in {"arm64", "aarch64"}:
+        cpu = "arm64"
+    elif machine in {"x86_64", "amd64"}:
+        cpu = "x64"
+    else:
+        return []
+    if system == "Darwin":
+        return [f"zero-darwin-{cpu}"]
+    if system == "Linux":
+        return [f"zero-linux-musl-{cpu}", f"zero-linux-{cpu}"]
+    return []
+def _install_zero_binary() -> str | None:
+    install_dir = Path(
+        os.environ.get("ZERO_INSTALL_DIR")
+        or Path(tempfile.gettempdir()) / "zerolang-editing-zero" / "bin"
+    ).expanduser()
+    binary = install_dir / "zero"
+    if binary.exists():
+        return str(binary)
+    with _ZERO_INSTALL_LOCK:
+        if binary.exists():
+            return str(binary)
+        base_url = os.environ.get(
+            "ZERO_DOWNLOAD_BASE_URL",
+            "https://github.com/vercel-labs/zero/releases/latest/download",
+        ).rstrip("/")
+        checksums_text = _download(f"{base_url}/CHECKSUMS.txt").decode()
+        checksums = {}
+        for line in checksums_text.splitlines():
+            parts = line.split()
+            if len(parts) >= 2:
+                checksums[parts[1]] = parts[0]
+        install_dir.mkdir(parents=True, exist_ok=True)
+        last_error: Exception | None = None
+        for asset in _zero_asset_candidates():
+            try:
+                data = _download(f"{base_url}/{asset}")
+                expected = checksums.get(asset)
+                actual = hashlib.sha256(data).hexdigest()
+                if expected and actual != expected:
+                    raise RuntimeError(f"checksum mismatch for {asset}")
+                binary.write_bytes(data)
+                os.chmod(binary, 0o755)
+                check = subprocess.run(
+                    [str(binary), "--version"],
+                    text=True,
+                    capture_output=True,
+                    timeout=10,
+                )
+                if check.returncode == 0:
+                    return str(binary)
+            except Exception as exc:
+                last_error = exc
+                if binary.exists():
+                    binary.unlink()
+                continue
+        if last_error is not None:
+            raise RuntimeError(f"failed to install zero binary: {last_error}") from last_error
+    return None
+def _zero_binary(zero_path: str | None = None) -> str | None:
+    candidates = [
+        zero_path,
+        shutil.which("zero"),
+        str(Path.home() / ".zero" / "bin" / "zero"),
+    ]
+    for candidate in candidates:
+        if candidate and Path(candidate).exists():
+            return candidate
+    return _install_zero_binary()
+def _json_tool_result(result: dict[str, Any]) -> str:
+    return json.dumps(result, indent=2, sort_keys=True)
+def read_source(path: str | Path) -> str:
+    return Path(path).read_text()
+def _source_fingerprint(path: Path) -> dict[str, Any]:
+    if not path.exists():
+        return {"path": str(path), "exists": False}
+    data = path.read_bytes()
+    return {
+        "path": str(path),
+        "exists": True,
+        "bytes": len(data),
+        "source_sha256": hashlib.sha256(data).hexdigest(),
+    }
+def _summarize_graph_dump(graph_dump: str) -> dict[str, Any]:
+    summary: dict[str, Any] = {
+        "hash": None,
+        "literals": [],
+        "functions": [],
+        "calls": [],
+        "identifiers": [],
+    }
+    for line in graph_dump.splitlines():
+        if line.startswith("hash "):
+            summary["hash"] = line.split('"', 2)[1]
+        elif " Literal " in line:
+            match = re.match(r'node (#[0-9a-f]+) Literal type:"([^"]+)" value:"(.*)"', line)
+            if match:
+                summary["literals"].append(
+                    {"node": match.group(1), "type": match.group(2), "value": match.group(3)}
+                )
+        elif " Function " in line:
+            match = re.match(r'node (#[0-9a-f]+) Function name:"([^"]+)" type:"([^"]+)"', line)
+            if match:
+                summary["functions"].append(
+                    {"node": match.group(1), "name": match.group(2), "type": match.group(3)}
+                )
+        elif " MethodCall " in line:
+            match = re.match(r'node (#[0-9a-f]+) MethodCall name:"([^"]+)" type:"([^"]+)"', line)
+            if match:
+                summary["calls"].append(
+                    {"node": match.group(1), "name": match.group(2), "type": match.group(3)}
+                )
+        elif " Identifier " in line:
+            match = re.match(r'node (#[0-9a-f]+) Identifier name:"([^"]+)"', line)
+            if match:
+                summary["identifiers"].append({"node": match.group(1), "name": match.group(2)})
+    return summary
+def run_zero_path(args: list[str], path: str | Path, zero_path: str | None = None) -> dict[str, Any]:
+    binary = _zero_binary(zero_path)
+    source_path = Path(path)
+    if binary is None:
+        return {
+            "ok": False,
+            "tool_error": "zero binary not found; install with https://zerolang.ai/install.sh",
+            **_source_fingerprint(source_path),
+        }
+    if not source_path.exists():
+        return {
+            "ok": False,
+            "tool_error": f"source file does not exist: {source_path}",
+            **_source_fingerprint(source_path),
+        }
+    proc = subprocess.run(
+        [binary, *args, str(source_path)],
+        text=True,
+        capture_output=True,
+        timeout=10,
+        env={**os.environ, "PATH": f"{Path(binary).parent}:{os.environ.get('PATH', '')}"},
+    )
+    return {
+        "ok": proc.returncode == 0,
+        "returncode": proc.returncode,
+        "stdout": proc.stdout[-12000:],
+        "stderr": proc.stderr[-4000:],
+        **_source_fingerprint(source_path),
+    }
+def run_zero_source(
+    args: list[str], source: str, zero_path: str | None = None
+) -> dict[str, Any]:
+    with tempfile.TemporaryDirectory(prefix="zerolang-editing-score-") as tmp:
+        source_path = Path(tmp) / "program.0"
+        source_path.write_text(source)
+        return run_zero_path(args, source_path, zero_path)
+def make_zero_tools(zero_path: str | None = None) -> list[Any]:
+    def zero_check(path: str) -> str:
+        """Run `zero check --json` on a `.0` file path on disk."""
+        return _json_tool_result(run_zero_path(["check", "--json"], path, zero_path))
+    def zero_graph_summary(path: str) -> str:
+        """Return compact graph hash and patchable node facts for a `.0` file path."""
+        result = run_zero_path(["graph", "dump"], path, zero_path)
+        if result.get("ok"):
+            result["summary"] = _summarize_graph_dump(result.get("stdout", ""))
+        return _json_tool_result(result)
+    def zero_graph_dump(path: str) -> str:
+        """Run `zero graph dump` on a `.0` file path on disk."""
+        return _json_tool_result(run_zero_path(["graph", "dump"], path, zero_path))
+    def zero_graph_json(path: str) -> str:
+        """Run `zero graph --json` on a `.0` file path on disk."""
+        return _json_tool_result(run_zero_path(["graph", "--json"], path, zero_path))
+    def zero_fix_plan(path: str) -> str:
+        """Run `zero fix --plan --json` on a `.0` file path on disk."""
+        return _json_tool_result(run_zero_path(["fix", "--plan", "--json"], path, zero_path))
+    def zero_graph_patch(path: str, expect_graph_hash: str, op: str) -> str:
+        """Apply one checked `zero graph patch` operation to a `.0` file path on disk."""
+        binary = _zero_binary(zero_path)
+        source_path = Path(path)
+        if binary is None:
+            return _json_tool_result(
+                {
+                    "ok": False,
+                    "tool_error": "zero binary not found; install with https://zerolang.ai/install.sh",
+                    **_source_fingerprint(source_path),
+                }
+            )
+        if not source_path.exists():
+            return _json_tool_result(
+                {
+                    "ok": False,
+                    "tool_error": f"source file does not exist: {source_path}",
+                    **_source_fingerprint(source_path),
+                }
+            )
+        proc = subprocess.run(
+            [
+                binary,
+                "graph",
+                "patch",
+                str(source_path),
+                "--expect-graph-hash",
+                expect_graph_hash,
+                "--op",
+                op,
+            ],
+            text=True,
+            capture_output=True,
+            timeout=10,
+            env={**os.environ, "PATH": f"{Path(binary).parent}:{os.environ.get('PATH', '')}"},
+        )
+        return _json_tool_result(
+            {
+                "ok": proc.returncode == 0,
+                "returncode": proc.returncode,
+                "stdout": proc.stdout[-12000:],
+                "stderr": proc.stderr[-4000:],
+                **_source_fingerprint(source_path),
+            }
+        )
+    def zero_skills_get(skill: str) -> str:
+        """Return version-matched Zerolang guidance for `language`, `diagnostics`, `stdlib`, or `zero`."""
+        binary = _zero_binary(zero_path)
+        if binary is None:
+            return _json_tool_result(
+                {
+                    "ok": False,
+                    "tool_error": "zero binary not found; install with https://zerolang.ai/install.sh",
+                }
+            )
+        proc = subprocess.run(
+            [binary, "skills", "get", skill],
+            text=True,
+            capture_output=True,
+            timeout=10,
+            env={**os.environ, "PATH": f"{Path(binary).parent}:{os.environ.get('PATH', '')}"},
+        )
+        return _json_tool_result(
+            {
+                "ok": proc.returncode == 0,
+                "returncode": proc.returncode,
+                "stdout": proc.stdout[-12000:],
+                "stderr": proc.stderr[-4000:],
+            }
+        )
+    return [
+        zero_check,
+        zero_graph_summary,
+        zero_graph_dump,
+        zero_graph_json,
+        zero_fix_plan,
+        zero_graph_patch,
+        zero_skills_get,
+    ]

zerolang_editing/zerolang_editing.py ADDED Viewed

	@@ -0,0 +1,418 @@

+"""Prime Verifiers environment for Zerolang graph-first editing."""
+from __future__ import annotations
+import json
+import os
+import re
+import tempfile
+from collections.abc import Mapping
+from pathlib import Path
+from typing import Any
+from datasets import Dataset
+import verifiers as vf
+from .tasks import SYNTHETIC_TASKS
+from .zero_tools import make_zero_tools, read_source, run_zero_path, run_zero_source
+SYSTEM_PROMPT = """\
+You are Roder, a coding agent running in an evaluation harness.
+Complete the requested code edit, use the available tools when they are useful,
+and return a concise final answer. The task source is already written to disk;
+operate on the provided `.0` file path.
+"""
+ZERO_FILE_PLACEHOLDER = "{{ZERO_FILE_PATH}}"
+def _normalize_source(source: str) -> str:
+    return "\n".join(line.rstrip() for line in source.strip().splitlines()).strip()
+def _message_role(message: Any) -> str | None:
+    if isinstance(message, dict):
+        return message.get("role")
+    return getattr(message, "role", None)
+def _message_content(message: Any) -> str:
+    if isinstance(message, dict):
+        content = message.get("content", "")
+    else:
+        content = getattr(message, "content", "")
+    return content if isinstance(content, str) else str(content)
+def _completion_text(completion: Any) -> str:
+    if isinstance(completion, str):
+        return completion
+    for message in reversed(completion or []):
+        if _message_role(message) == "assistant":
+            content = _message_content(message)
+            if content:
+                return content
+    return _message_content((completion or [{}])[-1]) if completion else ""
+def _extract_json_payload(completion: Any) -> dict[str, Any] | None:
+    text = _completion_text(completion).strip()
+    fenced_json = re.search(r"```json\s*(.*?)```", text, re.DOTALL | re.IGNORECASE)
+    if fenced_json:
+        text = fenced_json.group(1).strip()
+    for candidate in (text, None):
+        if candidate is None:
+            object_match = re.search(r"\{.*\}", text, re.DOTALL)
+            if object_match is None:
+                continue
+            candidate = object_match.group(0)
+        try:
+            payload = json.loads(candidate)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict):
+            return payload
+    return None
+def _extract_final_source(completion: Any) -> str:
+    payload = _extract_json_payload(completion)
+    if payload is not None and isinstance(payload.get("final_source"), str):
+        return payload["final_source"]
+    text = _completion_text(completion).strip()
+    fenced_zero = re.search(r"```(?:zero|0)?\s*(.*?)```", text, re.DOTALL | re.IGNORECASE)
+    if fenced_zero:
+        return fenced_zero.group(1).strip()
+    return text
+def _state_file_path(state: Any) -> str | None:
+    if isinstance(state, dict):
+        path = state.get("zero_file_path")
+        return path if isinstance(path, str) else None
+    return None
+def _state_file_source(state: Any) -> str:
+    path = _state_file_path(state)
+    if not path:
+        return ""
+    try:
+        return read_source(path)
+    except OSError:
+        return ""
+def _scored_source(completion: Any, state: Any = None) -> str:
+    disk_source = _state_file_source(state)
+    if disk_source:
+        return disk_source
+    return _extract_final_source(completion)
+def _tool_was_called(state: Any) -> bool:
+    for turn in (state or {}).get("trajectory", []):
+        for message in turn.get("completion", []):
+            tool_calls = getattr(message, "tool_calls", None)
+            if tool_calls:
+                return True
+            if isinstance(message, dict) and message.get("tool_calls"):
+                return True
+    return False
+def _make_prompt(row: dict[str, Any]) -> list[dict[str, str]]:
+    return [
+        {
+            "role": "user",
+            "content": (
+                f"Task id: {row['id']}\n"
+                f"Edit goal: {row['goal']}\n\n"
+                "The Zerolang source has been written to this file:\n"
+                f"{ZERO_FILE_PLACEHOLDER}\n\n"
+                "Use tool arguments with `path` set to that `.0` file. "
+                "The grader will read the edited file from disk and run `zero check` on it. "
+                "Return a JSON object with `path` when finished."
+            ),
+        }
+    ]
+def _build_dataset(split: str, max_examples: int | None) -> Dataset:
+    rows: list[dict[str, Any]] = []
+    for task in SYNTHETIC_TASKS:
+        if split != "all" and task["split"] != split:
+            continue
+        rows.append(
+            {
+                "prompt": _make_prompt(task),
+                "answer": task["target_source"],
+                "info": json.dumps(
+                    {
+                        "id": task["id"],
+                        "category": task["category"],
+                        "split": task["split"],
+                        "goal": task["goal"],
+                        "source": task["source"],
+                        "target_source": task["target_source"],
+                    }
+                ),
+            }
+        )
+    if max_examples is not None:
+        rows = rows[: int(max_examples)]
+    return Dataset.from_list(rows)
+def _workspace_root() -> Path:
+    configured = os.environ.get("ZEROLANG_EDITING_WORKDIR")
+    if configured:
+        return Path(configured).expanduser()
+    return Path(tempfile.gettempdir()) / "zerolang-editing-rollouts"
+def _safe_task_id(task_id: str) -> str:
+    return re.sub(r"[^A-Za-z0-9_.-]+", "-", task_id).strip("-") or "task"
+def _replace_prompt_path(messages: Any, path: str) -> None:
+    for message in messages or []:
+        if isinstance(message, dict):
+            content = message.get("content")
+            if isinstance(content, str):
+                message["content"] = content.replace(ZERO_FILE_PLACEHOLDER, path)
+            continue
+        content = getattr(message, "content", None)
+        if isinstance(content, str):
+            setattr(message, "content", content.replace(ZERO_FILE_PLACEHOLDER, path))
+def _is_relative_to(child: Path, parent: Path) -> bool:
+    try:
+        child.relative_to(parent)
+        return True
+    except ValueError:
+        return False
+class ZerolangPathToolEnv(vf.StatefulToolEnv):
+    """Tool environment that creates one on-disk `.0` file per rollout."""
+    def __init__(self, *args: Any, workspace_root: Path | None = None, **kwargs: Any):
+        super().__init__(*args, **kwargs)
+        self.workspace_root = workspace_root or _workspace_root()
+    async def setup_state(self, state: vf.State) -> None:
+        info = state.get("info") or {}
+        source = info.get("source") if isinstance(info, dict) else None
+        if not isinstance(source, str) or not source.strip():
+            raise ValueError("zerolang-editing rows must include info.source")
+        task_id = info.get("id", "task") if isinstance(info, dict) else "task"
+        workspace = self.workspace_root / f"{_safe_task_id(str(task_id))}-{state['trajectory_id']}"
+        workspace.mkdir(parents=True, exist_ok=True)
+        file_path = workspace / "program.0"
+        file_path.write_text(source)
+        state["zero_workspace"] = str(workspace.resolve())
+        state["zero_file_path"] = str(file_path.resolve())
+        _replace_prompt_path(state.get("prompt"), state["zero_file_path"])
+    def update_tool_args(
+        self,
+        tool_name: str,
+        tool_args: dict,
+        messages: vf.Messages,
+        state: vf.State,
+        **kwargs: Any,
+    ) -> dict:
+        if "source" in tool_args:
+            raise ValueError("Zerolang tools operate on `path`; do not pass source text.")
+        if tool_name == "zero_skills_get":
+            return tool_args
+        workspace = Path(str(state["zero_workspace"])).resolve()
+        fallback = Path(str(state["zero_file_path"])).resolve()
+        raw_value = tool_args.get("path")
+        correction_reason: str | None = None
+        if raw_value in {None, ""}:
+            resolved = fallback
+            correction_reason = "missing_path"
+        else:
+            raw_path = Path(str(raw_value)).expanduser()
+            resolved = (
+                (workspace / raw_path).resolve()
+                if not raw_path.is_absolute()
+                else raw_path.resolve()
+            )
+            if not _is_relative_to(resolved, workspace):
+                resolved = fallback
+                correction_reason = "outside_workspace"
+            elif resolved.suffix != ".0":
+                resolved = fallback
+                correction_reason = "non_zero_path"
+        if correction_reason is not None:
+            state.setdefault("zero_path_arg_corrections", []).append(
+                {
+                    "tool_name": tool_name,
+                    "reason": correction_reason,
+                    "raw_path": "" if raw_value is None else str(raw_value),
+                }
+            )
+        tool_args["path"] = str(resolved)
+        return tool_args
+async def target_source_match(completion: Any, answer: str, state: Any = None, **_: Any) -> float:
+    scored_source = _scored_source(completion, state)
+    return 1.0 if _normalize_source(scored_source) == _normalize_source(answer) else 0.0
+async def zero_check_pass(completion: Any, state: Any = None, **_: Any) -> float:
+    path = _state_file_path(state)
+    if path and Path(path).exists():
+        result = run_zero_path(["check", "--json"], path)
+    else:
+        final_source = _extract_final_source(completion)
+        if not final_source.strip():
+            return 0.0
+        result = run_zero_source(["check", "--json"], final_source)
+    if not result.get("ok"):
+        return 0.0
+    try:
+        parsed = json.loads(result.get("stdout") or "{}")
+    except json.JSONDecodeError:
+        return 0.0
+    return 1.0 if parsed.get("ok") is True else 0.0
+def _walk_graph_patch_payloads(value: Any, seen: set[int] | None = None):
+    if seen is None:
+        seen = set()
+    value_id = id(value)
+    if value_id in seen:
+        return
+    seen.add(value_id)
+    if isinstance(value, Mapping):
+        stdout = value.get("stdout")
+        if isinstance(stdout, str) and "program graph patch ok" in stdout:
+            yield value
+        for item in value.values():
+            yield from _walk_graph_patch_payloads(item, seen)
+        return
+    if isinstance(value, (list, tuple)):
+        for item in value:
+            yield from _walk_graph_patch_payloads(item, seen)
+        return
+    if isinstance(value, str) and "program graph patch ok" in value:
+        try:
+            parsed = json.loads(value)
+        except json.JSONDecodeError:
+            return
+        yield from _walk_graph_patch_payloads(parsed, seen)
+        return
+    model_dump = getattr(value, "model_dump", None)
+    if callable(model_dump):
+        try:
+            yield from _walk_graph_patch_payloads(model_dump(), seen)
+        except Exception:
+            pass
+        return
+    attrs = getattr(value, "__dict__", None)
+    if isinstance(attrs, dict):
+        yield from _walk_graph_patch_payloads(attrs, seen)
+async def graph_patch_success(
+    completion: Any = None, state: Any = None, answer: str = "", **_: Any
+) -> float:
+    search_root = (state or {}).get("trajectory", state or {}) if isinstance(state, Mapping) else state
+    for payload in _walk_graph_patch_payloads(search_root or {}):
+        path = payload.get("path")
+        if isinstance(path, str):
+            try:
+                patched_source = read_source(path)
+            except OSError:
+                continue
+            if _normalize_source(patched_source) == _normalize_source(answer):
+                return 1.0
+    text = _completion_text(completion)
+    if _tool_was_called(state) and "graph_patch" in text:
+        scored_source = _scored_source(completion, state)
+        if _normalize_source(scored_source) == _normalize_source(answer):
+            return 1.0
+    return 0.0
+async def zerolang_surface_used(completion: Any, state: Any = None, **_: Any) -> float:
+    if _tool_was_called(state):
+        text = _completion_text(completion).lower()
+        if "zero_graph_patch" in text or "graph_patch" in text:
+            return 1.0
+    text = _completion_text(completion).lower()
+    markers = [
+        "zero_graph_patch",
+        "graph_patch",
+        "expect_graph_hash",
+        "set node=",
+        "field=\"value\"",
+        "expect=",
+        "graph hash",
+        "node #",
+    ]
+    return 1.0 if any(marker in text for marker in markers) else 0.0
+async def path_argument_valid(completion: Any, state: Any = None, **_: Any) -> float:
+    if not _tool_was_called(state):
+        return 0.0
+    corrections = (state or {}).get("zero_path_arg_corrections", [])
+    return 0.0 if corrections else 1.0
+def load_environment(
+    split: str = "eval",
+    max_examples: int | None = None,
+    max_turns: int = 6,
+    zero_path: str | None = None,
+    enable_tools: bool = True,
+    **_: Any,
+) -> vf.Environment:
+    """Load the Zerolang editing environment."""
+    if split not in {"train", "eval", "all"}:
+        raise ValueError("split must be one of: train, eval, all")
+    dataset = _build_dataset(split=split, max_examples=max_examples)
+    rubric = vf.Rubric(
+        funcs=[
+            graph_patch_success,
+            target_source_match,
+            zero_check_pass,
+            zerolang_surface_used,
+            path_argument_valid,
+        ],
+        weights=[0.50, 0.20, 0.15, 0.10, 0.05],
+    )
+    if enable_tools:
+        return ZerolangPathToolEnv(
+            dataset=dataset,
+            rubric=rubric,
+            tools=make_zero_tools(zero_path),
+            max_turns=max_turns,
+            system_prompt=SYSTEM_PROMPT,
+        )
+    return vf.SingleTurnEnv(dataset=dataset, rubric=rubric, system_prompt=SYSTEM_PROMPT)