Text Generation
PEFT
Safetensors
zerolang
reinforcement-learning
verifiers
code-editing
tool-use
graph-editing
laguna-xs2
lora
fine-tune
Instructions to use poolside-laguna-hackathon/zerolang-editing with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use poolside-laguna-hackathon/zerolang-editing with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| tags: | |
| - zerolang | |
| - reinforcement-learning | |
| - verifiers | |
| - code-editing | |
| - tool-use | |
| - graph-editing | |
| - laguna-xs2 | |
| - lora | |
| - fine-tune | |
| - peft | |
| license: apache-2.0 | |
| base_model: poolside/Laguna-XS.2 | |
| base_model_relation: finetune | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| # Zerolang Editing | |
| `zerolang-editing` is a Verifiers/Prime RL environment for training coding agents | |
| to edit [Zerolang](https://github.com/vercel-labs/zerolang) programs through | |
| checked graph edits instead of loose text replacement. | |
| The RL harness is built around [Roder](https://roder.sh), using a custom | |
| zero-coder plugin/distribution that exposes a Zerolang-only graph toolset to the | |
| model. In training, generic source editing tools are disabled; the agent is | |
| expected to use the `zero_*` tools below, especially `zero_graph_summary` and | |
| `zero_graph_patch`, against the rollout file on disk. | |
| The core task is intentionally narrow: each rollout starts with a `.0` source | |
| file already written to disk, asks the model for a semantic code edit, and | |
| scores the edited file after the model uses Zerolang tooling. The intended | |
| successful behavior is: | |
| 1. Inspect the file with Zerolang graph/check tools. | |
| 2. Identify the relevant graph hash and semantic node. | |
| 3. Apply a checked `zero graph patch` operation to the on-disk file. | |
| 4. Finish with a compact JSON response pointing at the edited path. | |
| This repository contains the environment source package, synthetic task | |
| builders, tool wrappers, and documentation. The trained checkpoint from hosted | |
| RL runs is published separately by the training service when a run is finalized. | |
| <!-- zerolang-training-report:start --> | |
| ## Training Report: Laguna XS.2 Zerolang Editing LoRA | |
| This report covers the hosted Prime RL runs used to select the published LoRA adapter at `loras/laguna-xs2-zerolang-editing-step75-hdznmf/`. | |
| **Selected checkpoint:** run `hdznmfje3xv0clwhu9sx4b0n`, checkpoint `qyxg7ya6x53ntmfjerp11gah`, step 75. It reached the best held-out eval score, Avg@1 `0.6604`, on `pandelis/zerolang-editing@0.1.11` with `poolside/Laguna-XS.2`. | |
| ### Run Summary | |
| | Run | Env | LR | Max tokens | Eval examples | Peak Avg@1 | Final Avg@1 | Finding | | |
| | --- | --- | ---: | ---: | ---: | ---: | ---: | --- | | |
| | `r0pbp4` | `0.1.9` | `1e-04` | 2048 | 16 | 0.5844 @ step 20 | 0.1500 | Early lift, then collapsed to no-tool behavior. | | |
| | `impksa` | `0.1.10` | `1e-04` | 2048 | 16 | 0.5094 @ step 20 | 0.1500 | File-path task format worked, but LR/run length still collapsed. | | |
| | `hdznmf` | `0.1.11` | `2e-05` | 4096 | 24 | 0.6604 @ step 75 | 0.5542 | Lower LR and stricter 0.1.11 rewards produced the selected LoRA. | | |
| ### Curves | |
|  | |
|  | |
|  | |
|  | |
| ### Findings | |
| - The two `1e-4` overnight runs showed early learning, then collapsed. `r0pbp4` peaked at `0.5844` on step 20 and finished at `0.1500`; `impksa` peaked at `0.5094` on step 20 and also finished at `0.1500`. | |
| - The collapse correlated with no-tool behavior: both 200-step runs ended with `stop_condition/all/no_tools_called = 1.0`, `zero_graph_patch_calls = 0.0`, and `graph_patch_success = 0.0`. | |
| - The selected `0.1.11` run changed the training shape: lower learning rate `2e-5`, shorter 80-step run, larger decode budget `4096`, stricter rewards, and evals every 5 steps. It held up much better, peaking at step 75 and finishing at `0.5542` instead of collapsing. | |
| - The best run still has room to improve: at the last training metric step `79`, no-tool stop rate was `0.6719`, average graph patch calls were `1.4531`, and graph patch success was only `0.0312`. Supporting signals were stronger: path argument validity `0.8750`, target source match `0.7500`, and zero-check pass `0.8281`. | |
| - Prime hosted metrics for these runs do not expose a training-loss series, so the plots use held-out eval score, reward, filtering, stop conditions, and tool telemetry instead of loss. | |
| ### Recommendation | |
| Use the step-75 LoRA as the current best artifact. For the next full run, keep the `0.1.11` reward direction and lower LR, but add stronger pressure against no-tool endings and make successful checked `zero_graph_patch` application a larger part of the reward so high eval scores come from the intended graph-edit behavior rather than partial-credit checking and summarization. | |
| <!-- zerolang-training-report:end --> | |
| <!-- zerolang-trajectories:start --> | |
| ## Trajectory Samples | |
| Selected Prime RL rollout trajectories from the best run are published under [`trajectories/hdznmfje3xv0clwhu9sx4b0n/`](trajectories/hdznmfje3xv0clwhu9sx4b0n/). The bundle includes normalized JSONL rows and the exact raw `prime train rollouts` pages for retained steps 0, 20, and 70. | |
| <!-- zerolang-trajectories:end --> | |
| ## Why This Exists | |
| Most code-editing agents learn to patch source through line-oriented text | |
| operations. Zerolang exposes a graph-level editing surface where a patch is | |
| guarded by the expected graph hash and the expected field value. That makes | |
| edits auditable and harder to apply to stale or mismatched code. | |
| This environment is designed to train that behavior directly. It rewards | |
| successful checked graph patches, while still checking that the resulting file | |
| compiles and matches the hidden target source. | |
| ## Environment Summary | |
| - **Package name:** `zerolang-editing` | |
| - **Prime environment ID:** `pandelis/zerolang-editing` | |
| - **Version in this repo:** `0.1.8` | |
| - **Task type:** multi-turn tool-use code editing | |
| - **Agent harness:** Roder with a custom Zero graph-only plugin/tool allowlist | |
| - **Language under edit:** Zerolang `.0` | |
| - **Train split:** 209 deterministic synthetic tasks | |
| - **Eval split:** 67 held-out deterministic synthetic tasks | |
| - **Primary reward target:** successful `zero_graph_patch` on the rollout file | |
| ## Roder Harness | |
| The intended RL setup runs the model inside Roder rather than a generic chat | |
| loop. Roder provides the coding-agent harness, while a custom zero-coder plugin | |
| configures the available tool surface for this environment. | |
| That plugin is deliberately restrictive: | |
| - It exposes only Zerolang graph/check/fix/skills tools. | |
| - It removes generic text edit tools from the training harness. | |
| - It routes tool calls to on-disk `.0` files using `path` arguments. | |
| - It keeps checked graph edits as the primary affordance for code changes. | |
| This matters because the behavior we want to train is not "rewrite this source | |
| string". The target behavior is "inspect the Zerolang graph and apply a checked | |
| semantic graph patch to the file Roder is managing". The Verifiers environment | |
| then grades the resulting file from disk. | |
| ## Rollout Contract | |
| Each task row includes an initial Zerolang source program and a hidden target | |
| program. At rollout setup time, the environment writes the initial source to: | |
| ```text | |
| <temporary rollout workspace>/program.0 | |
| ``` | |
| The model receives that path in the user prompt. Tools must operate on `path` | |
| arguments that point to this `.0` file. Pasting the full source into tool calls | |
| is rejected because the training target is disk-backed graph editing, not | |
| source-string rewriting. | |
| The environment canonicalizes recoverable path mistakes, such as missing paths | |
| or paths outside the rollout workspace, back to the rollout file and records | |
| those corrections. The `path_argument_valid` metric rewards clean tool calls | |
| that did not require correction. | |
| ## Tools | |
| The environment exposes only Zerolang-specific tools: | |
| | Tool | Purpose | | |
| | --- | --- | | |
| | `zero_check(path)` | Run `zero check --json` against a `.0` file. | | |
| | `zero_graph_summary(path)` | Return compact graph hash and patchable node facts. | | |
| | `zero_graph_dump(path)` | Run `zero graph dump` for detailed graph inspection. | | |
| | `zero_graph_json(path)` | Run `zero graph --json`. | | |
| | `zero_fix_plan(path)` | Run `zero fix --plan --json`. | | |
| | `zero_graph_patch(path, expect_graph_hash, op)` | Apply one checked graph patch operation to the file. | | |
| | `zero_skills_get(skill)` | Load version-matched Zerolang guidance such as `language`, `diagnostics`, or `stdlib`. | | |
| Example checked patch shape: | |
| ```bash | |
| zero graph patch program.0 \ | |
| --expect-graph-hash graph:49dd208f8361c221 \ | |
| --op 'set node="#78ac4364" field="value" expect="66" value="65"' | |
| ``` | |
| ## Reward Metrics | |
| The main rubric is weighted toward actually patching the graph and producing | |
| the hidden target program. | |
| | Metric | Weight | Meaning | | |
| | --- | ---: | --- | | |
| | `graph_patch_success` | 0.50 | A successful `zero_graph_patch` call edited the file to the hidden target. | | |
| | `target_source_match` | 0.20 | The final on-disk source matches the target after whitespace normalization. | | |
| | `zero_check_pass` | 0.15 | The edited file passes `zero check --json`. | | |
| | `zerolang_surface_used` | 0.10 | The rollout used graph hashes, node IDs, `expect`, or graph-patch semantics. | | |
| | `path_argument_valid` | 0.05 | Tool calls used the rollout `.0` path without harness-side correction. | | |
| The reward is intentionally not fully binary. A model can get partial credit for | |
| producing compilable code and using the right interface, but the highest reward | |
| requires the checked graph patch to land correctly. | |
| ## Dataset Construction | |
| The synthetic tasks are generated from canonical Zerolang snippets: | |
| 1. Build an initial `.0` program. | |
| 2. Select a patchable semantic node, usually a literal, function value, call | |
| target, or printed diagnostic string. | |
| 3. Mutate the semantic value to produce the target program. | |
| 4. Store the target source and task metadata. | |
| 5. During rollout, require the model to recover the target through graph tools. | |
| The environment currently focuses on deterministic editing families where | |
| `zero graph patch` support is reliable. The task builders live in: | |
| - `zerolang_editing/tasks.py` | |
| - `zerolang_editing/train_tasks.py` | |
| - `zerolang_editing/task_builders.py` | |
| ## Installation | |
| Install from Prime Hub: | |
| ```bash | |
| prime env install pandelis/zerolang-editing@0.1.8 | |
| ``` | |
| Install from this repository: | |
| ```bash | |
| uv sync | |
| uv run python -m compileall zerolang_editing | |
| ``` | |
| Zerolang is required at runtime. If `zero` is not already on `PATH`, the tool | |
| wrapper checks `$HOME/.zero/bin/zero` and can download a release binary into a | |
| temporary install directory. | |
| ## Local Eval | |
| ```bash | |
| prime eval run ./environments/zerolang_editing \ | |
| -m poolside/laguna-xs.2 \ | |
| -n 3 -r 1 -t 2048 -T 0.4 \ | |
| -a '{"split":"eval","max_turns":10}' \ | |
| -s -d -A | |
| ``` | |
| For quick package-level validation: | |
| ```bash | |
| cd environments/zerolang_editing | |
| uv run python -m compileall zerolang_editing | |
| uv run python - <<'PY' | |
| from zerolang_editing.zerolang_editing import load_environment | |
| env = load_environment(split="eval", max_examples=1, max_turns=2) | |
| print(type(env).__name__, len(env.dataset)) | |
| PY | |
| ``` | |
| ## Hosted RL Configuration | |
| The overnight Laguna XS.2 run uses: | |
| ```toml | |
| model = "poolside/Laguna-XS.2" | |
| max_steps = 200 | |
| batch_size = 64 | |
| rollouts_per_example = 8 | |
| learning_rate = 1e-4 | |
| [sampling] | |
| max_tokens = 2048 | |
| temperature = 0.4 | |
| enable_thinking = true | |
| ``` | |
| The config is stored in: | |
| ```text | |
| configs/rl/zerolang-editing-laguna-xs2-overnight.toml | |
| ``` | |
| ## Previous Training Signal | |
| A 20-step stress run on `poolside/Laguna-XS.2` completed successfully before | |
| the overnight scale-up: | |
| - Baseline eval Avg@1: `0.1500` | |
| - Step 15 eval Avg@1: `0.2357` | |
| - Final eval Avg@1: `0.2250` | |
| - First 10 train-step reward average: `0.1606` | |
| - Last 10 train-step reward average: `0.2056` | |
| - No fatal orchestrator errors, no eval truncation, no no-response. | |
| The main failure signatures were invalid tool paths: missing `path` arguments | |
| and paths outside the rollout workspace. Version `0.1.8` keeps the path sandbox | |
| but converts recoverable path mistakes into canonicalized calls against the | |
| rollout file and adds a small clean-path reward term. | |
| ## Repository Contents | |
| ```text | |
| README.md | |
| pyproject.toml | |
| uv.lock | |
| configs/ | |
| rl/ | |
| zerolang-editing-laguna-xs2-20step.toml | |
| zerolang-editing-laguna-xs2-overnight.toml | |
| zerolang_editing/ | |
| __init__.py | |
| task_builders.py | |
| tasks.py | |
| train_tasks.py | |
| zero_tools.py | |
| zerolang_editing.py | |
| ``` | |
| Build artifacts, local virtualenvs, Zerolang caches, rollout outputs, and | |
| compiled Python caches are intentionally excluded from the Hugging Face repo. | |
| ## Limitations | |
| - The task distribution is synthetic and should be expanded before treating the | |
| trained behavior as general Zerolang editing competence. | |
| - Current graph-edit families focus on reliable literal/value style patches. | |
| - The environment is designed for RL tool-use behavior, not as a standalone | |
| benchmark of general coding ability. | |
| - This repo contains the environment source, not final model weights. | |