Text Generation
PEFT
Safetensors
zerolang
reinforcement-learning
verifiers
code-editing
tool-use
graph-editing
laguna-xs2
lora
fine-tune
Instructions to use poolside-laguna-hackathon/zerolang-editing with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use poolside-laguna-hackathon/zerolang-editing with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
File size: 12,878 Bytes
bb1b296 f19741c bb1b296 f19741c bb1b296 888969a bb1b296 aca50d5 319d34f bb1b296 888969a bb1b296 888969a bb1b296 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | ---
tags:
- zerolang
- reinforcement-learning
- verifiers
- code-editing
- tool-use
- graph-editing
- laguna-xs2
- lora
- fine-tune
- peft
license: apache-2.0
base_model: poolside/Laguna-XS.2
base_model_relation: finetune
library_name: peft
pipeline_tag: text-generation
---
# Zerolang Editing
`zerolang-editing` is a Verifiers/Prime RL environment for training coding agents
to edit [Zerolang](https://github.com/vercel-labs/zerolang) programs through
checked graph edits instead of loose text replacement.
The RL harness is built around [Roder](https://roder.sh), using a custom
zero-coder plugin/distribution that exposes a Zerolang-only graph toolset to the
model. In training, generic source editing tools are disabled; the agent is
expected to use the `zero_*` tools below, especially `zero_graph_summary` and
`zero_graph_patch`, against the rollout file on disk.
The core task is intentionally narrow: each rollout starts with a `.0` source
file already written to disk, asks the model for a semantic code edit, and
scores the edited file after the model uses Zerolang tooling. The intended
successful behavior is:
1. Inspect the file with Zerolang graph/check tools.
2. Identify the relevant graph hash and semantic node.
3. Apply a checked `zero graph patch` operation to the on-disk file.
4. Finish with a compact JSON response pointing at the edited path.
This repository contains the environment source package, synthetic task
builders, tool wrappers, and documentation. The trained checkpoint from hosted
RL runs is published separately by the training service when a run is finalized.
<!-- zerolang-training-report:start -->
## Training Report: Laguna XS.2 Zerolang Editing LoRA
This report covers the hosted Prime RL runs used to select the published LoRA adapter at `loras/laguna-xs2-zerolang-editing-step75-hdznmf/`.
**Selected checkpoint:** run `hdznmfje3xv0clwhu9sx4b0n`, checkpoint `qyxg7ya6x53ntmfjerp11gah`, step 75. It reached the best held-out eval score, Avg@1 `0.6604`, on `pandelis/zerolang-editing@0.1.11` with `poolside/Laguna-XS.2`.
### Run Summary
| Run | Env | LR | Max tokens | Eval examples | Peak Avg@1 | Final Avg@1 | Finding |
| --- | --- | ---: | ---: | ---: | ---: | ---: | --- |
| `r0pbp4` | `0.1.9` | `1e-04` | 2048 | 16 | 0.5844 @ step 20 | 0.1500 | Early lift, then collapsed to no-tool behavior. |
| `impksa` | `0.1.10` | `1e-04` | 2048 | 16 | 0.5094 @ step 20 | 0.1500 | File-path task format worked, but LR/run length still collapsed. |
| `hdznmf` | `0.1.11` | `2e-05` | 4096 | 24 | 0.6604 @ step 75 | 0.5542 | Lower LR and stricter 0.1.11 rewards produced the selected LoRA. |
### Curves




### Findings
- The two `1e-4` overnight runs showed early learning, then collapsed. `r0pbp4` peaked at `0.5844` on step 20 and finished at `0.1500`; `impksa` peaked at `0.5094` on step 20 and also finished at `0.1500`.
- The collapse correlated with no-tool behavior: both 200-step runs ended with `stop_condition/all/no_tools_called = 1.0`, `zero_graph_patch_calls = 0.0`, and `graph_patch_success = 0.0`.
- The selected `0.1.11` run changed the training shape: lower learning rate `2e-5`, shorter 80-step run, larger decode budget `4096`, stricter rewards, and evals every 5 steps. It held up much better, peaking at step 75 and finishing at `0.5542` instead of collapsing.
- The best run still has room to improve: at the last training metric step `79`, no-tool stop rate was `0.6719`, average graph patch calls were `1.4531`, and graph patch success was only `0.0312`. Supporting signals were stronger: path argument validity `0.8750`, target source match `0.7500`, and zero-check pass `0.8281`.
- Prime hosted metrics for these runs do not expose a training-loss series, so the plots use held-out eval score, reward, filtering, stop conditions, and tool telemetry instead of loss.
### Recommendation
Use the step-75 LoRA as the current best artifact. For the next full run, keep the `0.1.11` reward direction and lower LR, but add stronger pressure against no-tool endings and make successful checked `zero_graph_patch` application a larger part of the reward so high eval scores come from the intended graph-edit behavior rather than partial-credit checking and summarization.
<!-- zerolang-training-report:end -->
<!-- zerolang-trajectories:start -->
## Trajectory Samples
Selected Prime RL rollout trajectories from the best run are published under [`trajectories/hdznmfje3xv0clwhu9sx4b0n/`](trajectories/hdznmfje3xv0clwhu9sx4b0n/). The bundle includes normalized JSONL rows and the exact raw `prime train rollouts` pages for retained steps 0, 20, and 70.
<!-- zerolang-trajectories:end -->
## Why This Exists
Most code-editing agents learn to patch source through line-oriented text
operations. Zerolang exposes a graph-level editing surface where a patch is
guarded by the expected graph hash and the expected field value. That makes
edits auditable and harder to apply to stale or mismatched code.
This environment is designed to train that behavior directly. It rewards
successful checked graph patches, while still checking that the resulting file
compiles and matches the hidden target source.
## Environment Summary
- **Package name:** `zerolang-editing`
- **Prime environment ID:** `pandelis/zerolang-editing`
- **Version in this repo:** `0.1.8`
- **Task type:** multi-turn tool-use code editing
- **Agent harness:** Roder with a custom Zero graph-only plugin/tool allowlist
- **Language under edit:** Zerolang `.0`
- **Train split:** 209 deterministic synthetic tasks
- **Eval split:** 67 held-out deterministic synthetic tasks
- **Primary reward target:** successful `zero_graph_patch` on the rollout file
## Roder Harness
The intended RL setup runs the model inside Roder rather than a generic chat
loop. Roder provides the coding-agent harness, while a custom zero-coder plugin
configures the available tool surface for this environment.
That plugin is deliberately restrictive:
- It exposes only Zerolang graph/check/fix/skills tools.
- It removes generic text edit tools from the training harness.
- It routes tool calls to on-disk `.0` files using `path` arguments.
- It keeps checked graph edits as the primary affordance for code changes.
This matters because the behavior we want to train is not "rewrite this source
string". The target behavior is "inspect the Zerolang graph and apply a checked
semantic graph patch to the file Roder is managing". The Verifiers environment
then grades the resulting file from disk.
## Rollout Contract
Each task row includes an initial Zerolang source program and a hidden target
program. At rollout setup time, the environment writes the initial source to:
```text
<temporary rollout workspace>/program.0
```
The model receives that path in the user prompt. Tools must operate on `path`
arguments that point to this `.0` file. Pasting the full source into tool calls
is rejected because the training target is disk-backed graph editing, not
source-string rewriting.
The environment canonicalizes recoverable path mistakes, such as missing paths
or paths outside the rollout workspace, back to the rollout file and records
those corrections. The `path_argument_valid` metric rewards clean tool calls
that did not require correction.
## Tools
The environment exposes only Zerolang-specific tools:
| Tool | Purpose |
| --- | --- |
| `zero_check(path)` | Run `zero check --json` against a `.0` file. |
| `zero_graph_summary(path)` | Return compact graph hash and patchable node facts. |
| `zero_graph_dump(path)` | Run `zero graph dump` for detailed graph inspection. |
| `zero_graph_json(path)` | Run `zero graph --json`. |
| `zero_fix_plan(path)` | Run `zero fix --plan --json`. |
| `zero_graph_patch(path, expect_graph_hash, op)` | Apply one checked graph patch operation to the file. |
| `zero_skills_get(skill)` | Load version-matched Zerolang guidance such as `language`, `diagnostics`, or `stdlib`. |
Example checked patch shape:
```bash
zero graph patch program.0 \
--expect-graph-hash graph:49dd208f8361c221 \
--op 'set node="#78ac4364" field="value" expect="66" value="65"'
```
## Reward Metrics
The main rubric is weighted toward actually patching the graph and producing
the hidden target program.
| Metric | Weight | Meaning |
| --- | ---: | --- |
| `graph_patch_success` | 0.50 | A successful `zero_graph_patch` call edited the file to the hidden target. |
| `target_source_match` | 0.20 | The final on-disk source matches the target after whitespace normalization. |
| `zero_check_pass` | 0.15 | The edited file passes `zero check --json`. |
| `zerolang_surface_used` | 0.10 | The rollout used graph hashes, node IDs, `expect`, or graph-patch semantics. |
| `path_argument_valid` | 0.05 | Tool calls used the rollout `.0` path without harness-side correction. |
The reward is intentionally not fully binary. A model can get partial credit for
producing compilable code and using the right interface, but the highest reward
requires the checked graph patch to land correctly.
## Dataset Construction
The synthetic tasks are generated from canonical Zerolang snippets:
1. Build an initial `.0` program.
2. Select a patchable semantic node, usually a literal, function value, call
target, or printed diagnostic string.
3. Mutate the semantic value to produce the target program.
4. Store the target source and task metadata.
5. During rollout, require the model to recover the target through graph tools.
The environment currently focuses on deterministic editing families where
`zero graph patch` support is reliable. The task builders live in:
- `zerolang_editing/tasks.py`
- `zerolang_editing/train_tasks.py`
- `zerolang_editing/task_builders.py`
## Installation
Install from Prime Hub:
```bash
prime env install pandelis/zerolang-editing@0.1.8
```
Install from this repository:
```bash
uv sync
uv run python -m compileall zerolang_editing
```
Zerolang is required at runtime. If `zero` is not already on `PATH`, the tool
wrapper checks `$HOME/.zero/bin/zero` and can download a release binary into a
temporary install directory.
## Local Eval
```bash
prime eval run ./environments/zerolang_editing \
-m poolside/laguna-xs.2 \
-n 3 -r 1 -t 2048 -T 0.4 \
-a '{"split":"eval","max_turns":10}' \
-s -d -A
```
For quick package-level validation:
```bash
cd environments/zerolang_editing
uv run python -m compileall zerolang_editing
uv run python - <<'PY'
from zerolang_editing.zerolang_editing import load_environment
env = load_environment(split="eval", max_examples=1, max_turns=2)
print(type(env).__name__, len(env.dataset))
PY
```
## Hosted RL Configuration
The overnight Laguna XS.2 run uses:
```toml
model = "poolside/Laguna-XS.2"
max_steps = 200
batch_size = 64
rollouts_per_example = 8
learning_rate = 1e-4
[sampling]
max_tokens = 2048
temperature = 0.4
enable_thinking = true
```
The config is stored in:
```text
configs/rl/zerolang-editing-laguna-xs2-overnight.toml
```
## Previous Training Signal
A 20-step stress run on `poolside/Laguna-XS.2` completed successfully before
the overnight scale-up:
- Baseline eval Avg@1: `0.1500`
- Step 15 eval Avg@1: `0.2357`
- Final eval Avg@1: `0.2250`
- First 10 train-step reward average: `0.1606`
- Last 10 train-step reward average: `0.2056`
- No fatal orchestrator errors, no eval truncation, no no-response.
The main failure signatures were invalid tool paths: missing `path` arguments
and paths outside the rollout workspace. Version `0.1.8` keeps the path sandbox
but converts recoverable path mistakes into canonicalized calls against the
rollout file and adds a small clean-path reward term.
## Repository Contents
```text
README.md
pyproject.toml
uv.lock
configs/
rl/
zerolang-editing-laguna-xs2-20step.toml
zerolang-editing-laguna-xs2-overnight.toml
zerolang_editing/
__init__.py
task_builders.py
tasks.py
train_tasks.py
zero_tools.py
zerolang_editing.py
```
Build artifacts, local virtualenvs, Zerolang caches, rollout outputs, and
compiled Python caches are intentionally excluded from the Hugging Face repo.
## Limitations
- The task distribution is synthetic and should be expanded before treating the
trained behavior as general Zerolang editing competence.
- Current graph-edit families focus on reliable literal/value style patches.
- The environment is designed for RL tool-use behavior, not as a standalone
benchmark of general coding ability.
- This repo contains the environment source, not final model weights.
|