Text Generation
PEFT
Safetensors
zerolang
reinforcement-learning
verifiers
code-editing
tool-use
graph-editing
laguna-xs2
lora
fine-tune
Instructions to use poolside-laguna-hackathon/zerolang-editing with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use poolside-laguna-hackathon/zerolang-editing with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Add Zerolang editing environment
Browse files- .gitignore +8 -0
- README.md +241 -0
- configs/rl/zerolang-editing-laguna-xs2-20step.toml +34 -0
- configs/rl/zerolang-editing-laguna-xs2-overnight.toml +35 -0
- pyproject.toml +22 -0
- uv.lock +0 -0
- zerolang_editing/__init__.py +3 -0
- zerolang_editing/task_builders.py +270 -0
- zerolang_editing/tasks.py +193 -0
- zerolang_editing/train_tasks.py +255 -0
- zerolang_editing/zero_tools.py +310 -0
- zerolang_editing/zerolang_editing.py +418 -0
.gitignore
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.venv/
|
| 2 |
+
.zero/
|
| 3 |
+
__pycache__/
|
| 4 |
+
*.py[cod]
|
| 5 |
+
dist/
|
| 6 |
+
outputs/
|
| 7 |
+
.pytest_cache/
|
| 8 |
+
.ruff_cache/
|
README.md
ADDED
|
@@ -0,0 +1,241 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- zerolang
|
| 4 |
+
- reinforcement-learning
|
| 5 |
+
- verifiers
|
| 6 |
+
- code-editing
|
| 7 |
+
- tool-use
|
| 8 |
+
- graph-editing
|
| 9 |
+
- laguna-xs2
|
| 10 |
+
license: apache-2.0
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# Zerolang Editing
|
| 14 |
+
|
| 15 |
+
`zerolang-editing` is a Verifiers/Prime RL environment for training coding agents
|
| 16 |
+
to edit [Zerolang](https://github.com/vercel-labs/zerolang) programs through
|
| 17 |
+
checked graph edits instead of loose text replacement.
|
| 18 |
+
|
| 19 |
+
The core task is intentionally narrow: each rollout starts with a `.0` source
|
| 20 |
+
file already written to disk, asks the model for a semantic code edit, and
|
| 21 |
+
scores the edited file after the model uses Zerolang tooling. The intended
|
| 22 |
+
successful behavior is:
|
| 23 |
+
|
| 24 |
+
1. Inspect the file with Zerolang graph/check tools.
|
| 25 |
+
2. Identify the relevant graph hash and semantic node.
|
| 26 |
+
3. Apply a checked `zero graph patch` operation to the on-disk file.
|
| 27 |
+
4. Finish with a compact JSON response pointing at the edited path.
|
| 28 |
+
|
| 29 |
+
This repository contains the environment source package, synthetic task
|
| 30 |
+
builders, tool wrappers, and documentation. The trained checkpoint from hosted
|
| 31 |
+
RL runs is published separately by the training service when a run is finalized.
|
| 32 |
+
|
| 33 |
+
## Why This Exists
|
| 34 |
+
|
| 35 |
+
Most code-editing agents learn to patch source through line-oriented text
|
| 36 |
+
operations. Zerolang exposes a graph-level editing surface where a patch is
|
| 37 |
+
guarded by the expected graph hash and the expected field value. That makes
|
| 38 |
+
edits auditable and harder to apply to stale or mismatched code.
|
| 39 |
+
|
| 40 |
+
This environment is designed to train that behavior directly. It rewards
|
| 41 |
+
successful checked graph patches, while still checking that the resulting file
|
| 42 |
+
compiles and matches the hidden target source.
|
| 43 |
+
|
| 44 |
+
## Environment Summary
|
| 45 |
+
|
| 46 |
+
- **Package name:** `zerolang-editing`
|
| 47 |
+
- **Prime environment ID:** `pandelis/zerolang-editing`
|
| 48 |
+
- **Version in this repo:** `0.1.8`
|
| 49 |
+
- **Task type:** multi-turn tool-use code editing
|
| 50 |
+
- **Language under edit:** Zerolang `.0`
|
| 51 |
+
- **Train split:** 209 deterministic synthetic tasks
|
| 52 |
+
- **Eval split:** 67 held-out deterministic synthetic tasks
|
| 53 |
+
- **Primary reward target:** successful `zero_graph_patch` on the rollout file
|
| 54 |
+
|
| 55 |
+
## Rollout Contract
|
| 56 |
+
|
| 57 |
+
Each task row includes an initial Zerolang source program and a hidden target
|
| 58 |
+
program. At rollout setup time, the environment writes the initial source to:
|
| 59 |
+
|
| 60 |
+
```text
|
| 61 |
+
<temporary rollout workspace>/program.0
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
The model receives that path in the user prompt. Tools must operate on `path`
|
| 65 |
+
arguments that point to this `.0` file. Pasting the full source into tool calls
|
| 66 |
+
is rejected because the training target is disk-backed graph editing, not
|
| 67 |
+
source-string rewriting.
|
| 68 |
+
|
| 69 |
+
The environment canonicalizes recoverable path mistakes, such as missing paths
|
| 70 |
+
or paths outside the rollout workspace, back to the rollout file and records
|
| 71 |
+
those corrections. The `path_argument_valid` metric rewards clean tool calls
|
| 72 |
+
that did not require correction.
|
| 73 |
+
|
| 74 |
+
## Tools
|
| 75 |
+
|
| 76 |
+
The environment exposes only Zerolang-specific tools:
|
| 77 |
+
|
| 78 |
+
| Tool | Purpose |
|
| 79 |
+
| --- | --- |
|
| 80 |
+
| `zero_check(path)` | Run `zero check --json` against a `.0` file. |
|
| 81 |
+
| `zero_graph_summary(path)` | Return compact graph hash and patchable node facts. |
|
| 82 |
+
| `zero_graph_dump(path)` | Run `zero graph dump` for detailed graph inspection. |
|
| 83 |
+
| `zero_graph_json(path)` | Run `zero graph --json`. |
|
| 84 |
+
| `zero_fix_plan(path)` | Run `zero fix --plan --json`. |
|
| 85 |
+
| `zero_graph_patch(path, expect_graph_hash, op)` | Apply one checked graph patch operation to the file. |
|
| 86 |
+
| `zero_skills_get(skill)` | Load version-matched Zerolang guidance such as `language`, `diagnostics`, or `stdlib`. |
|
| 87 |
+
|
| 88 |
+
Example checked patch shape:
|
| 89 |
+
|
| 90 |
+
```bash
|
| 91 |
+
zero graph patch program.0 \
|
| 92 |
+
--expect-graph-hash graph:49dd208f8361c221 \
|
| 93 |
+
--op 'set node="#78ac4364" field="value" expect="66" value="65"'
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
## Reward Metrics
|
| 97 |
+
|
| 98 |
+
The main rubric is weighted toward actually patching the graph and producing
|
| 99 |
+
the hidden target program.
|
| 100 |
+
|
| 101 |
+
| Metric | Weight | Meaning |
|
| 102 |
+
| --- | ---: | --- |
|
| 103 |
+
| `graph_patch_success` | 0.50 | A successful `zero_graph_patch` call edited the file to the hidden target. |
|
| 104 |
+
| `target_source_match` | 0.20 | The final on-disk source matches the target after whitespace normalization. |
|
| 105 |
+
| `zero_check_pass` | 0.15 | The edited file passes `zero check --json`. |
|
| 106 |
+
| `zerolang_surface_used` | 0.10 | The rollout used graph hashes, node IDs, `expect`, or graph-patch semantics. |
|
| 107 |
+
| `path_argument_valid` | 0.05 | Tool calls used the rollout `.0` path without harness-side correction. |
|
| 108 |
+
|
| 109 |
+
The reward is intentionally not fully binary. A model can get partial credit for
|
| 110 |
+
producing compilable code and using the right interface, but the highest reward
|
| 111 |
+
requires the checked graph patch to land correctly.
|
| 112 |
+
|
| 113 |
+
## Dataset Construction
|
| 114 |
+
|
| 115 |
+
The synthetic tasks are generated from canonical Zerolang snippets:
|
| 116 |
+
|
| 117 |
+
1. Build an initial `.0` program.
|
| 118 |
+
2. Select a patchable semantic node, usually a literal, function value, call
|
| 119 |
+
target, or printed diagnostic string.
|
| 120 |
+
3. Mutate the semantic value to produce the target program.
|
| 121 |
+
4. Store the target source and task metadata.
|
| 122 |
+
5. During rollout, require the model to recover the target through graph tools.
|
| 123 |
+
|
| 124 |
+
The environment currently focuses on deterministic editing families where
|
| 125 |
+
`zero graph patch` support is reliable. The task builders live in:
|
| 126 |
+
|
| 127 |
+
- `zerolang_editing/tasks.py`
|
| 128 |
+
- `zerolang_editing/train_tasks.py`
|
| 129 |
+
- `zerolang_editing/task_builders.py`
|
| 130 |
+
|
| 131 |
+
## Installation
|
| 132 |
+
|
| 133 |
+
Install from Prime Hub:
|
| 134 |
+
|
| 135 |
+
```bash
|
| 136 |
+
prime env install pandelis/zerolang-editing@0.1.8
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
Install from this repository:
|
| 140 |
+
|
| 141 |
+
```bash
|
| 142 |
+
uv sync
|
| 143 |
+
uv run python -m compileall zerolang_editing
|
| 144 |
+
```
|
| 145 |
+
|
| 146 |
+
Zerolang is required at runtime. If `zero` is not already on `PATH`, the tool
|
| 147 |
+
wrapper checks `$HOME/.zero/bin/zero` and can download a release binary into a
|
| 148 |
+
temporary install directory.
|
| 149 |
+
|
| 150 |
+
## Local Eval
|
| 151 |
+
|
| 152 |
+
```bash
|
| 153 |
+
prime eval run ./environments/zerolang_editing \
|
| 154 |
+
-m poolside/laguna-xs.2 \
|
| 155 |
+
-n 3 -r 1 -t 2048 -T 0.4 \
|
| 156 |
+
-a '{"split":"eval","max_turns":10}' \
|
| 157 |
+
-s -d -A
|
| 158 |
+
```
|
| 159 |
+
|
| 160 |
+
For quick package-level validation:
|
| 161 |
+
|
| 162 |
+
```bash
|
| 163 |
+
cd environments/zerolang_editing
|
| 164 |
+
uv run python -m compileall zerolang_editing
|
| 165 |
+
uv run python - <<'PY'
|
| 166 |
+
from zerolang_editing.zerolang_editing import load_environment
|
| 167 |
+
env = load_environment(split="eval", max_examples=1, max_turns=2)
|
| 168 |
+
print(type(env).__name__, len(env.dataset))
|
| 169 |
+
PY
|
| 170 |
+
```
|
| 171 |
+
|
| 172 |
+
## Hosted RL Configuration
|
| 173 |
+
|
| 174 |
+
The overnight Laguna XS.2 run uses:
|
| 175 |
+
|
| 176 |
+
```toml
|
| 177 |
+
model = "poolside/Laguna-XS.2"
|
| 178 |
+
max_steps = 200
|
| 179 |
+
batch_size = 64
|
| 180 |
+
rollouts_per_example = 8
|
| 181 |
+
learning_rate = 1e-4
|
| 182 |
+
|
| 183 |
+
[sampling]
|
| 184 |
+
max_tokens = 2048
|
| 185 |
+
temperature = 0.4
|
| 186 |
+
enable_thinking = true
|
| 187 |
+
```
|
| 188 |
+
|
| 189 |
+
The config is stored in:
|
| 190 |
+
|
| 191 |
+
```text
|
| 192 |
+
configs/rl/zerolang-editing-laguna-xs2-overnight.toml
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
## Previous Training Signal
|
| 196 |
+
|
| 197 |
+
A 20-step stress run on `poolside/Laguna-XS.2` completed successfully before
|
| 198 |
+
the overnight scale-up:
|
| 199 |
+
|
| 200 |
+
- Baseline eval Avg@1: `0.1500`
|
| 201 |
+
- Step 15 eval Avg@1: `0.2357`
|
| 202 |
+
- Final eval Avg@1: `0.2250`
|
| 203 |
+
- First 10 train-step reward average: `0.1606`
|
| 204 |
+
- Last 10 train-step reward average: `0.2056`
|
| 205 |
+
- No fatal orchestrator errors, no eval truncation, no no-response.
|
| 206 |
+
|
| 207 |
+
The main failure signatures were invalid tool paths: missing `path` arguments
|
| 208 |
+
and paths outside the rollout workspace. Version `0.1.8` keeps the path sandbox
|
| 209 |
+
but converts recoverable path mistakes into canonicalized calls against the
|
| 210 |
+
rollout file and adds a small clean-path reward term.
|
| 211 |
+
|
| 212 |
+
## Repository Contents
|
| 213 |
+
|
| 214 |
+
```text
|
| 215 |
+
README.md
|
| 216 |
+
pyproject.toml
|
| 217 |
+
uv.lock
|
| 218 |
+
configs/
|
| 219 |
+
rl/
|
| 220 |
+
zerolang-editing-laguna-xs2-20step.toml
|
| 221 |
+
zerolang-editing-laguna-xs2-overnight.toml
|
| 222 |
+
zerolang_editing/
|
| 223 |
+
__init__.py
|
| 224 |
+
task_builders.py
|
| 225 |
+
tasks.py
|
| 226 |
+
train_tasks.py
|
| 227 |
+
zero_tools.py
|
| 228 |
+
zerolang_editing.py
|
| 229 |
+
```
|
| 230 |
+
|
| 231 |
+
Build artifacts, local virtualenvs, Zerolang caches, rollout outputs, and
|
| 232 |
+
compiled Python caches are intentionally excluded from the Hugging Face repo.
|
| 233 |
+
|
| 234 |
+
## Limitations
|
| 235 |
+
|
| 236 |
+
- The task distribution is synthetic and should be expanded before treating the
|
| 237 |
+
trained behavior as general Zerolang editing competence.
|
| 238 |
+
- Current graph-edit families focus on reliable literal/value style patches.
|
| 239 |
+
- The environment is designed for RL tool-use behavior, not as a standalone
|
| 240 |
+
benchmark of general coding ability.
|
| 241 |
+
- This repo contains the environment source, not final model weights.
|
configs/rl/zerolang-editing-laguna-xs2-20step.toml
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Conservative scale-up from zerolang-editing-stress.toml.
|
| 2 |
+
|
| 3 |
+
model = "poolside/Laguna-XS.2"
|
| 4 |
+
max_steps = 20
|
| 5 |
+
batch_size = 32
|
| 6 |
+
rollouts_per_example = 4
|
| 7 |
+
learning_rate = 1e-4
|
| 8 |
+
|
| 9 |
+
[sampling]
|
| 10 |
+
max_tokens = 2048
|
| 11 |
+
temperature = 0.4
|
| 12 |
+
enable_thinking = true
|
| 13 |
+
|
| 14 |
+
[[env]]
|
| 15 |
+
id = "pandelis/zerolang-editing"
|
| 16 |
+
version = "0.1.8"
|
| 17 |
+
|
| 18 |
+
[env.args]
|
| 19 |
+
split = "train"
|
| 20 |
+
max_turns = 10
|
| 21 |
+
|
| 22 |
+
[eval]
|
| 23 |
+
interval = 5
|
| 24 |
+
num_examples = 8
|
| 25 |
+
rollouts_per_example = 1
|
| 26 |
+
eval_base_model = true
|
| 27 |
+
|
| 28 |
+
[[eval.env]]
|
| 29 |
+
id = "pandelis/zerolang-editing"
|
| 30 |
+
version = "0.1.8"
|
| 31 |
+
|
| 32 |
+
[eval.env.args]
|
| 33 |
+
split = "eval"
|
| 34 |
+
max_turns = 10
|
configs/rl/zerolang-editing-laguna-xs2-overnight.toml
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Overnight scale-up from zerolang-editing-laguna-xs2-20step.toml.
|
| 2 |
+
# Previous 20-step run was stable and improved held-out Avg@1 from 0.1500 to 0.2250.
|
| 3 |
+
|
| 4 |
+
model = "poolside/Laguna-XS.2"
|
| 5 |
+
max_steps = 200
|
| 6 |
+
batch_size = 64
|
| 7 |
+
rollouts_per_example = 8
|
| 8 |
+
learning_rate = 1e-4
|
| 9 |
+
|
| 10 |
+
[sampling]
|
| 11 |
+
max_tokens = 2048
|
| 12 |
+
temperature = 0.4
|
| 13 |
+
enable_thinking = true
|
| 14 |
+
|
| 15 |
+
[[env]]
|
| 16 |
+
id = "pandelis/zerolang-editing"
|
| 17 |
+
version = "0.1.8"
|
| 18 |
+
|
| 19 |
+
[env.args]
|
| 20 |
+
split = "train"
|
| 21 |
+
max_turns = 10
|
| 22 |
+
|
| 23 |
+
[eval]
|
| 24 |
+
interval = 10
|
| 25 |
+
num_examples = 16
|
| 26 |
+
rollouts_per_example = 1
|
| 27 |
+
eval_base_model = true
|
| 28 |
+
|
| 29 |
+
[[eval.env]]
|
| 30 |
+
id = "pandelis/zerolang-editing"
|
| 31 |
+
version = "0.1.8"
|
| 32 |
+
|
| 33 |
+
[eval.env.args]
|
| 34 |
+
split = "eval"
|
| 35 |
+
max_turns = 10
|
pyproject.toml
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[project]
|
| 2 |
+
name = "zerolang-editing"
|
| 3 |
+
description = "Tool-backed Zerolang editing tasks for graph-first code repair and refactoring"
|
| 4 |
+
tags = ["zerolang", "graph-editing", "code-editing", "train", "eval"]
|
| 5 |
+
version = "0.1.8"
|
| 6 |
+
requires-python = ">=3.10"
|
| 7 |
+
dependencies = [
|
| 8 |
+
"datasets>=2.19.0",
|
| 9 |
+
"verifiers>=0.1.14",
|
| 10 |
+
]
|
| 11 |
+
|
| 12 |
+
[build-system]
|
| 13 |
+
requires = ["hatchling"]
|
| 14 |
+
build-backend = "hatchling.build"
|
| 15 |
+
|
| 16 |
+
[tool.hatch.build]
|
| 17 |
+
include = ["zerolang_editing", "pyproject.toml"]
|
| 18 |
+
|
| 19 |
+
[tool.verifiers.eval]
|
| 20 |
+
num_examples = 3
|
| 21 |
+
rollouts_per_example = 1
|
| 22 |
+
max_turns = 6
|
uv.lock
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
zerolang_editing/__init__.py
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from .zerolang_editing import load_environment
|
| 2 |
+
|
| 3 |
+
__all__ = ["load_environment"]
|
zerolang_editing/task_builders.py
ADDED
|
@@ -0,0 +1,270 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Task construction helpers for synthetic Zerolang editing rows."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from typing import Any
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
def _source(text: str) -> str:
|
| 9 |
+
return text.strip() + "\n"
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def _write_program(message: str, *, raises: bool = True) -> str:
|
| 13 |
+
raises_suffix = " raises" if raises else ""
|
| 14 |
+
return _source(
|
| 15 |
+
f"""
|
| 16 |
+
pub fn main(world: World) -> Void{raises_suffix} {{
|
| 17 |
+
check world.out.write("{message}\\n")
|
| 18 |
+
}}
|
| 19 |
+
"""
|
| 20 |
+
)
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
def _literal_task(
|
| 24 |
+
task_id: str, old: str, new: str, goal: str | None = None, *, split: str = "eval"
|
| 25 |
+
) -> dict[str, Any]:
|
| 26 |
+
return {
|
| 27 |
+
"id": task_id,
|
| 28 |
+
"split": split,
|
| 29 |
+
"category": "graph_patch_literal",
|
| 30 |
+
"goal": goal or f'Replace the string literal "{old}\\n" with "{new}\\n".',
|
| 31 |
+
"source": _write_program(old),
|
| 32 |
+
"target_source": _write_program(new),
|
| 33 |
+
}
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def _branch_literal_task(
|
| 37 |
+
task_id: str, helper: str, old: str, new: str, *, split: str = "eval"
|
| 38 |
+
) -> dict[str, Any]:
|
| 39 |
+
return {
|
| 40 |
+
"id": task_id,
|
| 41 |
+
"split": split,
|
| 42 |
+
"category": "graph_patch_literal",
|
| 43 |
+
"goal": (
|
| 44 |
+
"Keep the helper-controlled branch intact and update only the string "
|
| 45 |
+
f'literal from "{old}\\n" to "{new}\\n".'
|
| 46 |
+
),
|
| 47 |
+
"source": _source(
|
| 48 |
+
f"""
|
| 49 |
+
fn {helper}() -> i32 {{
|
| 50 |
+
return 1
|
| 51 |
+
}}
|
| 52 |
+
|
| 53 |
+
pub fn main(world: World) -> Void raises {{
|
| 54 |
+
if {helper}() == 1 {{
|
| 55 |
+
check world.out.write("{old}\\n")
|
| 56 |
+
}}
|
| 57 |
+
}}
|
| 58 |
+
"""
|
| 59 |
+
),
|
| 60 |
+
"target_source": _source(
|
| 61 |
+
f"""
|
| 62 |
+
fn {helper}() -> i32 {{
|
| 63 |
+
return 1
|
| 64 |
+
}}
|
| 65 |
+
|
| 66 |
+
pub fn main(world: World) -> Void raises {{
|
| 67 |
+
if {helper}() == 1 {{
|
| 68 |
+
check world.out.write("{new}\\n")
|
| 69 |
+
}}
|
| 70 |
+
}}
|
| 71 |
+
"""
|
| 72 |
+
),
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
def _helper_task(
|
| 77 |
+
task_id: str,
|
| 78 |
+
helper: str,
|
| 79 |
+
source_expr: str,
|
| 80 |
+
target_expr: str,
|
| 81 |
+
expected: int,
|
| 82 |
+
output: str,
|
| 83 |
+
*,
|
| 84 |
+
split: str = "eval",
|
| 85 |
+
) -> dict[str, Any]:
|
| 86 |
+
return {
|
| 87 |
+
"id": task_id,
|
| 88 |
+
"split": split,
|
| 89 |
+
"category": "semantic_update",
|
| 90 |
+
"goal": (
|
| 91 |
+
f"Update {helper}() so it returns {expected} and the existing main "
|
| 92 |
+
f"branch prints {output}."
|
| 93 |
+
),
|
| 94 |
+
"source": _source(
|
| 95 |
+
f"""
|
| 96 |
+
fn {helper}() -> i32 {{
|
| 97 |
+
return {source_expr}
|
| 98 |
+
}}
|
| 99 |
+
|
| 100 |
+
pub fn main(world: World) -> Void raises {{
|
| 101 |
+
if {helper}() == {expected} {{
|
| 102 |
+
check world.out.write("{output}\\n")
|
| 103 |
+
}}
|
| 104 |
+
}}
|
| 105 |
+
"""
|
| 106 |
+
),
|
| 107 |
+
"target_source": _source(
|
| 108 |
+
f"""
|
| 109 |
+
fn {helper}() -> i32 {{
|
| 110 |
+
return {target_expr}
|
| 111 |
+
}}
|
| 112 |
+
|
| 113 |
+
pub fn main(world: World) -> Void raises {{
|
| 114 |
+
if {helper}() == {expected} {{
|
| 115 |
+
check world.out.write("{output}\\n")
|
| 116 |
+
}}
|
| 117 |
+
}}
|
| 118 |
+
"""
|
| 119 |
+
),
|
| 120 |
+
}
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
def _two_helper_task(
|
| 124 |
+
task_id: str,
|
| 125 |
+
helper: str,
|
| 126 |
+
other: str,
|
| 127 |
+
source_expr: str,
|
| 128 |
+
target_expr: str,
|
| 129 |
+
other_expr: str,
|
| 130 |
+
expected: int,
|
| 131 |
+
*,
|
| 132 |
+
split: str = "eval",
|
| 133 |
+
) -> dict[str, Any]:
|
| 134 |
+
return {
|
| 135 |
+
"id": task_id,
|
| 136 |
+
"split": split,
|
| 137 |
+
"category": "semantic_update",
|
| 138 |
+
"goal": (
|
| 139 |
+
f"Update only {helper}() so main writes ok when the comparison succeeds; "
|
| 140 |
+
f"leave {other}() unchanged."
|
| 141 |
+
),
|
| 142 |
+
"source": _source(
|
| 143 |
+
f"""
|
| 144 |
+
fn {helper}() -> i32 {{
|
| 145 |
+
return {source_expr}
|
| 146 |
+
}}
|
| 147 |
+
|
| 148 |
+
fn {other}() -> i32 {{
|
| 149 |
+
return {other_expr}
|
| 150 |
+
}}
|
| 151 |
+
|
| 152 |
+
pub fn main(world: World) -> Void raises {{
|
| 153 |
+
if {helper}() == {expected} {{
|
| 154 |
+
check world.out.write("ok\\n")
|
| 155 |
+
}}
|
| 156 |
+
}}
|
| 157 |
+
"""
|
| 158 |
+
),
|
| 159 |
+
"target_source": _source(
|
| 160 |
+
f"""
|
| 161 |
+
fn {helper}() -> i32 {{
|
| 162 |
+
return {target_expr}
|
| 163 |
+
}}
|
| 164 |
+
|
| 165 |
+
fn {other}() -> i32 {{
|
| 166 |
+
return {other_expr}
|
| 167 |
+
}}
|
| 168 |
+
|
| 169 |
+
pub fn main(world: World) -> Void raises {{
|
| 170 |
+
if {helper}() == {expected} {{
|
| 171 |
+
check world.out.write("ok\\n")
|
| 172 |
+
}}
|
| 173 |
+
}}
|
| 174 |
+
"""
|
| 175 |
+
),
|
| 176 |
+
}
|
| 177 |
+
|
| 178 |
+
|
| 179 |
+
def _call_task(
|
| 180 |
+
task_id: str, source_args: str, target_args: str, expected: int, *, split: str = "eval"
|
| 181 |
+
) -> dict[str, Any]:
|
| 182 |
+
return {
|
| 183 |
+
"id": task_id,
|
| 184 |
+
"split": split,
|
| 185 |
+
"category": "call_update",
|
| 186 |
+
"goal": "Keep add unchanged, but edit one call argument so the comparison is true.",
|
| 187 |
+
"source": _source(
|
| 188 |
+
f"""
|
| 189 |
+
fn add(a: i32, b: i32) -> i32 {{
|
| 190 |
+
return a + b
|
| 191 |
+
}}
|
| 192 |
+
|
| 193 |
+
pub fn main(world: World) -> Void raises {{
|
| 194 |
+
if add({source_args}) == {expected} {{
|
| 195 |
+
check world.out.write("ok\\n")
|
| 196 |
+
}}
|
| 197 |
+
}}
|
| 198 |
+
"""
|
| 199 |
+
),
|
| 200 |
+
"target_source": _source(
|
| 201 |
+
f"""
|
| 202 |
+
fn add(a: i32, b: i32) -> i32 {{
|
| 203 |
+
return a + b
|
| 204 |
+
}}
|
| 205 |
+
|
| 206 |
+
pub fn main(world: World) -> Void raises {{
|
| 207 |
+
if add({target_args}) == {expected} {{
|
| 208 |
+
check world.out.write("ok\\n")
|
| 209 |
+
}}
|
| 210 |
+
}}
|
| 211 |
+
"""
|
| 212 |
+
),
|
| 213 |
+
}
|
| 214 |
+
|
| 215 |
+
|
| 216 |
+
def _condition_task(
|
| 217 |
+
task_id: str,
|
| 218 |
+
helper: str,
|
| 219 |
+
returned: int,
|
| 220 |
+
source_compare: int,
|
| 221 |
+
output: str,
|
| 222 |
+
*,
|
| 223 |
+
split: str = "eval",
|
| 224 |
+
) -> dict[str, Any]:
|
| 225 |
+
return {
|
| 226 |
+
"id": task_id,
|
| 227 |
+
"split": split,
|
| 228 |
+
"category": "condition_update",
|
| 229 |
+
"goal": (
|
| 230 |
+
"Edit the comparison literal so the branch is true without changing "
|
| 231 |
+
f"{helper}() or the output string."
|
| 232 |
+
),
|
| 233 |
+
"source": _source(
|
| 234 |
+
f"""
|
| 235 |
+
fn {helper}() -> i32 {{
|
| 236 |
+
return {returned}
|
| 237 |
+
}}
|
| 238 |
+
|
| 239 |
+
pub fn main(world: World) -> Void raises {{
|
| 240 |
+
if {helper}() == {source_compare} {{
|
| 241 |
+
check world.out.write("{output}\\n")
|
| 242 |
+
}}
|
| 243 |
+
}}
|
| 244 |
+
"""
|
| 245 |
+
),
|
| 246 |
+
"target_source": _source(
|
| 247 |
+
f"""
|
| 248 |
+
fn {helper}() -> i32 {{
|
| 249 |
+
return {returned}
|
| 250 |
+
}}
|
| 251 |
+
|
| 252 |
+
pub fn main(world: World) -> Void raises {{
|
| 253 |
+
if {helper}() == {returned} {{
|
| 254 |
+
check world.out.write("{output}\\n")
|
| 255 |
+
}}
|
| 256 |
+
}}
|
| 257 |
+
"""
|
| 258 |
+
),
|
| 259 |
+
}
|
| 260 |
+
|
| 261 |
+
|
| 262 |
+
def _diagnostic_task(task_id: str, message: str, *, split: str = "eval") -> dict[str, Any]:
|
| 263 |
+
return {
|
| 264 |
+
"id": task_id,
|
| 265 |
+
"split": split,
|
| 266 |
+
"category": "diagnostic_repair",
|
| 267 |
+
"goal": "Repair the main signature so the existing world.out.write check is valid.",
|
| 268 |
+
"source": _write_program(message, raises=False),
|
| 269 |
+
"target_source": _write_program(message, raises=True),
|
| 270 |
+
}
|
zerolang_editing/tasks.py
ADDED
|
@@ -0,0 +1,193 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Synthetic task corpus for the Zerolang editing environment."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from .task_builders import (
|
| 6 |
+
_branch_literal_task,
|
| 7 |
+
_call_task,
|
| 8 |
+
_condition_task,
|
| 9 |
+
_diagnostic_task,
|
| 10 |
+
_helper_task,
|
| 11 |
+
_literal_task,
|
| 12 |
+
_two_helper_task,
|
| 13 |
+
)
|
| 14 |
+
from .train_tasks import TRAIN_TASKS
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
EVAL_TASKS: list[dict[str, Any]] = [
|
| 18 |
+
_literal_task(
|
| 19 |
+
"literal-string-graph-patch",
|
| 20 |
+
"hello from zero",
|
| 21 |
+
"hello graph",
|
| 22 |
+
"Change the printed string from hello from zero to hello graph.",
|
| 23 |
+
),
|
| 24 |
+
_literal_task(
|
| 25 |
+
"repair-unknown-message",
|
| 26 |
+
"draft",
|
| 27 |
+
"fixed by zero",
|
| 28 |
+
'Replace the string literal "draft\\n" with "fixed by zero\\n".',
|
| 29 |
+
),
|
| 30 |
+
_literal_task(
|
| 31 |
+
"literal-status-ready",
|
| 32 |
+
"status: draft",
|
| 33 |
+
"status: ready",
|
| 34 |
+
'Replace the string literal "status: draft\\n" with "status: ready\\n".',
|
| 35 |
+
),
|
| 36 |
+
_literal_task(
|
| 37 |
+
"literal-counter-pass",
|
| 38 |
+
"counter failed",
|
| 39 |
+
"counter passed",
|
| 40 |
+
'Replace the string literal "counter failed\\n" with "counter passed\\n".',
|
| 41 |
+
),
|
| 42 |
+
_literal_task(
|
| 43 |
+
"literal-agent-graph",
|
| 44 |
+
"agent used text",
|
| 45 |
+
"agent used graph",
|
| 46 |
+
'Replace the string literal "agent used text\\n" with "agent used graph\\n".',
|
| 47 |
+
),
|
| 48 |
+
*[
|
| 49 |
+
_literal_task(task_id, old, new)
|
| 50 |
+
for task_id, old, new in [
|
| 51 |
+
("literal-alpha-beta", "alpha", "beta"),
|
| 52 |
+
("literal-start-finish", "start", "finish"),
|
| 53 |
+
("literal-left-right", "left", "right"),
|
| 54 |
+
("literal-plan-done", "plan pending", "plan done"),
|
| 55 |
+
("literal-state-green", "state: red", "state: green"),
|
| 56 |
+
("literal-cache-hot", "cache cold", "cache hot"),
|
| 57 |
+
]
|
| 58 |
+
],
|
| 59 |
+
*[
|
| 60 |
+
_literal_task(task_id, old, new, goal)
|
| 61 |
+
for task_id, old, new, goal in [
|
| 62 |
+
(
|
| 63 |
+
"literal-colon-version",
|
| 64 |
+
"status: init [v1]",
|
| 65 |
+
"status: init [v2]",
|
| 66 |
+
"Update the status bracket code from [v1] to [v2].",
|
| 67 |
+
),
|
| 68 |
+
(
|
| 69 |
+
"literal-api-version",
|
| 70 |
+
"load path /api/v1/health",
|
| 71 |
+
"load path /api/v2/health",
|
| 72 |
+
"Switch the printed endpoint from v1 to v2 while keeping the same path.",
|
| 73 |
+
),
|
| 74 |
+
(
|
| 75 |
+
"literal-score-number",
|
| 76 |
+
"score: 42/100",
|
| 77 |
+
"score: 99/100",
|
| 78 |
+
"Change the score text from 42 to 99.",
|
| 79 |
+
),
|
| 80 |
+
(
|
| 81 |
+
"literal-status-code",
|
| 82 |
+
"error: [404] failed",
|
| 83 |
+
"error: [200] resolved",
|
| 84 |
+
"Edit the status code label from 404 to 200 in brackets.",
|
| 85 |
+
),
|
| 86 |
+
(
|
| 87 |
+
"literal-progress-percent",
|
| 88 |
+
"progress: 50% complete",
|
| 89 |
+
"progress: 75% complete",
|
| 90 |
+
"Update the progress percentage from 50 to 75.",
|
| 91 |
+
),
|
| 92 |
+
(
|
| 93 |
+
"literal-time-stamp",
|
| 94 |
+
"time stamp 12:34",
|
| 95 |
+
"time stamp 13:00",
|
| 96 |
+
"Change the time from 12:34 to 13:00 in the output string.",
|
| 97 |
+
),
|
| 98 |
+
(
|
| 99 |
+
"literal-list-separator",
|
| 100 |
+
"list [a/b/c]",
|
| 101 |
+
"list [a-b-c]",
|
| 102 |
+
"Adjust the list label to use dashes instead of slashes.",
|
| 103 |
+
),
|
| 104 |
+
(
|
| 105 |
+
"literal-coordinate-label",
|
| 106 |
+
"coords (x:1,y:2)",
|
| 107 |
+
"coords (x:3,y:4)",
|
| 108 |
+
"Update the coordinate label from (x:1,y:2) to (x:3,y:4).",
|
| 109 |
+
),
|
| 110 |
+
]
|
| 111 |
+
],
|
| 112 |
+
*[
|
| 113 |
+
_branch_literal_task(task_id, helper, old, new)
|
| 114 |
+
for task_id, helper, old, new in [
|
| 115 |
+
("branch-literal-ready-version", "ready", "ready v1", "ready v2"),
|
| 116 |
+
("branch-literal-mode-active", "can_send", "mode: standby", "mode: active"),
|
| 117 |
+
("branch-literal-status-ok", "enabled", "status: ok [404]", "status: ok [200]"),
|
| 118 |
+
("branch-literal-step-count", "feature_flag", "steps: 1/3 complete", "steps: 2/3 complete"),
|
| 119 |
+
("branch-literal-coordinate", "should_emit", "coords (x:1,y:1)", "coords (x:2,y:2)"),
|
| 120 |
+
("branch-literal-check-pass", "allow_output", "check: fail", "check: pass"),
|
| 121 |
+
("branch-literal-health", "should_log", "health: warn", "health: ok"),
|
| 122 |
+
("branch-literal-phase", "gate_open", "phase: draft", "phase: final"),
|
| 123 |
+
]
|
| 124 |
+
],
|
| 125 |
+
*[
|
| 126 |
+
_helper_task(task_id, helper, source_expr, target_expr, expected, output)
|
| 127 |
+
for task_id, helper, source_expr, target_expr, expected, output in [
|
| 128 |
+
("helper-score-42", "score", "40 + 1", "40 + 2", 42, "ready"),
|
| 129 |
+
("helper-answer-41", "answer", "20 + 20", "20 + 21", 41, "green light"),
|
| 130 |
+
("helper-total-18", "total", "30 - 13", "30 - 12", 18, "total ok"),
|
| 131 |
+
("helper-count-9", "count", "12 - 4", "12 - 3", 9, "count passed"),
|
| 132 |
+
("helper-value-15", "value", "7 + 7", "7 + 8", 15, "value good"),
|
| 133 |
+
("helper-limit-24", "limit", "50 - 27", "50 - 26", 24, "limit open"),
|
| 134 |
+
("helper-score-31", "score", "15 + 15", "15 + 16", 31, "score matched"),
|
| 135 |
+
("helper-answer-8", "answer", "10 - 3", "10 - 2", 8, "done"),
|
| 136 |
+
]
|
| 137 |
+
],
|
| 138 |
+
*[
|
| 139 |
+
_two_helper_task(task_id, helper, other, source_expr, target_expr, other_expr, expected)
|
| 140 |
+
for task_id, helper, other, source_expr, target_expr, other_expr, expected in [
|
| 141 |
+
("two-helper-score", "score", "spare", "12 + 9", "13 + 9", "30 - 8", 22),
|
| 142 |
+
("two-helper-total", "total", "backup", "40 - 19", "41 - 19", "5 + 7", 22),
|
| 143 |
+
("two-helper-count", "count", "idle", "18 + 3", "19 + 3", "14 - 2", 22),
|
| 144 |
+
("two-helper-value", "value", "other", "55 - 35", "56 - 35", "6 + 2", 21),
|
| 145 |
+
("two-helper-answer", "answer", "spare", "27 - 7", "28 - 7", "8 + 1", 21),
|
| 146 |
+
("two-helper-level", "level", "helper", "9 + 10", "10 + 10", "40 - 3", 20),
|
| 147 |
+
("two-helper-points", "points", "extra", "64 - 46", "65 - 46", "11 + 4", 19),
|
| 148 |
+
("two-helper-result", "result", "unused", "33 - 12", "34 - 12", "2 + 2", 22),
|
| 149 |
+
]
|
| 150 |
+
],
|
| 151 |
+
*[
|
| 152 |
+
_call_task(task_id, source_args, target_args, expected)
|
| 153 |
+
for task_id, source_args, target_args, expected in [
|
| 154 |
+
("call-update-five", "1, 1", "4, 1", 5),
|
| 155 |
+
("call-update-ten", "7, 1", "7, 3", 10),
|
| 156 |
+
("call-update-eleven", "0, 6", "5, 6", 11),
|
| 157 |
+
("call-update-twenty", "9, 9", "11, 9", 20),
|
| 158 |
+
("call-update-seven", "2, 2", "5, 2", 7),
|
| 159 |
+
("call-update-twelve", "10, 0", "10, 2", 12),
|
| 160 |
+
("call-update-sixteen", "8, 4", "8, 8", 16),
|
| 161 |
+
("call-update-thirteen", "6, 6", "6, 7", 13),
|
| 162 |
+
]
|
| 163 |
+
],
|
| 164 |
+
*[
|
| 165 |
+
_condition_task(task_id, helper, returned, source_compare, "match found")
|
| 166 |
+
for task_id, helper, returned, source_compare in [
|
| 167 |
+
("condition-count-four", "count", 4, 1),
|
| 168 |
+
("condition-level-nine", "level", 9, 2),
|
| 169 |
+
("condition-token-twelve", "token", 12, 8),
|
| 170 |
+
("condition-value-fifteen", "value", 15, 10),
|
| 171 |
+
("condition-flag-six", "flag", 6, 3),
|
| 172 |
+
("condition-score-eleven", "score", 11, 7),
|
| 173 |
+
("condition-count-eight", "count", 8, 4),
|
| 174 |
+
("condition-marker-fourteen", "marker", 14, 0),
|
| 175 |
+
]
|
| 176 |
+
],
|
| 177 |
+
*[
|
| 178 |
+
_diagnostic_task(task_id, message)
|
| 179 |
+
for task_id, message in [
|
| 180 |
+
("diagnostic-starting-up", "starting up"),
|
| 181 |
+
("diagnostic-hello-main", "hello from main"),
|
| 182 |
+
("diagnostic-message", "diagnostic message"),
|
| 183 |
+
("diagnostic-payload-logged", "payload logged"),
|
| 184 |
+
("diagnostic-attempt-write", "attempt write"),
|
| 185 |
+
("diagnostic-retrying-output", "retrying output"),
|
| 186 |
+
("diagnostic-done", "done"),
|
| 187 |
+
("diagnostic-needs-raises", "needs raises"),
|
| 188 |
+
]
|
| 189 |
+
],
|
| 190 |
+
]
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
SYNTHETIC_TASKS: list[dict[str, Any]] = [*EVAL_TASKS, *TRAIN_TASKS]
|
zerolang_editing/train_tasks.py
ADDED
|
@@ -0,0 +1,255 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Synthetic training rows for the Zerolang editing environment."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from typing import Any
|
| 6 |
+
|
| 7 |
+
from .task_builders import (
|
| 8 |
+
_branch_literal_task,
|
| 9 |
+
_call_task,
|
| 10 |
+
_condition_task,
|
| 11 |
+
_diagnostic_task,
|
| 12 |
+
_helper_task,
|
| 13 |
+
_literal_task,
|
| 14 |
+
_two_helper_task,
|
| 15 |
+
)
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
LEGACY_TRAIN_TASKS: list[dict[str, Any]] = [
|
| 19 |
+
_helper_task(
|
| 20 |
+
"helper-return-update",
|
| 21 |
+
"answer",
|
| 22 |
+
"40 + 1",
|
| 23 |
+
"40 + 2",
|
| 24 |
+
42,
|
| 25 |
+
"math works",
|
| 26 |
+
split="train",
|
| 27 |
+
),
|
| 28 |
+
_call_task("callee-argument-update", "2, 2", "2, 3", 5, split="train"),
|
| 29 |
+
_condition_task("comparison-target-update", "score", 7, 8, "ready", split="train"),
|
| 30 |
+
_diagnostic_task("fallible-main-repair", "needs raises", split="train"),
|
| 31 |
+
]
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
def _literal_train_tasks() -> list[dict[str, Any]]:
|
| 35 |
+
pairs = [
|
| 36 |
+
("queue pending", "queue ready"),
|
| 37 |
+
("job queued", "job running"),
|
| 38 |
+
("job running", "job complete"),
|
| 39 |
+
("build red", "build green"),
|
| 40 |
+
("node cold", "node warm"),
|
| 41 |
+
("cache miss", "cache hit"),
|
| 42 |
+
("retry later", "retry now"),
|
| 43 |
+
("draft note", "final note"),
|
| 44 |
+
("plan open", "plan closed"),
|
| 45 |
+
("graph stale", "graph fresh"),
|
| 46 |
+
("route /v1/run", "route /v2/run"),
|
| 47 |
+
("status [100]", "status [200]"),
|
| 48 |
+
("phase: alpha", "phase: beta"),
|
| 49 |
+
("phase: beta", "phase: gamma"),
|
| 50 |
+
("step 1/4", "step 2/4"),
|
| 51 |
+
("step 2/4", "step 3/4"),
|
| 52 |
+
("score 10/20", "score 18/20"),
|
| 53 |
+
("level: low", "level: high"),
|
| 54 |
+
("mode manual", "mode auto"),
|
| 55 |
+
("window closed", "window open"),
|
| 56 |
+
("target west", "target east"),
|
| 57 |
+
("port 3000", "port 8080"),
|
| 58 |
+
("run id a1", "run id b2"),
|
| 59 |
+
("batch small", "batch large"),
|
| 60 |
+
("token old", "token new"),
|
| 61 |
+
("edge loose", "edge locked"),
|
| 62 |
+
("module local", "module remote"),
|
| 63 |
+
("worker idle", "worker busy"),
|
| 64 |
+
("agent paused", "agent active"),
|
| 65 |
+
("output empty", "output full"),
|
| 66 |
+
("index 0", "index 1"),
|
| 67 |
+
("flag off", "flag on"),
|
| 68 |
+
("signal weak", "signal strong"),
|
| 69 |
+
("health warn", "health pass"),
|
| 70 |
+
("check skipped", "check passed"),
|
| 71 |
+
("ticket open", "ticket merged"),
|
| 72 |
+
("snapshot old", "snapshot new"),
|
| 73 |
+
("profile dev", "profile prod"),
|
| 74 |
+
("version 0.1", "version 0.2"),
|
| 75 |
+
("result unknown", "result known"),
|
| 76 |
+
]
|
| 77 |
+
return [
|
| 78 |
+
_literal_task(f"train-literal-{index:03d}", old, new, split="train")
|
| 79 |
+
for index, (old, new) in enumerate(pairs, start=1)
|
| 80 |
+
]
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
def _branch_literal_train_tasks() -> list[dict[str, Any]]:
|
| 84 |
+
specs = [
|
| 85 |
+
("ready_gate", "gate draft", "gate ready"),
|
| 86 |
+
("emit_gate", "emit old", "emit new"),
|
| 87 |
+
("mode_gate", "mode test", "mode live"),
|
| 88 |
+
("route_gate", "route blue", "route green"),
|
| 89 |
+
("status_gate", "status low", "status high"),
|
| 90 |
+
("phase_gate", "phase one", "phase two"),
|
| 91 |
+
("counter_gate", "count fail", "count pass"),
|
| 92 |
+
("worker_gate", "worker wait", "worker run"),
|
| 93 |
+
("deploy_gate", "deploy hold", "deploy ship"),
|
| 94 |
+
("review_gate", "review open", "review done"),
|
| 95 |
+
("graph_gate", "graph dirty", "graph clean"),
|
| 96 |
+
("patch_gate", "patch text", "patch graph"),
|
| 97 |
+
("score_gate", "score bad", "score good"),
|
| 98 |
+
("plan_gate", "plan rough", "plan exact"),
|
| 99 |
+
("test_gate", "test flaky", "test stable"),
|
| 100 |
+
("queue_gate", "queue blocked", "queue clear"),
|
| 101 |
+
("cache_gate", "cache cold", "cache hot"),
|
| 102 |
+
("trace_gate", "trace off", "trace on"),
|
| 103 |
+
("run_gate", "run dry", "run real"),
|
| 104 |
+
("sync_gate", "sync stale", "sync current"),
|
| 105 |
+
]
|
| 106 |
+
return [
|
| 107 |
+
_branch_literal_task(f"train-branch-literal-{index:03d}", helper, old, new, split="train")
|
| 108 |
+
for index, (helper, old, new) in enumerate(specs, start=1)
|
| 109 |
+
]
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
def _helper_train_tasks() -> list[dict[str, Any]]:
|
| 113 |
+
helpers = ["answer", "score", "total", "count", "value", "limit", "level", "points"]
|
| 114 |
+
outputs = ["ok", "ready", "matched", "accepted", "passed", "open", "done", "green"]
|
| 115 |
+
tasks: list[dict[str, Any]] = []
|
| 116 |
+
|
| 117 |
+
for index in range(1, 26):
|
| 118 |
+
left = 10 + index
|
| 119 |
+
target_right = 3 + (index % 9)
|
| 120 |
+
source_right = target_right - 1
|
| 121 |
+
expected = left + target_right
|
| 122 |
+
tasks.append(
|
| 123 |
+
_helper_task(
|
| 124 |
+
f"train-helper-add-{index:03d}",
|
| 125 |
+
helpers[index % len(helpers)],
|
| 126 |
+
f"{left} + {source_right}",
|
| 127 |
+
f"{left} + {target_right}",
|
| 128 |
+
expected,
|
| 129 |
+
outputs[index % len(outputs)],
|
| 130 |
+
split="train",
|
| 131 |
+
)
|
| 132 |
+
)
|
| 133 |
+
|
| 134 |
+
for index in range(1, 26):
|
| 135 |
+
left = 60 + index
|
| 136 |
+
target_right = 5 + (index % 11)
|
| 137 |
+
source_right = target_right + 1
|
| 138 |
+
expected = left - target_right
|
| 139 |
+
tasks.append(
|
| 140 |
+
_helper_task(
|
| 141 |
+
f"train-helper-sub-{index:03d}",
|
| 142 |
+
helpers[(index + 3) % len(helpers)],
|
| 143 |
+
f"{left} - {source_right}",
|
| 144 |
+
f"{left} - {target_right}",
|
| 145 |
+
expected,
|
| 146 |
+
outputs[(index + 2) % len(outputs)],
|
| 147 |
+
split="train",
|
| 148 |
+
)
|
| 149 |
+
)
|
| 150 |
+
|
| 151 |
+
return tasks
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
def _two_helper_train_tasks() -> list[dict[str, Any]]:
|
| 155 |
+
primary_helpers = ["score", "total", "count", "value", "answer", "level", "points", "result"]
|
| 156 |
+
other_helpers = ["spare", "backup", "idle", "other", "side", "helper", "extra", "unused"]
|
| 157 |
+
tasks: list[dict[str, Any]] = []
|
| 158 |
+
for index in range(1, 21):
|
| 159 |
+
left = 20 + index
|
| 160 |
+
target_right = 2 + (index % 7)
|
| 161 |
+
source_right = target_right - 1
|
| 162 |
+
expected = left + target_right
|
| 163 |
+
other_expr = f"{4 + index % 6} + {8 + index % 5}"
|
| 164 |
+
tasks.append(
|
| 165 |
+
_two_helper_task(
|
| 166 |
+
f"train-two-helper-{index:03d}",
|
| 167 |
+
primary_helpers[index % len(primary_helpers)],
|
| 168 |
+
other_helpers[index % len(other_helpers)],
|
| 169 |
+
f"{left} + {source_right}",
|
| 170 |
+
f"{left} + {target_right}",
|
| 171 |
+
other_expr,
|
| 172 |
+
expected,
|
| 173 |
+
split="train",
|
| 174 |
+
)
|
| 175 |
+
)
|
| 176 |
+
return tasks
|
| 177 |
+
|
| 178 |
+
|
| 179 |
+
def _call_train_tasks() -> list[dict[str, Any]]:
|
| 180 |
+
tasks: list[dict[str, Any]] = []
|
| 181 |
+
for index in range(1, 31):
|
| 182 |
+
left = 1 + (index % 17)
|
| 183 |
+
target_right = 2 + (index % 13)
|
| 184 |
+
source_right = target_right - 1
|
| 185 |
+
expected = left + target_right
|
| 186 |
+
tasks.append(
|
| 187 |
+
_call_task(
|
| 188 |
+
f"train-call-update-{index:03d}",
|
| 189 |
+
f"{left}, {source_right}",
|
| 190 |
+
f"{left}, {target_right}",
|
| 191 |
+
expected,
|
| 192 |
+
split="train",
|
| 193 |
+
)
|
| 194 |
+
)
|
| 195 |
+
return tasks
|
| 196 |
+
|
| 197 |
+
|
| 198 |
+
def _condition_train_tasks() -> list[dict[str, Any]]:
|
| 199 |
+
helpers = ["score", "count", "level", "token", "value", "flag", "marker", "limit"]
|
| 200 |
+
tasks: list[dict[str, Any]] = []
|
| 201 |
+
for index in range(1, 26):
|
| 202 |
+
returned = 5 + (index * 3)
|
| 203 |
+
source_compare = returned + 1
|
| 204 |
+
tasks.append(
|
| 205 |
+
_condition_task(
|
| 206 |
+
f"train-condition-update-{index:03d}",
|
| 207 |
+
helpers[index % len(helpers)],
|
| 208 |
+
returned,
|
| 209 |
+
source_compare,
|
| 210 |
+
"matched",
|
| 211 |
+
split="train",
|
| 212 |
+
)
|
| 213 |
+
)
|
| 214 |
+
return tasks
|
| 215 |
+
|
| 216 |
+
|
| 217 |
+
def _diagnostic_train_tasks() -> list[dict[str, Any]]:
|
| 218 |
+
messages = [
|
| 219 |
+
"train starting",
|
| 220 |
+
"train ready",
|
| 221 |
+
"diagnostic pass",
|
| 222 |
+
"writer needs raises",
|
| 223 |
+
"output accepted",
|
| 224 |
+
"payload saved",
|
| 225 |
+
"attempt complete",
|
| 226 |
+
"retry complete",
|
| 227 |
+
"batch emitted",
|
| 228 |
+
"sample logged",
|
| 229 |
+
"graph checked",
|
| 230 |
+
"patch validated",
|
| 231 |
+
"route verified",
|
| 232 |
+
"state stored",
|
| 233 |
+
"run complete",
|
| 234 |
+
"score written",
|
| 235 |
+
"marker emitted",
|
| 236 |
+
"world write",
|
| 237 |
+
"tool output",
|
| 238 |
+
"final line",
|
| 239 |
+
]
|
| 240 |
+
return [
|
| 241 |
+
_diagnostic_task(f"train-diagnostic-{index:03d}", message, split="train")
|
| 242 |
+
for index, message in enumerate(messages, start=1)
|
| 243 |
+
]
|
| 244 |
+
|
| 245 |
+
|
| 246 |
+
TRAIN_TASKS: list[dict[str, Any]] = [
|
| 247 |
+
*_literal_train_tasks(),
|
| 248 |
+
*_branch_literal_train_tasks(),
|
| 249 |
+
*LEGACY_TRAIN_TASKS,
|
| 250 |
+
*_helper_train_tasks(),
|
| 251 |
+
*_two_helper_train_tasks(),
|
| 252 |
+
*_call_train_tasks(),
|
| 253 |
+
*_condition_train_tasks(),
|
| 254 |
+
*_diagnostic_train_tasks(),
|
| 255 |
+
]
|
zerolang_editing/zero_tools.py
ADDED
|
@@ -0,0 +1,310 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Path-based Zerolang compiler tools for the editing environment."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import hashlib
|
| 6 |
+
import json
|
| 7 |
+
import os
|
| 8 |
+
import platform
|
| 9 |
+
import re
|
| 10 |
+
import shutil
|
| 11 |
+
import subprocess
|
| 12 |
+
import tempfile
|
| 13 |
+
import threading
|
| 14 |
+
import urllib.request
|
| 15 |
+
from pathlib import Path
|
| 16 |
+
from typing import Any
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
_ZERO_INSTALL_LOCK = threading.Lock()
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def _download(url: str, timeout: int = 60) -> bytes:
|
| 23 |
+
with urllib.request.urlopen(url, timeout=timeout) as response:
|
| 24 |
+
return response.read()
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
def _zero_asset_candidates() -> list[str]:
|
| 28 |
+
system = platform.system()
|
| 29 |
+
machine = platform.machine().lower()
|
| 30 |
+
if machine in {"arm64", "aarch64"}:
|
| 31 |
+
cpu = "arm64"
|
| 32 |
+
elif machine in {"x86_64", "amd64"}:
|
| 33 |
+
cpu = "x64"
|
| 34 |
+
else:
|
| 35 |
+
return []
|
| 36 |
+
|
| 37 |
+
if system == "Darwin":
|
| 38 |
+
return [f"zero-darwin-{cpu}"]
|
| 39 |
+
if system == "Linux":
|
| 40 |
+
return [f"zero-linux-musl-{cpu}", f"zero-linux-{cpu}"]
|
| 41 |
+
return []
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def _install_zero_binary() -> str | None:
|
| 45 |
+
install_dir = Path(
|
| 46 |
+
os.environ.get("ZERO_INSTALL_DIR")
|
| 47 |
+
or Path(tempfile.gettempdir()) / "zerolang-editing-zero" / "bin"
|
| 48 |
+
).expanduser()
|
| 49 |
+
binary = install_dir / "zero"
|
| 50 |
+
if binary.exists():
|
| 51 |
+
return str(binary)
|
| 52 |
+
|
| 53 |
+
with _ZERO_INSTALL_LOCK:
|
| 54 |
+
if binary.exists():
|
| 55 |
+
return str(binary)
|
| 56 |
+
|
| 57 |
+
base_url = os.environ.get(
|
| 58 |
+
"ZERO_DOWNLOAD_BASE_URL",
|
| 59 |
+
"https://github.com/vercel-labs/zero/releases/latest/download",
|
| 60 |
+
).rstrip("/")
|
| 61 |
+
checksums_text = _download(f"{base_url}/CHECKSUMS.txt").decode()
|
| 62 |
+
checksums = {}
|
| 63 |
+
for line in checksums_text.splitlines():
|
| 64 |
+
parts = line.split()
|
| 65 |
+
if len(parts) >= 2:
|
| 66 |
+
checksums[parts[1]] = parts[0]
|
| 67 |
+
|
| 68 |
+
install_dir.mkdir(parents=True, exist_ok=True)
|
| 69 |
+
last_error: Exception | None = None
|
| 70 |
+
for asset in _zero_asset_candidates():
|
| 71 |
+
try:
|
| 72 |
+
data = _download(f"{base_url}/{asset}")
|
| 73 |
+
expected = checksums.get(asset)
|
| 74 |
+
actual = hashlib.sha256(data).hexdigest()
|
| 75 |
+
if expected and actual != expected:
|
| 76 |
+
raise RuntimeError(f"checksum mismatch for {asset}")
|
| 77 |
+
binary.write_bytes(data)
|
| 78 |
+
os.chmod(binary, 0o755)
|
| 79 |
+
check = subprocess.run(
|
| 80 |
+
[str(binary), "--version"],
|
| 81 |
+
text=True,
|
| 82 |
+
capture_output=True,
|
| 83 |
+
timeout=10,
|
| 84 |
+
)
|
| 85 |
+
if check.returncode == 0:
|
| 86 |
+
return str(binary)
|
| 87 |
+
except Exception as exc:
|
| 88 |
+
last_error = exc
|
| 89 |
+
if binary.exists():
|
| 90 |
+
binary.unlink()
|
| 91 |
+
continue
|
| 92 |
+
if last_error is not None:
|
| 93 |
+
raise RuntimeError(f"failed to install zero binary: {last_error}") from last_error
|
| 94 |
+
return None
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
def _zero_binary(zero_path: str | None = None) -> str | None:
|
| 98 |
+
candidates = [
|
| 99 |
+
zero_path,
|
| 100 |
+
shutil.which("zero"),
|
| 101 |
+
str(Path.home() / ".zero" / "bin" / "zero"),
|
| 102 |
+
]
|
| 103 |
+
for candidate in candidates:
|
| 104 |
+
if candidate and Path(candidate).exists():
|
| 105 |
+
return candidate
|
| 106 |
+
return _install_zero_binary()
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
def _json_tool_result(result: dict[str, Any]) -> str:
|
| 110 |
+
return json.dumps(result, indent=2, sort_keys=True)
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
def read_source(path: str | Path) -> str:
|
| 114 |
+
return Path(path).read_text()
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
def _source_fingerprint(path: Path) -> dict[str, Any]:
|
| 118 |
+
if not path.exists():
|
| 119 |
+
return {"path": str(path), "exists": False}
|
| 120 |
+
data = path.read_bytes()
|
| 121 |
+
return {
|
| 122 |
+
"path": str(path),
|
| 123 |
+
"exists": True,
|
| 124 |
+
"bytes": len(data),
|
| 125 |
+
"source_sha256": hashlib.sha256(data).hexdigest(),
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
|
| 129 |
+
def _summarize_graph_dump(graph_dump: str) -> dict[str, Any]:
|
| 130 |
+
summary: dict[str, Any] = {
|
| 131 |
+
"hash": None,
|
| 132 |
+
"literals": [],
|
| 133 |
+
"functions": [],
|
| 134 |
+
"calls": [],
|
| 135 |
+
"identifiers": [],
|
| 136 |
+
}
|
| 137 |
+
for line in graph_dump.splitlines():
|
| 138 |
+
if line.startswith("hash "):
|
| 139 |
+
summary["hash"] = line.split('"', 2)[1]
|
| 140 |
+
elif " Literal " in line:
|
| 141 |
+
match = re.match(r'node (#[0-9a-f]+) Literal type:"([^"]+)" value:"(.*)"', line)
|
| 142 |
+
if match:
|
| 143 |
+
summary["literals"].append(
|
| 144 |
+
{"node": match.group(1), "type": match.group(2), "value": match.group(3)}
|
| 145 |
+
)
|
| 146 |
+
elif " Function " in line:
|
| 147 |
+
match = re.match(r'node (#[0-9a-f]+) Function name:"([^"]+)" type:"([^"]+)"', line)
|
| 148 |
+
if match:
|
| 149 |
+
summary["functions"].append(
|
| 150 |
+
{"node": match.group(1), "name": match.group(2), "type": match.group(3)}
|
| 151 |
+
)
|
| 152 |
+
elif " MethodCall " in line:
|
| 153 |
+
match = re.match(r'node (#[0-9a-f]+) MethodCall name:"([^"]+)" type:"([^"]+)"', line)
|
| 154 |
+
if match:
|
| 155 |
+
summary["calls"].append(
|
| 156 |
+
{"node": match.group(1), "name": match.group(2), "type": match.group(3)}
|
| 157 |
+
)
|
| 158 |
+
elif " Identifier " in line:
|
| 159 |
+
match = re.match(r'node (#[0-9a-f]+) Identifier name:"([^"]+)"', line)
|
| 160 |
+
if match:
|
| 161 |
+
summary["identifiers"].append({"node": match.group(1), "name": match.group(2)})
|
| 162 |
+
return summary
|
| 163 |
+
|
| 164 |
+
|
| 165 |
+
def run_zero_path(args: list[str], path: str | Path, zero_path: str | None = None) -> dict[str, Any]:
|
| 166 |
+
binary = _zero_binary(zero_path)
|
| 167 |
+
source_path = Path(path)
|
| 168 |
+
if binary is None:
|
| 169 |
+
return {
|
| 170 |
+
"ok": False,
|
| 171 |
+
"tool_error": "zero binary not found; install with https://zerolang.ai/install.sh",
|
| 172 |
+
**_source_fingerprint(source_path),
|
| 173 |
+
}
|
| 174 |
+
if not source_path.exists():
|
| 175 |
+
return {
|
| 176 |
+
"ok": False,
|
| 177 |
+
"tool_error": f"source file does not exist: {source_path}",
|
| 178 |
+
**_source_fingerprint(source_path),
|
| 179 |
+
}
|
| 180 |
+
|
| 181 |
+
proc = subprocess.run(
|
| 182 |
+
[binary, *args, str(source_path)],
|
| 183 |
+
text=True,
|
| 184 |
+
capture_output=True,
|
| 185 |
+
timeout=10,
|
| 186 |
+
env={**os.environ, "PATH": f"{Path(binary).parent}:{os.environ.get('PATH', '')}"},
|
| 187 |
+
)
|
| 188 |
+
return {
|
| 189 |
+
"ok": proc.returncode == 0,
|
| 190 |
+
"returncode": proc.returncode,
|
| 191 |
+
"stdout": proc.stdout[-12000:],
|
| 192 |
+
"stderr": proc.stderr[-4000:],
|
| 193 |
+
**_source_fingerprint(source_path),
|
| 194 |
+
}
|
| 195 |
+
|
| 196 |
+
|
| 197 |
+
def run_zero_source(
|
| 198 |
+
args: list[str], source: str, zero_path: str | None = None
|
| 199 |
+
) -> dict[str, Any]:
|
| 200 |
+
with tempfile.TemporaryDirectory(prefix="zerolang-editing-score-") as tmp:
|
| 201 |
+
source_path = Path(tmp) / "program.0"
|
| 202 |
+
source_path.write_text(source)
|
| 203 |
+
return run_zero_path(args, source_path, zero_path)
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
def make_zero_tools(zero_path: str | None = None) -> list[Any]:
|
| 207 |
+
def zero_check(path: str) -> str:
|
| 208 |
+
"""Run `zero check --json` on a `.0` file path on disk."""
|
| 209 |
+
return _json_tool_result(run_zero_path(["check", "--json"], path, zero_path))
|
| 210 |
+
|
| 211 |
+
def zero_graph_summary(path: str) -> str:
|
| 212 |
+
"""Return compact graph hash and patchable node facts for a `.0` file path."""
|
| 213 |
+
result = run_zero_path(["graph", "dump"], path, zero_path)
|
| 214 |
+
if result.get("ok"):
|
| 215 |
+
result["summary"] = _summarize_graph_dump(result.get("stdout", ""))
|
| 216 |
+
return _json_tool_result(result)
|
| 217 |
+
|
| 218 |
+
def zero_graph_dump(path: str) -> str:
|
| 219 |
+
"""Run `zero graph dump` on a `.0` file path on disk."""
|
| 220 |
+
return _json_tool_result(run_zero_path(["graph", "dump"], path, zero_path))
|
| 221 |
+
|
| 222 |
+
def zero_graph_json(path: str) -> str:
|
| 223 |
+
"""Run `zero graph --json` on a `.0` file path on disk."""
|
| 224 |
+
return _json_tool_result(run_zero_path(["graph", "--json"], path, zero_path))
|
| 225 |
+
|
| 226 |
+
def zero_fix_plan(path: str) -> str:
|
| 227 |
+
"""Run `zero fix --plan --json` on a `.0` file path on disk."""
|
| 228 |
+
return _json_tool_result(run_zero_path(["fix", "--plan", "--json"], path, zero_path))
|
| 229 |
+
|
| 230 |
+
def zero_graph_patch(path: str, expect_graph_hash: str, op: str) -> str:
|
| 231 |
+
"""Apply one checked `zero graph patch` operation to a `.0` file path on disk."""
|
| 232 |
+
binary = _zero_binary(zero_path)
|
| 233 |
+
source_path = Path(path)
|
| 234 |
+
if binary is None:
|
| 235 |
+
return _json_tool_result(
|
| 236 |
+
{
|
| 237 |
+
"ok": False,
|
| 238 |
+
"tool_error": "zero binary not found; install with https://zerolang.ai/install.sh",
|
| 239 |
+
**_source_fingerprint(source_path),
|
| 240 |
+
}
|
| 241 |
+
)
|
| 242 |
+
if not source_path.exists():
|
| 243 |
+
return _json_tool_result(
|
| 244 |
+
{
|
| 245 |
+
"ok": False,
|
| 246 |
+
"tool_error": f"source file does not exist: {source_path}",
|
| 247 |
+
**_source_fingerprint(source_path),
|
| 248 |
+
}
|
| 249 |
+
)
|
| 250 |
+
proc = subprocess.run(
|
| 251 |
+
[
|
| 252 |
+
binary,
|
| 253 |
+
"graph",
|
| 254 |
+
"patch",
|
| 255 |
+
str(source_path),
|
| 256 |
+
"--expect-graph-hash",
|
| 257 |
+
expect_graph_hash,
|
| 258 |
+
"--op",
|
| 259 |
+
op,
|
| 260 |
+
],
|
| 261 |
+
text=True,
|
| 262 |
+
capture_output=True,
|
| 263 |
+
timeout=10,
|
| 264 |
+
env={**os.environ, "PATH": f"{Path(binary).parent}:{os.environ.get('PATH', '')}"},
|
| 265 |
+
)
|
| 266 |
+
return _json_tool_result(
|
| 267 |
+
{
|
| 268 |
+
"ok": proc.returncode == 0,
|
| 269 |
+
"returncode": proc.returncode,
|
| 270 |
+
"stdout": proc.stdout[-12000:],
|
| 271 |
+
"stderr": proc.stderr[-4000:],
|
| 272 |
+
**_source_fingerprint(source_path),
|
| 273 |
+
}
|
| 274 |
+
)
|
| 275 |
+
|
| 276 |
+
def zero_skills_get(skill: str) -> str:
|
| 277 |
+
"""Return version-matched Zerolang guidance for `language`, `diagnostics`, `stdlib`, or `zero`."""
|
| 278 |
+
binary = _zero_binary(zero_path)
|
| 279 |
+
if binary is None:
|
| 280 |
+
return _json_tool_result(
|
| 281 |
+
{
|
| 282 |
+
"ok": False,
|
| 283 |
+
"tool_error": "zero binary not found; install with https://zerolang.ai/install.sh",
|
| 284 |
+
}
|
| 285 |
+
)
|
| 286 |
+
proc = subprocess.run(
|
| 287 |
+
[binary, "skills", "get", skill],
|
| 288 |
+
text=True,
|
| 289 |
+
capture_output=True,
|
| 290 |
+
timeout=10,
|
| 291 |
+
env={**os.environ, "PATH": f"{Path(binary).parent}:{os.environ.get('PATH', '')}"},
|
| 292 |
+
)
|
| 293 |
+
return _json_tool_result(
|
| 294 |
+
{
|
| 295 |
+
"ok": proc.returncode == 0,
|
| 296 |
+
"returncode": proc.returncode,
|
| 297 |
+
"stdout": proc.stdout[-12000:],
|
| 298 |
+
"stderr": proc.stderr[-4000:],
|
| 299 |
+
}
|
| 300 |
+
)
|
| 301 |
+
|
| 302 |
+
return [
|
| 303 |
+
zero_check,
|
| 304 |
+
zero_graph_summary,
|
| 305 |
+
zero_graph_dump,
|
| 306 |
+
zero_graph_json,
|
| 307 |
+
zero_fix_plan,
|
| 308 |
+
zero_graph_patch,
|
| 309 |
+
zero_skills_get,
|
| 310 |
+
]
|
zerolang_editing/zerolang_editing.py
ADDED
|
@@ -0,0 +1,418 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Prime Verifiers environment for Zerolang graph-first editing."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import json
|
| 6 |
+
import os
|
| 7 |
+
import re
|
| 8 |
+
import tempfile
|
| 9 |
+
from collections.abc import Mapping
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
from typing import Any
|
| 12 |
+
|
| 13 |
+
from datasets import Dataset
|
| 14 |
+
import verifiers as vf
|
| 15 |
+
|
| 16 |
+
from .tasks import SYNTHETIC_TASKS
|
| 17 |
+
from .zero_tools import make_zero_tools, read_source, run_zero_path, run_zero_source
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
SYSTEM_PROMPT = """\
|
| 21 |
+
You are Roder, a coding agent running in an evaluation harness.
|
| 22 |
+
|
| 23 |
+
Complete the requested code edit, use the available tools when they are useful,
|
| 24 |
+
and return a concise final answer. The task source is already written to disk;
|
| 25 |
+
operate on the provided `.0` file path.
|
| 26 |
+
"""
|
| 27 |
+
|
| 28 |
+
ZERO_FILE_PLACEHOLDER = "{{ZERO_FILE_PATH}}"
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
def _normalize_source(source: str) -> str:
|
| 32 |
+
return "\n".join(line.rstrip() for line in source.strip().splitlines()).strip()
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def _message_role(message: Any) -> str | None:
|
| 36 |
+
if isinstance(message, dict):
|
| 37 |
+
return message.get("role")
|
| 38 |
+
return getattr(message, "role", None)
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
def _message_content(message: Any) -> str:
|
| 42 |
+
if isinstance(message, dict):
|
| 43 |
+
content = message.get("content", "")
|
| 44 |
+
else:
|
| 45 |
+
content = getattr(message, "content", "")
|
| 46 |
+
return content if isinstance(content, str) else str(content)
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def _completion_text(completion: Any) -> str:
|
| 50 |
+
if isinstance(completion, str):
|
| 51 |
+
return completion
|
| 52 |
+
for message in reversed(completion or []):
|
| 53 |
+
if _message_role(message) == "assistant":
|
| 54 |
+
content = _message_content(message)
|
| 55 |
+
if content:
|
| 56 |
+
return content
|
| 57 |
+
return _message_content((completion or [{}])[-1]) if completion else ""
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def _extract_json_payload(completion: Any) -> dict[str, Any] | None:
|
| 61 |
+
text = _completion_text(completion).strip()
|
| 62 |
+
fenced_json = re.search(r"```json\s*(.*?)```", text, re.DOTALL | re.IGNORECASE)
|
| 63 |
+
if fenced_json:
|
| 64 |
+
text = fenced_json.group(1).strip()
|
| 65 |
+
|
| 66 |
+
for candidate in (text, None):
|
| 67 |
+
if candidate is None:
|
| 68 |
+
object_match = re.search(r"\{.*\}", text, re.DOTALL)
|
| 69 |
+
if object_match is None:
|
| 70 |
+
continue
|
| 71 |
+
candidate = object_match.group(0)
|
| 72 |
+
try:
|
| 73 |
+
payload = json.loads(candidate)
|
| 74 |
+
except json.JSONDecodeError:
|
| 75 |
+
continue
|
| 76 |
+
if isinstance(payload, dict):
|
| 77 |
+
return payload
|
| 78 |
+
return None
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
def _extract_final_source(completion: Any) -> str:
|
| 82 |
+
payload = _extract_json_payload(completion)
|
| 83 |
+
if payload is not None and isinstance(payload.get("final_source"), str):
|
| 84 |
+
return payload["final_source"]
|
| 85 |
+
|
| 86 |
+
text = _completion_text(completion).strip()
|
| 87 |
+
fenced_zero = re.search(r"```(?:zero|0)?\s*(.*?)```", text, re.DOTALL | re.IGNORECASE)
|
| 88 |
+
if fenced_zero:
|
| 89 |
+
return fenced_zero.group(1).strip()
|
| 90 |
+
return text
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
def _state_file_path(state: Any) -> str | None:
|
| 94 |
+
if isinstance(state, dict):
|
| 95 |
+
path = state.get("zero_file_path")
|
| 96 |
+
return path if isinstance(path, str) else None
|
| 97 |
+
return None
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
def _state_file_source(state: Any) -> str:
|
| 101 |
+
path = _state_file_path(state)
|
| 102 |
+
if not path:
|
| 103 |
+
return ""
|
| 104 |
+
try:
|
| 105 |
+
return read_source(path)
|
| 106 |
+
except OSError:
|
| 107 |
+
return ""
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
def _scored_source(completion: Any, state: Any = None) -> str:
|
| 111 |
+
disk_source = _state_file_source(state)
|
| 112 |
+
if disk_source:
|
| 113 |
+
return disk_source
|
| 114 |
+
return _extract_final_source(completion)
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
def _tool_was_called(state: Any) -> bool:
|
| 118 |
+
for turn in (state or {}).get("trajectory", []):
|
| 119 |
+
for message in turn.get("completion", []):
|
| 120 |
+
tool_calls = getattr(message, "tool_calls", None)
|
| 121 |
+
if tool_calls:
|
| 122 |
+
return True
|
| 123 |
+
if isinstance(message, dict) and message.get("tool_calls"):
|
| 124 |
+
return True
|
| 125 |
+
return False
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
def _make_prompt(row: dict[str, Any]) -> list[dict[str, str]]:
|
| 129 |
+
return [
|
| 130 |
+
{
|
| 131 |
+
"role": "user",
|
| 132 |
+
"content": (
|
| 133 |
+
f"Task id: {row['id']}\n"
|
| 134 |
+
f"Edit goal: {row['goal']}\n\n"
|
| 135 |
+
"The Zerolang source has been written to this file:\n"
|
| 136 |
+
f"{ZERO_FILE_PLACEHOLDER}\n\n"
|
| 137 |
+
"Use tool arguments with `path` set to that `.0` file. "
|
| 138 |
+
"The grader will read the edited file from disk and run `zero check` on it. "
|
| 139 |
+
"Return a JSON object with `path` when finished."
|
| 140 |
+
),
|
| 141 |
+
}
|
| 142 |
+
]
|
| 143 |
+
|
| 144 |
+
|
| 145 |
+
def _build_dataset(split: str, max_examples: int | None) -> Dataset:
|
| 146 |
+
rows: list[dict[str, Any]] = []
|
| 147 |
+
for task in SYNTHETIC_TASKS:
|
| 148 |
+
if split != "all" and task["split"] != split:
|
| 149 |
+
continue
|
| 150 |
+
rows.append(
|
| 151 |
+
{
|
| 152 |
+
"prompt": _make_prompt(task),
|
| 153 |
+
"answer": task["target_source"],
|
| 154 |
+
"info": json.dumps(
|
| 155 |
+
{
|
| 156 |
+
"id": task["id"],
|
| 157 |
+
"category": task["category"],
|
| 158 |
+
"split": task["split"],
|
| 159 |
+
"goal": task["goal"],
|
| 160 |
+
"source": task["source"],
|
| 161 |
+
"target_source": task["target_source"],
|
| 162 |
+
}
|
| 163 |
+
),
|
| 164 |
+
}
|
| 165 |
+
)
|
| 166 |
+
if max_examples is not None:
|
| 167 |
+
rows = rows[: int(max_examples)]
|
| 168 |
+
return Dataset.from_list(rows)
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
def _workspace_root() -> Path:
|
| 172 |
+
configured = os.environ.get("ZEROLANG_EDITING_WORKDIR")
|
| 173 |
+
if configured:
|
| 174 |
+
return Path(configured).expanduser()
|
| 175 |
+
return Path(tempfile.gettempdir()) / "zerolang-editing-rollouts"
|
| 176 |
+
|
| 177 |
+
|
| 178 |
+
def _safe_task_id(task_id: str) -> str:
|
| 179 |
+
return re.sub(r"[^A-Za-z0-9_.-]+", "-", task_id).strip("-") or "task"
|
| 180 |
+
|
| 181 |
+
|
| 182 |
+
def _replace_prompt_path(messages: Any, path: str) -> None:
|
| 183 |
+
for message in messages or []:
|
| 184 |
+
if isinstance(message, dict):
|
| 185 |
+
content = message.get("content")
|
| 186 |
+
if isinstance(content, str):
|
| 187 |
+
message["content"] = content.replace(ZERO_FILE_PLACEHOLDER, path)
|
| 188 |
+
continue
|
| 189 |
+
content = getattr(message, "content", None)
|
| 190 |
+
if isinstance(content, str):
|
| 191 |
+
setattr(message, "content", content.replace(ZERO_FILE_PLACEHOLDER, path))
|
| 192 |
+
|
| 193 |
+
|
| 194 |
+
def _is_relative_to(child: Path, parent: Path) -> bool:
|
| 195 |
+
try:
|
| 196 |
+
child.relative_to(parent)
|
| 197 |
+
return True
|
| 198 |
+
except ValueError:
|
| 199 |
+
return False
|
| 200 |
+
|
| 201 |
+
|
| 202 |
+
class ZerolangPathToolEnv(vf.StatefulToolEnv):
|
| 203 |
+
"""Tool environment that creates one on-disk `.0` file per rollout."""
|
| 204 |
+
|
| 205 |
+
def __init__(self, *args: Any, workspace_root: Path | None = None, **kwargs: Any):
|
| 206 |
+
super().__init__(*args, **kwargs)
|
| 207 |
+
self.workspace_root = workspace_root or _workspace_root()
|
| 208 |
+
|
| 209 |
+
async def setup_state(self, state: vf.State) -> None:
|
| 210 |
+
info = state.get("info") or {}
|
| 211 |
+
source = info.get("source") if isinstance(info, dict) else None
|
| 212 |
+
if not isinstance(source, str) or not source.strip():
|
| 213 |
+
raise ValueError("zerolang-editing rows must include info.source")
|
| 214 |
+
|
| 215 |
+
task_id = info.get("id", "task") if isinstance(info, dict) else "task"
|
| 216 |
+
workspace = self.workspace_root / f"{_safe_task_id(str(task_id))}-{state['trajectory_id']}"
|
| 217 |
+
workspace.mkdir(parents=True, exist_ok=True)
|
| 218 |
+
file_path = workspace / "program.0"
|
| 219 |
+
file_path.write_text(source)
|
| 220 |
+
|
| 221 |
+
state["zero_workspace"] = str(workspace.resolve())
|
| 222 |
+
state["zero_file_path"] = str(file_path.resolve())
|
| 223 |
+
_replace_prompt_path(state.get("prompt"), state["zero_file_path"])
|
| 224 |
+
|
| 225 |
+
def update_tool_args(
|
| 226 |
+
self,
|
| 227 |
+
tool_name: str,
|
| 228 |
+
tool_args: dict,
|
| 229 |
+
messages: vf.Messages,
|
| 230 |
+
state: vf.State,
|
| 231 |
+
**kwargs: Any,
|
| 232 |
+
) -> dict:
|
| 233 |
+
if "source" in tool_args:
|
| 234 |
+
raise ValueError("Zerolang tools operate on `path`; do not pass source text.")
|
| 235 |
+
if tool_name == "zero_skills_get":
|
| 236 |
+
return tool_args
|
| 237 |
+
|
| 238 |
+
workspace = Path(str(state["zero_workspace"])).resolve()
|
| 239 |
+
fallback = Path(str(state["zero_file_path"])).resolve()
|
| 240 |
+
raw_value = tool_args.get("path")
|
| 241 |
+
correction_reason: str | None = None
|
| 242 |
+
|
| 243 |
+
if raw_value in {None, ""}:
|
| 244 |
+
resolved = fallback
|
| 245 |
+
correction_reason = "missing_path"
|
| 246 |
+
else:
|
| 247 |
+
raw_path = Path(str(raw_value)).expanduser()
|
| 248 |
+
resolved = (
|
| 249 |
+
(workspace / raw_path).resolve()
|
| 250 |
+
if not raw_path.is_absolute()
|
| 251 |
+
else raw_path.resolve()
|
| 252 |
+
)
|
| 253 |
+
if not _is_relative_to(resolved, workspace):
|
| 254 |
+
resolved = fallback
|
| 255 |
+
correction_reason = "outside_workspace"
|
| 256 |
+
elif resolved.suffix != ".0":
|
| 257 |
+
resolved = fallback
|
| 258 |
+
correction_reason = "non_zero_path"
|
| 259 |
+
|
| 260 |
+
if correction_reason is not None:
|
| 261 |
+
state.setdefault("zero_path_arg_corrections", []).append(
|
| 262 |
+
{
|
| 263 |
+
"tool_name": tool_name,
|
| 264 |
+
"reason": correction_reason,
|
| 265 |
+
"raw_path": "" if raw_value is None else str(raw_value),
|
| 266 |
+
}
|
| 267 |
+
)
|
| 268 |
+
tool_args["path"] = str(resolved)
|
| 269 |
+
return tool_args
|
| 270 |
+
|
| 271 |
+
|
| 272 |
+
async def target_source_match(completion: Any, answer: str, state: Any = None, **_: Any) -> float:
|
| 273 |
+
scored_source = _scored_source(completion, state)
|
| 274 |
+
return 1.0 if _normalize_source(scored_source) == _normalize_source(answer) else 0.0
|
| 275 |
+
|
| 276 |
+
|
| 277 |
+
async def zero_check_pass(completion: Any, state: Any = None, **_: Any) -> float:
|
| 278 |
+
path = _state_file_path(state)
|
| 279 |
+
if path and Path(path).exists():
|
| 280 |
+
result = run_zero_path(["check", "--json"], path)
|
| 281 |
+
else:
|
| 282 |
+
final_source = _extract_final_source(completion)
|
| 283 |
+
if not final_source.strip():
|
| 284 |
+
return 0.0
|
| 285 |
+
result = run_zero_source(["check", "--json"], final_source)
|
| 286 |
+
if not result.get("ok"):
|
| 287 |
+
return 0.0
|
| 288 |
+
try:
|
| 289 |
+
parsed = json.loads(result.get("stdout") or "{}")
|
| 290 |
+
except json.JSONDecodeError:
|
| 291 |
+
return 0.0
|
| 292 |
+
return 1.0 if parsed.get("ok") is True else 0.0
|
| 293 |
+
|
| 294 |
+
|
| 295 |
+
def _walk_graph_patch_payloads(value: Any, seen: set[int] | None = None):
|
| 296 |
+
if seen is None:
|
| 297 |
+
seen = set()
|
| 298 |
+
value_id = id(value)
|
| 299 |
+
if value_id in seen:
|
| 300 |
+
return
|
| 301 |
+
seen.add(value_id)
|
| 302 |
+
|
| 303 |
+
if isinstance(value, Mapping):
|
| 304 |
+
stdout = value.get("stdout")
|
| 305 |
+
if isinstance(stdout, str) and "program graph patch ok" in stdout:
|
| 306 |
+
yield value
|
| 307 |
+
for item in value.values():
|
| 308 |
+
yield from _walk_graph_patch_payloads(item, seen)
|
| 309 |
+
return
|
| 310 |
+
|
| 311 |
+
if isinstance(value, (list, tuple)):
|
| 312 |
+
for item in value:
|
| 313 |
+
yield from _walk_graph_patch_payloads(item, seen)
|
| 314 |
+
return
|
| 315 |
+
|
| 316 |
+
if isinstance(value, str) and "program graph patch ok" in value:
|
| 317 |
+
try:
|
| 318 |
+
parsed = json.loads(value)
|
| 319 |
+
except json.JSONDecodeError:
|
| 320 |
+
return
|
| 321 |
+
yield from _walk_graph_patch_payloads(parsed, seen)
|
| 322 |
+
return
|
| 323 |
+
|
| 324 |
+
model_dump = getattr(value, "model_dump", None)
|
| 325 |
+
if callable(model_dump):
|
| 326 |
+
try:
|
| 327 |
+
yield from _walk_graph_patch_payloads(model_dump(), seen)
|
| 328 |
+
except Exception:
|
| 329 |
+
pass
|
| 330 |
+
return
|
| 331 |
+
|
| 332 |
+
attrs = getattr(value, "__dict__", None)
|
| 333 |
+
if isinstance(attrs, dict):
|
| 334 |
+
yield from _walk_graph_patch_payloads(attrs, seen)
|
| 335 |
+
|
| 336 |
+
|
| 337 |
+
async def graph_patch_success(
|
| 338 |
+
completion: Any = None, state: Any = None, answer: str = "", **_: Any
|
| 339 |
+
) -> float:
|
| 340 |
+
search_root = (state or {}).get("trajectory", state or {}) if isinstance(state, Mapping) else state
|
| 341 |
+
for payload in _walk_graph_patch_payloads(search_root or {}):
|
| 342 |
+
path = payload.get("path")
|
| 343 |
+
if isinstance(path, str):
|
| 344 |
+
try:
|
| 345 |
+
patched_source = read_source(path)
|
| 346 |
+
except OSError:
|
| 347 |
+
continue
|
| 348 |
+
if _normalize_source(patched_source) == _normalize_source(answer):
|
| 349 |
+
return 1.0
|
| 350 |
+
|
| 351 |
+
text = _completion_text(completion)
|
| 352 |
+
if _tool_was_called(state) and "graph_patch" in text:
|
| 353 |
+
scored_source = _scored_source(completion, state)
|
| 354 |
+
if _normalize_source(scored_source) == _normalize_source(answer):
|
| 355 |
+
return 1.0
|
| 356 |
+
return 0.0
|
| 357 |
+
|
| 358 |
+
|
| 359 |
+
async def zerolang_surface_used(completion: Any, state: Any = None, **_: Any) -> float:
|
| 360 |
+
if _tool_was_called(state):
|
| 361 |
+
text = _completion_text(completion).lower()
|
| 362 |
+
if "zero_graph_patch" in text or "graph_patch" in text:
|
| 363 |
+
return 1.0
|
| 364 |
+
text = _completion_text(completion).lower()
|
| 365 |
+
markers = [
|
| 366 |
+
"zero_graph_patch",
|
| 367 |
+
"graph_patch",
|
| 368 |
+
"expect_graph_hash",
|
| 369 |
+
"set node=",
|
| 370 |
+
"field=\"value\"",
|
| 371 |
+
"expect=",
|
| 372 |
+
"graph hash",
|
| 373 |
+
"node #",
|
| 374 |
+
]
|
| 375 |
+
return 1.0 if any(marker in text for marker in markers) else 0.0
|
| 376 |
+
|
| 377 |
+
|
| 378 |
+
async def path_argument_valid(completion: Any, state: Any = None, **_: Any) -> float:
|
| 379 |
+
if not _tool_was_called(state):
|
| 380 |
+
return 0.0
|
| 381 |
+
corrections = (state or {}).get("zero_path_arg_corrections", [])
|
| 382 |
+
return 0.0 if corrections else 1.0
|
| 383 |
+
|
| 384 |
+
|
| 385 |
+
def load_environment(
|
| 386 |
+
split: str = "eval",
|
| 387 |
+
max_examples: int | None = None,
|
| 388 |
+
max_turns: int = 6,
|
| 389 |
+
zero_path: str | None = None,
|
| 390 |
+
enable_tools: bool = True,
|
| 391 |
+
**_: Any,
|
| 392 |
+
) -> vf.Environment:
|
| 393 |
+
"""Load the Zerolang editing environment."""
|
| 394 |
+
if split not in {"train", "eval", "all"}:
|
| 395 |
+
raise ValueError("split must be one of: train, eval, all")
|
| 396 |
+
|
| 397 |
+
dataset = _build_dataset(split=split, max_examples=max_examples)
|
| 398 |
+
rubric = vf.Rubric(
|
| 399 |
+
funcs=[
|
| 400 |
+
graph_patch_success,
|
| 401 |
+
target_source_match,
|
| 402 |
+
zero_check_pass,
|
| 403 |
+
zerolang_surface_used,
|
| 404 |
+
path_argument_valid,
|
| 405 |
+
],
|
| 406 |
+
weights=[0.50, 0.20, 0.15, 0.10, 0.05],
|
| 407 |
+
)
|
| 408 |
+
|
| 409 |
+
if enable_tools:
|
| 410 |
+
return ZerolangPathToolEnv(
|
| 411 |
+
dataset=dataset,
|
| 412 |
+
rubric=rubric,
|
| 413 |
+
tools=make_zero_tools(zero_path),
|
| 414 |
+
max_turns=max_turns,
|
| 415 |
+
system_prompt=SYSTEM_PROMPT,
|
| 416 |
+
)
|
| 417 |
+
|
| 418 |
+
return vf.SingleTurnEnv(dataset=dataset, rubric=rubric, system_prompt=SYSTEM_PROMPT)
|