pandelis commited on
Commit
888969a
·
verified ·
1 Parent(s): bb1b296

Document Roder zero graph RL harness

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -16,6 +16,12 @@ license: apache-2.0
16
  to edit [Zerolang](https://github.com/vercel-labs/zerolang) programs through
17
  checked graph edits instead of loose text replacement.
18
 
 
 
 
 
 
 
19
  The core task is intentionally narrow: each rollout starts with a `.0` source
20
  file already written to disk, asks the model for a semantic code edit, and
21
  scores the edited file after the model uses Zerolang tooling. The intended
@@ -47,11 +53,30 @@ compiles and matches the hidden target source.
47
  - **Prime environment ID:** `pandelis/zerolang-editing`
48
  - **Version in this repo:** `0.1.8`
49
  - **Task type:** multi-turn tool-use code editing
 
50
  - **Language under edit:** Zerolang `.0`
51
  - **Train split:** 209 deterministic synthetic tasks
52
  - **Eval split:** 67 held-out deterministic synthetic tasks
53
  - **Primary reward target:** successful `zero_graph_patch` on the rollout file
54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  ## Rollout Contract
56
 
57
  Each task row includes an initial Zerolang source program and a hidden target
 
16
  to edit [Zerolang](https://github.com/vercel-labs/zerolang) programs through
17
  checked graph edits instead of loose text replacement.
18
 
19
+ The RL harness is built around [Roder](https://roder.sh), using a custom
20
+ zero-coder plugin/distribution that exposes a Zerolang-only graph toolset to the
21
+ model. In training, generic source editing tools are disabled; the agent is
22
+ expected to use the `zero_*` tools below, especially `zero_graph_summary` and
23
+ `zero_graph_patch`, against the rollout file on disk.
24
+
25
  The core task is intentionally narrow: each rollout starts with a `.0` source
26
  file already written to disk, asks the model for a semantic code edit, and
27
  scores the edited file after the model uses Zerolang tooling. The intended
 
53
  - **Prime environment ID:** `pandelis/zerolang-editing`
54
  - **Version in this repo:** `0.1.8`
55
  - **Task type:** multi-turn tool-use code editing
56
+ - **Agent harness:** Roder with a custom Zero graph-only plugin/tool allowlist
57
  - **Language under edit:** Zerolang `.0`
58
  - **Train split:** 209 deterministic synthetic tasks
59
  - **Eval split:** 67 held-out deterministic synthetic tasks
60
  - **Primary reward target:** successful `zero_graph_patch` on the rollout file
61
 
62
+ ## Roder Harness
63
+
64
+ The intended RL setup runs the model inside Roder rather than a generic chat
65
+ loop. Roder provides the coding-agent harness, while a custom zero-coder plugin
66
+ configures the available tool surface for this environment.
67
+
68
+ That plugin is deliberately restrictive:
69
+
70
+ - It exposes only Zerolang graph/check/fix/skills tools.
71
+ - It removes generic text edit tools from the training harness.
72
+ - It routes tool calls to on-disk `.0` files using `path` arguments.
73
+ - It keeps checked graph edits as the primary affordance for code changes.
74
+
75
+ This matters because the behavior we want to train is not "rewrite this source
76
+ string". The target behavior is "inspect the Zerolang graph and apply a checked
77
+ semantic graph patch to the file Roder is managing". The Verifiers environment
78
+ then grades the resulting file from disk.
79
+
80
  ## Rollout Contract
81
 
82
  Each task row includes an initial Zerolang source program and a hidden target