poolside-laguna-hackathon
/

zerolang-editing

Text Generation

reinforcement-learning

Model card Files Files and versions

pandelis commited on 9 days ago

Commit

888969a

·

verified ·

1 Parent(s): bb1b296

Document Roder zero graph RL harness

Files changed (1) hide show

README.md +25 -0

README.md CHANGED Viewed

@@ -16,6 +16,12 @@ license: apache-2.0
 to edit [Zerolang](https://github.com/vercel-labs/zerolang) programs through
 checked graph edits instead of loose text replacement.
 The core task is intentionally narrow: each rollout starts with a `.0` source
 file already written to disk, asks the model for a semantic code edit, and
 scores the edited file after the model uses Zerolang tooling. The intended
@@ -47,11 +53,30 @@ compiles and matches the hidden target source.
 - **Prime environment ID:** `pandelis/zerolang-editing`
 - **Version in this repo:** `0.1.8`
 - **Task type:** multi-turn tool-use code editing
 - **Language under edit:** Zerolang `.0`
 - **Train split:** 209 deterministic synthetic tasks
 - **Eval split:** 67 held-out deterministic synthetic tasks
 - **Primary reward target:** successful `zero_graph_patch` on the rollout file
 ## Rollout Contract
 Each task row includes an initial Zerolang source program and a hidden target

 to edit [Zerolang](https://github.com/vercel-labs/zerolang) programs through
 checked graph edits instead of loose text replacement.
+The RL harness is built around [Roder](https://roder.sh), using a custom
+zero-coder plugin/distribution that exposes a Zerolang-only graph toolset to the
+model. In training, generic source editing tools are disabled; the agent is
+expected to use the `zero_*` tools below, especially `zero_graph_summary` and
+`zero_graph_patch`, against the rollout file on disk.
 The core task is intentionally narrow: each rollout starts with a `.0` source
 file already written to disk, asks the model for a semantic code edit, and
 scores the edited file after the model uses Zerolang tooling. The intended
 - **Prime environment ID:** `pandelis/zerolang-editing`
 - **Version in this repo:** `0.1.8`
 - **Task type:** multi-turn tool-use code editing
+- **Agent harness:** Roder with a custom Zero graph-only plugin/tool allowlist
 - **Language under edit:** Zerolang `.0`
 - **Train split:** 209 deterministic synthetic tasks
 - **Eval split:** 67 held-out deterministic synthetic tasks
 - **Primary reward target:** successful `zero_graph_patch` on the rollout file
+## Roder Harness
+The intended RL setup runs the model inside Roder rather than a generic chat
+loop. Roder provides the coding-agent harness, while a custom zero-coder plugin
+configures the available tool surface for this environment.
+That plugin is deliberately restrictive:
+- It exposes only Zerolang graph/check/fix/skills tools.
+- It removes generic text edit tools from the training harness.
+- It routes tool calls to on-disk `.0` files using `path` arguments.
+- It keeps checked graph edits as the primary affordance for code changes.
+This matters because the behavior we want to train is not "rewrite this source
+string". The target behavior is "inspect the Zerolang graph and apply a checked
+semantic graph patch to the file Roder is managing". The Verifiers environment
+then grades the resulting file from disk.
 ## Rollout Contract
 Each task row includes an initial Zerolang source program and a hidden target