Rename notebook to notebook/about-the-collab.txt
Browse files- notebook +0 -0
- notebook/about-the-collab.txt +1 -0
notebook
DELETED
|
File without changes
|
notebook/about-the-collab.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
The training notebook is a self‑contained, modular, and fully reproducible pipeline that takes a base language model from raw instruction following to a capable code‑review agent through a combination of expert‑guided supervised fine‑tuning and group‑relative policy optimization. It leverages Unsloth‑accelerated 4‑bit QLoRA for efficient training on a single GPU, integrates seamlessly with a custom OpenEnv‑compliant environment, and automatically produces rich evidence – reward curves, loss traces, action distributions, and per‑difficulty breakdowns – that make the agent’s learning visible at every stage. Every step is clearly documented, from environment instantiation and prompt engineering to reward design and final evaluation, making it easy for anyone to reproduce, extend, or adapt the pipeline to new verification‑based RL tasks.
|