dsl-debug
Collection
Models trained to find and fix bugs in custom dataflow DSL programs using multi-turn tool use. • 2 items • Updated
Qwen2.5-7B-Instruct trained with GRPO (Group Relative Policy Optimization) directly from the base model on the DSL Debug benchmark.
| Split | Base Model | This Model |
|---|---|---|
| Standard (481) | 50.5% | 78.8% |
| Nonlocal (200) | 12.0% | 54.0% |
| Intent-Mismatch (177) | 0.6% | 14.7% |
| Benchmark | Base | This Model |
|---|---|---|
| MMLU | 74.6% | 74.7% |
| GSM8K | 84.9% | 84.4% |
| HumanEval | 65.9% | 59.1% |
from huggingface_hub import snapshot_download
snapshot_download("andrewlngdn/dsl-debug-7b-rl-only-step30",
local_dir="/workspace/models/rl_only_step30")
See the collection for all models including the stronger SFT→RL variant.