RL4RLM: Training Native Recursive Language Models

omar81939 's Collections

updated Mar 3

LoRA adapters (Qwen3-1.7B) for training RLMs via RL. SFT, STaR, DPO, GRPO-v4. Code: github.com/pythonomar22/rl4rlm

omar81939/rl4rlm-sft

Text Generation • Updated Mar 3

Note SFT on 87 self-bootstrap trajectories (76.8% avg)
omar81939/rl4rlm-star

Text Generation • Updated Mar 3

Note STaR: iterative SFT, 132 trajectories (76.3% avg)
omar81939/rl4rlm-dpo

Text Generation • Updated Mar 3

Note Best model (84.5% avg). +29.5pp multi-needle over STaR.
omar81939/rl4rlm-grpo-v4

Text Generation • Updated Mar 3

Note GRPO-v4: fixed log-probs + token-level KL (83.4% avg)