rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset
Paper β’ 2505.21297 β’ Published β’ 29
How to use ashhhhhh26/qwen25-coder-32b-mythos with PEFT:
Task type is invalid.
A Claude Code-level coding model built by fine-tuning Qwen2.5-Coder-7B-Instruct using state-of-the-art training recipes from published research.
Based on exhaustive literature crawl of SOTA code LLM papers:
| Component | Details |
|---|---|
| Base Model | Qwen/Qwen2.5-Coder-7B-Instruct (88.4% HumanEval baseline) |
| Method | QLoRA (4-bit NF4 + LoRA r=64, all-linear layers) |
| Optimizer | Paged AdamW 8-bit, LR=2e-4, cosine schedule |
| Context | 4096 tokens with packing |
| Epochs | 2 |
| Effective Batch | 16 (1 Γ 16 grad accum) |
| Dataset | Samples | Purpose | Reference |
|---|---|---|---|
| KodCode-V1-SFT-R1 | ~100K+ (r1_correctness=True) | Verified competitive programming with R1-style chain-of-thought reasoning | arxiv:2503.02951 |
| Code-Feedback | 66K | Multi-turn code dialogue (ChatML) | m-a-p |
| Magicoder-OSS-Instruct-75K | 75K | Diverse code generation from real code seeds | arxiv:2312.02120 |
| Magicoder-Evol-Instruct-110K | 110K | Evolved code instructions (increasing complexity) | arxiv:2312.02120 |
This model's recipe is derived from deep literature analysis of the top code LLM papers:
| Paper | Key Finding | Benchmark |
|---|---|---|
| rStar-Coder (2505.21297) | Qwen2.5-Coder-7B β 57.3% LiveCodeBench (from 17.4%) using verified competitive programming data | LiveCodeBench |
| KodCode (2503.02951) | Verified R1-style reasoning traces improve coding by +15% on BigCodeBench | BigCodeBench |
| Qwen2.5-Coder (2409.12186) | 7:2:1 code:text:math ratio; coarseβfine SFT; 92.7% HumanEval at 32B | HumanEval |
| LoRA Without Regret | r=64+ all-linear matches full fine-tuning quality; alpha=2Γr | LoRA theory |
| SWE-RL (2502.18449) | GRPO on 273K PRs β 41.0% SWE-bench Verified (beats GPT-4o) | SWE-bench |
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base + adapter
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Coder-7B-Instruct",
torch_dtype="auto",
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "ashhhhhh26/qwen25-coder-32b-mythos")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")
# Generate
messages = [
{"role": "system", "content": "You are an elite software engineer..."},
{"role": "user", "content": "Implement a red-black tree in Python with insert, delete, and search operations."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
pip install torch transformers trl peft datasets bitsandbytes accelerate trackio flash-attn
# Single GPU (T4/L4/A10G with 16-24GB VRAM)
python train.py
# Multi-GPU with DeepSpeed ZeRO-2
accelerate launch --config_file deepspeed_zero2.yaml --num_processes 4 train.py
# Via HF Jobs API
huggingface-cli jobs run train.py \
--hardware t4-small \
--timeout 8h \
--dependencies torch transformers trl peft datasets bitsandbytes accelerate trackio flash-attn
| Hardware | VRAM | Feasibility |
|---|---|---|
| T4 (16GB) | 16GB | β QLoRA 4-bit (max_length=4096) |
| L4 (24GB) | 24GB | β QLoRA 4-bit (max_length=8192) |
| A10G (24GB) | 24GB | β QLoRA 4-bit (max_length=8192) |
| A100 (80GB) | 80GB | β Full LoRA or even full fine-tune |
Based on SWE-RL and DeepSeek-Coder-V2 research:
KodCode/KodCode-Light-RL-10K for GRPO trainingQwen/Qwen2.5-Coder-32B-Instruct on A100-80GB with same recipemicrosoft/rStar-Coder seed_sft split for even stronger competitive programmingIf you use this model, please cite the foundational works:
@article{qwen2.5-coder,
title={Qwen2.5-Coder Technical Report},
author={Hui, Binyuan and Yang, Jian and others},
journal={arXiv:2409.12186},
year={2024}
}
@article{kodcode,
title={KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding},
author={Zheng, Zhangchen and others},
journal={arXiv:2503.02951},
year={2025}
}
@article{rstar-coder,
title={rStar-Coder: Scaling Competitive Code Reasoning},
author={Li, Xinyu and others},
journal={arXiv:2505.21297},
year={2025}
}