| base_model: | |
| - open-r1/Qwen2.5-Math-7B-RoPE-300k | |
| - Qwen/Qwen2.5-Math-7B | |
| datasets: | |
| - Elliott/Openr1-Math-46k-8192 | |
| license: mit | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| arxiv: 2506.19767 | |
| # 📄 Introduction | |
| Supervised Reinforcement Fine-Tuning (SRFT) is a single-stage method that unifies both fine-tuning paradigms through entropy-aware weighting mechanisms. | |
| Paper: [arXiv](https://arxiv.org/abs/2506.19767) | |
| Project Website: [SRFT](https://anonymous.4open.science/w/SRFT2025) |