Yuqian-Fu
/

SRFT-Qwen2.5-Math-7B

Text Generation

text-generation-inference

Model card Files Files and versions

SRFT-Qwen2.5-Math-7B / README.md

Yuqian-Fu's picture

Remove file information (#2)

4433de2 verified 8 months ago

|

history blame contribute delete

501 Bytes

	---
	base_model:
	- open-r1/Qwen2.5-Math-7B-RoPE-300k
	- Qwen/Qwen2.5-Math-7B
	datasets:
	- Elliott/Openr1-Math-46k-8192
	license: mit
	pipeline_tag: text-generation
	library_name: transformers
	arxiv: 2506.19767
	---

	# 📄 Introduction

	Supervised Reinforcement Fine-Tuning (SRFT) is a single-stage method that unifies both fine-tuning paradigms through entropy-aware weighting mechanisms.

	Paper: [arXiv](https://arxiv.org/abs/2506.19767)

	Project Website: [SRFT](https://anonymous.4open.science/w/SRFT2025)