Yuqian-Fu
/

SRFT-Qwen2.5-Math-7B

Text Generation

text-generation-inference

Model card Files Files and versions

Yuqian-Fu commited on Jun 25, 2025

Commit

30d6824

·

verified ·

1 Parent(s): cb40792

Update README.md

Files changed (1) hide show

README.md +17 -3

README.md CHANGED Viewed

@@ -1,3 +1,17 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- Elliott/Openr1-Math-46k-8192
+base_model:
+- open-r1/Qwen2.5-Math-7B-RoPE-300k
+- Qwen/Qwen2.5-Math-7B
+pipeline_tag: reinforcement-learning
+---
+# 📄 Introduction
+Supervised Reinforcement Fine-Tuning (SRFT) is a single-stage method that unifies both fine-tuning paradigms through entropy-aware weighting mechanisms.
+Paper: [arXiv](https://arxiv.org/abs/2506.19767)
+Project Website: [SRFT](https://anonymous.4open.science/w/SRFT2025)