Elliott
/

Qwen2.5-Math-7B-16k-think

Text Generation

text-generation-inference

Model card Files Files and versions

Elliott commited on Apr 22, 2025

Commit

a546ca4

·

verified ·

1 Parent(s): 40ba96d

Update README.md

Files changed (1) hide show

README.md +16 -2

README.md CHANGED Viewed

@@ -1,6 +1,20 @@
 ---
-{}
 ---
 The base Qwen2.5-Math-7B model used by LUFFY.
 We change to rope_theta from 10000 to 40000 and extend the context window to 16k.
-Also, we modify the chat_template for the system prompt and add <think>.

 ---
+license: mit
 ---
 The base Qwen2.5-Math-7B model used by LUFFY.
 We change to rope_theta from 10000 to 40000 and extend the context window to 16k.
+Also, we modify the chat_template for the system prompt and add <think>.
+# Citation
+If you find our model, data, or evaluation code useful, please kindly cite our paper:
+```bib
+@misc{luffy,
+      title={Learning to Reason under Off-Policy Guidance},
+      author={Jianhao Yan and Yafu Li and Zican Hu and Zhi Wang and Ganqu Cui and Xiaoye Qu and Yu Cheng and Yue Zhang},
+      year={2025},
+      eprint={2504.14945},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2504.14945},
+}
+```