Improve model card: add pipeline tag, library name, language, license, paper, and code links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +15 -9
README.md CHANGED
@@ -1,5 +1,11 @@
1
  ---
2
  base_model: ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1
 
 
 
 
 
 
3
  tags:
4
  - text-generation-inference
5
  - transformers
@@ -8,9 +14,6 @@ tags:
8
  - trl
9
  - grpo
10
  - test-time-reinforcement-learning
11
- license: llama3
12
- language:
13
- - en
14
  ---
15
 
16
  <img src="https://huggingface.co/Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO/resolve/main/llama_clones.png"
@@ -20,6 +23,9 @@ alt="A scene from a famous movie" width="800"/>
20
 
21
  Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO is a [Test Time Reinforcement Learning (TTRL)](https://arxiv.org/abs/2504.16084) trained version of ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1. It is trained on Turkish math word problems using GRPO method and a majority vote reward function.
22
 
 
 
 
23
  ## Training Info
24
 
25
  - **Base Model**: [Turkish-Llama-8b-DPO-v0.1](https://huggingface.co/ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1)
@@ -88,11 +94,11 @@ print(result)
88
  ```
89
 
90
  # Citation
91
- ```
92
- @article{Metin,
93
- title={Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO},
94
- author={Metin Usta},
95
- year={2024},
96
- url={https://huggingface.co/Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO}
97
  }
98
  ```
 
1
  ---
2
  base_model: ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1
3
+ language:
4
+ - en
5
+ - tr
6
+ license: apache-2.0
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
  tags:
10
  - text-generation-inference
11
  - transformers
 
14
  - trl
15
  - grpo
16
  - test-time-reinforcement-learning
 
 
 
17
  ---
18
 
19
  <img src="https://huggingface.co/Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO/resolve/main/llama_clones.png"
 
23
 
24
  Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO is a [Test Time Reinforcement Learning (TTRL)](https://arxiv.org/abs/2504.16084) trained version of ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1. It is trained on Turkish math word problems using GRPO method and a majority vote reward function.
25
 
26
+ **Paper:** [TTRL: Test-Time Reinforcement Learning](https://huggingface.co/papers/2504.16084)
27
+ **Code:** [https://github.com/PRIME-RL/TTRL](https://github.com/PRIME-RL/TTRL)
28
+
29
  ## Training Info
30
 
31
  - **Base Model**: [Turkish-Llama-8b-DPO-v0.1](https://huggingface.co/ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1)
 
94
  ```
95
 
96
  # Citation
97
+ ```bibtex
98
+ @article{zuo2025ttrl,
99
+ title={Ttrl: Test-time reinforcement learning},
100
+ author={Zuo, Yuxin and Zhang, Kaiyan and Qu, Shang and Sheng, Li and Zhu, Xuekai and Qi, Biqing and Sun, Youbang and Cui, Ganqu and Ding, Ning and Zhou, Bowen},
101
+ journal={arXiv preprint arXiv:2504.16084},
102
+ year={2025}
103
  }
104
  ```