Update README.md
Browse files
README.md
CHANGED
|
@@ -13,6 +13,9 @@ language:
|
|
| 13 |
- en
|
| 14 |
---
|
| 15 |
|
|
|
|
|
|
|
|
|
|
| 16 |
# LLaMA-3-8B-Math-Majority-Vote-GRPO
|
| 17 |
|
| 18 |
Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO is a [Test Time Reinforcement Learning (TTRL)](https://arxiv.org/abs/2504.16084) trained version of ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1. It is trained on Turkish math word problems using GRPO method and a majority vote reward function.
|
|
|
|
| 13 |
- en
|
| 14 |
---
|
| 15 |
|
| 16 |
+
<img src="https://huggingface.co/Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO/resolve/main/llama_clones.png"
|
| 17 |
+
alt="A scene from a famous movie" width="800"/>
|
| 18 |
+
|
| 19 |
# LLaMA-3-8B-Math-Majority-Vote-GRPO
|
| 20 |
|
| 21 |
Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO is a [Test Time Reinforcement Learning (TTRL)](https://arxiv.org/abs/2504.16084) trained version of ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1. It is trained on Turkish math word problems using GRPO method and a majority vote reward function.
|