Metin commited on
Commit
5f99087
·
verified ·
1 Parent(s): 0a728d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -13,6 +13,9 @@ language:
13
  - en
14
  ---
15
 
 
 
 
16
  # LLaMA-3-8B-Math-Majority-Vote-GRPO
17
 
18
  Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO is a [Test Time Reinforcement Learning (TTRL)](https://arxiv.org/abs/2504.16084) trained version of ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1. It is trained on Turkish math word problems using GRPO method and a majority vote reward function.
 
13
  - en
14
  ---
15
 
16
+ <img src="https://huggingface.co/Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO/resolve/main/llama_clones.png"
17
+ alt="A scene from a famous movie" width="800"/>
18
+
19
  # LLaMA-3-8B-Math-Majority-Vote-GRPO
20
 
21
  Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO is a [Test Time Reinforcement Learning (TTRL)](https://arxiv.org/abs/2504.16084) trained version of ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1. It is trained on Turkish math word problems using GRPO method and a majority vote reward function.