Metin
/

LLaMA-3-8B-Math-Majority-Vote-GRPO

Text Generation

text-generation-inference

test-time-reinforcement-learning

Model card Files Files and versions

Metin commited on May 18, 2025

Commit

5f99087

·

verified ·

1 Parent(s): 0a728d9

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -13,6 +13,9 @@ language:
 - en
 ---
 # LLaMA-3-8B-Math-Majority-Vote-GRPO
 Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO is a [Test Time Reinforcement Learning (TTRL)](https://arxiv.org/abs/2504.16084) trained version of ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1. It is trained on Turkish math word problems using GRPO method and a majority vote reward function.

 - en
 ---
+<img src="https://huggingface.co/Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO/resolve/main/llama_clones.png"
+alt="A scene from a famous movie" width="800"/>
 # LLaMA-3-8B-Math-Majority-Vote-GRPO
 Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO is a [Test Time Reinforcement Learning (TTRL)](https://arxiv.org/abs/2504.16084) trained version of ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1. It is trained on Turkish math word problems using GRPO method and a majority vote reward function.