whysue
/

simple_GRPO

Question Answering

text-generation-inference

open-llm-leaderboard

Model card Files Files and versions

license: mit base_model：

Qwen/Qwen2.5-7B

参考simple_GROP项目训练的模型，GSM8K，训练了200个step，出现了一次however。使用了3张A800 80G，训练了20多分钟

训练结果：

loss

GPU

memory

测试结果

demo_math_chat_gen(simple_GRPO_why) demo_math_chat_gen(Qwen2.5-7B) notice

在GSM8K上进行评估，Qwen2.5-7B的得分为85.4。原因可能是是https://github.com/open-compass/opencompass/issues/1878

Downloads last month: 7

Safetensors

Model size

8B params

Tensor type

BF16

·

Model tree for whysue/simple_GRPO

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2639)

this model

Dataset used to train whysue/simple_GRPO