finetuning detail
- this model was first SFT on openmath-mini dataset from Qwen3-4B-Base
- then GRPO on the same openmath-mini dataset again
eval
- arc: 0.91
- ifeval: 0.86
- race: 0.91
- amc: 0.70
- competition_math: 0.92
- gsm8k: 0.90
- math_500: 0.90
- aime25: 0.33
- humaneval: 0.94
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer,TextStreamer
model_name='wesjos/GRPO-SFT-Qwen3-4B-Base-math'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
)
system_prompt="你是一个友善并且擅长解决问题的助手,请根据我的提问,给出专业的回答。"
prompt='帮我写一个python的强化学习玩2048的代码'
messages = [
{'role':'system',"content":system_prompt},
{"role": "user", "content": prompt}
]
streamer= TextStreamer(tokenizer)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt",add_special_tokens=True,
).to(model.device)
outputs = model.generate(
**model_inputs,
max_new_tokens=4096,
streamer=streamer,
)
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for wesjos/GRPO-SFT-Qwen3-4B-Base-math
Base model
Qwen/Qwen3-4B-Base