Evaluation

!lm_eval --model hf \
    --model_args pretrained=jaeyong2/Qwen3-0.6B-DPO-Ja-Peft \
    --tasks kmmlu,mmlu,japanese_leaderboard,gsm8k \
    --device cuda:0 \
    --batch_size 1 
Qwen3-1.7B-DPO-peft Qwen3-1.7B
MMLU 0.55 0.55
ja_leaderboard_jaqket_v2 0.35 0.34
ja_leaderboard_jcommonsenseqa 0.48 0.46
ja_leaderboard_jnli 0.28 0.23
ja_leaderboard_jsquad 0.25 0.22
ja_leaderboard_marc_ja 0.65 0.74
ja_leaderboard_mgsm 0.42 0.40
ja_leaderboard_xlsum 0.10 0.11
ja_leaderboard_xwinograd 0.58 0.58
GSM8K 0.70 0.69
KMMLU 0.36 0.36

License

Acknowledgement

This research is supported by TPU Research Cloud program.

Downloads last month
2
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support