Training Data

  1. jaeyong2/Qwen3-06B-Ko-DPO
  2. jaeyong2/Qwen3-06B-Ko-DPO-2
  3. jaeyong2/Qwen3-06B-Ko-DPO-3
  4. jaeyong2/Qwen3-06B-En-DPO-2

Evaluation

!lm_eval --model hf \
    --model_args pretrained=jaeyong2/Qwen3-0.6B-DPO \
    --tasks kmmlu,mmlu,gsm8k \
    --device cuda:0 \
    --batch_size 1 \
    --num_fewshot 5
(5-shot) Qwen3-0.6B-DPO Qwen3-0.6B naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
MMLU 0.47 0.47 0.44
KMMLU 0.34 0.35 0.38
GSM8K 0.47 0.42 0.39

License

Acknowledgement

This research is supported by TPU Research Cloud program.

Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jaeyong2/Qwen3-0.6B-DPO-Peft

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(828)
this model
Quantizations
1 model