qwen3-4b-dpo-merged-model

This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507.

Training Configuration

  • Method: DPO
  • Epochs: 1
  • Learning rate: 1e-6
  • Max length: 512
Downloads last month
29
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tkawamura/test-lora-repo

Finetuned
(1413)
this model

Dataset used to train Tkawamura/test-lora-repo