Qwen3-0.6B • DPO fine-tuned

Base model: Qwen/Qwen3-0.6B
SFT: GingerBled/qwen3-0.6B-FullFineTune DPO dataset: GingerBled/MNLP_M2_dpo_dataset
Hardware: Colab Epochs: 3
Method: Direct Preference Optimization (DPO)

from transformers import AutoTokenizer, pipeline, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("GingerBled/qwen-DPO")
tokenizer = AutoTokenizer.from_pretrained("GingerBled/qwen-DPO")

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
print(pipe("Explain the Pythagorean theorem in one sentence:")[0]["generated_text"])
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support