Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
ryusuke009
/
qwen3-4b-structured-sft-dpo
like
0
Text Generation
Transformers
Safetensors
u-10bei/structured_data_with_cot_dataset_512_v2
u-10bei/dpo-dataset-qwen-cot
English
qwen3
sft
dpo
unsloth
qwen
conversational
text-generation-inference
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
qwen3-4b-structured-sft-dpo
Training
qwen3-4b-structured-sft-dpo
Two-stage fine-tuned model: SFT + DPO
Training
Stage 1 (SFT): QLoRA on structured_data_with_cot_dataset_512_v2 (LR=2e-6, Epochs=2, LoRA r=64)
Stage 2 (DPO): DPO on dpo-dataset-qwen-cot (LR=1e-07, Epochs=1, Beta=0.1, LoRA r=8)
Downloads last month
20
Safetensors
Model size
4B params
Tensor type
BF16
·
Chat template
Files info
Inference Providers
NEW
Text Generation
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for
ryusuke009/qwen3-4b-structured-sft-dpo
Base model
Qwen/Qwen3-4B-Instruct-2507
Finetuned
(
1054
)
this model
Datasets used to train
ryusuke009/qwen3-4b-structured-sft-dpo
u-10bei/structured_data_with_cot_dataset_512_v2
Viewer
•
Updated
Jan 7
•
3.93k
•
12.7k
u-10bei/dpo-dataset-qwen-cot
Viewer
•
Updated
30 days ago
•
4.04k
•
3.08k
•
1