Safetensors
qwen3
dpo
unsloth
trl
qwen
instruction-tuning
preference-modeling
mnlp

Commit History

Update README.md
ceb4140
verified

Tandogan commited on

Update README.md
fea5737
verified

Tandogan commited on

Upload tokenizer
97dffbe
verified

Tandogan commited on

Upload best checkpoint from DPO on SFT (Tandogan/MNLP_M2_SFT) model finetuning
ca13d5c
verified

Tandogan commited on

initial commit
e57bc69
verified

Tandogan commited on