Tandogan/MNLP_M2_dpo_dataset
Viewer • Updated • 23.2k • 131
How to use Tandogan/MNLP_M2_dpo_model with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Tandogan/MNLP_M2_dpo_model to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Tandogan/MNLP_M2_dpo_model to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Tandogan/MNLP_M2_dpo_model to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="Tandogan/MNLP_M2_dpo_model",
max_seq_length=2048,
)This repository contains a Direct Preference Optimization (DPO) model built on top of a supervised fine-tuned version of Qwen/Qwen3-0.6B-Base, as part of the MNLP M2 project. The model is fine-tuned using a high-quality preference dataset to better align responses with human preferences.
Qwen/Qwen3-0.6B-BaseTandogan/MNLP_M2_SFTTandogan/MNLP_M2_dpo_datasetTandogan/sft_dataset_final_train3e-5, weight decay = 0)Two DPO fine-tuning experiments were run:
Qwen3-0.6B-Base)
Tandogan/MNLP_M2_SFT)
Tandogan/MNLP_M2_dpo_dataset2e-6, weight decay = 0) This model is intended for research and experimentation with preference-based alignment and reward modeling. It is not production-ready and may produce hallucinated, biased, or unsafe outputs. Please evaluate carefully for downstream tasks.
You can use the model with the transformers and trl libraries for inference or evaluation:
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Tandogan/MNLP_M2_dpo_model").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("Tandogan/MNLP_M2_dpo_model")
prompt = "Explain recursion in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Base model
Qwen/Qwen3-0.6B-Base