YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
MNLP M2 DPO Model
This model is a fine-tuned version of Qwen3-0.6B-Base using Direct Preference Optimization (DPO) on mathematical reasoning tasks.
Model Details
- Base Model:
Qwen/Qwen3-0.6B-Base - Training Method: Direct Preference Optimization (DPO)
- Training Dataset:
abacusai/MetaMath_DPO_FewShot - Model Size: 0.6B parameters
- Training Samples: ~9,398 preference pairs
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("albertfares/MNLP_M2_dpo_model_full")
tokenizer = AutoTokenizer.from_pretrained("albertfares/MNLP_M2_dpo_model_full")
# Generate response
prompt = "Solve this problem: What is 25% of 80?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Configuration
- Epochs: 3
- Batch Size: 16 (per device)
- Gradient Accumulation: 8 steps
- Learning Rate: 5e-6
- Beta (DPO): 0.1
- Max Length: 1024 tokens
Performance
This model has been trained to prefer high-quality mathematical explanations over lower-quality alternatives using Direct Preference Optimization.
Citation
@misc{mnlp_m2_dpo_2025,
title={MNLP M2 DPO Model},
author={Albert Fares},
year={2025},
howpublished={\url{https://huggingface.co/albertfares/MNLP_M2_dpo_model_full}}
}
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support