YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

MNLP M2 DPO Model

This model is a fine-tuned version of Qwen3-0.6B-Base using Direct Preference Optimization (DPO) on mathematical reasoning tasks.

Model Details

  • Base Model: Qwen/Qwen3-0.6B-Base
  • Training Method: Direct Preference Optimization (DPO)
  • Training Dataset: abacusai/MetaMath_DPO_FewShot
  • Model Size: 0.6B parameters
  • Training Samples: ~9,398 preference pairs

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("albertfares/MNLP_M2_dpo_model_full")
tokenizer = AutoTokenizer.from_pretrained("albertfares/MNLP_M2_dpo_model_full")

# Generate response
prompt = "Solve this problem: What is 25% of 80?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Configuration

  • Epochs: 3
  • Batch Size: 16 (per device)
  • Gradient Accumulation: 8 steps
  • Learning Rate: 5e-6
  • Beta (DPO): 0.1
  • Max Length: 1024 tokens

Performance

This model has been trained to prefer high-quality mathematical explanations over lower-quality alternatives using Direct Preference Optimization.

Citation

@misc{mnlp_m2_dpo_2025,
  title={MNLP M2 DPO Model},
  author={Albert Fares},
  year={2025},
  howpublished={\url{https://huggingface.co/albertfares/MNLP_M2_dpo_model_full}}
}
Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support