metadata
license: apache-2.0
base_model: Qwen/Qwen3-0.6B-Base
tags:
- dpo
- fdpo
- math
- code
- qwen3
- reasoning
datasets:
- albertfares/MNLP_M3_dpo_dataset
language:
- en
pipeline_tag: text-generation
MNLP M3 fDPO Model (187k samples)
This model is a fine-tuned version of Qwen/Qwen3-0.6B-Base using filtered Direct Preference Optimization (fDPO) on the MNLP M3 DPO dataset.
Model Details
- Base Model: Qwen/Qwen3-0.6B-Base
- Training Method: fDPO (filtered Direct Preference Optimization)
- Dataset: MNLP M3 mixed dataset (~69k samples)
- Format: SafeTensors (secure format)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("albertfares/MNLP_M3_dpo_model_69k")
tokenizer = AutoTokenizer.from_pretrained("albertfares/MNLP_M3_dpo_model_69k")
This model uses SafeTensors format for enhanced security and faster loading.