MNLP_M3_dpo_model / README.md
albertfares's picture
Update README.md
f85d307 verified
---
license: apache-2.0
base_model: Qwen/Qwen3-0.6B-Base
tags:
- dpo
- fdpo
- math
- code
- qwen3
- reasoning
datasets:
- albertfares/MNLP_M3_dpo_dataset
language:
- en
pipeline_tag: text-generation
---
# MNLP M3 fDPO Model (187k samples)
This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) using **filtered Direct Preference Optimization (fDPO)** on the [MNLP M3 DPO dataset](https://huggingface.co/datasets/albertfares/MNLP_M3_dpo_dataset).
## Model Details
- **Base Model**: Qwen/Qwen3-0.6B-Base
- **Training Method**: fDPO (filtered Direct Preference Optimization)
- **Dataset**: MNLP M3 mixed dataset (~69k samples)
- **Format**: SafeTensors (secure format)
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("albertfares/MNLP_M3_dpo_model_69k")
tokenizer = AutoTokenizer.from_pretrained("albertfares/MNLP_M3_dpo_model_69k")
```
This model uses SafeTensors format for enhanced security and faster loading.