|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen3-0.6B-Base |
|
|
tags: |
|
|
- dpo |
|
|
- fdpo |
|
|
- math |
|
|
- code |
|
|
- qwen3 |
|
|
- reasoning |
|
|
datasets: |
|
|
- albertfares/MNLP_M3_dpo_dataset |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# MNLP M3 fDPO Model (187k samples) |
|
|
|
|
|
This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) using **filtered Direct Preference Optimization (fDPO)** on the [MNLP M3 DPO dataset](https://huggingface.co/datasets/albertfares/MNLP_M3_dpo_dataset). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: Qwen/Qwen3-0.6B-Base |
|
|
- **Training Method**: fDPO (filtered Direct Preference Optimization) |
|
|
- **Dataset**: MNLP M3 mixed dataset (~69k samples) |
|
|
- **Format**: SafeTensors (secure format) |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("albertfares/MNLP_M3_dpo_model_69k") |
|
|
tokenizer = AutoTokenizer.from_pretrained("albertfares/MNLP_M3_dpo_model_69k") |
|
|
``` |
|
|
|
|
|
This model uses SafeTensors format for enhanced security and faster loading. |
|
|
|