MNLP_M3_dpo_model / README.md
albertfares's picture
Update README.md
f85d307 verified
metadata
license: apache-2.0
base_model: Qwen/Qwen3-0.6B-Base
tags:
  - dpo
  - fdpo
  - math
  - code
  - qwen3
  - reasoning
datasets:
  - albertfares/MNLP_M3_dpo_dataset
language:
  - en
pipeline_tag: text-generation

MNLP M3 fDPO Model (187k samples)

This model is a fine-tuned version of Qwen/Qwen3-0.6B-Base using filtered Direct Preference Optimization (fDPO) on the MNLP M3 DPO dataset.

Model Details

  • Base Model: Qwen/Qwen3-0.6B-Base
  • Training Method: fDPO (filtered Direct Preference Optimization)
  • Dataset: MNLP M3 mixed dataset (~69k samples)
  • Format: SafeTensors (secure format)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("albertfares/MNLP_M3_dpo_model_69k")
tokenizer = AutoTokenizer.from_pretrained("albertfares/MNLP_M3_dpo_model_69k")

This model uses SafeTensors format for enhanced security and faster loading.