| --- |
| base_model: Qwen/Qwen3-4B-Instruct-2507 |
| datasets: |
| - u-10bei/dpo-dataset-qwen-cot |
| language: |
| - en |
| license: apache-2.0 |
| library_name: transformers |
| pipeline_tag: text-generation |
| tags: |
| - dpo |
| - unsloth |
| - qwen |
| - alignment |
| --- |
| |
| # jm23d |
|
|
| This model is a fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507** using **Direct Preference Optimization (DPO)** via the **Unsloth** library. |
|
|
| This repository contains the **full-merged 16-bit weights**. No adapter loading is required. |
|
|
| ## Training Objective |
| This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset. |
|
|
| ## Training Configuration |
| - **Base model**: Qwen/Qwen3-4B-Instruct-2507 |
| - **Method**: DPO (Direct Preference Optimization) |
| - **Epochs**: 1 |
| - **Learning rate**: 1e-07 |
| - **Beta**: 0.1 |
| - **Max sequence length**: 1024 |
| - **LoRA Config**: r=8, alpha=16 (merged into base) |
|
|
| ## Usage |
| Since this is a merged model, you can use it directly with `transformers`. |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| model_id = "your_id/your-repo-name" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| torch_dtype=torch.float16, |
| device_map="auto" |
| ) |
| |
| # Test inference |
| prompt = "Your question here" |
| inputs = tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda") |
| outputs = model.generate(**inputs, max_new_tokens=512) |
| print(tokenizer.decode(outputs[0])) |
| |
| ``` |
|
|
| ## Sources & License (IMPORTANT) |
|
|
| * **Training Data**: [u-10bei/dpo-dataset-qwen-cot] |
| * **License**: MIT License. (As per dataset terms). |
| * **Compliance**: Users must follow the original base model's license terms. |
|
|