＜dpo-qwen-cot-mergedv2＞

This model is a fine-tuned version of NobutaMN/qwen3-4b-structevalt-lora-nobuta-v2change using Direct Preference Optimization (DPO) via the Unsloth library.

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.

Training Configuration

Base model: NobutaMN/qwen3-4b-structevalt-lora-nobuta-v2change
Method: DPO (Direct Preference Optimization)
Epochs: 1
Learning rate: 1e-07
Beta: 0.1
Max sequence length: 1024
LoRA Config: r=8, alpha=16

Usage

This is a LoRA adapter. Load the base model, then apply the adapter.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "your_id/your-repo-name"

tokenizer = AutoTokenizer.from_pretrained(base_id)
model = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype=torch.float16,
    device_map="auto"
)
from peft import PeftModel
model = PeftModel.from_pretrained(model, adapter_id)

# Test inference
prompt = "Your question here"
inputs = tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Sources & License (IMPORTANT)

Training Data: [u-10bei/dpo-dataset-qwen-cot-localchange]
License: MIT License. (As per dataset terms).
Compliance: Users must follow the original base model's license terms.

Downloads last month: 2

Safetensors

Model size

4B params

Tensor type

F32

F16

Model tree for NobutaMN/dpo-qwen-cot-mergedv2

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

NobutaMN/qwen3-4b-structevalt-lora-nobuta-v2change

Quantized

(1)

this model