KS150
/

testDPO

@@ -1,63 +1,21 @@
 ---
-base_model: Qwen/Qwen3-4B-Instruct-2507
-datasets:
-- u-10bei/dpo-dataset-qwen-cot
-language:
-- en
-license: apache-2.0
-library_name: transformers
-pipeline_tag: text-generation
 tags:
-- dpo
 - unsloth
-- qwen
-- alignment
 ---
-#  qwen3-4b-dpo-qwen-cot-merged
-This model is a fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507** using **Direct Preference Optimization (DPO)** via the **Unsloth** library.
-This repository contains the **full-merged 16-bit weights**. No adapter loading is required.
-## Training Objective
-This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.
-## Training Configuration
-- **Base model**: Qwen/Qwen3-4B-Instruct-2507
-- **Method**: DPO (Direct Preference Optimization)
-- **Epochs**: 5
-- **Learning rate**: 7e-04
-- **Beta**: 0.1
-- **Max sequence length**: 1024
-- **LoRA Config**: r=8, alpha=16 (merged into base)
-## Usage
-Since this is a merged model, you can use it directly with `transformers`.
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-import torch
-model_id = "your_id/your-repo-name"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(
-    model_id,
-    torch_dtype=torch.float16,
-    device_map="auto"
-)
-# Test inference
-prompt = "Your question here"
-inputs = tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
-outputs = model.generate(**inputs, max_new_tokens=512)
-print(tokenizer.decode(outputs[0]))
-```
-## Sources & License (IMPORTANT)
-* **Training Data**: [u-10bei/dpo-dataset-qwen-cot]
-* **License**: MIT License. (As per dataset terms).
-* **Compliance**: Users must follow the original base model's license terms.

 ---
+base_model: unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
 tags:
+- text-generation-inference
+- transformers
 - unsloth
+- qwen3
+license: apache-2.0
+language:
+- en
 ---
+# Uploaded finetuned  model
+- **Developed by:** KS150
+- **License:** apache-2.0
+- **Finetuned from model :** unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
+This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)