KS150 commited on
Commit
af039b0
·
verified ·
1 Parent(s): 1571c49

Unsloth Model Card

Browse files
Files changed (1) hide show
  1. README.md +13 -55
README.md CHANGED
@@ -1,63 +1,21 @@
1
  ---
2
- base_model: Qwen/Qwen3-4B-Instruct-2507
3
- datasets:
4
- - u-10bei/dpo-dataset-qwen-cot
5
- language:
6
- - en
7
- license: apache-2.0
8
- library_name: transformers
9
- pipeline_tag: text-generation
10
  tags:
11
- - dpo
 
12
  - unsloth
13
- - qwen
14
- - alignment
 
 
15
  ---
16
 
17
- # qwen3-4b-dpo-qwen-cot-merged
18
-
19
- This model is a fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507** using **Direct Preference Optimization (DPO)** via the **Unsloth** library.
20
-
21
- This repository contains the **full-merged 16-bit weights**. No adapter loading is required.
22
-
23
- ## Training Objective
24
- This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.
25
-
26
- ## Training Configuration
27
- - **Base model**: Qwen/Qwen3-4B-Instruct-2507
28
- - **Method**: DPO (Direct Preference Optimization)
29
- - **Epochs**: 5
30
- - **Learning rate**: 7e-04
31
- - **Beta**: 0.1
32
- - **Max sequence length**: 1024
33
- - **LoRA Config**: r=8, alpha=16 (merged into base)
34
-
35
- ## Usage
36
- Since this is a merged model, you can use it directly with `transformers`.
37
-
38
- ```python
39
- from transformers import AutoModelForCausalLM, AutoTokenizer
40
- import torch
41
-
42
- model_id = "your_id/your-repo-name"
43
-
44
- tokenizer = AutoTokenizer.from_pretrained(model_id)
45
- model = AutoModelForCausalLM.from_pretrained(
46
- model_id,
47
- torch_dtype=torch.float16,
48
- device_map="auto"
49
- )
50
-
51
- # Test inference
52
- prompt = "Your question here"
53
- inputs = tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
54
- outputs = model.generate(**inputs, max_new_tokens=512)
55
- print(tokenizer.decode(outputs[0]))
56
 
57
- ```
 
 
58
 
59
- ## Sources & License (IMPORTANT)
60
 
61
- * **Training Data**: [u-10bei/dpo-dataset-qwen-cot]
62
- * **License**: MIT License. (As per dataset terms).
63
- * **Compliance**: Users must follow the original base model's license terms.
 
1
  ---
2
+ base_model: unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
 
 
 
 
 
 
 
3
  tags:
4
+ - text-generation-inference
5
+ - transformers
6
  - unsloth
7
+ - qwen3
8
+ license: apache-2.0
9
+ language:
10
+ - en
11
  ---
12
 
13
+ # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
+ - **Developed by:** KS150
16
+ - **License:** apache-2.0
17
+ - **Finetuned from model :** unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
18
 
19
+ This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)