jmJapan commited on
Commit
3488431
·
verified ·
1 Parent(s): c393c2b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +65 -13
README.md CHANGED
@@ -1,21 +1,73 @@
1
  ---
2
- base_model: unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen3
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Developed by:** jmJapan
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
18
 
19
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
1
  ---
2
+ base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ datasets:
4
+ - u-10bei/dpo-dataset-qwen-cot
 
 
 
 
5
  language:
6
  - en
7
+ license: apache-2.0
8
+ library_name: transformers
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - dpo
12
+ - unsloth
13
+ - qwen
14
+ - alignment
15
  ---
16
 
17
+ # jm03
18
+
19
+ This model is a fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507** using **Direct Preference Optimization (DPO)** via the **Unsloth** library.
20
+
21
+ This repository contains the **full-merged 16-bit weights**. No adapter loading is required.
22
+
23
+ ## Training Objective
24
+ This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.
25
+
26
+ ## Training Configuration
27
+ - **Base model**: Qwen/Qwen3-4B-Instruct-2507
28
+ - **Method**: DPO (Direct Preference Optimization)
29
+ - **Epochs**: 1
30
+ - **Learning rate**: 1e-07
31
+ - **Beta**: 0.1
32
+ - **Max sequence length**: 1024
33
+ - **LoRA Config**: r=8, alpha=16 (merged into base)
34
+
35
+ ## Usage
36
+ Since this is a merged model, you can use it directly with `transformers`.
37
+
38
+ ```python
39
+ from transformers import AutoModelForCausalLM, AutoTokenizer
40
+ import torch
41
+
42
+ model_id = "your_id/your-repo-name"
43
+
44
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
45
+ model = AutoModelForCausalLM.from_pretrained(
46
+ model_id,
47
+ torch_dtype=torch.float16,
48
+ device_map="auto"
49
+ )
50
+
51
+ # Test inference
52
+ prompt = "Your question here"
53
+ inputs = tokenizer(
54
+ prompt,
55
+ return_tensors="pt"
56
+ ).to("cuda")
57
+
58
+ outputs = model.generate(
59
+ **inputs,
60
+ max_new_tokens=512
61
+ )
62
+
63
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
64
+ outputs = model.generate(**inputs, max_new_tokens=512)
65
+ print(tokenizer.decode(outputs[0]))
66
 
67
+ ```
 
 
68
 
69
+ ## Sources & License (IMPORTANT)
70
 
71
+ * **Training Data**: [u-10bei/dpo-dataset-qwen-cot]
72
+ * **License**: MIT License. (As per dataset terms).
73
+ * **Compliance**: Users must follow the original base model's license terms.