naru0411 commited on
Commit
50cc96e
·
verified ·
1 Parent(s): 65381ed

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +56 -13
README.md CHANGED
@@ -1,21 +1,64 @@
1
  ---
2
- base_model: unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen3
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Developed by:** naru0411
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
18
 
19
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
1
  ---
2
+ base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ datasets:
4
+ - u-10bei/dpo-dataset-qwen-cot
 
 
 
 
5
  language:
6
  - en
7
+ license: apache-2.0
8
+ library_name: transformers
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - dpo
12
+ - qwen
13
+ - alignment
14
+ - silent-cot
15
+ - structured-output
16
  ---
17
 
18
+ # Qwen3-4B-DPO-Silent-Format
19
+
20
+ This model is a fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507** using **Direct Preference Optimization (DPO)**.
21
+
22
+ ## 🎯 Training Objective
23
+ Unlike typical CoT (Chain-of-Thought) tuning, this model is optimized to **suppress verbose reasoning** and enforce **strict structured output compliance**.
24
+
25
+ The goal is to prevent parse errors by outputting data (JSON/TOML) directly without preamble (e.g., removing "Approach:" or "Here is the code").
26
+
27
+ ## Training Configuration
28
+ - **Base model**: Qwen/Qwen3-4B-Instruct-2507
29
+ - **Method**: DPO (Direct Preference Optimization)
30
+ - **Epochs**: 1
31
+ - **Learning rate**: 1e-6
32
+ - **Beta**: 0.05 (Strict penalty for deviating from chosen data)
33
+ - **Max sequence length**: 2048
34
+ - **LoRA Config**: r=16, alpha=32 (merged into base)
35
+
36
+ ## Usage
37
+ Since this is a merged model, you can use it directly with `transformers`.
38
+
39
+ ```python
40
+ from transformers import AutoModelForCausalLM, AutoTokenizer
41
+ import torch
42
+
43
+ model_id = "your_id/your-repo-name"
44
+
45
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
46
+ model = AutoModelForCausalLM.from_pretrained(
47
+ model_id,
48
+ torch_dtype=torch.float16,
49
+ device_map="auto"
50
+ )
51
+
52
+ # Test inference: The model should respond directly without "Approach:"
53
+ prompt = "Output a JSON for a user named Alice."
54
+ inputs = tokenizer.apply_chat_template([{ "role": "user", "content": prompt }], tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
55
+ outputs = model.generate(**inputs, max_new_tokens=512)
56
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
57
 
58
+ ```
 
 
59
 
60
+ ## Sources & License (IMPORTANT)
61
 
62
+ * **Training Data**: [u-10bei/dpo-dataset-qwen-cot]
63
+ * **License**: MIT License. (As per dataset terms).
64
+ * **Compliance**: Users must follow the original base model's license terms.