Harsha901 commited on
Commit
ed23669
ยท
verified ยท
1 Parent(s): 93e4f62

Update README.md

Browse files

Updated Model card

Files changed (1) hide show
  1. README.md +177 -5
README.md CHANGED
@@ -6,17 +6,189 @@ tags:
6
  - unsloth
7
  - qwen3
8
  - trl
 
 
 
 
 
 
 
9
  license: apache-2.0
10
  language:
11
  - en
 
12
  ---
13
 
14
- # Uploaded model
15
 
16
- - **Developed by:** Harsha901
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/Qwen3-4B-Instruct-2507
19
 
20
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - unsloth
7
  - qwen3
8
  - trl
9
+ - math-reasoning
10
+ - instruction-tuned
11
+ - supervised-finetuning
12
+ - chain-of-thought
13
+ - reasoning
14
+ - mathematics
15
+ - causal-lm
16
  license: apache-2.0
17
  language:
18
  - en
19
+ pipeline_tag: text-generation
20
  ---
21
 
22
+ # Uploaded Model
23
 
24
+ - **Developed by:** Harsha901
25
+ - **License:** apache-2.0
26
+ - **Finetuned from model:** unsloth/Qwen3-4B-Instruct-2507
27
 
28
+ This Qwen3 model was trained **~2ร— faster** using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Faceโ€™s **TRL** library.
29
 
30
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
31
+
32
+ ---
33
+
34
+ ## ๐Ÿ“Œ Model Overview
35
+
36
+ **Qwen3-4B-Inst-Math-Reasoning-SFT** is a **supervised fine-tuned (SFT)** variant of **Qwen3-4B-Instruct**, optimized for **mathematical reasoning and step-by-step problem solving**.
37
+
38
+ The model is trained to follow instructions precisely while producing **clear, logically structured reasoning chains**, making it suitable for:
39
+
40
+ - Math problem solving
41
+ - Educational assistants
42
+ - Reasoning benchmarks
43
+ - Downstream alignment (DPO / RLHF)
44
+
45
+ ---
46
+
47
+ ## ๐Ÿง  Key Capabilities
48
+
49
+ - Multi-step mathematical reasoning
50
+ - Algebra, arithmetic, and word problems
51
+ - Chain-of-thought style explanations
52
+ - Improved instruction adherence
53
+ - More stable reasoning compared to the base model
54
+
55
+ ---
56
+
57
+ ## ๐Ÿ—๏ธ Model Architecture
58
+
59
+ - **Architecture:** Decoder-only Transformer (Causal LM)
60
+ - **Parameters:** ~4B
61
+ - **Base Model:** Qwen3-4B-Instruct (Unsloth optimized)
62
+ - **Tokenization:** Qwen tokenizer
63
+ - **Context Length:** Same as base model
64
+
65
+ ---
66
+
67
+ ## ๐Ÿ“š Training Data
68
+
69
+ The model was fine-tuned on a curated dataset consisting of:
70
+
71
+ - Instruction-style math prompts
72
+ - Step-by-step mathematical solutions
73
+ - Reasoning-focused explanations
74
+
75
+ Data was filtered to emphasize:
76
+ - Logical consistency
77
+ - Clear intermediate steps
78
+ - Reduced ambiguity in solutions
79
+
80
+ > While care was taken to ensure quality, the dataset may still contain noise or biases present in public mathematical corpora.
81
+
82
+ ---
83
+
84
+ ## โš™๏ธ Training Details
85
+
86
+ - **Fine-tuning Method:** Supervised Fine-Tuning (SFT)
87
+ - **Frameworks:** Hugging Face Transformers + TRL
88
+ - **Acceleration:** Unsloth (memory-efficient & faster training)
89
+ - **Precision:** FP16 / BF16 (hardware dependent)
90
+ - **Optimizer:** AdamW
91
+ - **Loss Function:** Cross-entropy
92
+ - **Batching:** Gradient accumulation for memory efficiency
93
+
94
+ ---
95
+
96
+ ## ๐Ÿš€ Usage
97
+
98
+ ### Load the Model
99
+
100
+ ```python
101
+ from transformers import AutoModelForCausalLM, AutoTokenizer
102
+
103
+ model_id = "Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT"
104
+
105
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
106
+ model = AutoModelForCausalLM.from_pretrained(
107
+ model_id,
108
+ device_map="auto",
109
+ torch_dtype="auto"
110
+ )
111
+ ````
112
+
113
+ ### Example Inference
114
+
115
+ ```python
116
+ prompt = "Solve step by step: If 5x โˆ’ 10 = 15, find x."
117
+
118
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
119
+
120
+ outputs = model.generate(
121
+ **inputs,
122
+ max_new_tokens=256,
123
+ temperature=0.2,
124
+ do_sample=False
125
+ )
126
+
127
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
128
+ ```
129
+
130
+ ---
131
+
132
+ ## ๐Ÿ“Š Evaluation
133
+
134
+ The model was evaluated qualitatively on:
135
+
136
+ * Math word problems
137
+ * Algebraic equations
138
+ * Multi-step reasoning tasks
139
+
140
+ **Observed improvements vs base model:**
141
+
142
+ * Better structured reasoning
143
+ * More consistent intermediate steps
144
+ * Fewer incomplete solutions
145
+
146
+ Formal benchmark results (e.g., GSM8K, MATH) are planned for future updates.
147
+
148
+ ---
149
+
150
+ ## โš ๏ธ Limitations
151
+
152
+ * Not guaranteed to be mathematically correct in all cases
153
+ * Can be verbose due to reasoning-style outputs
154
+ * Not optimized for creative or non-technical writing
155
+ * Performance may degrade on extremely long or ambiguous prompts
156
+
157
+ ---
158
+
159
+ ## ๐Ÿ” Ethical & Responsible Use
160
+
161
+ * Intended for **research and educational purposes**
162
+ * Outputs should be verified for correctness in critical applications
163
+ * Not suitable for high-stakes decision-making without human oversight
164
+
165
+ ---
166
+
167
+ ## ๐Ÿ“œ License
168
+
169
+ Released under the **Apache 2.0 License**, consistent with the base Qwen3 model.
170
+
171
+ ---
172
+
173
+ ## ๐Ÿ™Œ Acknowledgements
174
+
175
+ * **Qwen Team** for the base Qwen3 architecture
176
+ * **Unsloth** for efficient fine-tuning optimizations
177
+ * **Hugging Face** for Transformers and TRL
178
+
179
+ ---
180
+
181
+ ## โœ‰๏ธ Author
182
+
183
+ **Harsha Vardhan Mannem**
184
+ AI / ML Engineer
185
+ Hugging Face & GitHub: **Harsha901**
186
+
187
+ ---
188
+
189
+ ## ๐Ÿ”ฎ Future Work
190
+
191
+ * Preference tuning with DPO
192
+ * Quantized inference (4-bit / 8-bit)
193
+ * Benchmark-based evaluation
194
+ * Deployment-optimized variants