firmanda commited on
Commit
071b8ef
Β·
verified Β·
1 Parent(s): aaca36a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +254 -3
README.md CHANGED
@@ -1,3 +1,254 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - multilingual
5
+ license: apache-2.0
6
+ library_name: transformers
7
+ tags:
8
+ - qwen
9
+ - qwen3.5
10
+ - finetuned
11
+ - astrophysics
12
+ - science
13
+ - cot
14
+ - chain-of-thought
15
+ - unsloth
16
+ - lora
17
+ - llama.cpp
18
+ - gguf
19
+ base_model: Qwen/Qwen3.5-0.8B
20
+ ---
21
+
22
+ # Qwen3.5-0.8B-Astro-Reasoning-v1
23
+
24
+ This is a finetuned version of [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) specialized for **astrophysics problem-solving** and **chain-of-thought reasoning**.
25
+
26
+ ## Model Description
27
+
28
+ - **Base Model:** Qwen/Qwen3.5-0.8B
29
+ - **Model Size:** 0.8B parameters
30
+ - **Architecture:** Causal Language Model with Vision Encoder
31
+ - **Context Length:** 1,024 tokens (training), up to 262,144 tokens (inference)
32
+ - **Training Method:** LoRA (Low-Rank Adaptation)
33
+ - **Precision:** BF16 training, F16 inference (GGUF)
34
+
35
+ ## Training Details
36
+
37
+ ### Hardware
38
+ - **GPU:** NVIDIA GeForce RTX 3060 (12GB VRAM)
39
+ - **Training Framework:** Unsloth (4-bit quantization)
40
+ - **Training Time:** ~32 minutes
41
+ - **Effective Batch Size:** 8 (batch_size=1, gradient_accumulation=8)
42
+
43
+ ### Hyperparameters
44
+ | Parameter | Value |
45
+ |-----------|-------|
46
+ | LoRA Rank (r) | 8 |
47
+ | LoRA Alpha | 8 |
48
+ | Learning Rate | 2e-4 |
49
+ | Max Steps | 300 |
50
+ | Warmup Steps | 10 |
51
+ | Sequence Length | 1,024 |
52
+ | Optimizer | adamw_8bit |
53
+ | Weight Decay | 0.01 |
54
+
55
+ ### Training Results
56
+ - **Final Loss:** 1.656
57
+ - **Loss Reduction:** 14% (from 1.924 to 1.656)
58
+ - **Epochs:** 0.22
59
+
60
+ ## Dataset
61
+
62
+ The model was finetuned on 12,357 high-quality examples from two sources:
63
+
64
+ ### 1. Gemini-3 Pro Dataset (10,031 examples)
65
+ - **Domain:** Astrophysics
66
+ - **Difficulty:** Extreme-level problems
67
+ - **Content:** Complex astrophysical concepts including:
68
+ - Eddington Luminosity in Porous Atmospheres
69
+ - Electron Capture Supernovae (ECSN)
70
+ - Beta Cephei Pulsations
71
+ - Type Ia Supernova Progenitors
72
+ - Neutrino Oscillations
73
+ - CNO Cycle Branching
74
+ - Gravitational Radiation Reaction
75
+ - And more...
76
+
77
+ ### 2. Distilled Corpus (2,326 examples)
78
+ - **Domains:** Mathematics, coding, natural language inference
79
+ - **Features:** Chain-of-thought reasoning with detailed solutions
80
+ - **Format:** Problem β†’ Thinking β†’ Solution
81
+
82
+ ## Model Capabilities
83
+
84
+ This model excels at:
85
+ - βœ… **Astrophysics problem-solving** with step-by-step reasoning
86
+ - βœ… **Complex scientific calculations** and derivations
87
+ - βœ… **Chain-of-thought reasoning** for multi-step problems
88
+ - βœ… **Mathematical reasoning** with detailed explanations
89
+ - βœ… **Technical documentation** and analysis
90
+
91
+ ## Usage
92
+
93
+ ### With llama.cpp (Recommended)
94
+
95
+ ```bash
96
+ llama-cli \
97
+ -m qwen3.5-0.8b-astro-reasoning-v1.gguf \
98
+ --chat-template chatml \
99
+ -c 2048 \
100
+ -n 512 \
101
+ --temp 0.7 \
102
+ -cnv
103
+ ```
104
+
105
+ ### With Python (Transformers)
106
+
107
+ ```python
108
+ from transformers import AutoModelForCausalLM, AutoTokenizer
109
+
110
+ model = AutoModelForCausalLM.from_pretrained(
111
+ "firmanda/qwen3.5-0.8b-astro-reasoning-v1"
112
+ torch_dtype="auto",
113
+ device_map="auto"
114
+ )
115
+ tokenizer = AutoTokenizer.from_pretrained("firmanda/qwen3.5-0.8b-astro-reasoning-v1")
116
+
117
+ prompt = "Solve this astrophysics problem: What is the Eddington luminosity?"
118
+ messages = [{"role": "user", "content": prompt}]
119
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
120
+
121
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
122
+ outputs = model.generate(**inputs, max_new_tokens=512)
123
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
124
+ ```
125
+
126
+ ### With Unsloth (For Further Training)
127
+
128
+ ```python
129
+ from unsloth import FastLanguageModel
130
+
131
+ model, tokenizer = FastLanguageModel.from_pretrained(
132
+ model_name="firmanda/qwen3.5-0.8b-astro-reasoning-v1",
133
+ max_seq_length=2048,
134
+ load_in_4bit=True,
135
+ )
136
+ ```
137
+
138
+ ## Example Outputs
139
+
140
+ ### Example 1: Astrophysics Problem
141
+ ```
142
+ User: Calculate the Eddington luminosity for a 10 solar mass star.
143
+
144
+ Assistant: The Eddington luminosity is given by:
145
+
146
+ L_Edd = (4Ο€GMc) / ΞΊ
147
+
148
+ Where:
149
+ - G = 6.674 Γ— 10^-11 m^3 kg^-1 s^-2
150
+ - M = 10 M_β˜‰ = 1.989 Γ— 10^31 kg
151
+ - c = 3 Γ— 10^8 m/s
152
+ - ΞΊ = 0.34 m^2/kg (Thomson scattering for ionized hydrogen)
153
+
154
+ Calculation:
155
+ L_Edd = (4Ο€ Γ— 6.674 Γ— 10^-11 Γ— 1.989 Γ— 10^31 Γ— 3 Γ— 10^8) / 0.34
156
+ L_Edd β‰ˆ 1.26 Γ— 10^32 W β‰ˆ 3.28 Γ— 10^5 L_β˜‰
157
+
158
+ This is the maximum luminosity where radiation pressure doesn't overcome gravity.
159
+ ```
160
+
161
+ ### Example 2: Chain-of-Thought Reasoning
162
+ ```
163
+ User: A class of 12 students shares 108 oranges, but 36 are bad.
164
+ How many fewer oranges per student?
165
+
166
+ Assistant: Let me solve this step by step:
167
+
168
+ 1. Total oranges: 108
169
+ 2. Bad oranges: 36
170
+ 3. Good oranges: 108 - 36 = 72
171
+
172
+ Original distribution:
173
+ - 108 oranges / 12 students = 9 oranges each
174
+
175
+ New distribution:
176
+ - 72 oranges / 12 students = 6 oranges each
177
+
178
+ Difference: 9 - 6 = 3 oranges fewer per student.
179
+ ```
180
+
181
+ ## Model Limitations
182
+
183
+ - **Context Window:** Optimized for 1,024 tokens during training (can handle up to 262,144 for inference)
184
+ - **Domain Specificity:** Best performance on astrophysics and scientific reasoning; may underperform on general chat
185
+ - **Factual Accuracy:** While trained on scientific content, always verify critical calculations
186
+ - **Language:** Primarily trained on English content
187
+ - **Reasoning Mode:** Qwen3.5 0.8B operates in non-thinking mode by default
188
+
189
+ ## Evaluation
190
+
191
+ The model was evaluated on:
192
+ - Training loss reduction: **14% improvement**
193
+ - Gradient norms remained stable throughout training
194
+ - No signs of overfitting observed
195
+
196
+ ### Hardware Compatibility
197
+
198
+ **Minimum Requirements:**
199
+ - **Inference:** 2GB VRAM (F16 GGUF)
200
+ - **Training:** 8GB+ VRAM recommended
201
+
202
+ **Tested On:**
203
+ - NVIDIA RTX 3060 12GB (training & inference)
204
+
205
+ ## Files Included
206
+
207
+ ```
208
+ qwen3.5-0.8b-astro-reasoning-v1/
209
+ β”œβ”€β”€ config.json # Model configuration
210
+ β”œβ”€β”€ model.safetensors # Model weights (LoRA adapters)
211
+ β”œβ”€β”€ README.md # This file
212
+ β”œβ”€β”€ qwen3.5-0.8b-astro-reasoning-v1.gguf # GGUF format for llama.cpp
213
+ └── training_info.md # Detailed training logs
214
+ ```
215
+
216
+ ## Citation
217
+
218
+ If you use this model, please cite:
219
+
220
+ ```bibtex
221
+ @misc{qwen3.5-0.8b-astro-reasoning-v1,
222
+ title={Qwen3.5-0.8B-Astro-Reasoning-v1: A Finetuned Model for Astrophysics Problem-Solving},
223
+ author={Your Name},
224
+ year={2026},
225
+ howpublished={HuggingFace Model Hub}
226
+ }
227
+
228
+ @article{qwen3.5,
229
+ title={Qwen3.5: Towards Native Multimodal Agents},
230
+ author={Qwen Team},
231
+ year={2026}
232
+ }
233
+ ```
234
+
235
+ ## Acknowledgments
236
+
237
+ - **Base Model:** [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) by Alibaba Cloud Qwen Team
238
+ - **Training Framework:** [Unsloth](https://github.com/unslothai/unsloth) for efficient finetuning
239
+ - **GGUF Conversion:** [llama.cpp](https://github.com/ggerganov/llama.cpp) for optimized inference
240
+
241
+ ## License
242
+
243
+ This model is licensed under the Apache 2.0 License, same as the base Qwen3.5 model.
244
+
245
+ ## Contact & Issues
246
+
247
+ For questions or issues:
248
+ - Open an issue on HuggingFace Hub
249
+ - Contact: [Your contact information]
250
+
251
+ ---
252
+
253
+ **Last Updated:** March 2026
254
+ **Model Version:** v1.0