File size: 6,397 Bytes
071b8ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
language:
- en
- multilingual
license: apache-2.0
library_name: transformers
tags:
- qwen
- qwen3.5
- finetuned
- astrophysics
- science
- cot
- chain-of-thought
- unsloth
- lora
- llama.cpp
- gguf
base_model: Qwen/Qwen3.5-0.8B
---

# Qwen3.5-0.8B-Astro-Reasoning-v1

This is a finetuned version of [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) specialized for **astrophysics problem-solving** and **chain-of-thought reasoning**.

## Model Description

- **Base Model:** Qwen/Qwen3.5-0.8B
- **Model Size:** 0.8B parameters
- **Architecture:** Causal Language Model with Vision Encoder
- **Context Length:** 1,024 tokens (training), up to 262,144 tokens (inference)
- **Training Method:** LoRA (Low-Rank Adaptation)
- **Precision:** BF16 training, F16 inference (GGUF)

## Training Details

### Hardware
- **GPU:** NVIDIA GeForce RTX 3060 (12GB VRAM)
- **Training Framework:** Unsloth (4-bit quantization)
- **Training Time:** ~32 minutes
- **Effective Batch Size:** 8 (batch_size=1, gradient_accumulation=8)

### Hyperparameters
| Parameter | Value |
|-----------|-------|
| LoRA Rank (r) | 8 |
| LoRA Alpha | 8 |
| Learning Rate | 2e-4 |
| Max Steps | 300 |
| Warmup Steps | 10 |
| Sequence Length | 1,024 |
| Optimizer | adamw_8bit |
| Weight Decay | 0.01 |

### Training Results
- **Final Loss:** 1.656
- **Loss Reduction:** 14% (from 1.924 to 1.656)
- **Epochs:** 0.22

## Dataset

The model was finetuned on 12,357 high-quality examples from two sources:

### 1. Gemini-3 Pro Dataset (10,031 examples)
- **Domain:** Astrophysics
- **Difficulty:** Extreme-level problems
- **Content:** Complex astrophysical concepts including:
  - Eddington Luminosity in Porous Atmospheres
  - Electron Capture Supernovae (ECSN)
  - Beta Cephei Pulsations
  - Type Ia Supernova Progenitors
  - Neutrino Oscillations
  - CNO Cycle Branching
  - Gravitational Radiation Reaction
  - And more...

### 2. Distilled Corpus (2,326 examples)
- **Domains:** Mathematics, coding, natural language inference
- **Features:** Chain-of-thought reasoning with detailed solutions
- **Format:** Problem β†’ Thinking β†’ Solution

## Model Capabilities

This model excels at:
- βœ… **Astrophysics problem-solving** with step-by-step reasoning
- βœ… **Complex scientific calculations** and derivations
- βœ… **Chain-of-thought reasoning** for multi-step problems
- βœ… **Mathematical reasoning** with detailed explanations
- βœ… **Technical documentation** and analysis

## Usage

### With llama.cpp (Recommended)

```bash
llama-cli \
    -m qwen3.5-0.8b-astro-reasoning-v1.gguf \
    --chat-template chatml \
    -c 2048 \
    -n 512 \
    --temp 0.7 \
    -cnv
```

### With Python (Transformers)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "firmanda/qwen3.5-0.8b-astro-reasoning-v1"
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("firmanda/qwen3.5-0.8b-astro-reasoning-v1")

prompt = "Solve this astrophysics problem: What is the Eddington luminosity?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### With Unsloth (For Further Training)

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="firmanda/qwen3.5-0.8b-astro-reasoning-v1",
    max_seq_length=2048,
    load_in_4bit=True,
)
```

## Example Outputs

### Example 1: Astrophysics Problem
```
User: Calculate the Eddington luminosity for a 10 solar mass star.

Assistant: The Eddington luminosity is given by:

L_Edd = (4Ο€GMc) / ΞΊ

Where:
- G = 6.674 Γ— 10^-11 m^3 kg^-1 s^-2
- M = 10 M_β˜‰ = 1.989 Γ— 10^31 kg
- c = 3 Γ— 10^8 m/s
- ΞΊ = 0.34 m^2/kg (Thomson scattering for ionized hydrogen)

Calculation:
L_Edd = (4Ο€ Γ— 6.674 Γ— 10^-11 Γ— 1.989 Γ— 10^31 Γ— 3 Γ— 10^8) / 0.34
L_Edd β‰ˆ 1.26 Γ— 10^32 W β‰ˆ 3.28 Γ— 10^5 L_β˜‰

This is the maximum luminosity where radiation pressure doesn't overcome gravity.
```

### Example 2: Chain-of-Thought Reasoning
```
User: A class of 12 students shares 108 oranges, but 36 are bad. 
How many fewer oranges per student?

Assistant: Let me solve this step by step:

1. Total oranges: 108
2. Bad oranges: 36
3. Good oranges: 108 - 36 = 72

Original distribution:
- 108 oranges / 12 students = 9 oranges each

New distribution:
- 72 oranges / 12 students = 6 oranges each

Difference: 9 - 6 = 3 oranges fewer per student.
```

## Model Limitations

- **Context Window:** Optimized for 1,024 tokens during training (can handle up to 262,144 for inference)
- **Domain Specificity:** Best performance on astrophysics and scientific reasoning; may underperform on general chat
- **Factual Accuracy:** While trained on scientific content, always verify critical calculations
- **Language:** Primarily trained on English content
- **Reasoning Mode:** Qwen3.5 0.8B operates in non-thinking mode by default

## Evaluation

The model was evaluated on:
- Training loss reduction: **14% improvement**
- Gradient norms remained stable throughout training
- No signs of overfitting observed

### Hardware Compatibility

**Minimum Requirements:**
- **Inference:** 2GB VRAM (F16 GGUF)
- **Training:** 8GB+ VRAM recommended

**Tested On:**
- NVIDIA RTX 3060 12GB (training & inference)

## Files Included

```
qwen3.5-0.8b-astro-reasoning-v1/
β”œβ”€β”€ config.json                    # Model configuration
β”œβ”€β”€ model.safetensors              # Model weights (LoRA adapters)
β”œβ”€β”€ README.md                      # This file
β”œβ”€β”€ qwen3.5-0.8b-astro-reasoning-v1.gguf  # GGUF format for llama.cpp
└── training_info.md               # Detailed training logs
```
## Acknowledgments

- **Base Model:** [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) by Alibaba Cloud Qwen Team
- **Training Framework:** [Unsloth](https://github.com/unslothai/unsloth) for efficient finetuning
- **GGUF Conversion:** [llama.cpp](https://github.com/ggerganov/llama.cpp) for optimized inference

## License

This model is licensed under the Apache 2.0 License, same as the base Qwen3.5 model.

---

**Last Updated:** March 2026  
**Model Version:** v1.0