Qwen3.5-0.8B-Astro-Reasoning-v1
This is a finetuned version of Qwen3.5-0.8B specialized for astrophysics problem-solving and chain-of-thought reasoning.
Model Description
- Base Model: Qwen/Qwen3.5-0.8B
- Model Size: 0.8B parameters
- Architecture: Causal Language Model with Vision Encoder
- Context Length: 1,024 tokens (training), up to 262,144 tokens (inference)
- Training Method: LoRA (Low-Rank Adaptation)
- Precision: BF16 training, F16 inference (GGUF)
Training Details
Hardware
- GPU: NVIDIA GeForce RTX 3060 (12GB VRAM)
- Training Framework: Unsloth (4-bit quantization)
- Training Time: ~32 minutes
- Effective Batch Size: 8 (batch_size=1, gradient_accumulation=8)
Hyperparameters
| Parameter | Value |
|---|---|
| LoRA Rank (r) | 8 |
| LoRA Alpha | 8 |
| Learning Rate | 2e-4 |
| Max Steps | 300 |
| Warmup Steps | 10 |
| Sequence Length | 1,024 |
| Optimizer | adamw_8bit |
| Weight Decay | 0.01 |
Training Results
- Final Loss: 1.656
- Loss Reduction: 14% (from 1.924 to 1.656)
- Epochs: 0.22
Dataset
The model was finetuned on 12,357 high-quality examples from two sources:
1. Gemini-3 Pro Dataset (10,031 examples)
- Domain: Astrophysics
- Difficulty: Extreme-level problems
- Content: Complex astrophysical concepts including:
- Eddington Luminosity in Porous Atmospheres
- Electron Capture Supernovae (ECSN)
- Beta Cephei Pulsations
- Type Ia Supernova Progenitors
- Neutrino Oscillations
- CNO Cycle Branching
- Gravitational Radiation Reaction
- And more...
2. Distilled Corpus (2,326 examples)
- Domains: Mathematics, coding, natural language inference
- Features: Chain-of-thought reasoning with detailed solutions
- Format: Problem → Thinking → Solution
Model Capabilities
This model excels at:
- ✅ Astrophysics problem-solving with step-by-step reasoning
- ✅ Complex scientific calculations and derivations
- ✅ Chain-of-thought reasoning for multi-step problems
- ✅ Mathematical reasoning with detailed explanations
- ✅ Technical documentation and analysis
Usage
With llama.cpp (Recommended)
llama-cli \
-m qwen3.5-0.8b-astro-reasoning-v1.gguf \
--chat-template chatml \
-c 2048 \
-n 512 \
--temp 0.7 \
-cnv
With Python (Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"firmanda/qwen3.5-0.8b-astro-reasoning-v1"
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("firmanda/qwen3.5-0.8b-astro-reasoning-v1")
prompt = "Solve this astrophysics problem: What is the Eddington luminosity?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With Unsloth (For Further Training)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="firmanda/qwen3.5-0.8b-astro-reasoning-v1",
max_seq_length=2048,
load_in_4bit=True,
)
Example Outputs
Example 1: Astrophysics Problem
User: Calculate the Eddington luminosity for a 10 solar mass star.
Assistant: The Eddington luminosity is given by:
L_Edd = (4πGMc) / κ
Where:
- G = 6.674 × 10^-11 m^3 kg^-1 s^-2
- M = 10 M_☉ = 1.989 × 10^31 kg
- c = 3 × 10^8 m/s
- κ = 0.34 m^2/kg (Thomson scattering for ionized hydrogen)
Calculation:
L_Edd = (4π × 6.674 × 10^-11 × 1.989 × 10^31 × 3 × 10^8) / 0.34
L_Edd ≈ 1.26 × 10^32 W ≈ 3.28 × 10^5 L_☉
This is the maximum luminosity where radiation pressure doesn't overcome gravity.
Example 2: Chain-of-Thought Reasoning
User: A class of 12 students shares 108 oranges, but 36 are bad.
How many fewer oranges per student?
Assistant: Let me solve this step by step:
1. Total oranges: 108
2. Bad oranges: 36
3. Good oranges: 108 - 36 = 72
Original distribution:
- 108 oranges / 12 students = 9 oranges each
New distribution:
- 72 oranges / 12 students = 6 oranges each
Difference: 9 - 6 = 3 oranges fewer per student.
Model Limitations
- Context Window: Optimized for 1,024 tokens during training (can handle up to 262,144 for inference)
- Domain Specificity: Best performance on astrophysics and scientific reasoning; may underperform on general chat
- Factual Accuracy: While trained on scientific content, always verify critical calculations
- Language: Primarily trained on English content
- Reasoning Mode: Qwen3.5 0.8B operates in non-thinking mode by default
Evaluation
The model was evaluated on:
- Training loss reduction: 14% improvement
- Gradient norms remained stable throughout training
- No signs of overfitting observed
Hardware Compatibility
Minimum Requirements:
- Inference: 2GB VRAM (F16 GGUF)
- Training: 8GB+ VRAM recommended
Tested On:
- NVIDIA RTX 3060 12GB (training & inference)
Files Included
qwen3.5-0.8b-astro-reasoning-v1/
├── config.json # Model configuration
├── model.safetensors # Model weights (LoRA adapters)
├── README.md # This file
├── qwen3.5-0.8b-astro-reasoning-v1.gguf # GGUF format for llama.cpp
└── training_info.md # Detailed training logs
Acknowledgments
- Base Model: Qwen3.5-0.8B by Alibaba Cloud Qwen Team
- Training Framework: Unsloth for efficient finetuning
- GGUF Conversion: llama.cpp for optimized inference
License
This model is licensed under the Apache 2.0 License, same as the base Qwen3.5 model.
Last Updated: March 2026
Model Version: v1.0
- Downloads last month
- 343
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support