Qwen3.5-0.8B-Astro-Reasoning-v1

This is a finetuned version of Qwen3.5-0.8B specialized for astrophysics problem-solving and chain-of-thought reasoning.

Model Description

  • Base Model: Qwen/Qwen3.5-0.8B
  • Model Size: 0.8B parameters
  • Architecture: Causal Language Model with Vision Encoder
  • Context Length: 1,024 tokens (training), up to 262,144 tokens (inference)
  • Training Method: LoRA (Low-Rank Adaptation)
  • Precision: BF16 training, F16 inference (GGUF)

Training Details

Hardware

  • GPU: NVIDIA GeForce RTX 3060 (12GB VRAM)
  • Training Framework: Unsloth (4-bit quantization)
  • Training Time: ~32 minutes
  • Effective Batch Size: 8 (batch_size=1, gradient_accumulation=8)

Hyperparameters

Parameter Value
LoRA Rank (r) 8
LoRA Alpha 8
Learning Rate 2e-4
Max Steps 300
Warmup Steps 10
Sequence Length 1,024
Optimizer adamw_8bit
Weight Decay 0.01

Training Results

  • Final Loss: 1.656
  • Loss Reduction: 14% (from 1.924 to 1.656)
  • Epochs: 0.22

Dataset

The model was finetuned on 12,357 high-quality examples from two sources:

1. Gemini-3 Pro Dataset (10,031 examples)

  • Domain: Astrophysics
  • Difficulty: Extreme-level problems
  • Content: Complex astrophysical concepts including:
    • Eddington Luminosity in Porous Atmospheres
    • Electron Capture Supernovae (ECSN)
    • Beta Cephei Pulsations
    • Type Ia Supernova Progenitors
    • Neutrino Oscillations
    • CNO Cycle Branching
    • Gravitational Radiation Reaction
    • And more...

2. Distilled Corpus (2,326 examples)

  • Domains: Mathematics, coding, natural language inference
  • Features: Chain-of-thought reasoning with detailed solutions
  • Format: Problem → Thinking → Solution

Model Capabilities

This model excels at:

  • Astrophysics problem-solving with step-by-step reasoning
  • Complex scientific calculations and derivations
  • Chain-of-thought reasoning for multi-step problems
  • Mathematical reasoning with detailed explanations
  • Technical documentation and analysis

Usage

With llama.cpp (Recommended)

llama-cli \
    -m qwen3.5-0.8b-astro-reasoning-v1.gguf \
    --chat-template chatml \
    -c 2048 \
    -n 512 \
    --temp 0.7 \
    -cnv

With Python (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "firmanda/qwen3.5-0.8b-astro-reasoning-v1"
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("firmanda/qwen3.5-0.8b-astro-reasoning-v1")

prompt = "Solve this astrophysics problem: What is the Eddington luminosity?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With Unsloth (For Further Training)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="firmanda/qwen3.5-0.8b-astro-reasoning-v1",
    max_seq_length=2048,
    load_in_4bit=True,
)

Example Outputs

Example 1: Astrophysics Problem

User: Calculate the Eddington luminosity for a 10 solar mass star.

Assistant: The Eddington luminosity is given by:

L_Edd = (4πGMc) / κ

Where:
- G = 6.674 × 10^-11 m^3 kg^-1 s^-2
- M = 10 M_☉ = 1.989 × 10^31 kg
- c = 3 × 10^8 m/s
- κ = 0.34 m^2/kg (Thomson scattering for ionized hydrogen)

Calculation:
L_Edd = (4π × 6.674 × 10^-11 × 1.989 × 10^31 × 3 × 10^8) / 0.34
L_Edd ≈ 1.26 × 10^32 W ≈ 3.28 × 10^5 L_☉

This is the maximum luminosity where radiation pressure doesn't overcome gravity.

Example 2: Chain-of-Thought Reasoning

User: A class of 12 students shares 108 oranges, but 36 are bad. 
How many fewer oranges per student?

Assistant: Let me solve this step by step:

1. Total oranges: 108
2. Bad oranges: 36
3. Good oranges: 108 - 36 = 72

Original distribution:
- 108 oranges / 12 students = 9 oranges each

New distribution:
- 72 oranges / 12 students = 6 oranges each

Difference: 9 - 6 = 3 oranges fewer per student.

Model Limitations

  • Context Window: Optimized for 1,024 tokens during training (can handle up to 262,144 for inference)
  • Domain Specificity: Best performance on astrophysics and scientific reasoning; may underperform on general chat
  • Factual Accuracy: While trained on scientific content, always verify critical calculations
  • Language: Primarily trained on English content
  • Reasoning Mode: Qwen3.5 0.8B operates in non-thinking mode by default

Evaluation

The model was evaluated on:

  • Training loss reduction: 14% improvement
  • Gradient norms remained stable throughout training
  • No signs of overfitting observed

Hardware Compatibility

Minimum Requirements:

  • Inference: 2GB VRAM (F16 GGUF)
  • Training: 8GB+ VRAM recommended

Tested On:

  • NVIDIA RTX 3060 12GB (training & inference)

Files Included

qwen3.5-0.8b-astro-reasoning-v1/
├── config.json                    # Model configuration
├── model.safetensors              # Model weights (LoRA adapters)
├── README.md                      # This file
├── qwen3.5-0.8b-astro-reasoning-v1.gguf  # GGUF format for llama.cpp
└── training_info.md               # Detailed training logs

Acknowledgments

  • Base Model: Qwen3.5-0.8B by Alibaba Cloud Qwen Team
  • Training Framework: Unsloth for efficient finetuning
  • GGUF Conversion: llama.cpp for optimized inference

License

This model is licensed under the Apache 2.0 License, same as the base Qwen3.5 model.


Last Updated: March 2026
Model Version: v1.0

Downloads last month
343
GGUF
Model size
0.8B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

Finetuned
Qwen/Qwen3.5-0.8B
Adapter
(14)
this model