--- language: - en - multilingual license: apache-2.0 library_name: transformers tags: - qwen - qwen3.5 - finetuned - astrophysics - science - cot - chain-of-thought - unsloth - lora - llama.cpp - gguf base_model: Qwen/Qwen3.5-0.8B --- # Qwen3.5-0.8B-Astro-Reasoning-v1 This is a finetuned version of [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) specialized for **astrophysics problem-solving** and **chain-of-thought reasoning**. ## Model Description - **Base Model:** Qwen/Qwen3.5-0.8B - **Model Size:** 0.8B parameters - **Architecture:** Causal Language Model with Vision Encoder - **Context Length:** 1,024 tokens (training), up to 262,144 tokens (inference) - **Training Method:** LoRA (Low-Rank Adaptation) - **Precision:** BF16 training, F16 inference (GGUF) ## Training Details ### Hardware - **GPU:** NVIDIA GeForce RTX 3060 (12GB VRAM) - **Training Framework:** Unsloth (4-bit quantization) - **Training Time:** ~32 minutes - **Effective Batch Size:** 8 (batch_size=1, gradient_accumulation=8) ### Hyperparameters | Parameter | Value | |-----------|-------| | LoRA Rank (r) | 8 | | LoRA Alpha | 8 | | Learning Rate | 2e-4 | | Max Steps | 300 | | Warmup Steps | 10 | | Sequence Length | 1,024 | | Optimizer | adamw_8bit | | Weight Decay | 0.01 | ### Training Results - **Final Loss:** 1.656 - **Loss Reduction:** 14% (from 1.924 to 1.656) - **Epochs:** 0.22 ## Dataset The model was finetuned on 12,357 high-quality examples from two sources: ### 1. Gemini-3 Pro Dataset (10,031 examples) - **Domain:** Astrophysics - **Difficulty:** Extreme-level problems - **Content:** Complex astrophysical concepts including: - Eddington Luminosity in Porous Atmospheres - Electron Capture Supernovae (ECSN) - Beta Cephei Pulsations - Type Ia Supernova Progenitors - Neutrino Oscillations - CNO Cycle Branching - Gravitational Radiation Reaction - And more... ### 2. Distilled Corpus (2,326 examples) - **Domains:** Mathematics, coding, natural language inference - **Features:** Chain-of-thought reasoning with detailed solutions - **Format:** Problem → Thinking → Solution ## Model Capabilities This model excels at: - ✅ **Astrophysics problem-solving** with step-by-step reasoning - ✅ **Complex scientific calculations** and derivations - ✅ **Chain-of-thought reasoning** for multi-step problems - ✅ **Mathematical reasoning** with detailed explanations - ✅ **Technical documentation** and analysis ## Usage ### With llama.cpp (Recommended) ```bash llama-cli \ -m qwen3.5-0.8b-astro-reasoning-v1.gguf \ --chat-template chatml \ -c 2048 \ -n 512 \ --temp 0.7 \ -cnv ``` ### With Python (Transformers) ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "firmanda/qwen3.5-0.8b-astro-reasoning-v1" torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("firmanda/qwen3.5-0.8b-astro-reasoning-v1") prompt = "Solve this astrophysics problem: What is the Eddington luminosity?" messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### With Unsloth (For Further Training) ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="firmanda/qwen3.5-0.8b-astro-reasoning-v1", max_seq_length=2048, load_in_4bit=True, ) ``` ## Example Outputs ### Example 1: Astrophysics Problem ``` User: Calculate the Eddington luminosity for a 10 solar mass star. Assistant: The Eddington luminosity is given by: L_Edd = (4πGMc) / κ Where: - G = 6.674 × 10^-11 m^3 kg^-1 s^-2 - M = 10 M_☉ = 1.989 × 10^31 kg - c = 3 × 10^8 m/s - κ = 0.34 m^2/kg (Thomson scattering for ionized hydrogen) Calculation: L_Edd = (4π × 6.674 × 10^-11 × 1.989 × 10^31 × 3 × 10^8) / 0.34 L_Edd ≈ 1.26 × 10^32 W ≈ 3.28 × 10^5 L_☉ This is the maximum luminosity where radiation pressure doesn't overcome gravity. ``` ### Example 2: Chain-of-Thought Reasoning ``` User: A class of 12 students shares 108 oranges, but 36 are bad. How many fewer oranges per student? Assistant: Let me solve this step by step: 1. Total oranges: 108 2. Bad oranges: 36 3. Good oranges: 108 - 36 = 72 Original distribution: - 108 oranges / 12 students = 9 oranges each New distribution: - 72 oranges / 12 students = 6 oranges each Difference: 9 - 6 = 3 oranges fewer per student. ``` ## Model Limitations - **Context Window:** Optimized for 1,024 tokens during training (can handle up to 262,144 for inference) - **Domain Specificity:** Best performance on astrophysics and scientific reasoning; may underperform on general chat - **Factual Accuracy:** While trained on scientific content, always verify critical calculations - **Language:** Primarily trained on English content - **Reasoning Mode:** Qwen3.5 0.8B operates in non-thinking mode by default ## Evaluation The model was evaluated on: - Training loss reduction: **14% improvement** - Gradient norms remained stable throughout training - No signs of overfitting observed ### Hardware Compatibility **Minimum Requirements:** - **Inference:** 2GB VRAM (F16 GGUF) - **Training:** 8GB+ VRAM recommended **Tested On:** - NVIDIA RTX 3060 12GB (training & inference) ## Files Included ``` qwen3.5-0.8b-astro-reasoning-v1/ ├── config.json # Model configuration ├── model.safetensors # Model weights (LoRA adapters) ├── README.md # This file ├── qwen3.5-0.8b-astro-reasoning-v1.gguf # GGUF format for llama.cpp └── training_info.md # Detailed training logs ``` ## Acknowledgments - **Base Model:** [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) by Alibaba Cloud Qwen Team - **Training Framework:** [Unsloth](https://github.com/unslothai/unsloth) for efficient finetuning - **GGUF Conversion:** [llama.cpp](https://github.com/ggerganov/llama.cpp) for optimized inference ## License This model is licensed under the Apache 2.0 License, same as the base Qwen3.5 model. --- **Last Updated:** March 2026 **Model Version:** v1.0