| | --- |
| | language: |
| | - en |
| | - multilingual |
| | license: apache-2.0 |
| | library_name: transformers |
| | tags: |
| | - qwen |
| | - qwen3.5 |
| | - finetuned |
| | - astrophysics |
| | - science |
| | - cot |
| | - chain-of-thought |
| | - unsloth |
| | - lora |
| | - llama.cpp |
| | - gguf |
| | base_model: Qwen/Qwen3.5-0.8B |
| | --- |
| | |
| | # Qwen3.5-0.8B-Astro-Reasoning-v1 |
| |
|
| | This is a finetuned version of [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) specialized for **astrophysics problem-solving** and **chain-of-thought reasoning**. |
| |
|
| | ## Model Description |
| |
|
| | - **Base Model:** Qwen/Qwen3.5-0.8B |
| | - **Model Size:** 0.8B parameters |
| | - **Architecture:** Causal Language Model with Vision Encoder |
| | - **Context Length:** 1,024 tokens (training), up to 262,144 tokens (inference) |
| | - **Training Method:** LoRA (Low-Rank Adaptation) |
| | - **Precision:** BF16 training, F16 inference (GGUF) |
| |
|
| | ## Training Details |
| |
|
| | ### Hardware |
| | - **GPU:** NVIDIA GeForce RTX 3060 (12GB VRAM) |
| | - **Training Framework:** Unsloth (4-bit quantization) |
| | - **Training Time:** ~32 minutes |
| | - **Effective Batch Size:** 8 (batch_size=1, gradient_accumulation=8) |
| |
|
| | ### Hyperparameters |
| | | Parameter | Value | |
| | |-----------|-------| |
| | | LoRA Rank (r) | 8 | |
| | | LoRA Alpha | 8 | |
| | | Learning Rate | 2e-4 | |
| | | Max Steps | 300 | |
| | | Warmup Steps | 10 | |
| | | Sequence Length | 1,024 | |
| | | Optimizer | adamw_8bit | |
| | | Weight Decay | 0.01 | |
| | |
| | ### Training Results |
| | - **Final Loss:** 1.656 |
| | - **Loss Reduction:** 14% (from 1.924 to 1.656) |
| | - **Epochs:** 0.22 |
| | |
| | ## Dataset |
| | |
| | The model was finetuned on 12,357 high-quality examples from two sources: |
| | |
| | ### 1. Gemini-3 Pro Dataset (10,031 examples) |
| | - **Domain:** Astrophysics |
| | - **Difficulty:** Extreme-level problems |
| | - **Content:** Complex astrophysical concepts including: |
| | - Eddington Luminosity in Porous Atmospheres |
| | - Electron Capture Supernovae (ECSN) |
| | - Beta Cephei Pulsations |
| | - Type Ia Supernova Progenitors |
| | - Neutrino Oscillations |
| | - CNO Cycle Branching |
| | - Gravitational Radiation Reaction |
| | - And more... |
| | |
| | ### 2. Distilled Corpus (2,326 examples) |
| | - **Domains:** Mathematics, coding, natural language inference |
| | - **Features:** Chain-of-thought reasoning with detailed solutions |
| | - **Format:** Problem β Thinking β Solution |
| | |
| | ## Model Capabilities |
| | |
| | This model excels at: |
| | - β
**Astrophysics problem-solving** with step-by-step reasoning |
| | - β
**Complex scientific calculations** and derivations |
| | - β
**Chain-of-thought reasoning** for multi-step problems |
| | - β
**Mathematical reasoning** with detailed explanations |
| | - β
**Technical documentation** and analysis |
| | |
| | ## Usage |
| | |
| | ### With llama.cpp (Recommended) |
| | |
| | ```bash |
| | llama-cli \ |
| | -m qwen3.5-0.8b-astro-reasoning-v1.gguf \ |
| | --chat-template chatml \ |
| | -c 2048 \ |
| | -n 512 \ |
| | --temp 0.7 \ |
| | -cnv |
| | ``` |
| | |
| | ### With Python (Transformers) |
| | |
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | model = AutoModelForCausalLM.from_pretrained( |
| | "firmanda/qwen3.5-0.8b-astro-reasoning-v1" |
| | torch_dtype="auto", |
| | device_map="auto" |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained("firmanda/qwen3.5-0.8b-astro-reasoning-v1") |
| | |
| | prompt = "Solve this astrophysics problem: What is the Eddington luminosity?" |
| | messages = [{"role": "user", "content": prompt}] |
| | text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| |
|
| | inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| | outputs = model.generate(**inputs, max_new_tokens=512) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | ``` |
| | |
| | ### With Unsloth (For Further Training) |
| | |
| | ```python |
| | from unsloth import FastLanguageModel |
| | |
| | model, tokenizer = FastLanguageModel.from_pretrained( |
| | model_name="firmanda/qwen3.5-0.8b-astro-reasoning-v1", |
| | max_seq_length=2048, |
| | load_in_4bit=True, |
| | ) |
| | ``` |
| | |
| | ## Example Outputs |
| | |
| | ### Example 1: Astrophysics Problem |
| | ``` |
| | User: Calculate the Eddington luminosity for a 10 solar mass star. |
| | |
| | Assistant: The Eddington luminosity is given by: |
| | |
| | L_Edd = (4ΟGMc) / ΞΊ |
| | |
| | Where: |
| | - G = 6.674 Γ 10^-11 m^3 kg^-1 s^-2 |
| | - M = 10 M_β = 1.989 Γ 10^31 kg |
| | - c = 3 Γ 10^8 m/s |
| | - ΞΊ = 0.34 m^2/kg (Thomson scattering for ionized hydrogen) |
| | |
| | Calculation: |
| | L_Edd = (4Ο Γ 6.674 Γ 10^-11 Γ 1.989 Γ 10^31 Γ 3 Γ 10^8) / 0.34 |
| | L_Edd β 1.26 Γ 10^32 W β 3.28 Γ 10^5 L_β |
| | |
| | This is the maximum luminosity where radiation pressure doesn't overcome gravity. |
| | ``` |
| | |
| | ### Example 2: Chain-of-Thought Reasoning |
| | ``` |
| | User: A class of 12 students shares 108 oranges, but 36 are bad. |
| | How many fewer oranges per student? |
| | |
| | Assistant: Let me solve this step by step: |
| | |
| | 1. Total oranges: 108 |
| | 2. Bad oranges: 36 |
| | 3. Good oranges: 108 - 36 = 72 |
| | |
| | Original distribution: |
| | - 108 oranges / 12 students = 9 oranges each |
| | |
| | New distribution: |
| | - 72 oranges / 12 students = 6 oranges each |
| | |
| | Difference: 9 - 6 = 3 oranges fewer per student. |
| | ``` |
| | |
| | ## Model Limitations |
| | |
| | - **Context Window:** Optimized for 1,024 tokens during training (can handle up to 262,144 for inference) |
| | - **Domain Specificity:** Best performance on astrophysics and scientific reasoning; may underperform on general chat |
| | - **Factual Accuracy:** While trained on scientific content, always verify critical calculations |
| | - **Language:** Primarily trained on English content |
| | - **Reasoning Mode:** Qwen3.5 0.8B operates in non-thinking mode by default |
| | |
| | ## Evaluation |
| | |
| | The model was evaluated on: |
| | - Training loss reduction: **14% improvement** |
| | - Gradient norms remained stable throughout training |
| | - No signs of overfitting observed |
| | |
| | ### Hardware Compatibility |
| | |
| | **Minimum Requirements:** |
| | - **Inference:** 2GB VRAM (F16 GGUF) |
| | - **Training:** 8GB+ VRAM recommended |
| | |
| | **Tested On:** |
| | - NVIDIA RTX 3060 12GB (training & inference) |
| | |
| | ## Files Included |
| | |
| | ``` |
| | qwen3.5-0.8b-astro-reasoning-v1/ |
| | βββ config.json # Model configuration |
| | βββ model.safetensors # Model weights (LoRA adapters) |
| | βββ README.md # This file |
| | βββ qwen3.5-0.8b-astro-reasoning-v1.gguf # GGUF format for llama.cpp |
| | βββ training_info.md # Detailed training logs |
| | ``` |
| | ## Acknowledgments |
| | |
| | - **Base Model:** [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) by Alibaba Cloud Qwen Team |
| | - **Training Framework:** [Unsloth](https://github.com/unslothai/unsloth) for efficient finetuning |
| | - **GGUF Conversion:** [llama.cpp](https://github.com/ggerganov/llama.cpp) for optimized inference |
| | |
| | ## License |
| | |
| | This model is licensed under the Apache 2.0 License, same as the base Qwen3.5 model. |
| | |
| | --- |
| | |
| | **Last Updated:** March 2026 |
| | **Model Version:** v1.0 |
| | |