Instructions to use firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF", dtype="auto")

llama-cpp-python

How to use firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF",
	filename="Qwen3.5-0.8B-Astro-Math-LORA-r8.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF
# Run inference directly in the terminal:
llama cli -hf firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF
# Run inference directly in the terminal:
llama cli -hf firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF
# Run inference directly in the terminal:
./llama-cli -hf firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF
# Run inference directly in the terminal:
./build/bin/llama-cli -hf firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

Use Docker

docker model run hf.co/firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

LM Studio
Jan
Ollama
How to use firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF with Ollama:
```
ollama run hf.co/firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF
```

Unsloth Studio

How to use firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF to start chatting

How to use firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF with Docker Model Runner:
```
docker model run hf.co/firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF
```

Lemonade

How to use firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

Run and chat with the model

lemonade run user.Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF-{{QUANT_TAG}}

List all available models

lemonade list

Qwen3.5-0.8B-Astro-Reasoning-v1

This is a finetuned version of Qwen3.5-0.8B specialized for astrophysics problem-solving and chain-of-thought reasoning.

Model Description

Base Model: Qwen/Qwen3.5-0.8B
Model Size: 0.8B parameters
Architecture: Causal Language Model with Vision Encoder
Context Length: 1,024 tokens (training), up to 262,144 tokens (inference)
Training Method: LoRA (Low-Rank Adaptation)
Precision: BF16 training, F16 inference (GGUF)

Training Details

Hardware

GPU: NVIDIA GeForce RTX 3060 (12GB VRAM)
Training Framework: Unsloth (4-bit quantization)
Training Time: ~32 minutes
Effective Batch Size: 8 (batch_size=1, gradient_accumulation=8)

Hyperparameters

Parameter	Value
LoRA Rank (r)	8
LoRA Alpha	8
Learning Rate	2e-4
Max Steps	300
Warmup Steps	10
Sequence Length	1,024
Optimizer	adamw_8bit
Weight Decay	0.01

Training Results

Final Loss: 1.656
Loss Reduction: 14% (from 1.924 to 1.656)
Epochs: 0.22

Dataset

The model was finetuned on 12,357 high-quality examples from two sources:

1. Gemini-3 Pro Dataset (10,031 examples)

Domain: Astrophysics
Difficulty: Extreme-level problems
Content: Complex astrophysical concepts including:
- Eddington Luminosity in Porous Atmospheres
- Electron Capture Supernovae (ECSN)
- Beta Cephei Pulsations
- Type Ia Supernova Progenitors
- Neutrino Oscillations
- CNO Cycle Branching
- Gravitational Radiation Reaction
- And more...

2. Distilled Corpus (2,326 examples)

Domains: Mathematics, coding, natural language inference
Features: Chain-of-thought reasoning with detailed solutions
Format: Problem → Thinking → Solution

Model Capabilities

This model excels at:

✅ Astrophysics problem-solving with step-by-step reasoning
✅ Complex scientific calculations and derivations
✅ Chain-of-thought reasoning for multi-step problems
✅ Mathematical reasoning with detailed explanations
✅ Technical documentation and analysis

Usage

With llama.cpp (Recommended)

llama-cli \
    -m qwen3.5-0.8b-astro-reasoning-v1.gguf \
    --chat-template chatml \
    -c 2048 \
    -n 512 \
    --temp 0.7 \
    -cnv

With Python (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "firmanda/qwen3.5-0.8b-astro-reasoning-v1"
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("firmanda/qwen3.5-0.8b-astro-reasoning-v1")

prompt = "Solve this astrophysics problem: What is the Eddington luminosity?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With Unsloth (For Further Training)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="firmanda/qwen3.5-0.8b-astro-reasoning-v1",
    max_seq_length=2048,
    load_in_4bit=True,
)

Example Outputs

Example 1: Astrophysics Problem

User: Calculate the Eddington luminosity for a 10 solar mass star.

Assistant: The Eddington luminosity is given by:

L_Edd = (4πGMc) / κ

Where:
- G = 6.674 × 10^-11 m^3 kg^-1 s^-2
- M = 10 M_☉ = 1.989 × 10^31 kg
- c = 3 × 10^8 m/s
- κ = 0.34 m^2/kg (Thomson scattering for ionized hydrogen)

Calculation:
L_Edd = (4π × 6.674 × 10^-11 × 1.989 × 10^31 × 3 × 10^8) / 0.34
L_Edd ≈ 1.26 × 10^32 W ≈ 3.28 × 10^5 L_☉

This is the maximum luminosity where radiation pressure doesn't overcome gravity.

Example 2: Chain-of-Thought Reasoning

User: A class of 12 students shares 108 oranges, but 36 are bad. 
How many fewer oranges per student?

Assistant: Let me solve this step by step:

1. Total oranges: 108
2. Bad oranges: 36
3. Good oranges: 108 - 36 = 72

Original distribution:
- 108 oranges / 12 students = 9 oranges each

New distribution:
- 72 oranges / 12 students = 6 oranges each

Difference: 9 - 6 = 3 oranges fewer per student.

Model Limitations

Context Window: Optimized for 1,024 tokens during training (can handle up to 262,144 for inference)
Domain Specificity: Best performance on astrophysics and scientific reasoning; may underperform on general chat
Factual Accuracy: While trained on scientific content, always verify critical calculations
Language: Primarily trained on English content
Reasoning Mode: Qwen3.5 0.8B operates in non-thinking mode by default

Evaluation

The model was evaluated on:

Training loss reduction: 14% improvement
Gradient norms remained stable throughout training
No signs of overfitting observed

Hardware Compatibility

Minimum Requirements:

Inference: 2GB VRAM (F16 GGUF)
Training: 8GB+ VRAM recommended

Tested On:

NVIDIA RTX 3060 12GB (training & inference)

Files Included

qwen3.5-0.8b-astro-reasoning-v1/
├── config.json                    # Model configuration
├── model.safetensors              # Model weights (LoRA adapters)
├── README.md                      # This file
├── qwen3.5-0.8b-astro-reasoning-v1.gguf  # GGUF format for llama.cpp
└── training_info.md               # Detailed training logs

Acknowledgments

Base Model: Qwen3.5-0.8B by Alibaba Cloud Qwen Team
Training Framework: Unsloth for efficient finetuning
GGUF Conversion: llama.cpp for optimized inference

License

This model is licensed under the Apache 2.0 License, same as the base Qwen3.5 model.

Last Updated: March 2026
Model Version: v1.0

Downloads last month: 28

GGUF

Model size

0.8B params

Architecture

qwen35

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for firmanda/Qwen3.5-0.8B-Astro-Math-LORA-R8-GGUF

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Adapter

(152)

this model