Instructions to use oracomputing/Qwen3.5-9B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use oracomputing/Qwen3.5-9B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="oracomputing/Qwen3.5-9B-GGUF",
	filename="Qwen3.5-9B-OQ-Q3_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use oracomputing/Qwen3.5-9B-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
# Run inference directly in the terminal:
llama cli -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
# Run inference directly in the terminal:
llama cli -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
# Run inference directly in the terminal:
./llama-cli -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M

Use Docker

docker model run hf.co/oracomputing/Qwen3.5-9B-GGUF:Q3_K_M

LM Studio
Jan

vLLM

How to use oracomputing/Qwen3.5-9B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "oracomputing/Qwen3.5-9B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "oracomputing/Qwen3.5-9B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/oracomputing/Qwen3.5-9B-GGUF:Q3_K_M

Ollama
How to use oracomputing/Qwen3.5-9B-GGUF with Ollama:
```
ollama run hf.co/oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
```

Unsloth Studio

How to use oracomputing/Qwen3.5-9B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for oracomputing/Qwen3.5-9B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for oracomputing/Qwen3.5-9B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for oracomputing/Qwen3.5-9B-GGUF to start chatting

How to use oracomputing/Qwen3.5-9B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "oracomputing/Qwen3.5-9B-GGUF:Q3_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use oracomputing/Qwen3.5-9B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default oracomputing/Qwen3.5-9B-GGUF:Q3_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use oracomputing/Qwen3.5-9B-GGUF with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "oracomputing/Qwen3.5-9B-GGUF:Q3_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use oracomputing/Qwen3.5-9B-GGUF with Docker Model Runner:
```
docker model run hf.co/oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
```

Lemonade

How to use oracomputing/Qwen3.5-9B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull oracomputing/Qwen3.5-9B-GGUF:Q3_K_M

Run and chat with the model

lemonade run user.Qwen3.5-9B-GGUF-Q3_K_M

List all available models

lemonade list

EVALUATION-ONLY ACCESS

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This is a private evaluation version of Qwen3.5-9B-GGUF (OraQuant).

By agreeing, you accept:

Internal testing only; no production use
No commercial use, redistribution, or reverse-engineering
Deletion of all files after evaluation
Full terms in LICENSE

Access is granted only to approved licensees.

Qwen3.5-9B-GGUF (OraQuant)

This repository contains GGUF builds of Qwen3.5-9B, quantized by Ora Computing with OraQuant (OQ) - Ora Computing's proprietary calibrated quantization. These are llama.cpp-compatible quantizations of Qwen/Qwen3.5-9B; the underlying weights are unchanged Qwen3.5-9B weights at reduced precision.

Text only. Qwen/Qwen3.5-9B is a multimodal model; these GGUFs contain only the language model (text input -> text output). The vision/video input encoders are not included.

Model Overview

Model name: Qwen3.5-9B-GGUF (OraQuant) Base model: Qwen/Qwen3.5-9B (Apache-2.0, Alibaba Cloud) - these are GGUF quantizations of it Parameters: ~9 billion (unchanged from the base model) Quantization: OraQuant (OQ) mixed-precision K-quant GGUFs produced by Ora Computing, provided in two footprints - OQ-Q4_K_M (higher quality) and OQ-Q3_K_M (smaller/faster). Not fine-tuned, not parameter-reduced: the model architecture and parameter count are identical to the base model; only the weight precision is reduced. Purpose: Evaluation/test-use only; optimized for local/offline inference and internal benchmarking. License: See LICENSE (Custom Model License Agreement).

Files in this repo

File	What it is	Size
`Qwen3.5-9B-OQ-Q4_K_M.gguf`	Language model, OraQuant Q4_K_M (higher quality)	~5.7 GB
`Qwen3.5-9B-OQ-Q3_K_M.gguf`	Language model, OraQuant Q3_K_M (smaller/faster)	~4.7 GB
`LICENSE`	Custom Model License Agreement	-

Usage

These GGUFs load with stock upstream llama.cpp (no patch required); use a build with Qwen3.5 support.

export MODEL=/path/to/Qwen3.5-9B-OQ-Q4_K_M.gguf   # or the Q3_K_M file

Interactive chat:

./build/bin/llama-cli -m "$MODEL" -ngl 99

Single-shot completion (-st runs one turn then exits):

./build/bin/llama-cli -m "$MODEL" -ngl 99 -st -p "Explain the Chudnovsky algorithm in two sentences."

OpenAI-compatible server (Web UI at http://localhost:8080):

./build/bin/llama-server -m "$MODEL" -ngl 99 \
  --served-model-name qwen3.5-9b --host 0.0.0.0 --port 8080

Qwen3.5 is a reasoning model; the chat template and thinking behaviour are carried in the GGUF.

Intended Use & Restrictions

Permitted use

Internal testing, benchmarking, and evaluation of the model by the named Licensee.
Exploration of model behaviours, prompt engineering, and non-production prototypes.

Prohibited use

Deployment in a production or commercial service, publicly-facing API, resale, or redistribution.
Fine-tuning or creating derivative models for production use without a separate agreement.
Reverse-engineering the quantization/calibration used to produce these files.
Disclosure or sharing of the model (or its weights) to third parties beyond the named Licensee.

Out-of-scope use

Use in regulated or safety-critical contexts (unless separately permitted).
Any use that violates the Apache License, Version 2.0 under which the upstream model is distributed.

Quantization

Method: OraQuant (OQ), Ora Computing's proprietary calibrated quantization. The released files are mixed-precision K-quant GGUFs.
No fine-tuning: the weights are the original Qwen/Qwen3.5-9B weights; no additional training was performed.
No parameter-count change: the architecture and ~9B parameter count are unchanged; only weight precision is reduced.
Footprints: OQ-Q4_K_M for higher quality, OQ-Q3_K_M for a smaller/faster footprint.

Limitations & Risks

Quantized models may not replicate the full behaviour of the base model under all prompt categories, particularly domain-specific or rare inputs.
The model is provided as-is for testing only and is not certified for production use.
Users should validate outputs carefully and monitor for bias or unintended behaviours.

Upstream Attribution

This model is derived from the Qwen3.5-9B model released by Alibaba Cloud under the Apache License, Version 2.0.

"Copyright 2025 Alibaba Cloud. Licensed under the Apache License, Version 2.0."

For full terms, see: https://huggingface.co/Qwen/Qwen3.5-9B/blob/main/LICENSE Apache License, Version 2.0: https://www.apache.org/licenses/LICENSE-2.0

Contact & Support

For licensing inquiries or to request extended evaluation rights, please contact: info@oracomputing.com

Repository and model access are regulated. Do not redistribute or share without explicit written permission from Ora Computing.

Downloads last month: 17

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

3-bit

4-bit

Model tree for oracomputing/Qwen3.5-9B-GGUF

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Quantized

(338)

this model