Qwen3.5-27B Opus v2 - Reasoning Model

Base Model: Qwen3.5-27B BF16
Architecture: Qwen3.5 MoE
Purpose: Complex reasoning, chain-of-thought, math, science, analysis
Context: 262K tokens

Recommended Settings

Thinking Mode (Default - Recommended)

Use for: Math, science, logic, complex reasoning, multi-step problems

--temp 0.6 --top-p 0.95 --top-k 20 --min-p 0 --reasoning on

Non-Thinking Mode

Use for: Creative writing, casual chat, fast responses

--temp 0.7 --top-p 0.8 --top-k 20 --min-p 0

Quantization Guide

Quant Size Quality Use Case
Q4_K_M 16GB ⭐⭐⭐⭐⭐ Daily use (recommended)
Q5_K_M 18GB ⭐⭐⭐⭐⭐ High quality tasks
Q6_K 22GB ⭐⭐⭐⭐⭐⭐ Near-lossless
Q8_0 28GB ⭐⭐⭐⭐⭐⭐⭐ Maximum quality
BF16 54GB ⭐⭐⭐⭐⭐⭐⭐ Original quality

Quick Start

llama-server -m Qwen3.5-27B-Opus-v2-Q4_K_M.gguf   --ctx-size 262144 --temp 0.6 --top-p 0.95 --top-k 20 --reasoning on

Vision Support

Use with mmproj for multimodal capabilities:

--mmproj mmproj-Qwen3.5-27B-Opus-v2-f16.gguf

Model Variants

  • Opus v2 (this repo) - Standard 262K context
  • Opus v2 YaRN 1M - Extended 1M context (separate repo)
Downloads last month
76
GGUF
Model size
25B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support