Instructions to use NeuralNet-Hub/Qwen3.6-27B-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NeuralNet-Hub/Qwen3.6-27B-NVFP4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="NeuralNet-Hub/Qwen3.6-27B-NVFP4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("NeuralNet-Hub/Qwen3.6-27B-NVFP4")
model = AutoModelForMultimodalLM.from_pretrained("NeuralNet-Hub/Qwen3.6-27B-NVFP4", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use NeuralNet-Hub/Qwen3.6-27B-NVFP4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NeuralNet-Hub/Qwen3.6-27B-NVFP4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NeuralNet-Hub/Qwen3.6-27B-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/NeuralNet-Hub/Qwen3.6-27B-NVFP4

SGLang

How to use NeuralNet-Hub/Qwen3.6-27B-NVFP4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "NeuralNet-Hub/Qwen3.6-27B-NVFP4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NeuralNet-Hub/Qwen3.6-27B-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "NeuralNet-Hub/Qwen3.6-27B-NVFP4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NeuralNet-Hub/Qwen3.6-27B-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use NeuralNet-Hub/Qwen3.6-27B-NVFP4 with Docker Model Runner:
```
docker model run hf.co/NeuralNet-Hub/Qwen3.6-27B-NVFP4
```

NeuralNet is a pioneering AI solutions provider that empowers businesses to harness the power of artificial intelligence.

🌟 Qwen3.6-27B NVFP4 Quantization by NeuralNet 🧠🤖

This is an NVFP4-quantized version of Qwen/Qwen3.6-27B, optimized for deployment on NVIDIA Blackwell architecture GPUs using vLLM.

NVFP4 quantization requires NVIDIA Blackwell architecture (GB200, RTX 5000 series, etc.). This format is not compatible with Ampere, Ada Lovelace, or Hopper GPUs. If you are running on an older GPU, please use a different quantization format.

Original model: https://huggingface.co/Qwen/Qwen3.6-27B

⚡ Deployment with vLLM

This quantized model is intended to be served using vLLM (vllm>=0.9.0 recommended).

Quick Start

vllm serve NeuralNet-Hub/Qwen3.6-27B-NVFP4 \
  --quantization nvfp4 \
  --dtype bfloat16 \
  --kv-cache-dtype fp8 \
  --max-model-len 262144 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder

Using a Config File

# Deploy with: vllm serve --config config.yaml

# Model config
model: NeuralNet-Hub/Qwen3.6-27B-NVFP4
dtype: bfloat16
kv-cache-dtype: fp8
gpu-memory-utilization: 0.95
max-model-len: 262144
max-num-batched-tokens: 4096
max-num-seqs: 200
max-cudagraph-capture-size: 209
enable-prefix-caching: true
trust-remote-code: true

# template parser
reasoning-parser: qwen3
enable-auto-tool-choice: true
tool-call-parser: qwen3_coder

# Optional
default-chat-template-kwargs: '{"enable_thinking": false}'
download-dir: /workspace/models
host: 0.0.0.0
port: 18000

vllm serve --config config.yaml

💬 Chat API Usage

Once you deploy your model using vLLM you can chat qwith Qwen3.6 with chat template compatible with OpenAI-format APIs. Thinking mode is enabled by default.

Thinking Mode (Default)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:18000/v1", api_key="EMPTY")

messages = [{"role": "user", "content": "Your message here"}]

response = client.chat.completions.create(
    model="NeuralNet-Hub/Qwen3.6-27B-NVFP4",
    messages=messages,
    max_tokens=32768,
    temperature=1.0,
    top_p=0.95,
    extra_body={"top_k": 20},
)
print(response.choices[0].message.content)

Non-Thinking (Instruct) Mode

response = client.chat.completions.create(
    model="NeuralNet-Hub/Qwen3.6-27B-NVFP4",
    messages=messages,
    max_tokens=8192,
    temperature=0.7,
    top_p=0.8,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
        "chat_template_kwargs": {"enable_thinking": False},
    },
)

Image Input

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
            {"type": "text", "text": "Describe this image in detail."}
        ]
    }
]

response = client.chat.completions.create(
    model="NeuralNet-Hub/Qwen3.6-27B-NVFP4",
    messages=messages,
    max_tokens=32768,
    temperature=1.0,
    top_p=0.95,
    extra_body={"top_k": 20},
)

⚙️ Recommended Sampling Parameters

Mode	temperature	top_p	top_k	presence_penalty
Thinking — general tasks	1.0	0.95	20	0.0
Thinking — precise coding	0.6	0.95	20	0.0
Instruct (non-thinking)	0.7	0.80	20	1.5

🔧 Hardware Requirements

Component	Requirement
GPU Architecture	NVIDIA Blackwell (sm_100+)
VRAM	24 GB+ recommended
CUDA	12.8+
vLLM	0.9.0+

NVFP4 is exclusively supported on NVIDIA Blackwell GPUs. Attempting to run this model on Ampere (A100), Ada Lovelace (RTX 4000), or Hopper (H100) will fail. For those architectures, use the original BF16 model or an AWQ/GPTQ quantized variant.

🌐 Contact Us

NeuralNet is a pioneering AI solutions provider that empowers businesses to harness the power of artificial intelligence.

Website: https://neuralnet.solutions
Email: info[at]neuralnet.solutions

Downloads last month: 277

Safetensors

Model size

19B params

Tensor type

F32

BF16

F8_E4M3

Model tree for NeuralNet-Hub/Qwen3.6-27B-NVFP4

Base model

Qwen/Qwen3.6-27B

Quantized

(673)

this model