Qwen3.5-4B-W8A8-Dynamic
W8A8 (INT8) quantization of Qwen/Qwen3.5-4B in the compressed-tensors format, ready to serve with vLLM.
- Weights: INT8, per-channel, symmetric (static)
- Activations: INT8, per-token, dynamic
lm_headand the vision tower are kept in BF16.
vLLM auto-detects the compressed-tensors quantization_config and serves it through its
CompressedTensorsW8A8Int8 scheme.
Serving with vLLM
from vllm import LLM, SamplingParams
llm = LLM(
model="NotaMG/Qwen3.5-4B-W8A8-Dynamic",
trust_remote_code=True,
dtype="bfloat16",
)
out = llm.generate(["The capital of France is"], SamplingParams(temperature=0.0, max_tokens=32))
print(out[0].outputs[0].text)
License
This model is derived from Qwen/Qwen3.5-4B; refer to the base model for license terms.
- Downloads last month
- 24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support