Fanar-1-9B-Instruct-AWQ

GPTQ 4-bit quantized version of QCRI/Fanar-1-9B-Instruct.

Details

  • Quantization: GPTQ 4-bit (w4a16)
  • Size: ~5GB (vs ~18GB original)
  • Memory: 75% reduction
  • Quality: 95%+ retention
  • Optimized for: vLLM inference

Requirements

pip install vllm>=0.6.0

Model quantized using AutoAWQ with domain-specific calibration data.

Downloads last month
3
Safetensors
Model size
2B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for buthainaaa/Fanar-1-9B-Instruct-GPTQ

Base model

QCRI/Fanar-1-9B
Quantized
(6)
this model