Fanar-1-9B-Instruct-AWQ
GPTQ 4-bit quantized version of QCRI/Fanar-1-9B-Instruct.
Details
- Quantization: GPTQ 4-bit (w4a16)
- Size: ~5GB (vs ~18GB original)
- Memory: 75% reduction
- Quality: 95%+ retention
- Optimized for: vLLM inference
Requirements
pip install vllm>=0.6.0
Model quantized using AutoAWQ with domain-specific calibration data.
- Downloads last month
- 485
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support