sarvam-105b-AWQ
Model Overview
- Model Architecture: sarvamai/sarvam-105b
- Input: Text
- Output: Text
- Model Optimizations:
- Weight quantization: AWQ
- Out-of-scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws).
- Version: 1.0
- Model Developers: QuantTrio
This model is quantized using llm-compressor. Calibration dataset sarvamai/indivibe
Deployment
Use with vLLM
This model can be deployed efficiently using the vLLM backend.
1: Hot-patch (easy)
Run hotpatch_vllm.py
This will do the following:
- install vllm=0.15.0
- add 2 model entries to registry.py
- download the model executors for sarvam-105b
2: Run vLLM
export OMP_NUM_THREADS=4
vllm serve
__YOUR_PATH__/QuantTrio/sarvam-105b-AWQ \
--served-model-name MY_MODEL \
--swap-space 16 \
--max-num-seqs 32 \
--max-model-len 32768 \
--gpu-memory-utilization 0.9 \
--tensor-parallel-size 4 \
--enable-auto-tool-choice \
--trust-remote-code \
--host 0.0.0.0 \
--port 8000
Model Files
| File Size | Last Updated |
|---|---|
74GiB |
2026-03-12 |
Logs
2026-03-12
1. Initial commit
- Downloads last month
- -
Model tree for QuantTrio/sarvam-105b-AWQ
Base model
sarvamai/sarvam-105b