sarvam-30b-AWQ
Model Overview
- Model Architecture: sarvamai/sarvam-30b
- Input: Text
- Output: Text
- Model Optimizations:
- Weight quantization: AWQ
- Out-of-scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws).
- Version: 1.0
- Model Developers: QuantTrio
This model is quantized using llm-compressor. Calibration dataset sarvamai/indivibe Benchmarks compared to bf16 are provided.
Deployment
Use with vLLM
This model can be deployed efficiently using the vLLM backend.
1: Hot-patch (easy)
Run hotpatch_vllm.py
This will do the following:
- install vllm=0.15.0
- add 2 model entries to registry.py
- download the model executors for sarvam-105b and sarvam-30b
2: Run vLLM
export OMP_NUM_THREADS=4
vllm serve
__YOUR_PATH__/QuantTrio/sarvam-30b-AWQ \
--served-model-name MY_MODEL \
--swap-space 16 \
--max-num-seqs 32 \
--max-model-len 32768 \
--gpu-memory-utilization 0.9 \
--tensor-parallel-size 2 \
--enable-auto-tool-choice \
--trust-remote-code \
--host 0.0.0.0 \
--port 8000
Model Files
| File Size | Last Updated |
|---|---|
26GiB |
2026-03-12 |
Logs
2026-03-12
1. Initial commit
Evaluation
| Benchmark | Metric | Config | BF16 (Original) | AWQ (4-bit) | Diff | Recovery |
|---|---|---|---|---|---|---|
| BBH | exact_match | 3-shot | 63.22% | 58.61% | 馃敾 -4.61% | 92.7% |
| GSM8K | strict-match | 5-shot (Direct) | 72.40% | 63.91% | 馃敾 -8.49% | 88.3% |
| GSM8K | flexible-extract | 5-shot (Direct) | 69.90% | 55.72% | 馃敾 -14.18% | 79.7% |
| GSM8K (CoT) | strict-match | 8-shot (CoT) | 72.71% | 76.80% | 馃敽 +4.09% | 105.6% |
| GSM8K (CoT) | flexible-extract | 8-shot (CoT) | 82.41% | 80.14% | 馃敾 -2.27% | 97.2% |
| MMLU | acc | 鈿狅笍 0-shot | 43.40% | 43.86% | 馃敽 +0.46% | 101.1% |
| ARC-Challenge | acc | 鈿狅笍 0-shot | 29.10% | 26.96% | 馃敾 -2.14% | 92.6% |
| HellaSwag | acc | 鈿狅笍 0-shot | 40.67% | 40.29% | 馃敾 -0.38% | 99.1% |
| HellaSwag | acc_norm | 鈿狅笍 0-shot | 51.75% | 50.27% | 馃敾 -1.48% | 97.1% |
| IFEval | inst_level_strict | 0-shot | 32.85% | 32.13% | 馃敾 -0.72% | 97.8% |
| TruthfulQA MC2 | acc | 0-shot | 49.71% | 50.75% | 馃敽 +1.04% | 102.1% |
| Winogrande | acc | 鈿狅笍 0-shot | 51.14% | 49.49% | 馃敾 -1.65% | 96.8% |
- Downloads last month
- 15
Model tree for QuantTrio/sarvam-30b-AWQ
Base model
sarvamai/sarvam-30b