sarvam-30b-AWQ

Model Overview

Model Architecture: sarvamai/sarvam-30b
- Input: Text
- Output: Text
Model Optimizations:
- Weight quantization: AWQ
Out-of-scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws).
Version: 1.0
Model Developers: QuantTrio

This model is quantized using llm-compressor. Calibration dataset sarvamai/indivibe Benchmarks compared to bf16 are provided.

Deployment

Use with vLLM

This model can be deployed efficiently using the vLLM backend.

1: Hot-patch (easy)

Run hotpatch_vllm.py This will do the following:

install vllm=0.15.0
add 2 model entries to registry.py
download the model executors for sarvam-105b and sarvam-30b

2: Run vLLM

export OMP_NUM_THREADS=4

vllm serve 
    __YOUR_PATH__/QuantTrio/sarvam-30b-AWQ \
    --served-model-name MY_MODEL \
    --swap-space 16 \
    --max-num-seqs 32 \
    --max-model-len 32768  \
    --gpu-memory-utilization 0.9 \
    --tensor-parallel-size 2 \
    --enable-auto-tool-choice \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 8000

Model Files

File Size	Last Updated
`26GiB`	`2026-03-12`

Logs

2026-03-12
1. Initial commit

Evaluation

Benchmark	Metric	Config	BF16 (Original)	AWQ (4-bit)	Diff	Recovery
BBH	exact_match	3-shot	63.22%	58.61%	🔻 -4.61%	92.7%
GSM8K	strict-match	5-shot (Direct)	72.40%	63.91%	🔻 -8.49%	88.3%
GSM8K	flexible-extract	5-shot (Direct)	69.90%	55.72%	🔻 -14.18%	79.7%
GSM8K (CoT)	strict-match	8-shot (CoT)	72.71%	76.80%	🔺 +4.09%	105.6%
GSM8K (CoT)	flexible-extract	8-shot (CoT)	82.41%	80.14%	🔻 -2.27%	97.2%
MMLU	acc	⚠️ 0-shot	43.40%	43.86%	🔺 +0.46%	101.1%
ARC-Challenge	acc	⚠️ 0-shot	29.10%	26.96%	🔻 -2.14%	92.6%
HellaSwag	acc	⚠️ 0-shot	40.67%	40.29%	🔻 -0.38%	99.1%
HellaSwag	acc_norm	⚠️ 0-shot	51.75%	50.27%	🔻 -1.48%	97.1%
IFEval	inst_level_strict	0-shot	32.85%	32.13%	🔻 -0.72%	97.8%
TruthfulQA MC2	acc	0-shot	49.71%	50.75%	🔺 +1.04%	102.1%
Winogrande	acc	⚠️ 0-shot	51.14%	49.49%	🔻 -1.65%	96.8%

Downloads last month: 615

Safetensors

Model size

7B params

Tensor type

F32

I64

I32

Model tree for QuantTrio/sarvam-30b-AWQ

Base model

sarvamai/sarvam-30b

Quantized

(28)

this model