sarvam-30b-AWQ

Model Overview

  • Model Architecture: sarvamai/sarvam-30b
    • Input: Text
    • Output: Text
  • Model Optimizations:
    • Weight quantization: AWQ
  • Out-of-scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws).
  • Version: 1.0
  • Model Developers: QuantTrio

This model is quantized using llm-compressor. Calibration dataset sarvamai/indivibe Benchmarks compared to bf16 are provided.

Deployment

Use with vLLM

This model can be deployed efficiently using the vLLM backend.

1: Hot-patch (easy)

Run hotpatch_vllm.py This will do the following:

  • install vllm=0.15.0
  • add 2 model entries to registry.py
  • download the model executors for sarvam-105b and sarvam-30b

2: Run vLLM

export OMP_NUM_THREADS=4

vllm serve 
    __YOUR_PATH__/QuantTrio/sarvam-30b-AWQ \
    --served-model-name MY_MODEL \
    --swap-space 16 \
    --max-num-seqs 32 \
    --max-model-len 32768  \
    --gpu-memory-utilization 0.9 \
    --tensor-parallel-size 2 \
    --enable-auto-tool-choice \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 8000

Model Files

File Size Last Updated
26GiB 2026-03-12

Logs

2026-03-12
1. Initial commit

Evaluation

Benchmark Metric Config BF16 (Original) AWQ (4-bit) Diff Recovery
BBH exact_match 3-shot 63.22% 58.61% 馃敾 -4.61% 92.7%
GSM8K strict-match 5-shot (Direct) 72.40% 63.91% 馃敾 -8.49% 88.3%
GSM8K flexible-extract 5-shot (Direct) 69.90% 55.72% 馃敾 -14.18% 79.7%
GSM8K (CoT) strict-match 8-shot (CoT) 72.71% 76.80% 馃敽 +4.09% 105.6%
GSM8K (CoT) flexible-extract 8-shot (CoT) 82.41% 80.14% 馃敾 -2.27% 97.2%
MMLU acc 鈿狅笍 0-shot 43.40% 43.86% 馃敽 +0.46% 101.1%
ARC-Challenge acc 鈿狅笍 0-shot 29.10% 26.96% 馃敾 -2.14% 92.6%
HellaSwag acc 鈿狅笍 0-shot 40.67% 40.29% 馃敾 -0.38% 99.1%
HellaSwag acc_norm 鈿狅笍 0-shot 51.75% 50.27% 馃敾 -1.48% 97.1%
IFEval inst_level_strict 0-shot 32.85% 32.13% 馃敾 -0.72% 97.8%
TruthfulQA MC2 acc 0-shot 49.71% 50.75% 馃敽 +1.04% 102.1%
Winogrande acc 鈿狅笍 0-shot 51.14% 49.49% 馃敾 -1.65% 96.8%
Downloads last month
15
Safetensors
Model size
7B params
Tensor type
F32
I64
I32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for QuantTrio/sarvam-30b-AWQ

Quantized
(9)
this model