You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

DeepSeek-V2-Lite-INT8 (block-wise GPTQ, modelopt)

INT8 post-training quantization of deepseek-ai/DeepSeek-V2-Lite using the Runaraai block-wise GPTQ pipeline.

This checkpoint is Stage 7 modelopt format — packed int8 weights + per-channel float32 scales, with quantization_config for vLLM / SGLang / TensorRT-LLM. (Stage 5 alone stores GPTQ-tuned values as BF16 on disk.)

Quantization

Setting Value
Method GPTQ, block-wise
Storage INT8 symmetric + per-channel weight_scale (modelopt)
Calibration C4, 512 samples × 4096 tokens
Parallel Hessian ON
Finished pack 2026-06-30 UTC

Quality (Δppl vs BF16 baseline, Stage 5/6 eval)

Dataset BF16 ppl INT8 ppl Δppl
WikiText-2 7.224 7.225 +0.001
C4 11.409 11.403 −0.006

Usage (vLLM)

from vllm import LLM
llm = LLM("abanerjee10/DeepSeek-V2-Lite-INT8", trust_remote_code=True)

Weights on disk are int8, not BF16 — expect ~half the weight memory vs the BF16 source.

Provenance

  • Quantized by: Aranya @ Runara
  • Pipeline: Stages 5 (GPTQ) + 7 (modelopt pack)
Downloads last month
6
Safetensors
Model size
16B params
Tensor type
BF16
·
I8
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abanerjee10/DeepSeek-V2-Lite-INT8

Quantized
(25)
this model