Chandra GPTQ 4-bit
Chandra-GPTQ-4bit is a 4-bit GPTQ-quantized version of datalab-to/chandra, optimized for efficient inference with vLLM on modern GPUs such as A100.
This model significantly reduces VRAM usage while maintaining strong OCR and document understanding performance.
π Model Overview
- Base Model: datalab-to/chandra
- Quantization: GPTQ 4-bit
- Precision: W4A16
- Group Size: 128
- Symmetric Quantization: Yes
- Format: GPTQ
- Primary Use: OCR / Image-to-Text / Document Understanding
β¨ Capabilities
Chandra is a multimodal OCR model capable of:
- Converting documents to Markdown, HTML, or JSON
- Preserving detailed layout information
- Extracting tables, forms, and structured data
- Handling handwriting
- Reconstructing checkboxes and form elements
- Extracting images and captions
- Supporting 40+ languages
- Understanding complex layouts (math, tables, diagrams)
π Usage with vLLM (Recommended)
Install vLLM:
pip install vllm
Start OpenAI-compatible API server:
python -m vllm.entrypoints.openai.api_server \
--host 0.0.0.0 \
--port $PORT \
--model kishlay9890/chandra-gptq-4bit \
--max-model-len 8192 \
--tensor-parallel-size 1 \
--gpu-memory-utilization $GPU_UTIL \
--max-num-batched-tokens 8192 \
--max-num-seqs 8 \
--seed 1234
πΎ Hardware Requirements
Recommended:
- NVIDIA A100 (40GB or higher)
- CUDA 12+
- 16β24GB VRAM minimum for inference
Compared to FP16 models, this 4-bit version significantly lowers memory requirements.
π License
Please refer to the original base model (datalab-to/chandra) for license details.
π Acknowledgements
- Base model: datalab-to/chandra
- Quantization: GPTQ
- Downloads last month
- 66
Model tree for kishlay9890/chandra-gptq-4bit
Base model
datalab-to/chandra