Chandra FP8

FP8 quantized version of datalab-to/chandra for efficient inference with vLLM.

Quantization

Method: FP8 Dynamic (W8A8)
Tool: llmcompressor
Scheme: Static per-channel weights, dynamic per-token activations
Ignored layers: lm_head, visual.*

Usage with vLLM

from vllm import LLM

llm = LLM("richarddavison/chandra-fp8")

Original Model

Chandra is an OCR model that outputs markdown, HTML, and JSON. It is highly accurate at extracting text from images and PDFs, while preserving layout information.

Features

Convert documents to markdown, html, or json with detailed layout information
Good handwriting support
Reconstructs forms accurately, including checkboxes
Good support for tables, math, and complex layouts
Extracts images and diagrams, with captions and structured data
Support for 40+ languages

Benchmarks

Model	Overall
Datalab Chandra v0.1.0	83.1
olmOCR v0.3.0	78.5
dots.ocr	79.1
Mistral OCR API	72.0
GPT-4o (Anchored)	69.9

See the original model card for full details.

Credits

Original model by Datalab.

Downloads last month: 4

Safetensors

Model size

9B params

Tensor type

BF16

F8_E4M3

Model tree for richarddavison/chandra-fp8

Base model

datalab-to/chandra

Quantized

(12)

this model