Chandra FP8

FP8 quantized version of datalab-to/chandra for efficient inference with vLLM.

Quantization

  • Method: FP8 Dynamic (W8A8)
  • Tool: llmcompressor
  • Scheme: Static per-channel weights, dynamic per-token activations
  • Ignored layers: lm_head, visual.*

Usage with vLLM

from vllm import LLM

llm = LLM("richarddavison/chandra-fp8")

Original Model

Chandra is an OCR model that outputs markdown, HTML, and JSON. It is highly accurate at extracting text from images and PDFs, while preserving layout information.

Features

  • Convert documents to markdown, html, or json with detailed layout information
  • Good handwriting support
  • Reconstructs forms accurately, including checkboxes
  • Good support for tables, math, and complex layouts
  • Extracts images and diagrams, with captions and structured data
  • Support for 40+ languages

Benchmarks

Model Overall
Datalab Chandra v0.1.0 83.1
olmOCR v0.3.0 78.5
dots.ocr 79.1
Mistral OCR API 72.0
GPT-4o (Anchored) 69.9

See the original model card for full details.

Credits

Original model by Datalab.

Downloads last month
2,374
Safetensors
Model size
9B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for richarddavison/chandra-fp8

Base model

datalab-to/chandra
Quantized
(9)
this model