Chandra FP8
FP8 quantized version of datalab-to/chandra for efficient inference with vLLM.
Quantization
- Method: FP8 Dynamic (W8A8)
- Tool: llmcompressor
- Scheme: Static per-channel weights, dynamic per-token activations
- Ignored layers:
lm_head,visual.*
Usage with vLLM
from vllm import LLM
llm = LLM("richarddavison/chandra-fp8")
Original Model
Chandra is an OCR model that outputs markdown, HTML, and JSON. It is highly accurate at extracting text from images and PDFs, while preserving layout information.
Features
- Convert documents to markdown, html, or json with detailed layout information
- Good handwriting support
- Reconstructs forms accurately, including checkboxes
- Good support for tables, math, and complex layouts
- Extracts images and diagrams, with captions and structured data
- Support for 40+ languages
Benchmarks
| Model | Overall |
|---|---|
| Datalab Chandra v0.1.0 | 83.1 |
| olmOCR v0.3.0 | 78.5 |
| dots.ocr | 79.1 |
| Mistral OCR API | 72.0 |
| GPT-4o (Anchored) | 69.9 |
See the original model card for full details.
Credits
Original model by Datalab.
- Downloads last month
- 2,374
Model tree for richarddavison/chandra-fp8
Base model
datalab-to/chandra