File size: 5,241 Bytes
8051f6e ecb5e11 0966fd1 ef5c452 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | ---
license: openrail
base_model:
- datalab-to/chandra
language:
- en
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- text-generation-inference
- vllm
- fp8
- quantized
- llm-compressor
- ocr
- vlm
---

# **chandra-FP8-Latest**
> **chandra-FP8-Latest** is an FP8-compressed evolution built on top of **datalab-to/chandra**. This variant leverages **BF16 · FP8 (F8_E4M3)** precision formats to significantly reduce memory footprint and improve inference efficiency while preserving the high-precision OCR and layout-aware reasoning capabilities of the original architecture.
> The result is a highly efficient document intelligence vision-language model optimized for complex parsing, structured output generation, and production-scale deployment.
> [!important]
> FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – [FP8 W8A8](https://docs.vllm.ai/en/stable/features/quantization/fp8/). Quantization W8A8 FP8-dynamic recipe – [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8).
## About the Base Model
**Chandra** from datalab-to is a state-of-the-art open-source OCR vision-language model designed for complex document parsing and high-fidelity layout reconstruction.
It excels at:
* **Handwriting Recognition** across diverse styles
* **Table Structure Preservation**, including merged and nested cells
* **Mathematical Equation Rendering** into clean LaTeX
* **Form Reconstruction** with checkboxes and radio buttons
* **Multi-Column Layout Parsing**
* **40+ Language Support**
* **Precise Bounding Box Extraction** for every text block, table, and image
Chandra outputs structured **Markdown, HTML, or JSON** with layout-aware coordinates, enabling seamless integration into document intelligence pipelines.
It handles challenging real-world inputs such as:
* Doctor notes
* Financial filings
* Invoices
* Textbooks
* Government forms
* Low-quality or messy scanned documents
## What FP8 Adds
The **chandra-FP8-Latest** variant introduces:
* **BF16 · FP8 (F8_E4M3) Compression**: Transformer Engine–based quantization reduces VRAM usage while maintaining OCR precision and layout fidelity.
* **Higher Throughput**: Faster document parsing at scale.
* **Lower Memory Footprint**: Improved deployment feasibility on Hopper-class and compatible GPUs.
* **Production Optimization**: Ideal for high-volume PDF ingestion and enterprise document processing.
## Deployment Support
Chandra supports:
* **Hugging Face Transformers** for local inference
* **vLLM server deployment** for high-throughput production environments
* Layout-aware prompts such as `"ocr_layout"`
* Configurable `max_output_tokens` up to **8192 per page**
* CLI workflows with environment-based configuration
* Page-range processing for PDFs
This makes it well-suited for enterprise-scale document AI systems.
## Quick Start with Transformers
```python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
# Load the FP8-compressed chandra model
model = Qwen3VLForConditionalGeneration.from_pretrained(
"prithivMLmods/chandra-FP8",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"prithivMLmods/chandra-FP8"
)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Analyze the fine-grained details in this image."},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(output_text)
```
## Intended Use
* High-precision OCR pipelines
* Financial and legal document processing
* Academic and textbook digitization
* Automated form parsing
* Enterprise document intelligence systems
* AI data ingestion pipelines
## License
Licensed under a modified **[OpenRAIL-M](https://huggingface.co/datalab-to/chandra/blob/main/LICENSE)** framework:
* Apache 2.0 for code
* Commercial restrictions for competitors exceeding $2M revenue
Please review the base model license terms before commercial deployment.
## Limitations & Considerations
* FP8 requires compatible GPU hardware for optimal acceleration.
* Extremely low-resolution or heavily degraded scans may still impact recognition quality.
* Users are responsible for ensuring lawful and compliant deployment in regulated environments. |