chandra-FP8-Latest / README.md
prithivMLmods's picture
Update README.md
ef5c452 verified
---
license: openrail
base_model:
- datalab-to/chandra
language:
- en
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- text-generation-inference
- vllm
- fp8
- quantized
- llm-compressor
- ocr
- vlm
---
![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/D6NaXVEc9diE1NThfi1RK.png)
# **chandra-FP8-Latest**
> **chandra-FP8-Latest** is an FP8-compressed evolution built on top of **datalab-to/chandra**. This variant leverages **BF16 · FP8 (F8_E4M3)** precision formats to significantly reduce memory footprint and improve inference efficiency while preserving the high-precision OCR and layout-aware reasoning capabilities of the original architecture.
> The result is a highly efficient document intelligence vision-language model optimized for complex parsing, structured output generation, and production-scale deployment.
> [!important]
> FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – [FP8 W8A8](https://docs.vllm.ai/en/stable/features/quantization/fp8/). Quantization W8A8 FP8-dynamic recipe – [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8).
## About the Base Model
**Chandra** from datalab-to is a state-of-the-art open-source OCR vision-language model designed for complex document parsing and high-fidelity layout reconstruction.
It excels at:
* **Handwriting Recognition** across diverse styles
* **Table Structure Preservation**, including merged and nested cells
* **Mathematical Equation Rendering** into clean LaTeX
* **Form Reconstruction** with checkboxes and radio buttons
* **Multi-Column Layout Parsing**
* **40+ Language Support**
* **Precise Bounding Box Extraction** for every text block, table, and image
Chandra outputs structured **Markdown, HTML, or JSON** with layout-aware coordinates, enabling seamless integration into document intelligence pipelines.
It handles challenging real-world inputs such as:
* Doctor notes
* Financial filings
* Invoices
* Textbooks
* Government forms
* Low-quality or messy scanned documents
## What FP8 Adds
The **chandra-FP8-Latest** variant introduces:
* **BF16 · FP8 (F8_E4M3) Compression**: Transformer Engine–based quantization reduces VRAM usage while maintaining OCR precision and layout fidelity.
* **Higher Throughput**: Faster document parsing at scale.
* **Lower Memory Footprint**: Improved deployment feasibility on Hopper-class and compatible GPUs.
* **Production Optimization**: Ideal for high-volume PDF ingestion and enterprise document processing.
## Deployment Support
Chandra supports:
* **Hugging Face Transformers** for local inference
* **vLLM server deployment** for high-throughput production environments
* Layout-aware prompts such as `"ocr_layout"`
* Configurable `max_output_tokens` up to **8192 per page**
* CLI workflows with environment-based configuration
* Page-range processing for PDFs
This makes it well-suited for enterprise-scale document AI systems.
## Quick Start with Transformers
```python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
# Load the FP8-compressed chandra model
model = Qwen3VLForConditionalGeneration.from_pretrained(
"prithivMLmods/chandra-FP8",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"prithivMLmods/chandra-FP8"
)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Analyze the fine-grained details in this image."},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(output_text)
```
## Intended Use
* High-precision OCR pipelines
* Financial and legal document processing
* Academic and textbook digitization
* Automated form parsing
* Enterprise document intelligence systems
* AI data ingestion pipelines
## License
Licensed under a modified **[OpenRAIL-M](https://huggingface.co/datalab-to/chandra/blob/main/LICENSE)** framework:
* Apache 2.0 for code
* Commercial restrictions for competitors exceeding $2M revenue
Please review the base model license terms before commercial deployment.
## Limitations & Considerations
* FP8 requires compatible GPU hardware for optimal acceleration.
* Extremely low-resolution or heavily degraded scans may still impact recognition quality.
* Users are responsible for ensuring lawful and compliant deployment in regulated environments.