| | --- |
| | license: openrail |
| | base_model: |
| | - datalab-to/chandra |
| | language: |
| | - en |
| | pipeline_tag: image-text-to-text |
| | library_name: transformers |
| | tags: |
| | - text-generation-inference |
| | - vllm |
| | - fp8 |
| | - quantized |
| | - llm-compressor |
| | - ocr |
| | - vlm |
| | --- |
| | |
| |  |
| |
|
| | # **chandra-FP8-Latest** |
| |
|
| | > **chandra-FP8-Latest** is an FP8-compressed evolution built on top of **datalab-to/chandra**. This variant leverages **BF16 · FP8 (F8_E4M3)** precision formats to significantly reduce memory footprint and improve inference efficiency while preserving the high-precision OCR and layout-aware reasoning capabilities of the original architecture. |
| | > The result is a highly efficient document intelligence vision-language model optimized for complex parsing, structured output generation, and production-scale deployment. |
| | |
| | > [!important] |
| | > FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – [FP8 W8A8](https://docs.vllm.ai/en/stable/features/quantization/fp8/). Quantization W8A8 FP8-dynamic recipe – [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8). |
| | |
| | ## About the Base Model |
| | |
| | **Chandra** from datalab-to is a state-of-the-art open-source OCR vision-language model designed for complex document parsing and high-fidelity layout reconstruction. |
| | |
| | It excels at: |
| | |
| | * **Handwriting Recognition** across diverse styles |
| | * **Table Structure Preservation**, including merged and nested cells |
| | * **Mathematical Equation Rendering** into clean LaTeX |
| | * **Form Reconstruction** with checkboxes and radio buttons |
| | * **Multi-Column Layout Parsing** |
| | * **40+ Language Support** |
| | * **Precise Bounding Box Extraction** for every text block, table, and image |
| | |
| | Chandra outputs structured **Markdown, HTML, or JSON** with layout-aware coordinates, enabling seamless integration into document intelligence pipelines. |
| | |
| | It handles challenging real-world inputs such as: |
| | |
| | * Doctor notes |
| | * Financial filings |
| | * Invoices |
| | * Textbooks |
| | * Government forms |
| | * Low-quality or messy scanned documents |
| | |
| | ## What FP8 Adds |
| | |
| | The **chandra-FP8-Latest** variant introduces: |
| | |
| | * **BF16 · FP8 (F8_E4M3) Compression**: Transformer Engine–based quantization reduces VRAM usage while maintaining OCR precision and layout fidelity. |
| | * **Higher Throughput**: Faster document parsing at scale. |
| | * **Lower Memory Footprint**: Improved deployment feasibility on Hopper-class and compatible GPUs. |
| | * **Production Optimization**: Ideal for high-volume PDF ingestion and enterprise document processing. |
| |
|
| | ## Deployment Support |
| |
|
| | Chandra supports: |
| |
|
| | * **Hugging Face Transformers** for local inference |
| | * **vLLM server deployment** for high-throughput production environments |
| | * Layout-aware prompts such as `"ocr_layout"` |
| | * Configurable `max_output_tokens` up to **8192 per page** |
| | * CLI workflows with environment-based configuration |
| | * Page-range processing for PDFs |
| |
|
| | This makes it well-suited for enterprise-scale document AI systems. |
| |
|
| | ## Quick Start with Transformers |
| |
|
| | ```python |
| | from transformers import Qwen3VLForConditionalGeneration, AutoProcessor |
| | from qwen_vl_utils import process_vision_info |
| | import torch |
| | |
| | # Load the FP8-compressed chandra model |
| | model = Qwen3VLForConditionalGeneration.from_pretrained( |
| | "prithivMLmods/chandra-FP8", |
| | torch_dtype="auto", |
| | device_map="auto" |
| | ) |
| | |
| | processor = AutoProcessor.from_pretrained( |
| | "prithivMLmods/chandra-FP8" |
| | ) |
| | |
| | messages = [ |
| | { |
| | "role": "user", |
| | "content": [ |
| | { |
| | "type": "image", |
| | "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg", |
| | }, |
| | {"type": "text", "text": "Analyze the fine-grained details in this image."}, |
| | ], |
| | } |
| | ] |
| | |
| | text = processor.apply_chat_template( |
| | messages, tokenize=False, add_generation_prompt=True |
| | ) |
| | |
| | image_inputs, video_inputs = process_vision_info(messages) |
| | |
| | inputs = processor( |
| | text=[text], |
| | images=image_inputs, |
| | videos=video_inputs, |
| | padding=True, |
| | return_tensors="pt", |
| | ).to("cuda") |
| | |
| | generated_ids = model.generate(**inputs, max_new_tokens=256) |
| | |
| | generated_ids_trimmed = [ |
| | out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) |
| | ] |
| | |
| | output_text = processor.batch_decode( |
| | generated_ids_trimmed, |
| | skip_special_tokens=True, |
| | clean_up_tokenization_spaces=False |
| | ) |
| | |
| | print(output_text) |
| | ``` |
| |
|
| | ## Intended Use |
| |
|
| | * High-precision OCR pipelines |
| | * Financial and legal document processing |
| | * Academic and textbook digitization |
| | * Automated form parsing |
| | * Enterprise document intelligence systems |
| | * AI data ingestion pipelines |
| |
|
| | ## License |
| |
|
| | Licensed under a modified **[OpenRAIL-M](https://huggingface.co/datalab-to/chandra/blob/main/LICENSE)** framework: |
| |
|
| | * Apache 2.0 for code |
| | * Commercial restrictions for competitors exceeding $2M revenue |
| |
|
| | Please review the base model license terms before commercial deployment. |
| |
|
| | ## Limitations & Considerations |
| |
|
| | * FP8 requires compatible GPU hardware for optimal acceleration. |
| | * Extremely low-resolution or heavily degraded scans may still impact recognition quality. |
| | * Users are responsible for ensuring lawful and compliant deployment in regulated environments. |