--- license: openrail base_model: - datalab-to/chandra language: - en pipeline_tag: image-text-to-text library_name: transformers tags: - text-generation-inference - vllm - fp8 - quantized - llm-compressor - ocr - vlm --- ![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/D6NaXVEc9diE1NThfi1RK.png) # **chandra-FP8-Latest** > **chandra-FP8-Latest** is an FP8-compressed evolution built on top of **datalab-to/chandra**. This variant leverages **BF16 · FP8 (F8_E4M3)** precision formats to significantly reduce memory footprint and improve inference efficiency while preserving the high-precision OCR and layout-aware reasoning capabilities of the original architecture. > The result is a highly efficient document intelligence vision-language model optimized for complex parsing, structured output generation, and production-scale deployment. > [!important] > FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – [FP8 W8A8](https://docs.vllm.ai/en/stable/features/quantization/fp8/). Quantization W8A8 FP8-dynamic recipe – [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8). ## About the Base Model **Chandra** from datalab-to is a state-of-the-art open-source OCR vision-language model designed for complex document parsing and high-fidelity layout reconstruction. It excels at: * **Handwriting Recognition** across diverse styles * **Table Structure Preservation**, including merged and nested cells * **Mathematical Equation Rendering** into clean LaTeX * **Form Reconstruction** with checkboxes and radio buttons * **Multi-Column Layout Parsing** * **40+ Language Support** * **Precise Bounding Box Extraction** for every text block, table, and image Chandra outputs structured **Markdown, HTML, or JSON** with layout-aware coordinates, enabling seamless integration into document intelligence pipelines. It handles challenging real-world inputs such as: * Doctor notes * Financial filings * Invoices * Textbooks * Government forms * Low-quality or messy scanned documents ## What FP8 Adds The **chandra-FP8-Latest** variant introduces: * **BF16 · FP8 (F8_E4M3) Compression**: Transformer Engine–based quantization reduces VRAM usage while maintaining OCR precision and layout fidelity. * **Higher Throughput**: Faster document parsing at scale. * **Lower Memory Footprint**: Improved deployment feasibility on Hopper-class and compatible GPUs. * **Production Optimization**: Ideal for high-volume PDF ingestion and enterprise document processing. ## Deployment Support Chandra supports: * **Hugging Face Transformers** for local inference * **vLLM server deployment** for high-throughput production environments * Layout-aware prompts such as `"ocr_layout"` * Configurable `max_output_tokens` up to **8192 per page** * CLI workflows with environment-based configuration * Page-range processing for PDFs This makes it well-suited for enterprise-scale document AI systems. ## Quick Start with Transformers ```python from transformers import Qwen3VLForConditionalGeneration, AutoProcessor from qwen_vl_utils import process_vision_info import torch # Load the FP8-compressed chandra model model = Qwen3VLForConditionalGeneration.from_pretrained( "prithivMLmods/chandra-FP8", torch_dtype="auto", device_map="auto" ) processor = AutoProcessor.from_pretrained( "prithivMLmods/chandra-FP8" ) messages = [ { "role": "user", "content": [ { "type": "image", "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg", }, {"type": "text", "text": "Analyze the fine-grained details in this image."}, ], } ] text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ).to("cuda") generated_ids = model.generate(**inputs, max_new_tokens=256) generated_ids_trimmed = [ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text) ``` ## Intended Use * High-precision OCR pipelines * Financial and legal document processing * Academic and textbook digitization * Automated form parsing * Enterprise document intelligence systems * AI data ingestion pipelines ## License Licensed under a modified **[OpenRAIL-M](https://huggingface.co/datalab-to/chandra/blob/main/LICENSE)** framework: * Apache 2.0 for code * Commercial restrictions for competitors exceeding $2M revenue Please review the base model license terms before commercial deployment. ## Limitations & Considerations * FP8 requires compatible GPU hardware for optimal acceleration. * Extremely low-resolution or heavily degraded scans may still impact recognition quality. * Users are responsible for ensuring lawful and compliant deployment in regulated environments.