README.md · prithivMLmods/chandra-FP8-Latest at main

File size: 5,241 Bytes

---
license: openrail
base_model:
- datalab-to/chandra
language:
- en
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- text-generation-inference
- vllm
- fp8
- quantized
- llm-compressor
- ocr
- vlm
---

![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/D6NaXVEc9diE1NThfi1RK.png)

# **chandra-FP8-Latest**

> **chandra-FP8-Latest** is an FP8-compressed evolution built on top of **datalab-to/chandra**. This variant leverages **BF16 · FP8 (F8_E4M3)** precision formats to significantly reduce memory footprint and improve inference efficiency while preserving the high-precision OCR and layout-aware reasoning capabilities of the original architecture.
> The result is a highly efficient document intelligence vision-language model optimized for complex parsing, structured output generation, and production-scale deployment.

> [!important]
> FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – [FP8 W8A8](https://docs.vllm.ai/en/stable/features/quantization/fp8/). Quantization W8A8 FP8-dynamic recipe – [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8).

## About the Base Model

**Chandra** from datalab-to is a state-of-the-art open-source OCR vision-language model designed for complex document parsing and high-fidelity layout reconstruction.

It excels at:

* **Handwriting Recognition** across diverse styles
* **Table Structure Preservation**, including merged and nested cells
* **Mathematical Equation Rendering** into clean LaTeX
* **Form Reconstruction** with checkboxes and radio buttons
* **Multi-Column Layout Parsing**
* **40+ Language Support**
* **Precise Bounding Box Extraction** for every text block, table, and image

Chandra outputs structured **Markdown, HTML, or JSON** with layout-aware coordinates, enabling seamless integration into document intelligence pipelines.

It handles challenging real-world inputs such as:

* Doctor notes
* Financial filings
* Invoices
* Textbooks
* Government forms
* Low-quality or messy scanned documents

## What FP8 Adds

The **chandra-FP8-Latest** variant introduces:

* **BF16 · FP8 (F8_E4M3) Compression**: Transformer Engine–based quantization reduces VRAM usage while maintaining OCR precision and layout fidelity.
* **Higher Throughput**: Faster document parsing at scale.
* **Lower Memory Footprint**: Improved deployment feasibility on Hopper-class and compatible GPUs.
* **Production Optimization**: Ideal for high-volume PDF ingestion and enterprise document processing.

## Deployment Support

Chandra supports:

* **Hugging Face Transformers** for local inference
* **vLLM server deployment** for high-throughput production environments
* Layout-aware prompts such as `"ocr_layout"`
* Configurable `max_output_tokens` up to **8192 per page**
* CLI workflows with environment-based configuration
* Page-range processing for PDFs

This makes it well-suited for enterprise-scale document AI systems.

## Quick Start with Transformers

```python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

# Load the FP8-compressed chandra model
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/chandra-FP8",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "prithivMLmods/chandra-FP8"
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Analyze the fine-grained details in this image."},
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=256)

generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)
```

## Intended Use

* High-precision OCR pipelines
* Financial and legal document processing
* Academic and textbook digitization
* Automated form parsing
* Enterprise document intelligence systems
* AI data ingestion pipelines

## License

Licensed under a modified **[OpenRAIL-M](https://huggingface.co/datalab-to/chandra/blob/main/LICENSE)** framework:

* Apache 2.0 for code
* Commercial restrictions for competitors exceeding $2M revenue

Please review the base model license terms before commercial deployment.

## Limitations & Considerations

* FP8 requires compatible GPU hardware for optimal acceleration.
* Extremely low-resolution or heavily degraded scans may still impact recognition quality.
* Users are responsible for ensuring lawful and compliant deployment in regulated environments.