File size: 5,241 Bytes
8051f6e
 
 
 
ecb5e11
 
 
 
 
 
 
 
 
 
0966fd1
 
ef5c452
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
license: openrail
base_model:
- datalab-to/chandra
language:
- en
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- text-generation-inference
- vllm
- fp8
- quantized
- llm-compressor
- ocr
- vlm
---

![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/D6NaXVEc9diE1NThfi1RK.png)

# **chandra-FP8-Latest**

> **chandra-FP8-Latest** is an FP8-compressed evolution built on top of **datalab-to/chandra**. This variant leverages **BF16 · FP8 (F8_E4M3)** precision formats to significantly reduce memory footprint and improve inference efficiency while preserving the high-precision OCR and layout-aware reasoning capabilities of the original architecture.
> The result is a highly efficient document intelligence vision-language model optimized for complex parsing, structured output generation, and production-scale deployment.

> [!important]
> FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – [FP8 W8A8](https://docs.vllm.ai/en/stable/features/quantization/fp8/). Quantization W8A8 FP8-dynamic recipe – [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8).

## About the Base Model

**Chandra** from datalab-to is a state-of-the-art open-source OCR vision-language model designed for complex document parsing and high-fidelity layout reconstruction.

It excels at:

* **Handwriting Recognition** across diverse styles
* **Table Structure Preservation**, including merged and nested cells
* **Mathematical Equation Rendering** into clean LaTeX
* **Form Reconstruction** with checkboxes and radio buttons
* **Multi-Column Layout Parsing**
* **40+ Language Support**
* **Precise Bounding Box Extraction** for every text block, table, and image

Chandra outputs structured **Markdown, HTML, or JSON** with layout-aware coordinates, enabling seamless integration into document intelligence pipelines.

It handles challenging real-world inputs such as:

* Doctor notes
* Financial filings
* Invoices
* Textbooks
* Government forms
* Low-quality or messy scanned documents

## What FP8 Adds

The **chandra-FP8-Latest** variant introduces:

* **BF16 · FP8 (F8_E4M3) Compression**: Transformer Engine–based quantization reduces VRAM usage while maintaining OCR precision and layout fidelity.
* **Higher Throughput**: Faster document parsing at scale.
* **Lower Memory Footprint**: Improved deployment feasibility on Hopper-class and compatible GPUs.
* **Production Optimization**: Ideal for high-volume PDF ingestion and enterprise document processing.

## Deployment Support

Chandra supports:

* **Hugging Face Transformers** for local inference
* **vLLM server deployment** for high-throughput production environments
* Layout-aware prompts such as `"ocr_layout"`
* Configurable `max_output_tokens` up to **8192 per page**
* CLI workflows with environment-based configuration
* Page-range processing for PDFs

This makes it well-suited for enterprise-scale document AI systems.

## Quick Start with Transformers

```python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

# Load the FP8-compressed chandra model
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/chandra-FP8",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "prithivMLmods/chandra-FP8"
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Analyze the fine-grained details in this image."},
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=256)

generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)
```

## Intended Use

* High-precision OCR pipelines
* Financial and legal document processing
* Academic and textbook digitization
* Automated form parsing
* Enterprise document intelligence systems
* AI data ingestion pipelines

## License

Licensed under a modified **[OpenRAIL-M](https://huggingface.co/datalab-to/chandra/blob/main/LICENSE)** framework:

* Apache 2.0 for code
* Commercial restrictions for competitors exceeding $2M revenue

Please review the base model license terms before commercial deployment.

## Limitations & Considerations

* FP8 requires compatible GPU hardware for optimal acceleration.
* Extremely low-resolution or heavily degraded scans may still impact recognition quality.
* Users are responsible for ensuring lawful and compliant deployment in regulated environments.