prithivMLmods commited on
Commit
ef5c452
·
verified ·
1 Parent(s): 0966fd1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -1
README.md CHANGED
@@ -14,4 +14,145 @@ tags:
14
  - llm-compressor
15
  - ocr
16
  - vlm
17
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  - llm-compressor
15
  - ocr
16
  - vlm
17
+ ---
18
+
19
+ ![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/D6NaXVEc9diE1NThfi1RK.png)
20
+
21
+ # **chandra-FP8-Latest**
22
+
23
+ > **chandra-FP8-Latest** is an FP8-compressed evolution built on top of **datalab-to/chandra**. This variant leverages **BF16 · FP8 (F8_E4M3)** precision formats to significantly reduce memory footprint and improve inference efficiency while preserving the high-precision OCR and layout-aware reasoning capabilities of the original architecture.
24
+ > The result is a highly efficient document intelligence vision-language model optimized for complex parsing, structured output generation, and production-scale deployment.
25
+
26
+ > [!important]
27
+ > FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – [FP8 W8A8](https://docs.vllm.ai/en/stable/features/quantization/fp8/). Quantization W8A8 FP8-dynamic recipe – [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8).
28
+
29
+ ## About the Base Model
30
+
31
+ **Chandra** from datalab-to is a state-of-the-art open-source OCR vision-language model designed for complex document parsing and high-fidelity layout reconstruction.
32
+
33
+ It excels at:
34
+
35
+ * **Handwriting Recognition** across diverse styles
36
+ * **Table Structure Preservation**, including merged and nested cells
37
+ * **Mathematical Equation Rendering** into clean LaTeX
38
+ * **Form Reconstruction** with checkboxes and radio buttons
39
+ * **Multi-Column Layout Parsing**
40
+ * **40+ Language Support**
41
+ * **Precise Bounding Box Extraction** for every text block, table, and image
42
+
43
+ Chandra outputs structured **Markdown, HTML, or JSON** with layout-aware coordinates, enabling seamless integration into document intelligence pipelines.
44
+
45
+ It handles challenging real-world inputs such as:
46
+
47
+ * Doctor notes
48
+ * Financial filings
49
+ * Invoices
50
+ * Textbooks
51
+ * Government forms
52
+ * Low-quality or messy scanned documents
53
+
54
+ ## What FP8 Adds
55
+
56
+ The **chandra-FP8-Latest** variant introduces:
57
+
58
+ * **BF16 · FP8 (F8_E4M3) Compression**: Transformer Engine–based quantization reduces VRAM usage while maintaining OCR precision and layout fidelity.
59
+ * **Higher Throughput**: Faster document parsing at scale.
60
+ * **Lower Memory Footprint**: Improved deployment feasibility on Hopper-class and compatible GPUs.
61
+ * **Production Optimization**: Ideal for high-volume PDF ingestion and enterprise document processing.
62
+
63
+ ## Deployment Support
64
+
65
+ Chandra supports:
66
+
67
+ * **Hugging Face Transformers** for local inference
68
+ * **vLLM server deployment** for high-throughput production environments
69
+ * Layout-aware prompts such as `"ocr_layout"`
70
+ * Configurable `max_output_tokens` up to **8192 per page**
71
+ * CLI workflows with environment-based configuration
72
+ * Page-range processing for PDFs
73
+
74
+ This makes it well-suited for enterprise-scale document AI systems.
75
+
76
+ ## Quick Start with Transformers
77
+
78
+ ```python
79
+ from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
80
+ from qwen_vl_utils import process_vision_info
81
+ import torch
82
+
83
+ # Load the FP8-compressed chandra model
84
+ model = Qwen3VLForConditionalGeneration.from_pretrained(
85
+ "prithivMLmods/chandra-FP8",
86
+ torch_dtype="auto",
87
+ device_map="auto"
88
+ )
89
+
90
+ processor = AutoProcessor.from_pretrained(
91
+ "prithivMLmods/chandra-FP8"
92
+ )
93
+
94
+ messages = [
95
+ {
96
+ "role": "user",
97
+ "content": [
98
+ {
99
+ "type": "image",
100
+ "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
101
+ },
102
+ {"type": "text", "text": "Analyze the fine-grained details in this image."},
103
+ ],
104
+ }
105
+ ]
106
+
107
+ text = processor.apply_chat_template(
108
+ messages, tokenize=False, add_generation_prompt=True
109
+ )
110
+
111
+ image_inputs, video_inputs = process_vision_info(messages)
112
+
113
+ inputs = processor(
114
+ text=[text],
115
+ images=image_inputs,
116
+ videos=video_inputs,
117
+ padding=True,
118
+ return_tensors="pt",
119
+ ).to("cuda")
120
+
121
+ generated_ids = model.generate(**inputs, max_new_tokens=256)
122
+
123
+ generated_ids_trimmed = [
124
+ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
125
+ ]
126
+
127
+ output_text = processor.batch_decode(
128
+ generated_ids_trimmed,
129
+ skip_special_tokens=True,
130
+ clean_up_tokenization_spaces=False
131
+ )
132
+
133
+ print(output_text)
134
+ ```
135
+
136
+ ## Intended Use
137
+
138
+ * High-precision OCR pipelines
139
+ * Financial and legal document processing
140
+ * Academic and textbook digitization
141
+ * Automated form parsing
142
+ * Enterprise document intelligence systems
143
+ * AI data ingestion pipelines
144
+
145
+ ## License
146
+
147
+ Licensed under a modified **[OpenRAIL-M](https://huggingface.co/datalab-to/chandra/blob/main/LICENSE)** framework:
148
+
149
+ * Apache 2.0 for code
150
+ * Commercial restrictions for competitors exceeding $2M revenue
151
+
152
+ Please review the base model license terms before commercial deployment.
153
+
154
+ ## Limitations & Considerations
155
+
156
+ * FP8 requires compatible GPU hardware for optimal acceleration.
157
+ * Extremely low-resolution or heavily degraded scans may still impact recognition quality.
158
+ * Users are responsible for ensuring lawful and compliant deployment in regulated environments.