README.md · prithivMLmods/chandra-FP8-Latest at main

chandra-FP8-Latest / README.md

prithivMLmods

Update README.md

ef5c452 verified 5 days ago

preview code

raw

history blame contribute delete

5.24 kB

	---
	license: openrail
	base_model:
	- datalab-to/chandra
	language:
	- en
	pipeline_tag: image-text-to-text
	library_name: transformers
	tags:
	- text-generation-inference
	- vllm
	- fp8
	- quantized
	- llm-compressor
	- ocr
	- vlm
	---

	![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/D6NaXVEc9diE1NThfi1RK.png)

	# chandra-FP8-Latest

	> chandra-FP8-Latest is an FP8-compressed evolution built on top of datalab-to/chandra. This variant leverages BF16 · FP8 (F8_E4M3) precision formats to significantly reduce memory footprint and improve inference efficiency while preserving the high-precision OCR and layout-aware reasoning capabilities of the original architecture.
	> The result is a highly efficient document intelligence vision-language model optimized for complex parsing, structured output generation, and production-scale deployment.

	> [!important]
	> FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – [FP8 W8A8](https://docs.vllm.ai/en/stable/features/quantization/fp8/). Quantization W8A8 FP8-dynamic recipe – [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8).

	## About the Base Model

	Chandra from datalab-to is a state-of-the-art open-source OCR vision-language model designed for complex document parsing and high-fidelity layout reconstruction.

	It excels at:

	* Handwriting Recognition across diverse styles
	* Table Structure Preservation, including merged and nested cells
	* Mathematical Equation Rendering into clean LaTeX
	* Form Reconstruction with checkboxes and radio buttons
	* Multi-Column Layout Parsing
	* 40+ Language Support
	* Precise Bounding Box Extraction for every text block, table, and image

	Chandra outputs structured Markdown, HTML, or JSON with layout-aware coordinates, enabling seamless integration into document intelligence pipelines.

	It handles challenging real-world inputs such as:

	* Doctor notes
	* Financial filings
	* Invoices
	* Textbooks
	* Government forms
	* Low-quality or messy scanned documents

	## What FP8 Adds

	The chandra-FP8-Latest variant introduces:

	* BF16 · FP8 (F8_E4M3) Compression: Transformer Engine–based quantization reduces VRAM usage while maintaining OCR precision and layout fidelity.
	* Higher Throughput: Faster document parsing at scale.
	* Lower Memory Footprint: Improved deployment feasibility on Hopper-class and compatible GPUs.
	* Production Optimization: Ideal for high-volume PDF ingestion and enterprise document processing.

	## Deployment Support

	Chandra supports:

	* Hugging Face Transformers for local inference
	* vLLM server deployment for high-throughput production environments
	* Layout-aware prompts such as `"ocr_layout"`
	* Configurable `max_output_tokens` up to 8192 per page
	* CLI workflows with environment-based configuration
	* Page-range processing for PDFs

	This makes it well-suited for enterprise-scale document AI systems.

	## Quick Start with Transformers

	```python
	from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
	from qwen_vl_utils import process_vision_info
	import torch

	# Load the FP8-compressed chandra model
	model = Qwen3VLForConditionalGeneration.from_pretrained(
	"prithivMLmods/chandra-FP8",
	torch_dtype="auto",
	device_map="auto"
	)

	processor = AutoProcessor.from_pretrained(
	"prithivMLmods/chandra-FP8"
	)

	messages = [
	{
	"role": "user",
	"content": [
	{
	"type": "image",
	"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
	},
	{"type": "text", "text": "Analyze the fine-grained details in this image."},
	],
	}
	]

	text = processor.apply_chat_template(
	messages, tokenize=False, add_generation_prompt=True
	)

	image_inputs, video_inputs = process_vision_info(messages)

	inputs = processor(
	text=[text],
	images=image_inputs,
	videos=video_inputs,
	padding=True,
	return_tensors="pt",
	).to("cuda")

	generated_ids = model.generate(**inputs, max_new_tokens=256)

	generated_ids_trimmed = [
	out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
	]

	output_text = processor.batch_decode(
	generated_ids_trimmed,
	skip_special_tokens=True,
	clean_up_tokenization_spaces=False
	)

	print(output_text)
	```

	## Intended Use

	* High-precision OCR pipelines
	* Financial and legal document processing
	* Academic and textbook digitization
	* Automated form parsing
	* Enterprise document intelligence systems
	* AI data ingestion pipelines

	## License

	Licensed under a modified [OpenRAIL-M](https://huggingface.co/datalab-to/chandra/blob/main/LICENSE) framework:

	* Apache 2.0 for code
	* Commercial restrictions for competitors exceeding $2M revenue

	Please review the base model license terms before commercial deployment.

	## Limitations & Considerations

	* FP8 requires compatible GPU hardware for optimal acceleration.
	* Extremely low-resolution or heavily degraded scans may still impact recognition quality.
	* Users are responsible for ensuring lawful and compliant deployment in regulated environments.