prithivMLmods
/

Nanonets-OCR2-3B-AWQ-nvfp4

@@ -17,7 +17,7 @@ tags:
 # **Nanonets-OCR2-3B-AWQ-nvfp4**
-> Nanonets-OCR2-3B-AWQ-nvfp4 model is an experimental quantized version of the [Nanonets-OCR2-3B](https://huggingface.co/nanonets/Nanonets-OCR2-3B) model, featuring 3 billion parameters and multiple tensor types including F32, BF16, F8_E4M3, and U8, optimized for efficient inference. It is based on the Qwen/Qwen2.5-VL-3B-Instruct base model and fine-tuned on Nanonets-OCR2 data, designed for advanced image-to-markdown OCR tasks such as recognizing LaTeX equations, complex tables, signatures, watermarks, checkboxes, and multilingual handwritten text, outputting documents in structured markdown with intelligent semantic tagging suitable for large language model downstream processing. Despite being experimental and not yet deployed by any inference provider, it supports image-text-to-text processing ideal for complex document workflows involving multipart content types including flowcharts and organizational charts, with applications in business, financial, and multilingual domains. This quantized variant is part of ongoing efforts to enable efficient use of this powerful OCR technology on lighter hardware while maintaining sophisticated extraction capabilities.
 ## Quick Start with Transformers 🤗
@@ -162,4 +162,17 @@ with gr.Blocks(css=css) as demo:
 if __name__ == "__main__":
     demo.queue(max_size=50).launch(debug=True)
-```

 # **Nanonets-OCR2-3B-AWQ-nvfp4**
+> Nanonets-OCR2-3B-AWQ-nvfp4 model is an `experimental` quantized version of the [Nanonets-OCR2-3B](https://huggingface.co/nanonets/Nanonets-OCR2-3B) model, featuring 3 billion parameters and multiple tensor types including F32, BF16, F8_E4M3, and U8, optimized for efficient inference. It is based on the Qwen/Qwen2.5-VL-3B-Instruct base model and fine-tuned on Nanonets-OCR2 data, designed for advanced image-to-markdown OCR tasks such as recognizing LaTeX equations, complex tables, signatures, watermarks, checkboxes, and multilingual handwritten text, outputting documents in structured markdown with intelligent semantic tagging suitable for large language model downstream processing. Despite being experimental and not yet deployed by any inference provider, it supports image-text-to-text processing ideal for complex document workflows involving multipart content types including flowcharts and organizational charts, with applications in business, financial, and multilingual domains. This quantized variant is part of ongoing efforts to enable efficient use of this powerful OCR technology on lighter hardware while maintaining sophisticated extraction capabilities.
 ## Quick Start with Transformers 🤗
 if __name__ == "__main__":
     demo.queue(max_size=50).launch(debug=True)
+```
+> All the restrictions and guidelines will be followed as in the original model [Nanonets-OCR2-3B](https://huggingface.co/nanonets/Nanonets-OCR2-3B).
+![Screenshot 2025-11-07 at 00-36-37 Gradio](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/nUyi_LTHlvoJM_HOvU1Kh.png)
+![Screenshot 2025-11-07 at 00-36-54 Gradio](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/9omEpkYn_nYnXOfGKcjSX.png)
+## Model and Resource Links
+| Resource Type | Description | Link |
+|----------------|--------------|------|
+| Original Model Card | Official release of Nanonets-OCR2-3B by Nanonets | [nanonets/Nanonets-OCR2-3B](https://huggingface.co/nanonets/Nanonets-OCR2-3B) |
+| Optimized Model (AWQ-nvfp4) | Quantized version optimized for efficient inference and deployment | [prithivMLmods/Nanonets-OCR2-3B-AWQ-nvfp4](https://huggingface.co/prithivMLmods/Nanonets-OCR2-3B-AWQ-nvfp4) |
+| Demo Space | Interactive demo hosted on Hugging Face Spaces | [Multimodal-OCR3 Demo](https://huggingface.co/spaces/prithivMLmods/Multimodal-OCR3) |