| | --- |
| | license: apache-2.0 |
| | pipeline_tag: image-text-to-text |
| | language: |
| | - en |
| | base_model: |
| | - prithivMLmods/Qwen2-VL-OCR-2B-Instruct |
| | library_name: peft |
| | tags: |
| | - ocr_test |
| | - qwen |
| | - qvq |
| | - kie |
| | - trl |
| | - text-generation-inference |
| | - qwen2_vl |
| | --- |
| | # **QvQ KiE [Key Information Extractor] Adapter for Qwen2-VL-OCR-2B-Instruct** |
| |
|
| | The **QvQ KiE adapter** is a fine-tuned version of the **Qwen/Qwen2-VL-2B-Instruct** model, specifically tailored for tasks involving **Optical Character Recognition (OCR)**, **image-to-text conversion**, and **math problem-solving** with **LaTeX formatting**. This adapter enhances the model’s performance for multi-modal tasks by integrating vision and language capabilities in a conversational framework. |
| |
|
| | # **Key Features** |
| |
|
| | ### 1. **Vision-Language Integration** |
| | - Seamlessly combines **image understanding** with **natural language processing**, enabling accurate image-to-text conversion. |
| | |
| | ### 2. **Optical Character Recognition (OCR)** |
| | - Extracts and processes textual content from images with high precision, making it ideal for document analysis and information extraction. |
| |
|
| | ### 3. **Math and LaTeX Support** |
| | - Efficiently handles complex **math problem-solving**, outputting results in **LaTeX format** for easy integration into scientific and academic workflows. |
| |
|
| | ### 4. **Conversational Capabilities** |
| | - Equipped with multi-turn conversational capabilities, providing context-aware responses during interactions. This makes it suitable for tasks requiring ongoing dialogue and clarification. |
| |
|
| | ### 5. **Image-Text-to-Text Generation** |
| | - Supports input in various forms: |
| | - **Images** |
| | - **Text** |
| | - **Image + Text (multi-modal)** |
| | - Outputs include descriptive or problem-solving text, depending on the input type. |
| |
|
| | ### 6. **Secure Weight Format** |
| | - Utilizes **Safetensors** for fast and secure model weight loading, ensuring both performance and safety during deployment. |
| |
|
| | --- |