Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,4 +9,38 @@ app_file: app.py
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
| 12 |
-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
| 12 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 13 |
+
|
| 14 |
+
# Byaldi + Qwen2VL
|
| 15 |
+
|
| 16 |
+
## Overview
|
| 17 |
+
|
| 18 |
+
The **Byaldi + Qwen2VL** app is an innovative tool designed for extracting text from images using advanced OCR (Optical Character Recognition) techniques and natural language processing. This application leverages the **RAGMultiModalModel** from Byaldi and the **Qwen2VL** model for generating meaningful responses based on the extracted text.
|
| 19 |
+
|
| 20 |
+
This application also takes advantage of **ZeroGPU** to run efficiently on powerful hardware, specifically the **NVIDIA A100** GPU, ensuring high-speed processing and accurate results even for large and complex image inputs.
|
| 21 |
+
|
| 22 |
+
## Features
|
| 23 |
+
|
| 24 |
+
- **Image Upload**: Users can upload images from which text will be extracted.
|
| 25 |
+
- **Text Extraction**: Utilizes state-of-the-art models to accurately extract text from the uploaded images.
|
| 26 |
+
- **Keyword Search**: Allows users to search for specific keywords within the extracted text and highlights them.
|
| 27 |
+
- **High-Performance**: Runs on **ZeroGPU (NVIDIA A100)** for accelerated computation and efficient model execution.
|
| 28 |
+
- **User-Friendly Interface**: Built using Gradio for an interactive user experience.
|
| 29 |
+
|
| 30 |
+
## Technologies Used
|
| 31 |
+
|
| 32 |
+
- **Gradio**: For creating the web interface.
|
| 33 |
+
- **Byaldi RAGMultiModalModel**: For indexing and searching images.
|
| 34 |
+
- **Qwen2VL**: For generating responses based on visual and textual inputs.
|
| 35 |
+
- **ZeroGPU**: For efficient model inference using **NVIDIA A100**.
|
| 36 |
+
- **PyTorch**: For deep learning functionalities.
|
| 37 |
+
- **Pillow**: For image handling.
|
| 38 |
+
|
| 39 |
+
## Getting Started
|
| 40 |
+
|
| 41 |
+
### Prerequisites
|
| 42 |
+
|
| 43 |
+
- Python 3.8 or later
|
| 44 |
+
- Required libraries:
|
| 45 |
+
```bash
|
| 46 |
+
pip install gradio byaldi transformers torch pillow
|