File size: 1,931 Bytes
6e2196b
 
 
 
 
96c41f7
6e2196b
 
 
 
 
 
b8b55ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
title: Image Text Extractor
emoji: πŸ“„
colorFrom: blue
colorTo: indigo
sdk: docker
sdk_version: 1.28.0
app_file: streamlit_app.py
pinned: false
license: mit
---

# Image Text Extractor

This project is a Streamlit application that uses the `olmOCR` model (based on Qwen2.5-VL) to extract text from images. It provides a user-friendly interface to upload images and view the extracted text along with metadata.

## Features

-   **Image Upload**: Support for PNG, JPG, and JPEG formats.
-   **Text Extraction**: Uses state-of-the-art Vision-Language Models for accurate OCR.
-   **Metadata Extraction**: Extracts additional information like primary language, rotation, and content type (table, diagram).
-   **JSON Export**: Download extraction results as JSON files.
-   **Configurable**: Adjust maximum token generation for longer documents.

## Installation

1.  **Clone the repository**:
    ```bash
    git clone <repository-url>
    cd image-text-extractor
    ```

2.  **Create a virtual environment** (recommended):
    ```bash
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    ```

3.  **Install dependencies**:
    ```bash
    pip install -r requirements.txt
    ```

## Usage

1.  **Run the Streamlit app**:
    ```bash
    streamlit run streamlit_app.py
    ```

2.  **Open your browser**:
    The app should automatically open in your default browser at `http://localhost:8501`.

## Testing

This project uses `pytest` for unit testing.

1.  **Run tests**:
    ```bash
    pytest tests/
    ```

## Project Structure

-   `streamlit_app.py`: The main entry point for the Streamlit application.
-   `service/`: Contains the backend logic for text extraction.
    -   `text_extraction_service.py`: The core service class handling model interaction.
-   `tests/`: Unit tests for the application.
-   `requirements.txt`: Python dependencies.

## License

[Add License Here]