kusatmer's picture
feat: Add Dockerfile for application containerization and update README to reflect Docker SDK.
96c41f7
metadata
title: Image Text Extractor
emoji: πŸ“„
colorFrom: blue
colorTo: indigo
sdk: docker
sdk_version: 1.28.0
app_file: streamlit_app.py
pinned: false
license: mit

Image Text Extractor

This project is a Streamlit application that uses the olmOCR model (based on Qwen2.5-VL) to extract text from images. It provides a user-friendly interface to upload images and view the extracted text along with metadata.

Features

  • Image Upload: Support for PNG, JPG, and JPEG formats.
  • Text Extraction: Uses state-of-the-art Vision-Language Models for accurate OCR.
  • Metadata Extraction: Extracts additional information like primary language, rotation, and content type (table, diagram).
  • JSON Export: Download extraction results as JSON files.
  • Configurable: Adjust maximum token generation for longer documents.

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd image-text-extractor
    
  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    

Usage

  1. Run the Streamlit app:

    streamlit run streamlit_app.py
    
  2. Open your browser: The app should automatically open in your default browser at http://localhost:8501.

Testing

This project uses pytest for unit testing.

  1. Run tests:
    pytest tests/
    

Project Structure

  • streamlit_app.py: The main entry point for the Streamlit application.
  • service/: Contains the backend logic for text extraction.
    • text_extraction_service.py: The core service class handling model interaction.
  • tests/: Unit tests for the application.
  • requirements.txt: Python dependencies.

License

[Add License Here]