Spaces:
Running
Running
metadata
title: Image Text Extractor
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
sdk_version: 1.28.0
app_file: streamlit_app.py
pinned: false
license: mit
Image Text Extractor
This project is a Streamlit application that uses the olmOCR model (based on Qwen2.5-VL) to extract text from images. It provides a user-friendly interface to upload images and view the extracted text along with metadata.
Features
- Image Upload: Support for PNG, JPG, and JPEG formats.
- Text Extraction: Uses state-of-the-art Vision-Language Models for accurate OCR.
- Metadata Extraction: Extracts additional information like primary language, rotation, and content type (table, diagram).
- JSON Export: Download extraction results as JSON files.
- Configurable: Adjust maximum token generation for longer documents.
Installation
Clone the repository:
git clone <repository-url> cd image-text-extractorCreate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activateInstall dependencies:
pip install -r requirements.txt
Usage
Run the Streamlit app:
streamlit run streamlit_app.pyOpen your browser: The app should automatically open in your default browser at
http://localhost:8501.
Testing
This project uses pytest for unit testing.
- Run tests:
pytest tests/
Project Structure
streamlit_app.py: The main entry point for the Streamlit application.service/: Contains the backend logic for text extraction.text_extraction_service.py: The core service class handling model interaction.
tests/: Unit tests for the application.requirements.txt: Python dependencies.
License
[Add License Here]