--- title: Image Text Extractor emoji: 📄 colorFrom: blue colorTo: indigo sdk: docker sdk_version: 1.28.0 app_file: streamlit_app.py pinned: false license: mit --- # Image Text Extractor This project is a Streamlit application that uses the `olmOCR` model (based on Qwen2.5-VL) to extract text from images. It provides a user-friendly interface to upload images and view the extracted text along with metadata. ## Features - **Image Upload**: Support for PNG, JPG, and JPEG formats. - **Text Extraction**: Uses state-of-the-art Vision-Language Models for accurate OCR. - **Metadata Extraction**: Extracts additional information like primary language, rotation, and content type (table, diagram). - **JSON Export**: Download extraction results as JSON files. - **Configurable**: Adjust maximum token generation for longer documents. ## Installation 1. **Clone the repository**: ```bash git clone cd image-text-extractor ``` 2. **Create a virtual environment** (recommended): ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. **Install dependencies**: ```bash pip install -r requirements.txt ``` ## Usage 1. **Run the Streamlit app**: ```bash streamlit run streamlit_app.py ``` 2. **Open your browser**: The app should automatically open in your default browser at `http://localhost:8501`. ## Testing This project uses `pytest` for unit testing. 1. **Run tests**: ```bash pytest tests/ ``` ## Project Structure - `streamlit_app.py`: The main entry point for the Streamlit application. - `service/`: Contains the backend logic for text extraction. - `text_extraction_service.py`: The core service class handling model interaction. - `tests/`: Unit tests for the application. - `requirements.txt`: Python dependencies. ## License [Add License Here]