πΈ OCR Screenshot Reader
Extract text from screenshots of terminal windows, log files, and computer application windows using GOT-OCR 2.0.
Features
- β Reads text from terminal/console screenshots
- β Handles log files with timestamps, error levels, paths
- β Works with any computer window screenshot
- β Supports colored text (ANSI terminal colors)
- β Batch processing (multiple files or entire folders)
- β
Runs on both GPU (
1-2s/image) and CPU (30-60s/image) - β Pipe-friendly stdout output
Model
Uses stepfun-ai/GOT-OCR-2.0-hf:
- 560M parameters β lightweight and fast
- Apache 2.0 license β fully open, commercial use allowed
- Best-in-class for document/text OCR β beats 7B-34B VLMs on plain text extraction
Installation
pip install transformers torch Pillow accelerate
Usage
# Single image
python ocr_reader.py screenshot.png
# Multiple images
python ocr_reader.py image1.png image2.png image3.png
# Process entire folder
python ocr_reader.py ./screenshots/
# Save output to file
python ocr_reader.py screenshot.png --output extracted.txt
# For very long log outputs
python ocr_reader.py long_log.png --max-tokens 8192
# Force CPU even if GPU is available
python ocr_reader.py screenshot.png --device cpu
Performance
| Setup | Speed | Memory |
|---|---|---|
| GPU (CUDA) | ~1-2 sec/image | ~4GB VRAM |
| CPU | ~30-60 sec/image | ~2GB RAM |
Supported Image Formats
PNG, JPG, JPEG, BMP, TIFF, WebP, GIF
Example
Input: A terminal screenshot with log output
Output:
user@server:~$ cat /var/log/syslog | tail -15
2024-03-15 10:23:01 INFO Starting application server v2.4.1
2024-03-15 10:23:02 INFO Loading configuration from /etc/app/config.yml
2024-03-15 10:23:03 INFO Database connection established (pool=5)
2024-03-15 10:23:03 WARN Cache directory /tmp/cache is 85% full
2024-03-15 10:23:04 INFO Listening on port 8080
2024-03-15 10:23:22 ERROR Connection timeout to redis:6379
2024-03-15 10:23:25 INFO Redis reconnected successfully
How It Works
- Loads the GOT-OCR 2.0 vision-language model (downloads automatically on first run)
- Processes your screenshot through the model's vision encoder
- Generates text token-by-token using the language decoder
- Outputs clean extracted text preserving line structure
License
Apache 2.0
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support