πŸ“Έ OCR Screenshot Reader

Extract text from screenshots of terminal windows, log files, and computer application windows using GOT-OCR 2.0.

Features

  • βœ… Reads text from terminal/console screenshots
  • βœ… Handles log files with timestamps, error levels, paths
  • βœ… Works with any computer window screenshot
  • βœ… Supports colored text (ANSI terminal colors)
  • βœ… Batch processing (multiple files or entire folders)
  • βœ… Runs on both GPU (1-2s/image) and CPU (30-60s/image)
  • βœ… Pipe-friendly stdout output

Model

Uses stepfun-ai/GOT-OCR-2.0-hf:

  • 560M parameters β€” lightweight and fast
  • Apache 2.0 license β€” fully open, commercial use allowed
  • Best-in-class for document/text OCR β€” beats 7B-34B VLMs on plain text extraction

Installation

pip install transformers torch Pillow accelerate

Usage

# Single image
python ocr_reader.py screenshot.png

# Multiple images
python ocr_reader.py image1.png image2.png image3.png

# Process entire folder
python ocr_reader.py ./screenshots/

# Save output to file
python ocr_reader.py screenshot.png --output extracted.txt

# For very long log outputs
python ocr_reader.py long_log.png --max-tokens 8192

# Force CPU even if GPU is available
python ocr_reader.py screenshot.png --device cpu

Performance

Setup Speed Memory
GPU (CUDA) ~1-2 sec/image ~4GB VRAM
CPU ~30-60 sec/image ~2GB RAM

Supported Image Formats

PNG, JPG, JPEG, BMP, TIFF, WebP, GIF

Example

Input: A terminal screenshot with log output

Output:

user@server:~$ cat /var/log/syslog | tail -15
2024-03-15 10:23:01 INFO  Starting application server v2.4.1
2024-03-15 10:23:02 INFO  Loading configuration from /etc/app/config.yml
2024-03-15 10:23:03 INFO  Database connection established (pool=5)
2024-03-15 10:23:03 WARN  Cache directory /tmp/cache is 85% full
2024-03-15 10:23:04 INFO  Listening on port 8080
2024-03-15 10:23:22 ERROR Connection timeout to redis:6379
2024-03-15 10:23:25 INFO  Redis reconnected successfully

How It Works

  1. Loads the GOT-OCR 2.0 vision-language model (downloads automatically on first run)
  2. Processes your screenshot through the model's vision encoder
  3. Generates text token-by-token using the language decoder
  4. Outputs clean extracted text preserving line structure

License

Apache 2.0

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support