Balatro OCR

Fine-tuned PaddleOCR recognition model for extracting game state text from downscaled Balatro gameplay (360p).

The model is trained on UI text cropped from frames downscaled from 1920Γ—1080 gameplay. It is intended for pipelines that reconstruct structured game state from video, typically for imitation learning or behavior cloning.

The repository includes:

  • a trained PaddleOCR inference model
  • predefined UI bounding boxes (text_boxes.json)
  • a reference inference script
  • an example gameplay frame

Repository

balatro-ocr
β”œβ”€β”€ example.png
β”œβ”€β”€ inference
β”‚   β”œβ”€β”€ inference.pdiparams
β”‚   β”œβ”€β”€ inference.pdmodel
β”‚   └── inference.yml
β”œβ”€β”€ inference.py
β”œβ”€β”€ text_boxes.json
└── README.md

Example Frame

Example 360p gameplay frame used for OCR.

Balatro OCR example


How It Works

  1. gameplay frames are downscaled to 360p
  2. text_boxes.json defines fixed UI regions
  3. each region is cropped and passed through the OCR model
  4. predictions reconstruct the game state
video frame
   ↓
downscale to 360p
   ↓
crop regions (text_boxes.json)
   ↓
Balatro OCR
   ↓
structured game state

The bounding boxes correspond to important UI elements such as:

  • reroll price
  • round score
  • dollars
  • hand size
  • pack type

Installation

pip install paddlepaddle paddleocr opencv-python numpy

The inference script uses internal PaddleOCR utilities, so clone the PaddleOCR repository:

git clone https://github.com/PaddlePaddle/PaddleOCR

Run the script from inside the PaddleOCR directory or ensure it is on PYTHONPATH.


Usage

Run OCR on the example frame:

python inference.py
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support