WATERec-Models: Strong Baseline for WordArt-Oriented Scene Text Recognition

WATERec is the strong STR baseline proposed in the paper "Advancing WordArt-Oriented Scene Text Recognition: Datasets and Methods" (ECCV 2026). It couples a NaViT-like RoPE-ViT encoder that supports arbitrary-shaped inputs with an autoregressive (AR) Transformer decoder, structurally breaking the bottleneck of fixed-template STR on highly irregular WordArt.

This repository hosts the trained model checkpoints.

📄 Paper (arXiv): https://arxiv.org/abs/2606.24484
💻 Code: https://github.com/YesianRohn/WATER
🧠 Model code (OpenOCR-WATERec): https://github.com/YesianRohn/OpenOCR-WATERec
📦 Datasets (WATER-Data): https://huggingface.co/datasets/Yesianrohn/WATER-Data

Model Architecture

Encoder: 6-layer Transformer with RoPE attention, accepting arbitrary aspect ratios. Inputs are rescaled (aspect-ratio preserving) so the number of 4×4 patch tokens lies in [64, 256]; tokens are projected to d=384 and arranged in row-major order.
Decoder: 2 cross-attention AR Transformer layers, predicting characters one by one under cross-entropy loss. Max text length 25; character set of 94 tokens (digits, letters, common symbols).

This design preserves native aspect ratios, mitigates distortion from fixed-template resizing, and better adapts to curved / vertical / multi-oriented artistic layouts.

Checkpoints

Each file is a standard PyTorch state_dict (~112 MB), differing only in the training data:

File	Training data	WordArt-Bench Acc.
`WATERec-R.pth`	WATER-R (real only, 3.2M)	88.55%
`WATERec-S.pth`	WATER-S (synthetic only, 2M)	80.94%
`WATERec-RS.pth`	WATER-R + WATER-S (real + 2M synthetic)	90.40%

WATERec-RS.pth is the recommended best model — the first result to exceed 90% on WordArt-Bench, surpassing both general-purpose and OCR-specialized VLMs by a large margin.

Usage

We recommend running these checkpoints with the official framework OpenOCR-WATERec, which provides the matching model configuration, preprocessing, and inference scripts.

Download the weights:

# Requires: pip install -U "huggingface_hub[cli]"
hf download Yesianrohn/WATERec-Models --local-dir ./WATERec-Models

Load a checkpoint:

import torch

# weights_only=True for safer loading of pickle-based .pth files
state_dict = torch.load("WATERec-RS.pth", map_location="cpu", weights_only=True)
# Build the WATERec model from the OpenOCR-WATERec config, then:
# model.load_state_dict(state_dict)

These .pth files contain only model weights; no config is bundled. Use the configs in the OpenOCR-WATERec repository to instantiate the architecture before loading the state dict.

License

Released under the Apache 2.0 license.

Citation

If you use these models in your research, please cite our paper:

@inproceedings{water2026eccv,
  title     = {Advancing WordArt-Oriented Scene Text Recognition: Datasets and Methods},
  author    = {Ye, Xingsong and Du, Yongkun and Zhang, Jiaxin and Zhang, Haojie and Sun, Chong and Li, Chen and Lyu, Jing and Chen, Zhineng},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for Yesianrohn/WATERec-Models

Advancing WordArt-Oriented Scene Text Recognition: Datasets and Methods

Paper • 2606.24484 • Published 2 days ago • 6