Manga OCR Mobile (Preview)

This model is a lightweight OCR model build for speed and optimized for mobile/edge devices.

It achives high-accuracy text recognition while maintaining a footprint much smaller than standard models.

Check out the technical docs for more details. Source code will soon be available at the GitHub repo

Training Details

Pretrained on ~1 million synthetic images generated with cleaned/filtered text:
- 60% anime (the corpus is not public)
- 20% webnovel
- 20% CC100
Fine-tuned on Manga109s dataset (random 90% split)
Trained in PyTorch and converted to TFLite with AI Edge Torch
Achieves ~7.4% CER (character error rate) and ~73% exact-match accuracy on a random 10% split of Manga109s
- Comparable to PaddleOCR-VL-For-Manga, which has a ~10% CER and ~70% exact-match accuracy
- The model seems to struggle with English letters and punctuation

Acknowledgments

This project was done with the usage of:

Manga109-s dataset
CC-100 dataset - used for synthetic data
webnovels dataset - used for synthetic data

The model builds upon kha-white/manga-ocr, with a significant divergence in deployment focus and data generation.

@inproceedings{wang2024repvit,
  title={Repvit: Revisiting mobile cnn from vit perspective},
  author={Wang, Ao and Chen, Hui and Lin, Zijia and Han, Jungong and Ding, Guiguang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15909--15920},
  year={2024}
}

@misc{wang2023repvitsam,
      title={RepViT-SAM: Towards Real-Time Segmenting Anything}, 
      author={Ao Wang and Hui Chen and Zijia Lin and Jungong Han and Guiguang Ding},
      year={2023},
      eprint={2312.05760},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Downloads last month: 81

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train bluolightning/manga-ocr-mobile

Paper for bluolightning/manga-ocr-mobile

RepViT-SAM: Towards Real-Time Segmenting Anything

Paper • 2312.05760 • Published Dec 10, 2023