Image-Text-to-Text
Transformers
LiteRT
Japanese
OCR
Manga
10m-parameters

Manga OCR Mobile (Preview)

This model is a lightweight OCR model build for speed and optimized for mobile/edge devices.

It achives high-accuracy text recognition while maintaining a footprint much smaller than standard models.

Check out the technical docs for more details. Source code will soon be available at the GitHub repo

Training Details

  • Pretrained on ~1 million synthetic images generated with cleaned/filtered text:
    • 60% anime (the corpus is not public)
    • 20% webnovel
    • 20% CC100
  • Fine-tuned on Manga109s dataset (random 90% split)
  • Trained in PyTorch and converted to TFLite with AI Edge Torch
  • Achieves ~7.4% CER (character error rate) and ~73% exact-match accuracy on a random 10% split of Manga109s
    • Comparable to PaddleOCR-VL-For-Manga, which has a ~10% CER and ~70% exact-match accuracy
    • The model seems to struggle with English letters and punctuation

Acknowledgments

This project was done with the usage of:

The model builds upon kha-white/manga-ocr, with a significant divergence in deployment focus and data generation.

@inproceedings{wang2024repvit,
  title={Repvit: Revisiting mobile cnn from vit perspective},
  author={Wang, Ao and Chen, Hui and Lin, Zijia and Han, Jungong and Ding, Guiguang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15909--15920},
  year={2024}
}

@misc{wang2023repvitsam,
      title={RepViT-SAM: Towards Real-Time Segmenting Anything}, 
      author={Ao Wang and Hui Chen and Zijia Lin and Jungong Han and Guiguang Ding},
      year={2023},
      eprint={2312.05760},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train bluolightning/manga-ocr-mobile

Paper for bluolightning/manga-ocr-mobile