|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- fa |
|
|
pipeline_tag: image-to-text |
|
|
widget: |
|
|
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/papers/attention.png |
|
|
example_title: "Persian OCR" |
|
|
--- |
|
|
|
|
|
# Persian-OCR |
|
|
|
|
|
**Persian-OCR** is a deep learning model for **Optical Character Recognition (OCR)**, designed specifically for Persian text. |
|
|
The model employs a **CNN + Transformer architecture** trained with **CTC loss** to extract text from images. |
|
|
|
|
|
The model was trained on a custom dataset of approximately **600,000 synthetic Persian text images**. |
|
|
These images were generated from **Wikipedia text** using **49 different Persian fonts**, with sequence lengths ranging from **0 to 150 characters**. |
|
|
|
|
|
On this dataset, the model achieves a **sequence accuracy of 96%**. |
|
|
|
|
|
The model may benefit from **further fine-tuning on real-world data**, and contributions or collaborations are **warmly welcomed**. |
|
|
|
|
|
## 🤝 Contributing |
|
|
Contributions are welcome! If you have a dataset of real-world Persian text or improvements to the model, please open an issue or submit a pull request. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 📬 Contact |
|
|
For collaboration or inquiries, please reach out via farbodpya@gmail.com |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Files |
|
|
|
|
|
- `pytorch_model.bin` : PyTorch model weights |
|
|
- `vocab.json` : Character vocabulary |
|
|
- `model.py` : Python script defining the CNN + Transformer OCR model |
|
|
- `utils.py` : Utility functions for OCR, including `ocr_page` and `load_vocab` |
|
|
- `config.json` : Model configuration |
|
|
|
|
|
## Installation |
|
|
``` |
|
|
pip install torch torchvision huggingface_hub |
|
|
``` |
|
|
|
|
|
|
|
|
## Usage |
|
|
``` |
|
|
|
|
|
import torch |
|
|
import json |
|
|
import sys |
|
|
import importlib.util |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# 1️⃣ Load vocab |
|
|
vocab_path = hf_hub_download("farbodpya/Persian-OCR", "vocab.json") |
|
|
with open(vocab_path, "r", encoding="utf-8") as f: |
|
|
vocab = json.load(f) |
|
|
idx_to_char = {int(k): v for k, v in vocab["idx_to_char"].items()} |
|
|
|
|
|
# 2️⃣ Import model.py |
|
|
model_file = hf_hub_download("farbodpya/Persian-OCR", "model.py") |
|
|
spec_model = importlib.util.spec_from_file_location("model", model_file) |
|
|
model_module = importlib.util.module_from_spec(spec_model) |
|
|
sys.modules["model"] = model_module |
|
|
spec_model.loader.exec_module(model_module) |
|
|
from model import CNN_Transformer_OCR |
|
|
|
|
|
# 3️⃣ Import utils.py |
|
|
utils_file = hf_hub_download("farbodpya/Persian-OCR", "utils.py") |
|
|
spec_utils = importlib.util.spec_from_file_location("utils", utils_file) |
|
|
utils_module = importlib.util.module_from_spec(spec_utils) |
|
|
sys.modules["utils"] = utils_module |
|
|
spec_utils.loader.exec_module(utils_module) |
|
|
from utils import ocr_page |
|
|
|
|
|
# 4️⃣ Load model weights |
|
|
weights_path = hf_hub_download("farbodpya/Persian-OCR", "pytorch_model.bin") |
|
|
model = CNN_Transformer_OCR(num_classes=len(idx_to_char)+1) |
|
|
model.load_state_dict(torch.load(weights_path, map_location="cpu")) |
|
|
model.eval() |
|
|
|
|
|
# 5️⃣ Run OCR on an image |
|
|
img_path = "sample.png" # replace with your own image |
|
|
text = ocr_page(img_path, model, idx_to_char) |
|
|
print("\n=== Final OCR Page ===\n", text) |
|
|
|
|
|
|
|
|
|
|
|
|