File size: 3,025 Bytes
cde5e17 2f7a699 4637047 db96333 2f7a699 db96333 cde5e17 2f7a699 babcc0d 1817909 cde5e17 2f7a699 91e3f1a bd63f64 cde5e17 41013f8 67d6a3a 41013f8 ba0b33a 6b2f80e 67d6a3a e0399f1 ba0b33a 66c87f4 ba0b33a 66c87f4 540d190 e0399f1 540d190 0845151 66c87f4 540d190 0845151 66c87f4 0845151 ba0b33a 66c87f4 2f7a699 0845151 40777e8 41013f8 2f7a699 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
license: apache-2.0
language:
- fa
pipeline_tag: image-to-text
widget:
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/papers/attention.png
example_title: "Persian OCR"
---
# Persian-OCR
**Persian-OCR** is a deep learning model for **Optical Character Recognition (OCR)**, designed specifically for Persian text.
The model employs a **CNN + Transformer architecture** trained with **CTC loss** to extract text from images.
The model was trained on a custom dataset of approximately **600,000 synthetic Persian text images**.
These images were generated from **Wikipedia text** using **49 different Persian fonts**, with sequence lengths ranging from **0 to 150 characters**.
On this dataset, the model achieves a **sequence accuracy of 96%**.
The model may benefit from **further fine-tuning on real-world data**, and contributions or collaborations are **warmly welcomed**.
## 🤝 Contributing
Contributions are welcome! If you have a dataset of real-world Persian text or improvements to the model, please open an issue or submit a pull request.
## 📬 Contact
For collaboration or inquiries, please reach out via farbodpya@gmail.com
## Files
- `pytorch_model.bin` : PyTorch model weights
- `vocab.json` : Character vocabulary
- `model.py` : Python script defining the CNN + Transformer OCR model
- `utils.py` : Utility functions for OCR, including `ocr_page` and `load_vocab`
- `config.json` : Model configuration
## Installation
```
pip install torch torchvision huggingface_hub
```
## Usage
```
import torch
import json
import sys
import importlib.util
from huggingface_hub import hf_hub_download
# 1️⃣ Load vocab
vocab_path = hf_hub_download("farbodpya/Persian-OCR", "vocab.json")
with open(vocab_path, "r", encoding="utf-8") as f:
vocab = json.load(f)
idx_to_char = {int(k): v for k, v in vocab["idx_to_char"].items()}
# 2️⃣ Import model.py
model_file = hf_hub_download("farbodpya/Persian-OCR", "model.py")
spec_model = importlib.util.spec_from_file_location("model", model_file)
model_module = importlib.util.module_from_spec(spec_model)
sys.modules["model"] = model_module
spec_model.loader.exec_module(model_module)
from model import CNN_Transformer_OCR
# 3️⃣ Import utils.py
utils_file = hf_hub_download("farbodpya/Persian-OCR", "utils.py")
spec_utils = importlib.util.spec_from_file_location("utils", utils_file)
utils_module = importlib.util.module_from_spec(spec_utils)
sys.modules["utils"] = utils_module
spec_utils.loader.exec_module(utils_module)
from utils import ocr_page
# 4️⃣ Load model weights
weights_path = hf_hub_download("farbodpya/Persian-OCR", "pytorch_model.bin")
model = CNN_Transformer_OCR(num_classes=len(idx_to_char)+1)
model.load_state_dict(torch.load(weights_path, map_location="cpu"))
model.eval()
# 5️⃣ Run OCR on an image
img_path = "sample.png" # replace with your own image
text = ocr_page(img_path, model, idx_to_char)
print("\n=== Final OCR Page ===\n", text)
|