File size: 3,025 Bytes
cde5e17
 
 
2f7a699
4637047
db96333
2f7a699
 
db96333
 
cde5e17
 
2f7a699
babcc0d
 
 
 
 
 
 
 
 
1817909
 
 
 
 
 
 
 
 
 
 
cde5e17
 
 
2f7a699
 
91e3f1a
 
bd63f64
cde5e17
 
41013f8
67d6a3a
41013f8
ba0b33a
 
6b2f80e
67d6a3a
e0399f1
ba0b33a
66c87f4
 
 
ba0b33a
 
66c87f4
 
540d190
e0399f1
 
540d190
0845151
66c87f4
 
 
 
 
 
540d190
0845151
66c87f4
 
 
 
 
0845151
ba0b33a
66c87f4
 
 
 
 
2f7a699
0845151
 
40777e8
41013f8
2f7a699
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
license: apache-2.0
language:
  - fa
pipeline_tag: image-to-text
widget:
  - src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/papers/attention.png
    example_title: "Persian OCR"
---

# Persian-OCR

**Persian-OCR** is a deep learning model for **Optical Character Recognition (OCR)**, designed specifically for Persian text.  
The model employs a **CNN + Transformer architecture** trained with **CTC loss** to extract text from images.

The model was trained on a custom dataset of approximately **600,000 synthetic Persian text images**.  
These images were generated from **Wikipedia text** using **49 different Persian fonts**, with sequence lengths ranging from **0 to 150 characters**.  

On this dataset, the model achieves a **sequence accuracy of 96%**.  

The model may benefit from **further fine-tuning on real-world data**, and contributions or collaborations are **warmly welcomed**.

## 🤝 Contributing
Contributions are welcome! If you have a dataset of real-world Persian text or improvements to the model, please open an issue or submit a pull request.




## 📬 Contact
For collaboration or inquiries, please reach out via farbodpya@gmail.com




## Files

- `pytorch_model.bin` : PyTorch model weights  
- `vocab.json` : Character vocabulary  
- `model.py` : Python script defining the CNN + Transformer OCR model  
- `utils.py` : Utility functions for OCR, including `ocr_page` and `load_vocab`  
- `config.json` : Model configuration  

## Installation
```
pip install torch torchvision huggingface_hub
```


## Usage
```

import torch
import json
import sys
import importlib.util
from huggingface_hub import hf_hub_download

# 1️⃣ Load vocab
vocab_path = hf_hub_download("farbodpya/Persian-OCR", "vocab.json")
with open(vocab_path, "r", encoding="utf-8") as f:
    vocab = json.load(f)
idx_to_char = {int(k): v for k, v in vocab["idx_to_char"].items()}

# 2️⃣ Import model.py 
model_file = hf_hub_download("farbodpya/Persian-OCR", "model.py")
spec_model = importlib.util.spec_from_file_location("model", model_file)
model_module = importlib.util.module_from_spec(spec_model)
sys.modules["model"] = model_module
spec_model.loader.exec_module(model_module)
from model import CNN_Transformer_OCR

# 3️⃣ Import utils.py 
utils_file = hf_hub_download("farbodpya/Persian-OCR", "utils.py")
spec_utils = importlib.util.spec_from_file_location("utils", utils_file)
utils_module = importlib.util.module_from_spec(spec_utils)
sys.modules["utils"] = utils_module
spec_utils.loader.exec_module(utils_module)
from utils import ocr_page

# 4️⃣ Load model weights
weights_path = hf_hub_download("farbodpya/Persian-OCR", "pytorch_model.bin")
model = CNN_Transformer_OCR(num_classes=len(idx_to_char)+1)
model.load_state_dict(torch.load(weights_path, map_location="cpu"))
model.eval()

# 5️⃣ Run OCR on an image
img_path = "sample.png"  # replace with your own image
text = ocr_page(img_path, model, idx_to_char)
print("\n=== Final OCR Page ===\n", text)