QuickHawk commited on
Commit
0e39716
·
1 Parent(s): 854b485

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - cer
7
+ - wer
8
+ base_model:
9
+ - facebook/deit-base-patch16-224
10
+ - ai4bharat/IndicBART
11
+ tags:
12
+ - scene-text-recognition
13
+ - text-recognition
14
+ - computer-vision
15
+ - language-model
16
+ ---
17
+ # trocr-indic
18
+
19
+ This model utilizes the trocr approach to predict the **Indic Texts** from **cropped_images**.
20
+ ## Model Details
21
+
22
+ The model follows the TrOCR approach of training OCR for Scene Texts. Since, there is scarcity for generalized model for majority of Indian Languages, this model serves it replacement.
23
+
24
+ ![TrOCR_Architecture.jpg](https://cdn-uploads.huggingface.co/production/uploads/6868f8219c4cd7445653ada1/d6d9a0UVlL8EleZrC_ts9.jpeg)
25
+ *Courtesty: TrOCR - [original paper](https://huggingface.co/papers/2109.10282)*
26
+
27
+ The model is trained for the following languages:
28
+
29
+ - Assamese
30
+ - Bengali
31
+ - Gujarati
32
+ - Hindi
33
+ - Kannada
34
+ - Malayalam
35
+ - Marathi
36
+ - Odia
37
+ - Punjabi
38
+ - Telugu
39
+ - Tamil
40
+
41
+ ### Model Description
42
+
43
+ **IMPORTANT**
44
+ Although the model is trained on these languages due to limitations of IndicBART, the model is trained with only Devnagiri Scripts.
45
+
46
+ The output is in the following format:
47
+ ```
48
+ <LANGUAGE TOKEN> <TEXT TOKENS> <EOS TOKEN>
49
+ ```
50
+
51
+ The following flowchart gives a better picture on the approach of training and inference regarding this model.
52
+
53
+ ![Reworked_Implementation](https://cdn-uploads.huggingface.co/production/uploads/6868f8219c4cd7445653ada1/1KiAan55GWl9tZNOTuMs0.png)
54
+
55
+
56
+ - **Datasets used:** [IndicSTR12](https://cvit.iiit.ac.in/research/projects/cvit-projects/indicstr)
57
+ - **Developed by:** Aarya Devarla
58
+ - **Model type:** Visio-Lingual Model / Vision-Language Model
59
+ - **License:** mit
60
+ - **Finetuned from model:** deit, indicBART
61
+
62
+ ### Results
63
+
64
+ | Metric | Assamese | Bengali | Gujarati | Hindi | Kannada | Malayalam | Marathi | Odia | Punjabi | Tamil | Telugu |
65
+ |--------|----------|---------|----------|-------|---------|-----------|---------|------|---------|-------|--------|
66
+ | CER | 0.069 | 0.133 | 0.058 | 0.075 | 0.212 | 0.154 | 0.082 | 0.120 | 0.097 | 0.122 | 0.220 |
67
+ | WER | 0.205 | 0.395 | 0.192 | 0.283 | 0.576 | 0.519 | 0.312 | 0.375 | 0.304 | 0.409 | 0.612 |
68
+
69
+ Well, the model isn't perfect. But it's a start.
70
+
71
+ ## Limitations
72
+
73
+ The main limitation comes from IndicBART which is primarily trained on IndicTexts.
74
+
75
+ ### Recommendations
76
+
77
+ Since the TrOCR is modular in approach one can just swap out the IndicBART model and train it with new model. Must keep in mind about the preprocessing and outputs.