ljcamargo commited on
Commit
567dda7
verified
1 Parent(s): 983f843

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -6
README.md CHANGED
@@ -10,14 +10,113 @@ tags:
10
  license: apache-2.0
11
  language:
12
  - en
 
 
13
  ---
14
 
15
- # Uploaded model
 
16
 
17
- - **Developed by:** tachiwin
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/PaddleOCR-VL
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- This paddleocr_vl model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
10
  license: apache-2.0
11
  language:
12
  - en
13
+ datasets:
14
+ - tachiwin/multilingual_ocr_llm_2
15
  ---
16
 
17
+ # TachiwinOCR
18
+ **for the Indigenous Languages of Mexico**
19
 
20
+ _32 bits full precision_
21
+
22
+ This is a PaddleOCR-VL Finetune specialized in the 68 indigenous languages of Mexico and their diverse character and glyph repertoire making a world first in tech access and linguistic rights
23
+
24
+ ## Inference
25
+ You can perform inference using the `PaddleOCR` pipeline or the `transformers` library.
26
+
27
+ #### Option A: Using PaddleOCR (Easy Pipeline)
28
+ ```python
29
+ from paddleocr import PaddleOCRVL
30
+
31
+ # Load the fine-tuned model
32
+ pipeline = PaddleOCRVL(
33
+ vl_rec_model_name="PaddleOCR-VL-0.9B",
34
+ vl_rec_model_dir=path_to_tachiwin_downloaded_model,
35
+ )
36
+
37
+ # Predict on an image
38
+ output = pipeline.predict("test.png")
39
+
40
+ for res in output:
41
+ res.print()
42
+ res.save_to_json(save_path="output")
43
+ res.save_to_markdown(save_path="output")
44
+ ```
45
+
46
+ #### Option B: Using Transformers (Advanced Control)
47
+ ```python
48
+ from PIL import Image
49
+ import torch
50
+ from transformers import AutoModelForCausalLM, AutoProcessor
51
+
52
+ # ---- Settings ----
53
+ model_path = "tachiwin/PaddleOCR-VL-Tachiwin"
54
+ image_path = "test.png"
55
+ # ------------------
56
+
57
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
58
+
59
+ image = Image.open(image_path).convert("RGB")
60
+
61
+ model = AutoModelForCausalLM.from_pretrained(
62
+ model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
63
+ ).to(DEVICE).eval()
64
+ processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
65
+
66
+ messages = [
67
+ {"role": "user", "content": [
68
+ {"type": "image", "image": image},
69
+ {"type": "text", "text": "OCR:"},
70
+ ]}
71
+ ]
72
+
73
+ inputs = processor.apply_chat_template(
74
+ messages,
75
+ tokenize=True,
76
+ add_generation_prompt=True,
77
+ return_dict=True,
78
+ return_tensors="pt"
79
+ ).to(DEVICE)
80
 
81
+ outputs = model.generate(**inputs, max_new_tokens=1024, min_new_tokens=1)
82
+ generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0]
83
+ print(generated_text)
84
+ ```
85
+
86
+ ---
87
+
88
+ ## 馃搳 Benchmark Results
89
+
90
+ Tachiwin-OCR was evaluated against the base PaddleOCR-VL model using a diverse subset of Indigenous language samples. The fine-tuning results demonstrate significant improvements in both character and word recognition accuracy.
91
+
92
+ ### Summary Metrics
93
+
94
+ | Metric | Base Model (Raw) | Tachiwin-OCR (Fine-tuned) | Improvement |
95
+ | :--- | :---: | :---: | :---: |
96
+ | **Character Error Rate (CER)** | 7.59% | 6.80% | **10.4% (Relative Reduction)** |
97
+ | **Word Error Rate (WER)** | 25.17% | 17.36% | **+7.81% (Absolute)** |
98
+ | **OCR Accuracy (1 - CER)** | 92.41% | 93.20% | **+0.79% (Absolute)** |
99
+
100
+ ### Detailed Comparison (Sample)
101
+
102
+ A subset of the evaluation results across different languages, where tonal languages are the most improved by this fine-tuning:
103
+
104
+ | Language | Raw CER | FT CER | Raw WER | FT WER | Improvement |
105
+ | :--- | :---: | :---: | :---: | :---: | :---: |
106
+ | `stp` (Tepehu谩n) | 10.95% | 0.00% | 43.55% | 0.00% | +10.95% |
107
+ | `maz` (Central Mazahua) | 3.29% | 0.41% | 9.09% | 0.00% | +2.88% |
108
+ | `chj` (Ojitl谩n Chinantec) | 16.97% | 2.21% | 52.78% | 9.72% | +14.76% |
109
+ | `maa` (Tec贸atl Mazatec) | 86.70% | 8.49% | 105.08% | 10.17% | +78.21% |
110
+
111
+ ### Key Findings
112
+ - **High Accuracy Gains:** In many tonal languages like Tepehu谩n (`stp`) and Mazatec (`maa`), the fine-tuning process reduced the error rate from significant levels to nearly zero or double digits.
113
+ - **Robustness:** The model shows high resilience against synthetic distortions implemented during the data generation phase.
114
+ - **Word-Level Performance:** The relative reduction in Word Error Rate (WER) highlights the model's improved capability in contextualizing character sequences specific to these language families.
115
+
116
+ **Tachiwin** (from Totonac - "Language") is dedicated to bridging
117
+ the digital divide for indigenous languages of Mexico through AI technology.
118
+
119
+ - **Developed by:** Tachiwin
120
+ - **License:** apache-2.0
121
+ - **Finetuned from model :** PaddlePaddle/PaddleOCR-VL
122