tachiwin
/

PaddleOCR-VL-Tachiwin

@@ -10,14 +10,113 @@ tags:
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** tachiwin
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/PaddleOCR-VL
-This paddleocr_vl model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 license: apache-2.0
 language:
 - en
+datasets:
+- tachiwin/multilingual_ocr_llm_2
 ---
+# TachiwinOCR
+**for the Indigenous Languages of Mexico**
+_32 bits full precision_
+This is a PaddleOCR-VL Finetune specialized in the 68 indigenous languages of Mexico and their diverse character and glyph repertoire making a world first in tech access and linguistic rights
+## Inference
+You can perform inference using the `PaddleOCR` pipeline or the `transformers` library.
+#### Option A: Using PaddleOCR (Easy Pipeline)
+```python
+from paddleocr import PaddleOCRVL
+# Load the fine-tuned model
+pipeline = PaddleOCRVL(
+    vl_rec_model_name="PaddleOCR-VL-0.9B",
+    vl_rec_model_dir=path_to_tachiwin_downloaded_model,
+)
+# Predict on an image
+output = pipeline.predict("test.png")
+for res in output:
+    res.print()
+    res.save_to_json(save_path="output")
+    res.save_to_markdown(save_path="output")
+```
+#### Option B: Using Transformers (Advanced Control)
+```python
+from PIL import Image
+import torch
+from transformers import AutoModelForCausalLM, AutoProcessor
+# ---- Settings ----
+model_path = "tachiwin/PaddleOCR-VL-Tachiwin"
+image_path = "test.png"
+# ------------------
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+image = Image.open(image_path).convert("RGB")
+model = AutoModelForCausalLM.from_pretrained(
+    model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
+).to(DEVICE).eval()
+processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
+messages = [
+    {"role": "user", "content": [
+        {"type": "image", "image": image},
+        {"type": "text", "text": "OCR:"},
+    ]}
+]
+inputs = processor.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    return_dict=True,
+    return_tensors="pt"
+).to(DEVICE)
+outputs = model.generate(**inputs, max_new_tokens=1024, min_new_tokens=1)
+generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0]
+print(generated_text)
+```
+---
+## 📊 Benchmark Results
+Tachiwin-OCR was evaluated against the base PaddleOCR-VL model using a diverse subset of Indigenous language samples. The fine-tuning results demonstrate significant improvements in both character and word recognition accuracy.
+### Summary Metrics
+| Metric | Base Model (Raw) | Tachiwin-OCR (Fine-tuned) | Improvement |
+| :--- | :---: | :---: | :---: |
+| **Character Error Rate (CER)** | 7.59% | 6.80% | **10.4% (Relative Reduction)** |
+| **Word Error Rate (WER)** | 25.17% | 17.36% | **+7.81% (Absolute)** |
+| **OCR Accuracy (1 - CER)** | 92.41% | 93.20% | **+0.79% (Absolute)** |
+### Detailed Comparison (Sample)
+A subset of the evaluation results across different languages, where tonal languages are the most improved by this fine-tuning:
+| Language | Raw CER | FT CER | Raw WER | FT WER | Improvement |
+| :--- | :---: | :---: | :---: | :---: | :---: |
+| `stp` (Tepehuán) | 10.95% | 0.00% | 43.55% | 0.00% | +10.95% |
+| `maz` (Central Mazahua) | 3.29% | 0.41% | 9.09% | 0.00% | +2.88% |
+| `chj` (Ojitlán Chinantec) | 16.97% | 2.21% | 52.78% | 9.72% | +14.76% |
+| `maa` (Tecóatl Mazatec) | 86.70% | 8.49% | 105.08% | 10.17% | +78.21% |
+### Key Findings
+- **High Accuracy Gains:** In many tonal languages like Tepehuán (`stp`) and Mazatec (`maa`), the fine-tuning process reduced the error rate from significant levels to nearly zero or double digits.
+- **Robustness:** The model shows high resilience against synthetic distortions implemented during the data generation phase.
+- **Word-Level Performance:** The relative reduction in Word Error Rate (WER) highlights the model's improved capability in contextualizing character sequences specific to these language families.
+**Tachiwin** (from Totonac - "Language") is dedicated to bridging
+the digital divide for indigenous languages of Mexico through AI technology.
+- **Developed by:** Tachiwin
+- **License:** apache-2.0
+- **Finetuned from model :** PaddlePaddle/PaddleOCR-VL