garo-ocr / README.md
Badnyal's picture
Update README.md
0c10117 verified
---
language:
- grt
license: cc-by-4.0
tags:
- ocr
- florence-2
- garo
- northeast-india
- image-to-text
base_model: microsoft/Florence-2-base-ft
metrics:
- character_accuracy
model-index:
- name: MWirelabs/garo-ocr
results:
- task:
type: image-to-text
name: OCR
metrics:
- type: character_accuracy
value: 93.13
name: Character Accuracy (1000 samples)
---
# GaroOCR
![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)
![Character Accuracy](https://img.shields.io/badge/Char%20Accuracy-93.13%25-brightgreen)
OCR model for the Garo (grt_Latn) language, fine-tuned from `microsoft/Florence-2-base-ft` on Garo text images.
Developed by **MWire Labs**, Shillong, Meghalaya; part of an ongoing effort to build foundational AI for Northeast Indian languages.
---
## Model Details
| | |
|---|---|
| Base model | `microsoft/Florence-2-base-ft` |
| Parameters | 231M |
| Language | Garo (Achik) |
| Task | OCR (image → text) |
| Training samples | 80,000 |
| Epochs | 5 |
| Character Accuracy | 93.13% |
---
## Training Setup
- **Hardware:** NVIDIA A40 (48GB)
- **Precision:** bfloat16
- **Batch size:** 4 (effective 16 with gradient accumulation)
- **Learning rate:** 3e-4 with cosine scheduler
- **Max label length:** 128 tokens
- **Task prompt:** `<OCR>` (Florence-2 uppercase token)
---
## Usage
```python
from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image
import torch
processor = AutoProcessor.from_pretrained("MWirelabs/garo-ocr", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"MWirelabs/garo-ocr",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).cuda()
image = Image.open("your_image.png").convert("RGB")
inputs = processor(text="<OCR>", images=image, return_tensors="pt")
inputs = {k: v.cuda() for k, v in inputs.items()}
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)
with torch.no_grad():
generated = model.generate(
pixel_values=inputs["pixel_values"],
input_ids=inputs["input_ids"],
max_new_tokens=128,
)
text = processor.tokenizer.decode(generated[0], skip_special_tokens=True)
print(text)
```
> **Note:** Use `transformers==4.38.2` for compatibility.
---
## Limitations
- Max reliable output length is ~128 tokens
- Part of MWire Labs' mono-language series; a multilingual NE-OCR model covering more Northeast Indian languages is in development
---