garo-ocr / README.md

Badnyal

Update README.md

0c10117 verified 8 days ago

preview code

raw

history blame contribute delete

2.53 kB

metadata

language:
  - grt
license: cc-by-4.0
tags:
  - ocr
  - florence-2
  - garo
  - northeast-india
  - image-to-text
base_model: microsoft/Florence-2-base-ft
metrics:
  - character_accuracy
model-index:
  - name: MWirelabs/garo-ocr
    results:
      - task:
          type: image-to-text
          name: OCR
        metrics:
          - type: character_accuracy
            value: 93.13
            name: Character Accuracy (1000 samples)

GaroOCR

OCR model for the Garo (grt_Latn) language, fine-tuned from microsoft/Florence-2-base-ft on Garo text images.

Developed by MWire Labs, Shillong, Meghalaya; part of an ongoing effort to build foundational AI for Northeast Indian languages.

Model Details


Base model	`microsoft/Florence-2-base-ft`
Parameters	231M
Language	Garo (Achik)
Task	OCR (image → text)
Training samples	80,000
Epochs	5
Character Accuracy	93.13%

Training Setup

Hardware: NVIDIA A40 (48GB)
Precision: bfloat16
Batch size: 4 (effective 16 with gradient accumulation)
Learning rate: 3e-4 with cosine scheduler
Max label length: 128 tokens
Task prompt: <OCR> (Florence-2 uppercase token)

Usage

from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image
import torch

processor = AutoProcessor.from_pretrained("MWirelabs/garo-ocr", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "MWirelabs/garo-ocr",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).cuda()

image = Image.open("your_image.png").convert("RGB")
inputs = processor(text="<OCR>", images=image, return_tensors="pt")
inputs = {k: v.cuda() for k, v in inputs.items()}
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)

with torch.no_grad():
    generated = model.generate(
        pixel_values=inputs["pixel_values"],
        input_ids=inputs["input_ids"],
        max_new_tokens=128,
    )

text = processor.tokenizer.decode(generated[0], skip_special_tokens=True)
print(text)

Note: Use transformers==4.38.2 for compatibility.

Limitations

Max reliable output length is ~128 tokens
Part of MWire Labs' mono-language series; a multilingual NE-OCR model covering more Northeast Indian languages is in development