|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- ocr |
|
|
- handwritten-text |
|
|
- trocr |
|
|
- pytorch |
|
|
--- |
|
|
|
|
|
# Model Name: TrOCR Fine-Tuned on Custom Dataset |
|
|
|
|
|
This model is a fine-tuned version of Microsoft's `TrOCR` on a custom dataset for handwritten text extraction from scanned documents. |
|
|
|
|
|
## π§ Model Architecture |
|
|
- **Base model**: Microsoft TrOCR (base) |
|
|
- **Used with**: CRAFT for text detection |
|
|
- **Fine-tuned with**: OCR-specific dataset |
|
|
|
|
|
## π Files in this repository: |
|
|
- `pytorch_model.bin`: Model weights (2.1 GB) |
|
|
- `config.json`, `tokenizer_config.json`, etc. |
|
|
- Training and evaluation scripts (optional) |
|
|
|
|
|
## π How to Use |
|
|
|
|
|
```python |
|
|
from transformers import VisionEncoderDecoderModel, TrOCRProcessor |
|
|
from PIL import Image |
|
|
import torch |
|
|
|
|
|
# Load processor and model |
|
|
processor = TrOCRProcessor.from_pretrained("Gitesh2003/MESA_TrOCR") |
|
|
model = VisionEncoderDecoderModel.from_pretrained("Gitesh2003/MESA_TrOCR") |
|
|
|
|
|
# Load image |
|
|
image = Image.open("sample_image.jpg").convert("RGB") |
|
|
|
|
|
# OCR |
|
|
pixel_values = processor(images=image, return_tensors="pt").pixel_values |
|
|
generated_ids = model.generate(pixel_values) |
|
|
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
|
|
|
print(generated_text) |
|
|
|