Spaces:
Sleeping
Sleeping
| base_model: paudelanil/trocr-devanagari-2 | |
| library_name: peft | |
| license: mit | |
| language: | |
| - ne | |
| - hi | |
| pipeline_tag: image-to-text | |
| tags: | |
| - lora | |
| - peft | |
| - transformers | |
| - trocr | |
| - devanagari | |
| - ocr | |
| datasets: | |
| - c3rl/IIIT-INDIC-HW-WORDS-Hindi | |
| sdk: gradio | |
| sdk_version: 3.50.2 | |
| python_version: "3.10" | |
| app_file: app.py | |
| pinned: true | |
| # DevGen TrOCR Devanagari LoRA Adapter | |
| This repository contains the **DevGen LoRA adapter** for Devanagari OCR. It is designed to be loaded on top of `paudelanil/trocr-devanagari-2` using the PEFT (Parameter-Efficient Fine-Tuning) library. | |
| ## Model Details | |
| - **Developed by:** Manish Wagle / DevGen | |
| - **Base model:** `paudelanil/trocr-devanagari-2` | |
| - **Adapter type:** LoRA | |
| - **Task:** Image-to-text OCR for Devanagari handwritten words and short phrases. | |
| - **Library:** PEFT + Transformers | |
| ## Intended Use | |
| Use this adapter for high-precision recognition of Devanagari text from cropped handwritten word images. It is particularly optimized for the **IIIT-INDIC-HW-WORDS** distribution but generalizes well to other handwritten Devanagari styles. | |
| > [!IMPORTANT] | |
| > This model is an OCR engine for word-level recognition. It does not perform page layout analysis, table extraction, or full-page segmentation. | |
| ## Loading and Inference | |
| To use this model, you need `transformers`, `peft`, and `torch` installed. | |
| ```python | |
| from peft import PeftModel | |
| from transformers import AutoTokenizer, TrOCRProcessor, ViTImageProcessor, VisionEncoderDecoderModel | |
| import torch | |
| from PIL import Image | |
| base_model_id = "paudelanil/trocr-devanagari-2" | |
| adapter_id = "manishwagle/devgen-trocr-devanagari-lora" # Or local path | |
| # Load processors | |
| image_processor = ViTImageProcessor.from_pretrained(base_model_id) | |
| tokenizer = AutoTokenizer.from_pretrained(base_model_id) | |
| processor = TrOCRProcessor(image_processor=image_processor, tokenizer=tokenizer) | |
| # Load model | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| base_model = VisionEncoderDecoderModel.from_pretrained(base_model_id) | |
| model = PeftModel.from_pretrained(base_model, adapter_id) | |
| model.to(device) | |
| # Inference | |
| image = Image.open("sample_handwritten_word.png").convert("RGB") | |
| pixel_values = processor(image, return_tensors="pt").pixel_values.to(device) | |
| generated_ids = model.generate(pixel_values) | |
| generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] | |
| print(f"Recognized Text: {generated_text}") | |
| ``` | |
| ## Demo | |
| A hosted Gradio demo and full DevGen OCR Suite interface is available in the [Hugging Face Space](https://huggingface.co/spaces/manishw10/devgen-devanagari-ocr) and the project repository: [DevGen Project Repository](https://github.com/manishwagle/DevGen). | |
| ## Limitations | |
| - **Rotation:** Sensitivity to severe text rotation (beyond 15 degrees). | |
| - **Noise:** Performance may degrade on heavily blurred or low-contrast scans. | |
| - **Script:** Optimized for Devanagari; may not work for other Indic scripts without additional tuning. | |
| ## Framework Versions | |
| - PEFT 0.19.1 | |
| - Transformers 4.40.0+ | |
| - PyTorch 2.1.0+ | |