Karez
/

KHLR

@@ -1,79 +1,79 @@
----
-language:
-  - ur
-license: cc-by-nc-4.0
-tags:
-  - handwritten-text-recognition
-  - urdu
-  - pucit
-  - densenet
-  - transformer
-  - transfer-learning
-  - pytorch
-  - safetensors
-datasets:
-  - PUCIT
-  - DASTNUS
-metrics:
-  - cer
-  - wer
-pipeline_tag: image-to-text
----
-# Urdu Handwritten Text Recognition: DenseNet121-Transformer (Fine-tuned on PUCIT)
-## Model Description
-A lightweight DenseNet121-Transformer architecture for Urdu handwritten line recognition,
-pre-trained on the Kurdish DASTNUS dataset and fine-tuned on the PUCIT Urdu handwritten dataset.
-Uses a triple unified vocabulary covering Kurdish, Arabic, and Urdu scripts (192 tokens).
-## Architecture
-- **CNN Backbone:** DenseNet-121 (pretrained on ImageNet)
-- **Encoder:** 3 Transformer encoder layers
-- **Decoder:** 3 Transformer decoder layers
-- **Attention Heads:** 8
-- **Hidden Size:** 256
-- **Parameters:** ~12.8M
-- **Vocabulary:** 192 tokens (Triple unified: Kurdish + Arabic + Urdu)
-## Transfer Learning Pipeline
-1. Pre-trained on Kurdish DASTNUS dataset (with unified vocabulary)
-2. Fine-tuned on PUCIT Urdu handwritten line dataset
-## Performance on PUCIT Test Set
-| Metric | Value |
-|--------|-------|
-| CER | 0.0932 |
-| WER | 0.2799 |
-| CRR | 90.68% |
-## Training Data
-- **Pre-training:** DASTNUS Kurdish handwritten dataset
-- **Fine-tuning:** PUCIT Urdu handwritten dataset (5,554 training, 935 validation, 912 testing)
-## Usage
-```python
-from safetensors.torch import load_file
-import json
-# Load model weights
-state_dict = load_file("model.safetensors")
-# Load config
-with open("config.json", "r") as f:
-    config = json.load(f)
-# Load vocabulary
-with open("vocab.json", "r", encoding="utf-8") as f:
-    vocab = json.load(f)
-# Load full unified vocabulary info
-with open("unified_vocabulary.json", "r", encoding="utf-8") as f:
-    unified_vocab = json.load(f)
-```
-## Citation
-[]
-## License
-This model is released for non-commercial scientific research purposes only.

+---
+language:
+  - ur
+license: cc-by-nc-4.0
+tags:
+  - handwritten-text-recognition
+  - urdu
+  - pucit
+  - densenet
+  - transformer
+  - transfer-learning
+  - pytorch
+  - safetensors
+datasets:
+  - PUCIT
+  - DASTNUS
+metrics:
+  - cer
+  - wer
+pipeline_tag: image-to-text
+---
+# Urdu Handwritten Text Recognition: DenseNet121-Transformer (Fine-tuned on PUCIT)
+## Model Description
+A lightweight DenseNet121-Transformer architecture for Urdu handwritten line recognition,
+pre-trained on the Kurdish DASTNUS dataset and fine-tuned on the PUCIT Urdu handwritten dataset.
+Uses a triple unified vocabulary covering Kurdish, Arabic, and Urdu scripts (192 tokens). The PUCIT-OHUL is publicly available at: http://faculty.pucit.edu.pk/nazarkhan/work/urdu_ohtr/pucit_ohul_dataset.html
+## Architecture
+- **CNN Backbone:** DenseNet-121 (pretrained on ImageNet)
+- **Encoder:** 3 Transformer encoder layers
+- **Decoder:** 3 Transformer decoder layers
+- **Attention Heads:** 8
+- **Hidden Size:** 256
+- **Parameters:** ~12.8M
+- **Vocabulary:** 192 tokens (Triple unified: Kurdish + Arabic + Urdu)
+## Transfer Learning Pipeline
+1. Pre-trained on Kurdish DASTNUS dataset (with unified vocabulary)
+2. Fine-tuned on PUCIT Urdu handwritten line dataset
+## Performance on PUCIT Test Set
+| Metric | Value |
+|--------|-------|
+| CER | 0.0932 |
+| WER | 0.2799 |
+| CRR | 90.68% |
+## Training Data
+- **Pre-training:** DASTNUS Kurdish handwritten dataset
+- **Fine-tuning:** PUCIT Urdu handwritten dataset (5,554 training, 935 validation, 912 testing)
+## Usage
+```python
+from safetensors.torch import load_file
+import json
+# Load model weights
+state_dict = load_file("model.safetensors")
+# Load config
+with open("config.json", "r") as f:
+    config = json.load(f)
+# Load vocabulary
+with open("vocab.json", "r", encoding="utf-8") as f:
+    vocab = json.load(f)
+# Load full unified vocabulary info
+with open("unified_vocabulary.json", "r", encoding="utf-8") as f:
+    unified_vocab = json.load(f)
+```
+## Citation
+[]
+## License
+This model is released for non-commercial scientific research purposes only.