VietOCR Fine-tuned for Vietnamese Handwriting Recognition
📋 Model Description
This is a fine-tuned VietOCR model specifically optimized for Vietnamese handwritten text recognition. The model is based on VGG Transformer architecture and has been trained on the UIT_HWDB_line dataset, which contains real Vietnamese handwritten text samples.
Best for: Recognizing handwritten Vietnamese text from images (line-level OCR)
🎯 Performance
Evaluated on UIT_HWDB validation set (702 samples):
| Metric | Score | Description |
|---|---|---|
| Character Error Rate (CER) | 4.01% | Average character-level error (lower is better) |
| Word Error Rate (WER) | 11.02% | Average word-level error (lower is better) |
| Exact Match Accuracy | 34.76% | Percentage of perfectly predicted lines |
📊 Interpretation:
- CER 4.01% means the model correctly predicts ~96% of all characters
- WER 11.02% means ~89% of words are correctly recognized
- The low CER despite moderate accuracy indicates the model makes small, recoverable errors rather than catastrophic failures
Example Predictions:
Ground Truth: "Nước ta giáp với biển Đông ở hai phía Đông và Nam"
Prediction: "Nước ta giáp với biển Đông ở hai phía Đông và Nam"
✅ Perfect match
Ground Truth: "thẳng Phú Hữu"
Prediction: "thẳng phú trước"
⚠️ Minor errors (case + word confusion)
🚀 Usage
Installation
pip install vietocr
pip install pillow
Quick Start
from vietocr.tool.predictor import Predictor
from vietocr.tool.config import Cfg
from PIL import Image
# Load configuration
config = Cfg.load_config_from_name('vgg_transformer')
# Download model from Hugging Face
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="DungHugging/vietocr-handwritten-finetune",
filename="transformerocr.pth"
)
config['weights'] = model_path
config['device'] = 'cuda' # or 'cpu'
# Create predictor
predictor = Predictor(config)
# Predict
image = Image.open('path/to/handwritten_image.jpg')
text = predictor.predict(image)
print(text)
Batch Processing
# For multiple images
images = [Image.open(f'image_{i}.jpg') for i in range(10)]
texts = [predictor.predict(img) for img in images]
📦 Model Details
Architecture
- Base Model: VietOCR VGG Transformer
- Backbone: VGG19-BN (pretrained on ImageNet)
- Encoder: Transformer encoder (6 layers)
- Decoder: Transformer decoder with attention
- Total Parameters: ~37.65M
- Trainable Parameters: ~37.65M
Training Configuration
Model: vgg_transformer
Pretrained: Yes (VietOCR pretrained weights)
Optimizer: Adam
Learning Rate: 0.0001
Batch Size: 8
Epochs: 50
Device: NVIDIA GPU (Kaggle T4/P100)
Training Time: ~2-3 hours
Training Data
- Dataset: UIT_HWDB_line
- Training Samples: 6,326 lines
- Validation Samples: 702 lines
- Test Samples: 201 lines
- Language: Vietnamese
- Text Type: Real handwritten text (various writing styles)
Data Split:
- Train: 90%
- Validation: 10%
- Test: Public test set
📈 Training & Evaluation
Training Process
- Data Preparation: Convert UIT_HWDB format to VietOCR annotation format
- Fine-tuning: Start from VietOCR pretrained weights
- Optimization: Adam optimizer with learning rate 1e-4
- Regularization: Weight decay 1e-5
- Validation: Evaluated every 500 iterations
Metrics Calculation
- CER (Character Error Rate): Levenshtein distance at character level
- WER (Word Error Rate): Levenshtein distance at word level
- Accuracy: Percentage of exact string matches
💻 System Requirements
Minimum:
- Python 3.7+
- 4GB RAM
- CPU inference: ~1-2 seconds per image
Recommended:
- Python 3.8+
- 8GB+ RAM
- NVIDIA GPU with 4GB+ VRAM
- GPU inference: ~0.1-0.2 seconds per image
🔧 Fine-tuning for Your Data
If you want to fine-tune this model on your own dataset:
from vietocr.tool.config import Cfg
from vietocr.model.trainer import Trainer
# Load this model as base
config = Cfg.load_config_from_name('vgg_transformer')
config['weights'] = 'path/to/transformerocr.pth'
# Update with your data
config['trainer']['data_root'] = './your_data'
config['trainer']['train_annotation'] = 'train.txt'
config['trainer']['valid_annotation'] = 'val.txt'
# Train
trainer = Trainer(config, pretrained=True)
trainer.train()
📝 Citation
If you use this model in your research, please cite:
@misc{vietocr-handwritten-finetune,
author = {DungHugging},
title = {VietOCR Fine-tuned for Vietnamese Handwriting Recognition},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/DungHugging/vietocr-handwritten-finetune}}
}
Original VietOCR:
@misc{vietocr,
author = {Pham, Bao Cong},
title = {VietOCR: Optical Character Recognition for Vietnamese},
year = {2020},
publisher = {GitHub},
howpublished = {\url{https://github.com/pbcquoc/vietocr}}
}
UIT_HWDB Dataset:
@inproceedings{nguyen2018recognition,
title={Recognition of Handwritten Vietnamese Text Using Convolutional Neural Network},
author={Nguyen, Hung Tuan and Nguyen, Cong Thanh and others},
booktitle={International Conference on Future Data and Security Engineering},
year={2018}
}
📄 License
This model is released under the MIT License.
- ✅ Free for commercial use
- ✅ Free for academic research
- ✅ Free to modify and redistribute
🤝 Acknowledgments
- VietOCR Team for the excellent base model and framework
- UIT for the UIT_HWDB dataset
- Kaggle for providing free GPU resources for training
📬 Contact & Issues
- Hugging Face: @DungHugging
- Model Repository: vietocr-handwritten-finetune
For issues, questions, or improvements, please open a discussion on the Hugging Face model page.
🔄 Updates
v1.0 (Current)
- Initial release
- Fine-tuned on UIT_HWDB_line dataset
- CER: 4.01%, WER: 11.02%
- 50 epochs training
Made with ❤️ for Vietnamese OCR Community
Evaluation results
- Accuracy on UIT_HWDB_lineself-reported34.760
- Character Error Rate on UIT_HWDB_lineself-reported4.010
- Word Error Rate on UIT_HWDB_lineself-reported11.020