VietOCR Fine-tuned for Vietnamese Handwriting Recognition

📋 Model Description

This is a fine-tuned VietOCR model specifically optimized for Vietnamese handwritten text recognition. The model is based on VGG Transformer architecture and has been trained on the UIT_HWDB_line dataset, which contains real Vietnamese handwritten text samples.

Best for: Recognizing handwritten Vietnamese text from images (line-level OCR)

🎯 Performance

Evaluated on UIT_HWDB validation set (702 samples):

Metric	Score	Description
Character Error Rate (CER)	4.01%	Average character-level error (lower is better)
Word Error Rate (WER)	11.02%	Average word-level error (lower is better)
Exact Match Accuracy	34.76%	Percentage of perfectly predicted lines

📊 Interpretation:

CER 4.01% means the model correctly predicts ~96% of all characters
WER 11.02% means ~89% of words are correctly recognized
The low CER despite moderate accuracy indicates the model makes small, recoverable errors rather than catastrophic failures

Example Predictions:

Ground Truth: "Nước ta giáp với biển Đông ở hai phía Đông và Nam"
Prediction:   "Nước ta giáp với biển Đông ở hai phía Đông và Nam"
✅ Perfect match

Ground Truth: "thẳng Phú Hữu"
Prediction:   "thẳng phú trước"
⚠️ Minor errors (case + word confusion)

🚀 Usage

Installation

pip install vietocr
pip install pillow

Quick Start

from vietocr.tool.predictor import Predictor
from vietocr.tool.config import Cfg
from PIL import Image

# Load configuration
config = Cfg.load_config_from_name('vgg_transformer')

# Download model from Hugging Face
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
    repo_id="DungHugging/vietocr-handwritten-finetune",
    filename="transformerocr.pth"
)

config['weights'] = model_path
config['device'] = 'cuda'  # or 'cpu'

# Create predictor
predictor = Predictor(config)

# Predict
image = Image.open('path/to/handwritten_image.jpg')
text = predictor.predict(image)
print(text)

Batch Processing

# For multiple images
images = [Image.open(f'image_{i}.jpg') for i in range(10)]
texts = [predictor.predict(img) for img in images]

📦 Model Details

Architecture

Base Model: VietOCR VGG Transformer
Backbone: VGG19-BN (pretrained on ImageNet)
Encoder: Transformer encoder (6 layers)
Decoder: Transformer decoder with attention
Total Parameters: ~37.65M
Trainable Parameters: ~37.65M

Training Configuration

Model: vgg_transformer
Pretrained: Yes (VietOCR pretrained weights)
Optimizer: Adam
Learning Rate: 0.0001
Batch Size: 8
Epochs: 50
Device: NVIDIA GPU (Kaggle T4/P100)
Training Time: ~2-3 hours

Training Data

Dataset: UIT_HWDB_line
Training Samples: 6,326 lines
Validation Samples: 702 lines
Test Samples: 201 lines
Language: Vietnamese
Text Type: Real handwritten text (various writing styles)

Data Split:

Train: 90%
Validation: 10%
Test: Public test set

📈 Training & Evaluation

Training Process

Data Preparation: Convert UIT_HWDB format to VietOCR annotation format
Fine-tuning: Start from VietOCR pretrained weights
Optimization: Adam optimizer with learning rate 1e-4
Regularization: Weight decay 1e-5
Validation: Evaluated every 500 iterations

Metrics Calculation

CER (Character Error Rate): Levenshtein distance at character level
WER (Word Error Rate): Levenshtein distance at word level
Accuracy: Percentage of exact string matches

💻 System Requirements

Minimum:

Python 3.7+
4GB RAM
CPU inference: ~1-2 seconds per image

Recommended:

Python 3.8+
8GB+ RAM
NVIDIA GPU with 4GB+ VRAM
GPU inference: ~0.1-0.2 seconds per image

🔧 Fine-tuning for Your Data

If you want to fine-tune this model on your own dataset:

from vietocr.tool.config import Cfg
from vietocr.model.trainer import Trainer

# Load this model as base
config = Cfg.load_config_from_name('vgg_transformer')
config['weights'] = 'path/to/transformerocr.pth'

# Update with your data
config['trainer']['data_root'] = './your_data'
config['trainer']['train_annotation'] = 'train.txt'
config['trainer']['valid_annotation'] = 'val.txt'

# Train
trainer = Trainer(config, pretrained=True)
trainer.train()

📝 Citation

If you use this model in your research, please cite:

@misc{vietocr-handwritten-finetune,
  author = {DungHugging},
  title = {VietOCR Fine-tuned for Vietnamese Handwriting Recognition},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/DungHugging/vietocr-handwritten-finetune}}
}

Original VietOCR:

@misc{vietocr,
  author = {Pham, Bao Cong},
  title = {VietOCR: Optical Character Recognition for Vietnamese},
  year = {2020},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/pbcquoc/vietocr}}
}

UIT_HWDB Dataset:

@inproceedings{nguyen2018recognition,
  title={Recognition of Handwritten Vietnamese Text Using Convolutional Neural Network},
  author={Nguyen, Hung Tuan and Nguyen, Cong Thanh and others},
  booktitle={International Conference on Future Data and Security Engineering},
  year={2018}
}

📄 License

This model is released under the MIT License.

✅ Free for commercial use
✅ Free for academic research
✅ Free to modify and redistribute

🤝 Acknowledgments

VietOCR Team for the excellent base model and framework
UIT for the UIT_HWDB dataset
Kaggle for providing free GPU resources for training

📬 Contact & Issues

Hugging Face: @DungHugging
Model Repository: vietocr-handwritten-finetune

For issues, questions, or improvements, please open a discussion on the Hugging Face model page.

🔄 Updates

v1.0 (Current)

Initial release
Fine-tuned on UIT_HWDB_line dataset
CER: 4.01%, WER: 11.02%
50 epochs training

Made with ❤️ for Vietnamese OCR Community

VietOCR • Dataset • Hugging Face

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Accuracy on UIT_HWDB_line
self-reported

34.760
Character Error Rate on UIT_HWDB_line
self-reported

4.010
Word Error Rate on UIT_HWDB_line
self-reported

11.020