VietOCR Fine-tuned for Vietnamese Handwriting Recognition

Model Dataset License

📋 Model Description

This is a fine-tuned VietOCR model specifically optimized for Vietnamese handwritten text recognition. The model is based on VGG Transformer architecture and has been trained on the UIT_HWDB_line dataset, which contains real Vietnamese handwritten text samples.

Best for: Recognizing handwritten Vietnamese text from images (line-level OCR)

🎯 Performance

Evaluated on UIT_HWDB validation set (702 samples):

Metric Score Description
Character Error Rate (CER) 4.01% Average character-level error (lower is better)
Word Error Rate (WER) 11.02% Average word-level error (lower is better)
Exact Match Accuracy 34.76% Percentage of perfectly predicted lines

📊 Interpretation:

  • CER 4.01% means the model correctly predicts ~96% of all characters
  • WER 11.02% means ~89% of words are correctly recognized
  • The low CER despite moderate accuracy indicates the model makes small, recoverable errors rather than catastrophic failures

Example Predictions:

Ground Truth: "Nước ta giáp với biển Đông ở hai phía Đông và Nam"
Prediction:   "Nước ta giáp với biển Đông ở hai phía Đông và Nam"
✅ Perfect match

Ground Truth: "thẳng Phú Hữu"
Prediction:   "thẳng phú trước"
⚠️ Minor errors (case + word confusion)

🚀 Usage

Installation

pip install vietocr
pip install pillow

Quick Start

from vietocr.tool.predictor import Predictor
from vietocr.tool.config import Cfg
from PIL import Image

# Load configuration
config = Cfg.load_config_from_name('vgg_transformer')

# Download model from Hugging Face
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
    repo_id="DungHugging/vietocr-handwritten-finetune",
    filename="transformerocr.pth"
)

config['weights'] = model_path
config['device'] = 'cuda'  # or 'cpu'

# Create predictor
predictor = Predictor(config)

# Predict
image = Image.open('path/to/handwritten_image.jpg')
text = predictor.predict(image)
print(text)

Batch Processing

# For multiple images
images = [Image.open(f'image_{i}.jpg') for i in range(10)]
texts = [predictor.predict(img) for img in images]

📦 Model Details

Architecture

  • Base Model: VietOCR VGG Transformer
  • Backbone: VGG19-BN (pretrained on ImageNet)
  • Encoder: Transformer encoder (6 layers)
  • Decoder: Transformer decoder with attention
  • Total Parameters: ~37.65M
  • Trainable Parameters: ~37.65M

Training Configuration

Model: vgg_transformer
Pretrained: Yes (VietOCR pretrained weights)
Optimizer: Adam
Learning Rate: 0.0001
Batch Size: 8
Epochs: 50
Device: NVIDIA GPU (Kaggle T4/P100)
Training Time: ~2-3 hours

Training Data

  • Dataset: UIT_HWDB_line
  • Training Samples: 6,326 lines
  • Validation Samples: 702 lines
  • Test Samples: 201 lines
  • Language: Vietnamese
  • Text Type: Real handwritten text (various writing styles)

Data Split:

  • Train: 90%
  • Validation: 10%
  • Test: Public test set

📈 Training & Evaluation

Training Process

  1. Data Preparation: Convert UIT_HWDB format to VietOCR annotation format
  2. Fine-tuning: Start from VietOCR pretrained weights
  3. Optimization: Adam optimizer with learning rate 1e-4
  4. Regularization: Weight decay 1e-5
  5. Validation: Evaluated every 500 iterations

Metrics Calculation

  • CER (Character Error Rate): Levenshtein distance at character level
  • WER (Word Error Rate): Levenshtein distance at word level
  • Accuracy: Percentage of exact string matches

💻 System Requirements

Minimum:

  • Python 3.7+
  • 4GB RAM
  • CPU inference: ~1-2 seconds per image

Recommended:

  • Python 3.8+
  • 8GB+ RAM
  • NVIDIA GPU with 4GB+ VRAM
  • GPU inference: ~0.1-0.2 seconds per image

🔧 Fine-tuning for Your Data

If you want to fine-tune this model on your own dataset:

from vietocr.tool.config import Cfg
from vietocr.model.trainer import Trainer

# Load this model as base
config = Cfg.load_config_from_name('vgg_transformer')
config['weights'] = 'path/to/transformerocr.pth'

# Update with your data
config['trainer']['data_root'] = './your_data'
config['trainer']['train_annotation'] = 'train.txt'
config['trainer']['valid_annotation'] = 'val.txt'

# Train
trainer = Trainer(config, pretrained=True)
trainer.train()

📝 Citation

If you use this model in your research, please cite:

@misc{vietocr-handwritten-finetune,
  author = {DungHugging},
  title = {VietOCR Fine-tuned for Vietnamese Handwriting Recognition},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/DungHugging/vietocr-handwritten-finetune}}
}

Original VietOCR:

@misc{vietocr,
  author = {Pham, Bao Cong},
  title = {VietOCR: Optical Character Recognition for Vietnamese},
  year = {2020},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/pbcquoc/vietocr}}
}

UIT_HWDB Dataset:

@inproceedings{nguyen2018recognition,
  title={Recognition of Handwritten Vietnamese Text Using Convolutional Neural Network},
  author={Nguyen, Hung Tuan and Nguyen, Cong Thanh and others},
  booktitle={International Conference on Future Data and Security Engineering},
  year={2018}
}

📄 License

This model is released under the MIT License.

  • ✅ Free for commercial use
  • ✅ Free for academic research
  • ✅ Free to modify and redistribute

🤝 Acknowledgments

  • VietOCR Team for the excellent base model and framework
  • UIT for the UIT_HWDB dataset
  • Kaggle for providing free GPU resources for training

📬 Contact & Issues

For issues, questions, or improvements, please open a discussion on the Hugging Face model page.

🔄 Updates

v1.0 (Current)

  • Initial release
  • Fine-tuned on UIT_HWDB_line dataset
  • CER: 4.01%, WER: 11.02%
  • 50 epochs training

Made with ❤️ for Vietnamese OCR Community

VietOCRDatasetHugging Face

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results