📑 ScanOCR Identity VI (Vietnamese ID Card OCR)

scanocr-identity-vi is a comprehensive OCR solution specifically optimized for recognizing and extracting information from Vietnamese Citizen Identification Cards (CCCD) and National Identity Cards (CMND).

This model combines the power of:

YOLOv11 (ONNX): To detect bounding boxes containing information such as full name, ID number, date of birth, address, etc.

VietOCR: To accurately recognize Vietnamese characters with diacritics.

🚀 Features

Accurately detects various types of cards (chip-embedded ID cards, barcode ID cards, 9-digit ID cards).
Extract detailed information: ID number, full name, date of birth, place of origin, permanent address...
Handles tilted, rotated, or images with challenging lighting conditions effectively.
Supports reordering reversed address lines (commonly encountered in multi-line OCR).

🛠 Install

Requires Python 3.9 or higher. Install the necessary libraries:

pip install ultralytics vietocr opencv-python pillow numpy

💻 Instructions

Below is an example of sample code to run the model using the .onnx (YOLO) and VietOCR files:

import cv2
import torch
from PIL import Image
from ultralytics import YOLO
from vietocr.tool.predictor import Predictor
from vietocr.tool.config import Cfg

# 1. Initialize VietOCR (For Vietnamese language)
config = Cfg.load_config_from_name('vgg_transformer')
config['device'] = 'cuda' if torch.cuda.is_available() else 'cpu'
ocr_predictor = Predictor(config)

# 2. Download the YOLO model (exported to ONNX)
yolo_model = YOLO("yolo_v11_best.onnx")

def scan_id_card(image_path):
    img = cv2.imread(image_path)
    detections = yolo_model(img)[0]
    
    results = {"type": "Unknown", "data": {}}
    box_list = []

    # Collect the discovered boxes
    for box in detections.boxes:
        cls = int(box.cls[0])
        label = detections.names[cls]
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        
        crop = img[y1:y2, x1:x2]
        box_list.append({"label": label, "y1": y1, "crop": crop})

    # Sort by Y-coordinate to avoid address inversion
    box_list = sorted(box_list, key=lambda x: x["y1"])

    for item in box_list:
        label = item["label"]
        crop_pil = Image.fromarray(cv2.cvtColor(item["crop"], cv2.COLOR_BGR2RGB))
        text = ocr_predictor.predict(crop_pil)
        
        if label in results["data"]:
            results["data"][label] += " " + text
        else:
            results["data"][label] = text

    return results

# Test run
print(scan_id_card("path_to_your_image.jpg"))

📊 Output JSON structure

{
    "success": true,
    "result": {
        "type": "CCCD Gắn Chip",
        "data": {
            "Số ID": "026xxxxxxxxx",
            "Họ tên": "NGUYỄN QUỐC VIỆT",
            "Ngày sinh": "21/02/1989",
            "Quê quán": "Sơn Lôi, Bình Xuyên, Vĩnh Phúc",
            "Địa chỉ thường trú": "Ấp An Viễn, Bình An, Long Thành, Đồng Nai"
        }
    }
}

🏗 Architectural pipeline

Input: Photo of the Citizen Identification Card.
Detection: YOLOv11 identifies the location of text regions.
Sorting: Arrange the boxes vertically (y-axis) to ensure the correct reading order.
Recognition: VietOCR converts the cropped image area into text.
Post-processing: Normalize strings and format JSON.

📝 Note

Light: The model performs best in evenly lit conditions, without glare in the ID number area.
Coordinates: If the addresses are reversed (e.g., Province before Hamlet), make sure you have used y1 sort logic before importing them into VietOCR.

💳 Author & License

Developed by: PHGROUP TECHNOLOGY SOLUTIONS CO., LTD

HuggingFace: phgrouptechs/scanocr-identity-vi

License: Apache License 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support