πŸ“‘ ScanOCR Identity VI (Vietnamese ID Card OCR)

scanocr-identity-vi is a comprehensive OCR solution specifically optimized for recognizing and extracting information from Vietnamese Citizen Identification Cards (CCCD) and National Identity Cards (CMND).

This model combines the power of:

YOLOv11 (ONNX): To detect bounding boxes containing information such as full name, ID number, date of birth, address, etc.

VietOCR: To accurately recognize Vietnamese characters with diacritics.

πŸš€ Features

  • Accurately detects various types of cards (chip-embedded ID cards, barcode ID cards, 9-digit ID cards).

  • Extract detailed information: ID number, full name, date of birth, place of origin, permanent address...

  • Handles tilted, rotated, or images with challenging lighting conditions effectively.

  • Supports reordering reversed address lines (commonly encountered in multi-line OCR).

πŸ›  Install

Requires Python 3.9 or higher. Install the necessary libraries:

pip install ultralytics vietocr opencv-python pillow numpy

πŸ’» Instructions

Below is an example of sample code to run the model using the .onnx (YOLO) and VietOCR files:

import cv2
import torch
from PIL import Image
from ultralytics import YOLO
from vietocr.tool.predictor import Predictor
from vietocr.tool.config import Cfg

# 1. Initialize VietOCR (For Vietnamese language)
config = Cfg.load_config_from_name('vgg_transformer')
config['device'] = 'cuda' if torch.cuda.is_available() else 'cpu'
ocr_predictor = Predictor(config)

# 2. Download the YOLO model (exported to ONNX)
yolo_model = YOLO("yolo_v11_best.onnx")

def scan_id_card(image_path):
    img = cv2.imread(image_path)
    detections = yolo_model(img)[0]
    
    results = {"type": "Unknown", "data": {}}
    box_list = []

    # Collect the discovered boxes
    for box in detections.boxes:
        cls = int(box.cls[0])
        label = detections.names[cls]
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        
        crop = img[y1:y2, x1:x2]
        box_list.append({"label": label, "y1": y1, "crop": crop})

    # Sort by Y-coordinate to avoid address inversion
    box_list = sorted(box_list, key=lambda x: x["y1"])

    for item in box_list:
        label = item["label"]
        crop_pil = Image.fromarray(cv2.cvtColor(item["crop"], cv2.COLOR_BGR2RGB))
        text = ocr_predictor.predict(crop_pil)
        
        if label in results["data"]:
            results["data"][label] += " " + text
        else:
            results["data"][label] = text

    return results

# Test run
print(scan_id_card("path_to_your_image.jpg"))

πŸ“Š Output JSON structure

{
    "success": true,
    "result": {
        "type": "CCCD GαΊ―n Chip",
        "data": {
            "Sα»‘ ID": "026xxxxxxxxx",
            "Họ tΓͺn": "NGUYα»„N QUỐC VIỆT",
            "NgΓ y sinh": "21/02/1989",
            "QuΓͺ quΓ‘n": "SΖ‘n LΓ΄i, BΓ¬nh XuyΓͺn, VΔ©nh PhΓΊc",
            "Địa chỉ thường trΓΊ": "αΊ€p An Viα»…n, BΓ¬nh An, Long ThΓ nh, Đồng Nai"
        }
    }
}

πŸ— Architectural pipeline

  1. Input: Photo of the Citizen Identification Card.

  2. Detection: YOLOv11 identifies the location of text regions.

  3. Sorting: Arrange the boxes vertically (y-axis) to ensure the correct reading order.

  4. Recognition: VietOCR converts the cropped image area into text.

  5. Post-processing: Normalize strings and format JSON.

πŸ“ Note

  • Light: The model performs best in evenly lit conditions, without glare in the ID number area.

  • Coordinates: If the addresses are reversed (e.g., Province before Hamlet), make sure you have used y1 sort logic before importing them into VietOCR.

πŸ’³ Author & License

Developed by: PHGROUP TECHNOLOGY SOLUTIONS CO., LTD

HuggingFace: phgrouptechs/scanocr-identity-vi

License: Apache License 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support