π ScanOCR Identity VI (Vietnamese ID Card OCR)
scanocr-identity-vi is a comprehensive OCR solution specifically optimized for recognizing and extracting information from Vietnamese Citizen Identification Cards (CCCD) and National Identity Cards (CMND).
This model combines the power of:
YOLOv11 (ONNX): To detect bounding boxes containing information such as full name, ID number, date of birth, address, etc.
VietOCR: To accurately recognize Vietnamese characters with diacritics.
π Features
Accurately detects various types of cards (chip-embedded ID cards, barcode ID cards, 9-digit ID cards).
Extract detailed information: ID number, full name, date of birth, place of origin, permanent address...
Handles tilted, rotated, or images with challenging lighting conditions effectively.
Supports reordering reversed address lines (commonly encountered in multi-line OCR).
π Install
Requires Python 3.9 or higher. Install the necessary libraries:
pip install ultralytics vietocr opencv-python pillow numpy
π» Instructions
Below is an example of sample code to run the model using the .onnx (YOLO) and VietOCR files:
import cv2
import torch
from PIL import Image
from ultralytics import YOLO
from vietocr.tool.predictor import Predictor
from vietocr.tool.config import Cfg
# 1. Initialize VietOCR (For Vietnamese language)
config = Cfg.load_config_from_name('vgg_transformer')
config['device'] = 'cuda' if torch.cuda.is_available() else 'cpu'
ocr_predictor = Predictor(config)
# 2. Download the YOLO model (exported to ONNX)
yolo_model = YOLO("yolo_v11_best.onnx")
def scan_id_card(image_path):
img = cv2.imread(image_path)
detections = yolo_model(img)[0]
results = {"type": "Unknown", "data": {}}
box_list = []
# Collect the discovered boxes
for box in detections.boxes:
cls = int(box.cls[0])
label = detections.names[cls]
x1, y1, x2, y2 = map(int, box.xyxy[0])
crop = img[y1:y2, x1:x2]
box_list.append({"label": label, "y1": y1, "crop": crop})
# Sort by Y-coordinate to avoid address inversion
box_list = sorted(box_list, key=lambda x: x["y1"])
for item in box_list:
label = item["label"]
crop_pil = Image.fromarray(cv2.cvtColor(item["crop"], cv2.COLOR_BGR2RGB))
text = ocr_predictor.predict(crop_pil)
if label in results["data"]:
results["data"][label] += " " + text
else:
results["data"][label] = text
return results
# Test run
print(scan_id_card("path_to_your_image.jpg"))
π Output JSON structure
{
"success": true,
"result": {
"type": "CCCD GαΊ―n Chip",
"data": {
"Sα» ID": "026xxxxxxxxx",
"Hα» tΓͺn": "NGUYα»N QUα»C VIα»T",
"NgΓ y sinh": "21/02/1989",
"QuΓͺ quΓ‘n": "SΖ‘n LΓ΄i, BΓ¬nh XuyΓͺn, VΔ©nh PhΓΊc",
"Δα»a chα» thΖ°α»ng trΓΊ": "αΊ€p An Viα»
n, BΓ¬nh An, Long ThΓ nh, Δα»ng Nai"
}
}
}
π Architectural pipeline
Input: Photo of the Citizen Identification Card.
Detection: YOLOv11 identifies the location of text regions.
Sorting: Arrange the boxes vertically (y-axis) to ensure the correct reading order.
Recognition: VietOCR converts the cropped image area into text.
Post-processing: Normalize strings and format JSON.
π Note
Light: The model performs best in evenly lit conditions, without glare in the ID number area.
Coordinates: If the addresses are reversed (e.g., Province before Hamlet), make sure you have used y1 sort logic before importing them into VietOCR.
π³ Author & License
Developed by: PHGROUP TECHNOLOGY SOLUTIONS CO., LTD
HuggingFace: phgrouptechs/scanocr-identity-vi
License: Apache License 2.0