Upload full package: YOLO + ArcFace + Scripts

Browse files

Files changed (7) hide show

README.md +114 -3
config.json +8 -0
finetune.py +260 -0
inference.py +210 -0
pytorch_model.bin +3 -0
yolov8s-face-lindevs.onnx +3 -0
yolov8s-face-lindevs.onnx:Zone.Identifier +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,114 @@
----
-license: apache-2.0
----

+import os
+readme_content = """---
+tags:
+- face-recognition
+- yolo
+- pytorch
+- computer-vision
+- arcface
+- metric-learning
+- biometrics
+- 100m-parameters
+library_name: generic
+license: mit
+pipeline_tag: image-feature-extraction
+---
+# 🧠 Face Recognition System (ArcFace + YOLOv8)
+![Python](https://img.shields.io/badge/Python-3.8%2B-blue)
+![PyTorch](https://img.shields.io/badge/PyTorch-2.0%2B-orange)
+![Status](https://img.shields.io/badge/Status-Stable-green)
+![License](https://img.shields.io/badge/License-MIT-yellow)
+## 📖 Overview
+This repository hosts a production-ready **Face Recognition Pipeline** designed for high-accuracy biometric identification. Unlike standard recognizers, this system integrates **YOLOv8** for robust face detection and alignment before feature extraction.
+The core recognition model is built upon a **Wide ResNet-101-2** backbone, trained with a hybrid loss function (**ArcFace + Center Loss**) to generate highly discriminative 512-dimensional embeddings.
+### 🌟 Key Features
+- **Robust Detection**: Uses **YOLOv8 (ONNX)** to detect faces even in challenging lighting or angles.
+- **High Accuracy**: Achieves **90.5%** accuracy on the LFW (Labeled Faces in the Wild) dataset and 90% on Validation.
+- **Discriminative Embeddings**: 512-dim vectors optimized for Cosine Similarity.
+- **Easy-to-Use API**: Includes a wrapper (`inference.py`) for 3-line code implementation.
+- **Fine-tuning Ready**: Includes scripts to retrain the model on your custom dataset.
+---
+## 🛠️ Installation
+To run the pipeline, you need to install the necessary dependencies. We recommend using a virtual environment.
+```bash
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  # For CUDA support
+pip install opencv-python onnxruntime-gpu huggingface_hub pillow tqdm numpy
+```
+## Step 1: Download the Wrapper
+- **Download our helper script inference.py which handles model downloading and YOLO detection automatically.**
+```bash
+wget https://huggingface.co/biometric-ai-lab/Face_Recognition/resolve/main/inference.py
+```
+---
+## Step 2: Create & Run Python Script
+- **Create a new file named run_demo.py.**
+- **Copy and paste the code below into it.**
+- **Make sure you have 2 images to test (e.g., face1.jpg and face2.jpg).**
+```bash
+# File: run_demo.py
+from inference import FaceAnalysis
+# 1. Initialize the AI (Downloads models automatically on first run)
+print("⏳ Initializing models...")
+app = FaceAnalysis()
+# 2. Define your images
+img1_path = "face1.jpg"  # <--- Change this to your image path
+img2_path = "face2.jpg"  # <--- Change this to your image path
+# 3. Run Comparison
+print(f"🔍 Comparing {img1_path} vs {img2_path}...")
+try:
+    # Get similarity score and boolean result
+    similarity, is_same = app.compare(img1_path, img2_path)
+    print("-" * 30)
+    print(f"🔹 Similarity Score: {similarity:.4f}")
+    print("-" * 30)
+    if is_same:
+        print("✅ RESULT: SAME PERSON")
+    else:
+        print("❌ RESULT: DIFFERENT PERSON")
+except Exception as e:
+    print(f"Error: {e}")
+    print("Tip: Make sure the image paths are correct!")
+```
+---
+## 🎓 Training Guide
+Option: Full Training (Advanced): Use train.py to train the model from scratch (ImageNet weights) on a large dataset.
+**Step 1: Prepare Dataset**
+- **Organize images in ImageFolder format**
+```bash
+dataset/
+├── person_1/
+│   ├── img1.jpg
+│   └── ...
+└── person_2/
+    └── img1.jpg
+```
+**Step 2: Run Training**
+```bash
+python train.py \\
+    --data_dir ./dataset \\
+    --output_dir ./checkpoints \\
+    --epochs 50 \\
+    --batch_size 64 \\
+    --lr_backbone 8e-6 \\
+    --lr_head 8e-5
+```

config.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "model_type": "face_recognition",
+  "backbone": "wide_resnet101_2",
+  "embedding_size": 512,
+  "num_classes": 100,
+  "test_accuracy": 90.5,
+  "test_dataset": "LFW"
+}

finetune.py ADDED Viewed

	@@ -0,0 +1,260 @@

+import argparse
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision import transforms, datasets
+import torchvision.models as models
+from torch.utils.data import DataLoader, random_split
+from tqdm import tqdm
+import os
+import numpy as np
+# ==========================================
+# 1. MODEL ARCHITECTURE
+# ==========================================
+class FaceRecognitionModel(nn.Module):
+    def __init__(self):
+        super(FaceRecognitionModel, self).__init__()
+        # Load backbone
+        print("🏗️ Loading Backbone: Wide ResNet-101-2...")
+        self.backbone = models.wide_resnet101_2(weights='IMAGENET1K_V2')
+        self.backbone.fc = nn.Identity()
+        # Embedding Head
+        self.embed = nn.Sequential(
+            nn.Linear(2048, 512),
+            nn.BatchNorm1d(512),
+            nn.ReLU(inplace=True)
+        )
+    def forward(self, img):
+        features = self.backbone(img)
+        embedding = self.embed(features)
+        # Normalize to hypersphere
+        return F.normalize(embedding, p=2, dim=1)
+# ==========================================
+# 2. LOSS FUNCTIONS
+# ==========================================
+class ArcFaceLoss(nn.Module):
+    def __init__(self, num_classes, embedding_size=512, margin=0.5, scale=64):
+        super(ArcFaceLoss, self).__init__()
+        self.margin = margin
+        self.scale = scale
+        self.weight = nn.Parameter(torch.Tensor(num_classes, embedding_size))
+        nn.init.xavier_uniform_(self.weight)
+    def forward(self, embeddings, labels):
+        W = F.normalize(self.weight, dim=1)
+        x = F.normalize(embeddings, dim=1)
+        cosine = torch.matmul(x, W.t())
+        cosine = cosine.clamp(-1 + 1e-7, 1 - 1e-7)
+        theta = torch.acos(cosine)
+        target_logits = torch.cos(theta + self.margin)
+        one_hot = torch.zeros_like(cosine)
+        one_hot.scatter_(1, labels.view(-1, 1), 1.0)
+        output = cosine * (1 - one_hot) + target_logits * one_hot
+        output = output * self.scale
+        return output
+class CenterLoss(nn.Module):
+    def __init__(self, num_classes, embedding_size=512):
+        super(CenterLoss, self).__init__()
+        self.centers = nn.Parameter(torch.randn(num_classes, embedding_size))
+        nn.init.xavier_uniform_(self.centers)
+    def forward(self, embeddings, labels):
+        centers_norm = F.normalize(self.centers, p=2, dim=1)
+        centers_batch = centers_norm[labels]
+        cosine_sim = (embeddings * centers_batch).sum(dim=1)
+        loss = (1.0 - cosine_sim).mean()
+        return loss
+# ==========================================
+# 3. DATA LOADER
+# ==========================================
+def get_dataloader(data_dir, batch_size=64, num_workers=4, split_ratio=0.9):
+    print(f"📂 Loading Data from: {data_dir}")
+    # Strong Augmentation for Training
+    transform_train = transforms.Compose([
+        transforms.Resize((256, 256)),
+        transforms.RandomCrop((224, 224)),
+        transforms.RandomHorizontalFlip(p=0.5),
+        transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.25, hue=0.08),
+        transforms.RandomGrayscale(p=0.1),
+        transforms.RandomRotation(degrees=10),
+        transforms.RandomAffine(degrees=0, translate=(0.08, 0.08), scale=(0.92, 1.08)),
+        transforms.RandomApply([transforms.GaussianBlur(kernel_size=5, sigma=(0.1, 2.0))], p=0.3),
+        transforms.RandomPerspective(distortion_scale=0.2, p=0.3),
+        transforms.ToTensor(),
+        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+        transforms.RandomErasing(p=0.25, scale=(0.02, 0.15), ratio=(0.3, 3.3)),
+    ])
+    # Standard Transform for Validation
+    transform_val = transforms.Compose([
+        transforms.Resize((224, 224)),
+        transforms.ToTensor(),
+        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+    ])
+    full_dataset = datasets.ImageFolder(root=data_dir, transform=transform_train)
+    num_classes = len(full_dataset.classes)
+    # Split Train/Val
+    train_size = int(split_ratio * len(full_dataset))
+    val_size = len(full_dataset) - train_size
+    train_set, val_set = random_split(full_dataset, [train_size, val_size], generator=torch.Generator().manual_seed(42))
+    # Apply specific transform to validation set
+    val_set.dataset.transform = transform_val
+    print(f"   ✅ Classes: {num_classes}")
+    print(f"   ✅ Train Images: {len(train_set)}")
+    print(f"   ✅ Val Images: {len(val_set)}")
+    train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=num_workers, pin_memory=True)
+    val_loader = DataLoader(val_set, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=True)
+    return train_loader, val_loader, num_classes
+# ==========================================
+# 4. TRAINING ENGINE
+# ==========================================
+def evaluate(model, arcface, val_loader, criterion, device):
+    model.eval()
+    arcface.eval()
+    total_loss = 0
+    correct = 0
+    total = 0
+    with torch.no_grad():
+        for imgs, labels in tqdm(val_loader, desc="   🧪 Evaluating"):
+            imgs, labels = imgs.to(device), labels.to(device)
+            embeddings = model(imgs)
+            logits = arcface(embeddings, labels)
+            loss = criterion(logits, labels)
+            total_loss += loss.item()
+            _, predicted = torch.max(logits.data, 1)
+            total += labels.size(0)
+            correct += (predicted == labels).sum().item()
+    return total_loss / len(val_loader), 100 * correct / total
+def main(args):
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    print(f"🚀 Device: {device}")
+    # Data
+    train_loader, val_loader, num_classes = get_dataloader(args.data_dir, args.batch_size, args.num_workers)
+    # Models
+    model = FaceRecognitionModel().to(device)
+    arcface = ArcFaceLoss(num_classes=num_classes).to(device)
+    center_loss = CenterLoss(num_classes=num_classes).to(device)
+    # Load Checkpoint (Resume)
+    start_epoch = 0
+    if args.resume and os.path.exists(args.resume):
+        print(f"🔄 Resuming from {args.resume}...")
+        checkpoint = torch.load(args.resume, map_location=device)
+        model.load_state_dict(checkpoint['model_state_dict'])
+        arcface.load_state_dict(checkpoint['arcface_state_dict'])
+        if 'center_loss_state_dict' in checkpoint:
+            center_loss.load_state_dict(checkpoint['center_loss_state_dict'])
+        start_epoch = checkpoint.get('epoch', 0)
+    # Optimizer
+    optimizer = torch.optim.Adam([
+        {'params': model.backbone.parameters(), 'lr': args.lr_backbone},
+        {'params': model.embed.parameters(), 'lr': args.lr_head},
+        {'params': arcface.parameters(), 'lr': args.lr_head},
+        {'params': center_loss.parameters(), 'lr': 1e-4}
+    ], weight_decay=1e-3)
+    criterion = nn.CrossEntropyLoss()
+    best_acc = 0.0
+    # Training Loop
+    print("\n🔥 START TRAINING...")
+    for epoch in range(start_epoch, args.epochs):
+        model.train()
+        total_loss = 0
+        pbar = tqdm(train_loader, desc=f"Epoch {epoch + 1}/{args.epochs}")
+        for imgs, labels in pbar:
+            imgs, labels = imgs.to(device), labels.to(device)
+            # Forward
+            embeddings = model(imgs)
+            logits = arcface(embeddings, labels)
+            # Loss Calculation
+            loss_ce = criterion(logits, labels)
+            loss_center = center_loss(embeddings, labels)
+            loss = loss_ce + (args.lambda_center * loss_center)
+            # Backward
+            optimizer.zero_grad()
+            loss.backward()
+            optimizer.step()
+            total_loss += loss.item()
+            pbar.set_postfix({'Loss': f"{loss.item():.4f}", 'CE': f"{loss_ce.item():.4f}"})
+        # Save Checkpoint
+        save_dict = {
+            'epoch': epoch + 1,
+            'model_state_dict': model.state_dict(),
+            'arcface_state_dict': arcface.state_dict(),
+            'center_loss_state_dict': center_loss.state_dict(),
+            'num_classes': num_classes
+        }
+        # Save Last
+        torch.save(save_dict, os.path.join(args.output_dir, "last_checkpoint.bin"))
+        # Evaluate & Save Best
+        val_loss, val_acc = evaluate(model, arcface, val_loader, criterion, device)
+        print(f"   🏆 Epoch {epoch + 1} | Val Loss: {val_loss:.4f} | Accuracy: {val_acc:.2f}%")
+        if val_acc > best_acc:
+            best_acc = val_acc
+            print(f"   💾 Saving New Best Model (Acc: {best_acc:.2f}%)")
+            torch.save(save_dict, os.path.join(args.output_dir, "pytorch_model.bin"))
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Train Face Recognition Model (ArcFace + CenterLoss)")
+    # Required
+    parser.add_argument('--data_dir', type=str, required=True, help="Path to ImageFolder dataset")
+    # Optional
+    parser.add_argument('--output_dir', type=str, default=".", help="Where to save .bin files")
+    parser.add_argument('--resume', type=str, default=None, help="Path to checkpoint to resume")
+    parser.add_argument('--epochs', type=int, default=20)
+    parser.add_argument('--batch_size', type=int, default=64)
+    parser.add_argument('--num_workers', type=int, default=4)
+    # Hyperparameters
+    parser.add_argument('--lr_backbone', type=float, default=8e-6)
+    parser.add_argument('--lr_head', type=float, default=8e-5)
+    parser.add_argument('--lambda_center', type=float, default=0.18)
+    args = parser.parse_args()
+    os.makedirs(args.output_dir, exist_ok=True)
+    main(args)

inference.py ADDED Viewed

	@@ -0,0 +1,210 @@

+import os
+import cv2
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision import transforms
+import torchvision.models as models
+from PIL import Image
+import onnxruntime as ort
+from huggingface_hub import hf_hub_download
+# ==========================================
+# CẤU HÌNH REPO
+# ==========================================
+REPO_ID = "biometric-ai-lab/Face_Recognition"
+RECOG_FILENAME = "pytorch_model.bin"
+YOLO_FILENAME = "yolov8s-face-lindevs.onnx"
+# ==========================================
+# 1. MODEL ARCHITECTURE (Giống hệt code bạn)
+# ==========================================
+class FaceRecognitionModel(nn.Module):
+    def __init__(self):
+        super(FaceRecognitionModel, self).__init__()
+        # Khởi tạo backbone, để weights=None vì ta sẽ load weight train của bạn
+        self.backbone = models.wide_resnet101_2(weights=None)
+        self.backbone.fc = nn.Identity()
+        self.embed = nn.Sequential(
+            nn.Linear(2048, 512),
+            nn.BatchNorm1d(512),
+            nn.ReLU(inplace=True),
+        )
+    def forward(self, img):
+        features = self.backbone(img)
+        embedding = self.embed(features)
+        return F.normalize(embedding, p=2, dim=1)
+# ==========================================
+# 2. YOLO DETECTOR (Logic chuẩn của bạn)
+# ==========================================
+class YOLOFaceDetector:
+    def __init__(self, model_path, conf_threshold=0.5):
+        self.session = ort.InferenceSession(model_path, providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
+        self.input_name = self.session.get_inputs()[0].name
+        self.output_names = [output.name for output in self.session.get_outputs()]
+        self.conf_threshold = conf_threshold
+        self.input_size = 640
+    def detect_extract_face(self, image_pil, expand_ratio=0.0):
+        """
+        Input: PIL Image
+        Output: PIL Image (Cropped Face)
+        """
+        # Convert PIL -> OpenCV (BGR) để giống logic cũ
+        image_np = np.array(image_pil)
+        image_bgr = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)
+        img_height, img_width = image_bgr.shape[:2]
+        # Preprocess (Resize -> RGB -> Norm -> Transpose)
+        img_resized = cv2.resize(image_bgr, (self.input_size, self.input_size))
+        # Lưu ý: YOLO training thường dùng RGB
+        img_rgb = cv2.cvtColor(img_resized, cv2.COLOR_BGR2RGB)
+        img_normalized = img_rgb.astype(np.float32) / 255.0
+        img_transposed = np.transpose(img_normalized, (2, 0, 1))
+        img_batch = np.expand_dims(img_transposed, axis=0)
+        # Inference
+        outputs = self.session.run(self.output_names, {self.input_name: img_batch})
+        predictions = outputs[0]
+        if len(predictions.shape) == 3:
+            predictions = predictions[0].T
+        best_face = None
+        max_area = 0
+        # Post-process
+        for pred in predictions:
+            conf = pred[4]
+            if conf > self.conf_threshold:
+                x_center, y_center, w, h = pred[:4]
+                # Scale về ảnh gốc
+                x_center = x_center * img_width / self.input_size
+                y_center = y_center * img_height / self.input_size
+                w = w * img_width / self.input_size
+                h = h * img_height / self.input_size
+                x1 = int(x_center - w / 2)
+                y1 = int(y_center - h / 2)
+                x2 = int(x_center + w / 2)
+                y2 = int(y_center + h / 2)
+                x1 = max(0, x1)
+                y1 = max(0, y1)
+                x2 = min(img_width, x2)
+                y2 = min(img_height, y2)
+                area = (x2 - x1) * (y2 - y1)
+                # Lấy mặt to nhất
+                if area > max_area:
+                    max_area = area
+                    best_face = (x1, y1, x2, y2)
+        # Crop ảnh
+        if best_face:
+            x1, y1, x2, y2 = best_face
+            # Xử lý expand_ratio (nếu có dùng)
+            if expand_ratio != 0:
+                w_box = x2 - x1
+                h_box = y2 - y1
+                pad = int(expand_ratio * max(w_box, h_box))
+                x1 = max(0, x1 - pad)
+                y1 = max(0, y1 - pad)
+                x2 = min(img_width, x2 + pad)
+                y2 = min(img_height, y2 + pad)
+            # Crop từ ảnh gốc PIL (để giữ chất lượng tốt nhất)
+            return image_pil.crop((x1, y1, x2, y2))
+        print("⚠️ Warning: No face detected. Using full image.")
+        return image_pil
+# ==========================================
+# 3. FACE ANALYSIS WRAPPER
+# ==========================================
+class FaceAnalysis:
+    def __init__(self, device=None):
+        self.device = device if device else ('cuda' if torch.cuda.is_available() else 'cpu')
+        print(f"🚀 Initializing Face Analysis on {self.device}...")
+        # 1. Tải Model
+        try:
+            print(f"📥 Checking models from {REPO_ID}...")
+            recog_path = hf_hub_download(repo_id=REPO_ID, filename=RECOG_FILENAME)
+            yolo_path = hf_hub_download(repo_id=REPO_ID, filename=YOLO_FILENAME)
+        except Exception as e:
+            raise RuntimeError(f"❌ Failed to download models. Check internet or Repo ID.\nError: {e}")
+        # 2. Init YOLO
+        self.yolo = YOLOFaceDetector(yolo_path, conf_threshold=0.5)
+        # 3. Init Recognition
+        self.model = FaceRecognitionModel().to(self.device)
+        # Load weights an toàn
+        checkpoint = torch.load(recog_path, map_location=self.device)
+        if 'model' in checkpoint:
+            self.model.load_state_dict(checkpoint['model'])
+        else:
+            # Fallback nếu file chỉ chứa weight không
+            self.model.load_state_dict(checkpoint)
+        self.model.eval()
+        # 4. Transform (Giống hệt inference_transform của bạn)
+        self.transform = transforms.Compose([
+            transforms.Resize((224, 224)),
+            transforms.ToTensor(),
+            transforms.Normalize(
+                mean=[0.485, 0.456, 0.406],
+                std=[0.229, 0.224, 0.225],
+            ),
+        ])
+        print("✅ System Ready!")
+    def process_image(self, image_source, expand_ratio=0.0):
+        # Load ảnh
+        if isinstance(image_source, str):
+            if not os.path.exists(image_source):
+                raise FileNotFoundError(f"Image not found: {image_source}")
+            img_pil = Image.open(image_source).convert('RGB')
+        elif isinstance(image_source, Image.Image):
+            img_pil = image_source.convert('RGB')
+        elif isinstance(image_source, np.ndarray):
+            img_pil = Image.fromarray(cv2.cvtColor(image_source, cv2.COLOR_BGR2RGB))
+        else:
+            raise ValueError("Input must be filepath, PIL Image, or Numpy Array")
+        # 1. YOLO Detect & Crop
+        face_crop = self.yolo.detect_extract_face(img_pil, expand_ratio=expand_ratio)
+        # 2. Transform & Embedding
+        img_tensor = self.transform(face_crop).unsqueeze(0).to(self.device)
+        with torch.no_grad():
+            embedding = self.model(img_tensor)
+        return embedding
+    def compare(self, img1, img2, threshold=0.45, expand_ratio=0.01):
+        """
+        So sánh 2 ảnh.
+        expand_ratio=0.01 giống code demo của bạn.
+        """
+        emb1 = self.process_image(img1, expand_ratio)
+        emb2 = self.process_image(img2, expand_ratio)
+        # Cosine Similarity
+        similarity = F.cosine_similarity(emb1, emb2).item()
+        is_same = similarity > threshold
+        return similarity, is_same

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64d4b0af1cbe947a0bcf2996d362cf973bd30f277f55e67a748a409fd733385a
+size 529070510

yolov8s-face-lindevs.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a6d19f2f68d7f0cc8104ab5c9eaa54b63e298f91dcfefd4be897f94a1561d02
+size 44731626

yolov8s-face-lindevs.onnx:Zone.Identifier ADDED Viewed

Binary file (25 Bytes). View file