🛡️ Face Anti-Spoofing Detection System

📖 Overview

This repository presents a Video-Based Face Anti-Spoofing System designed to detect presentation attacks (photo, video replay, 3D masks) in facial recognition systems. The model analyzes temporal patterns across video frames to distinguish between genuine faces and spoofing attempts.

🏗️ Architecture

The system employs a two-stage architecture:

Spatial Feature Extraction: Wide ResNet-101-2 (pretrained on face recognition) extracts discriminative features from individual frames
Temporal Analysis: Transformer encoder captures motion patterns and temporal inconsistencies across frame sequences

This hybrid approach achieves superior performance by combining:

Strong spatial features from face recognition pretraining
Temporal modeling to detect subtle motion artifacts in spoofing attacks

🌟 Key Features

Temporal Modeling: Analyzes 10-frame sequences to detect liveness patterns
Transfer Learning: Leverages pretrained face recognition backbone (Wide ResNet-101-2)
Strong Augmentation: Geometric transforms, color jitter, and perspective distortion for robust training
Efficient Training: Parallel batch processing with gradient checkpointing
Full Checkpoint System: Single-file model with all weights (backbone + temporal + classifier)
Resume Capability: Continue training from checkpoints with full optimizer state recovery
Real-time Detection: YOLO-based face detection + temporal analysis for webcam inference

📊 Performance

Metric	Training Set	Validation Set
Accuracy	80.69%	99.75%
EER (Equal Error Rate)	20.83%	1%
AUC-ROC	TBD	TBD

🛠️ Installation

Requirements

# Core dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install scikit-learn pillow tqdm numpy opencv-python

# For inference (YOLO face detection)
pip install onnxruntime-gpu huggingface_hub

Dataset Structure (For Training)

Organize your dataset in the following format:

dataset_antispoofing_cropped/
├── real_frame_cropped/          # Genuine face videos
│   ├── video_001/
│   │   ├── frame_0001.jpg
│   │   ├── frame_0002.jpg
│   │   └── ...
│   └── video_002/
│       └── ...
└── attack_frame_cropped2/       # Spoofing attack videos
    ├── video_001/
    │   ├── frame_0001.jpg
    │   └── ...
    └── video_002/
        └── ...

🚀 Quick Start

Option 1: Use Pre-trained Model (Inference Only)

Step 1: Download Models

# Download inference script
wget https://huggingface.co/YOUR_USERNAME/Face_AntiSpoofing/resolve/main/inference.py

# Download FULL trained model (includes backbone + temporal + classifier)
wget https://huggingface.co/YOUR_USERNAME/Face_AntiSpoofing/resolve/main/antispoofing_full.pth

# Download YOLO face detector (or auto-download on first run)
wget https://huggingface.co/arnabdhar/YOLOv8-Face-Detection/resolve/main/model.onnx -O yolov8s-face-lindevs.onnx

Step 2: Test with Image

from inference import AntiSpoofingDetector

# Initialize detector (NEW: single file!)
detector = AntiSpoofingDetector(
    model_path="antispoofing_full.pth",  # ← Full checkpoint
    yolo_model_path="yolov8s-face-lindevs.onnx",
    device="cuda",
    threshold=0.5
)

# Test single image
result = detector.predict_image("test.jpg")
print(result)
# Output: {'prediction': 'GENUINE', 'confidence': 0.32, 'is_attack': False}

Step 3: Test with Video

# Test video file
result = detector.predict_video("test_video.mp4", sample_frames=30)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.4f}")
print(f"Frame scores: {result['frame_scores']}")

Step 4: Test with Webcam (Real-time)

# Run webcam detection
detector.run_webcam(camera_id=0, frame_skip=2)
# Press 'q' to quit, 's' to show statistics

Option 2: Train Your Own Model

Step 1: Prepare Dataset

Organize your dataset following the structure above with real_frame_cropped and attack_frame_cropped2 folders.

Step 2: Download Required Files

# Download training script
wget https://huggingface.co/YOUR_USERNAME/Face_AntiSpoofing/resolve/main/train_antispoofing.py

# Download face recognition backbone (ONLY needed for first training)
wget https://huggingface.co/biometric-ai-lab/Face_Recognition/resolve/main/faceRecognition_arcface_ckpt.pth

Step 3: Train from Scratch

# First time training (will load backbone from faceRecognition_arcface_ckpt.pth)
python train_antispoofing.py

The script will:

Check if antispoofing_full.pth exists
If NO → Load backbone from faceRecognition_arcface_ckpt.pth
Train model
Save to antispoofing_full.pth (includes backbone + temporal + classifier)

Step 4: Resume Training

# Next time (will load full model from antispoofing_full.pth)
python train_antispoofing.py

The script will:

Check if antispoofing_full.pth exists
If YES → Load full model (no need for separate backbone file!)
Resume training from saved epoch

Custom Training Code:

import torch
from train_antispoofing import (
    DeepFakeModel, FaceVideoDataset, train_model,
    transforms, DataLoader, ConcatDataset
)

# Define augmentation
transform_train = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.RandomRotation(45),
    transforms.RandomAffine(0, translate=(0.1, 0.1), scale=(0.9, 1.1), shear=10),
    transforms.RandomPerspective(0.2, p=0.3),
    transforms.RandomCrop((224, 224)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ColorJitter(0.3, 0.3, 0.3, 0.15),
    transforms.ToTensor(),
    transforms.RandomErasing(p=0.3),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Load datasets
real_train = FaceVideoDataset(
    root_dir="path/to/real_frame_cropped",
    label=0, num_frames=10, clips_per_video=70,
    transform=transform_train
)

fake_train = FaceVideoDataset(
    root_dir="path/to/attack_frame_cropped2",
    label=1, num_frames=10, clips_per_video=70,
    transform=transform_train
)

train_dataset = ConcatDataset([real_train, fake_train])
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# ============================================================
# SMART MODEL CREATION
# ============================================================
resume_checkpoint = "antispoofing_full.pth"
backbone_file = "faceRecognition_arcface_ckpt.pth"

# Check if full checkpoint exists
if os.path.exists(resume_checkpoint):
    print("✓ Found checkpoint, loading full model...")
    model = DeepFakeModel(
        backbone_ckpt=None,  # Don't load backbone separately
        freeze_backbone=True
    )
else:
    print("✓ No checkpoint, loading backbone separately...")
    model = DeepFakeModel(
        backbone_ckpt=backbone_file,  # Load backbone for first training
        freeze_backbone=True
    )

# Train (will auto-resume if checkpoint exists)
train_model(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    epochs=15,
    lr_temporal=1e-4,
    lr_head=3e-4,
    device="cuda",
    save_path="antispoofing_full.pth",
    resume_from=resume_checkpoint  # Auto-load if exists
)

📋 Complete Example Scripts

Example 1: Test Single Image

Create test_image.py:

from inference import AntiSpoofingDetector

detector = AntiSpoofingDetector(
    model_path="antispoofing_full.pth",  # ← Single file!
    yolo_model_path="yolov8s-face-lindevs.onnx",
    threshold=0.5
)

result = detector.predict_image("test_photo.jpg")

print(f"\n{'='*50}")
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.4f}")
print(f"Is Attack: {result['is_attack']}")
if result['bbox']:
    x1, y1, x2, y2, conf = result['bbox']
    print(f"Face Location: ({x1}, {y1}) to ({x2}, {y2})")
print(f"{'='*50}")

Example 2: Batch Test Multiple Videos

Create test_videos.py:

from inference import AntiSpoofingDetector
import glob

detector = AntiSpoofingDetector(
    model_path="antispoofing_full.pth",
    yolo_model_path="yolov8s-face-lindevs.onnx",
    threshold=0.5
)

video_files = glob.glob("test_videos/*.mp4")

results = []
for video_path in video_files:
    result = detector.predict_video(video_path, sample_frames=30)
    results.append({
        'video': video_path,
        'prediction': result['prediction'],
        'confidence': result['confidence']
    })

# Print summary
print("\n" + "="*60)
print("BATCH TEST RESULTS")
print("="*60)
for r in results:
    status = "✓ REAL" if r['prediction'] == "GENUINE" else "✗ FAKE"
    print(f"{status:10} | {r['confidence']:.4f} | {r['video']}")
print("="*60)

Example 3: Real-time Webcam Detection

Create run_webcam.py:

from inference import AntiSpoofingDetector

detector = AntiSpoofingDetector(
    model_path="antispoofing_full.pth",
    yolo_model_path="yolov8s-face-lindevs.onnx",
    threshold=0.2,  # Lower threshold for real-time (more sensitive)
    num_frames=10
)

# Run webcam
# Press 'q' to quit, 's' to show statistics
detector.run_webcam(camera_id=0, frame_skip=2)

Run:

python run_webcam.py

📦 Model Architecture

Input: (Batch, 10 frames, 3, 224, 224)
  ↓
Backbone: Wide ResNet-101-2 [Layer 1-2 only]
  - Transfer learning from face recognition
  - Output: (Batch, 10, 512, H, W)
  ↓
Temporal Encoder: Transformer (8 heads, 4 layers)
  - Positional encoding for temporal order
  - Output: (Batch, 512)
  ↓
Classifier: Linear(512→256→1)
  - GELU activation + Dropout
  - Output: (Batch, 1) logits → sigmoid → probability
  ↓
Decision: probability > threshold → FAKE, else REAL

Parameter Count:

Backbone (frozen): ~35M parameters
Temporal Encoder: ~8M parameters
Classifier Head: ~130K parameters
Total Trainable: ~8.1M parameters

📊 Understanding Predictions

Output Format

{
    'prediction': 'GENUINE' or 'SPOOFING_ATTACK',
    'confidence': 0.0-1.0,  # Probability of being ATTACK
    'is_attack': True/False,
    'bbox': (x1, y1, x2, y2, conf) or None
}

Confidence Score Interpretation

0.0 - 0.3: High confidence GENUINE face
0.3 - 0.5: Low confidence GENUINE (borderline)
0.5 - 0.7: Low confidence ATTACK (borderline)
0.7 - 1.0: High confidence ATTACK

Adjusting Threshold

# Strict mode (fewer false negatives, more false positives)
detector = AntiSpoofingDetector(
    model_path="antispoofing_full.pth",
    threshold=0.2
)

# Balanced mode (default)
detector = AntiSpoofingDetector(
    model_path="antispoofing_full.pth",
    threshold=0.5
)

# Permissive mode (fewer false positives, more false negatives)
detector = AntiSpoofingDetector(
    model_path="antispoofing_full.pth",
    threshold=0.7
)

Checkpoint System Explained

First Training:

Load: faceRecognition_arcface_ckpt.pth (backbone only)
  ↓
Train: temporal encoder + classifier
  ↓
Save: antispoofing_full.pth (backbone + temporal + classifier)

Resume Training:

Load: antispoofing_full.pth (everything)
  ↓
Resume: from saved epoch
  ↓
Save: antispoofing_full.pth (updated)

📂 Repository Structure

Face_AntiSpoofing/
├── train_antispoofing.py           # Training script
├── inference.py                    # Inference API
├── README.md                       # This file
├── requirements.txt                # Dependencies
├── models/
│   ├── antispoofing_full.pth              # ← FULL checkpoint (all weights)
│   ├── faceRecognition_arcface_ckpt.pth   # ← Backbone (only for first training)
│   └── yolov8s-face-lindevs.onnx          # ← Face detector
├── examples/
│   ├── test_image.py
│   ├── test_videos.py
│   └── run_webcam.py
└── dataset_antispoofing_cropped/
    ├── real_frame_cropped/
    └── attack_frame_cropped2/

🎓 About This Project

This research is conducted by undergraduate students from
Ho Chi Minh City University of Technology and Education (HCMUTE),
Cohort K23, as part of our Computer Vision and Biometric Systems coursework.

Downloads last month: 7