Update README.md

2fe5d91 verified 4 months ago

6.35 kB

language:
  - en
library_name: timm
tags:
  - vision
  - image-classification
  - vit
  - anti-spoofing
  - face-liveness
  - celeba-spoof
  - deep-learning
  - pytorch
  - huggingface
datasets:
  - celeba-spoof
model_name: ViT-Base-Patch16-224 Face Anti-Spoofing (CelebA Spoof PDA)
license: mit
tasks:
  - name: Face Anti-Spoofing
    type: image-classification
inference: true
metrics:
  - accuracy
  - f1
  - auc
  - precision
  - recall
  - specificity
  - far
  - frr
  - eer
model-index:
  - name: ViT-Base-Patch16-224 Anti-Spoofing (CelebA Spoof PDA)
    results:
      - task:
          type: image-classification
          name: Face Anti-Spoofing
        dataset:
          name: CelebA Spoof (PDA Splits 19–21)
          type: celeba-spoof
          split: test
          size: 1747
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.8329
          - name: F1-score
            type: f1
            value: 0.878
          - name: AUC-ROC
            type: auc
            value: 0.9561
          - name: Precision (PPV)
            type: precision
            value: 0.7974
          - name: Recall (TPR)
            type: recall
            value: 0.9768
          - name: Specificity
            type: specificity
            value: 0.6021
          - name: FAR
            type: far
            value: 0.3979
          - name: FRR
            type: frr
            value: 0.0232
          - name: EER
            type: eer
            value: 0.1083

Vision Transformer for Face Anti-Spoofing on CelebA Spoof (PDA)

This model is a fine tuned Vision Transformer based face anti spoofing system designed to distinguish live faces from spoof attacks under real world conditions.
It is based on ViT Base Patch16 224 and evaluated on the CelebA Spoof PDA benchmark following the official protocol.

The model achieves strong discriminative performance with an AUC ROC of 0.9561 on the test splits.

Source code and training pipeline are available at:
https://github.com/ArchitRastogi20/vit-spoof-detection-pda

Model Summary

Architecture: Vision Transformer Base Patch16 224
Task: Binary face anti spoofing (Live vs Spoof)
Dataset: CelebA Spoof PDA
Training splits: 1 to 18
Evaluation splits: 19 to 21
Framework: PyTorch with timm
Pretraining: ImageNet

Intended Use

This model is intended for research and benchmarking in face anti spoofing and face liveness detection.
Potential application domains include biometric authentication systems, access control, and academic evaluation of transformer based approaches for spoof detection.

The model is not intended for deployment in high risk security environments without additional validation, calibration, and fairness analysis.

Dataset

CelebA Spoof is a large scale face anti spoofing dataset containing diverse spoof attack types such as print, replay, and mask attacks.

Test samples: 1,747
Live samples: 1,076
Spoof samples: 671
Protocol: PDA official split

Dataset reference:
https://github.com/Davidzhangyuanhan/CelebA-Spoof

Data Augmentation

To improve generalization, a GPU accelerated augmentation pipeline was implemented using Kornia.
Augmentations target variations in illumination, pose, blur, and camera artifacts.

Augmentation strategy:

Live samples: 8 augmented variants per image
Spoof samples: 2 augmented variants per image

Applied transformations include:

Random horizontal flip
Random rotation
Color jitter
Gaussian blur and noise
Perspective distortion
Elastic deformation
Sharpness adjustment

Normalization follows ImageNet statistics used by ViT models.

Model Architecture

The base Vision Transformer encoder is initialized with ImageNet pretrained weights.
A custom classification head is appended for binary classification.

Architecture of the classification head:

LayerNorm -> Dropout(0.1) -> Linear(512) -> GELU -> Dropout(0.1) -> Linear(2)

Key configuration details:

Patch size: 16
Input resolution: 224 x 224
Dropout: 0.1
Mixed precision training enabled

Training Procedure

The model was trained on augmented CelebA Spoof data using focal loss to address class imbalance. Hyperparameters were optimized using Weights and Biases sweeps.

Training configuration:

Parameter	Value
Optimizer	AdamW
Learning rate	3e-4
Weight decay	0.05
Batch size	128
Epochs	50
Loss function	Focal Loss (alpha 0.25, gamma 2.0)
Scheduler	Cosine annealing with warmup
Early stopping	Enabled
Device	NVIDIA RTX A5000

Evaluation

Evaluation follows the CelebA Spoof PDA protocol using splits 19 to 21. Threshold optimization was applied to balance false acceptance and false rejection rates.

Reported metrics include accuracy, F1 score, AUC ROC, precision, recall, specificity, FAR, FRR, and EER.

Results

Overall Performance

Metric	Value
Accuracy	0.8329
F1 Score	0.8780
AUC ROC	0.9561

Detection Metrics

Metric	Value
Precision	0.7974
Recall	0.9768
Specificity	0.6021
NPV	0.9417

Error Rates

Metric	Value
FAR	0.3979
FRR	0.0232
EER	0.1083

Confusion Matrix

	Predicted Spoof	Predicted Live
Actual Spoof	404	267
Actual Live	25	1051

Limitations

Performance may degrade on datasets with significantly different capture conditions
High FAR indicates sensitivity to certain spoof patterns
No cross dataset evaluation included

Citation

If you use this model in your research, please cite the CelebA Spoof dataset and reference the repository:

https://github.com/ArchitRastogi20/vit-spoof-detection-pda