language:
- en
library_name: timm
tags:
- vision
- image-classification
- vit
- anti-spoofing
- face-liveness
- celeba-spoof
- deep-learning
- pytorch
- huggingface
datasets:
- celeba-spoof
model_name: ViT-Base-Patch16-224 Face Anti-Spoofing (CelebA Spoof PDA)
license: mit
tasks:
- name: Face Anti-Spoofing
type: image-classification
inference: true
metrics:
- accuracy
- f1
- auc
- precision
- recall
- specificity
- far
- frr
- eer
model-index:
- name: ViT-Base-Patch16-224 Anti-Spoofing (CelebA Spoof PDA)
results:
- task:
type: image-classification
name: Face Anti-Spoofing
dataset:
name: CelebA Spoof (PDA Splits 19–21)
type: celeba-spoof
split: test
size: 1747
metrics:
- name: Accuracy
type: accuracy
value: 0.8329
- name: F1-score
type: f1
value: 0.878
- name: AUC-ROC
type: auc
value: 0.9561
- name: Precision (PPV)
type: precision
value: 0.7974
- name: Recall (TPR)
type: recall
value: 0.9768
- name: Specificity
type: specificity
value: 0.6021
- name: FAR
type: far
value: 0.3979
- name: FRR
type: frr
value: 0.0232
- name: EER
type: eer
value: 0.1083
Vision Transformer for Face Anti-Spoofing on CelebA Spoof (PDA)
This model is a fine tuned Vision Transformer based face anti spoofing system designed to distinguish live faces from spoof attacks under real world conditions.
It is based on ViT Base Patch16 224 and evaluated on the CelebA Spoof PDA benchmark following the official protocol.
The model achieves strong discriminative performance with an AUC ROC of 0.9561 on the test splits.
Source code and training pipeline are available at:
https://github.com/ArchitRastogi20/vit-spoof-detection-pda
Model Summary
- Architecture: Vision Transformer Base Patch16 224
- Task: Binary face anti spoofing (Live vs Spoof)
- Dataset: CelebA Spoof PDA
- Training splits: 1 to 18
- Evaluation splits: 19 to 21
- Framework: PyTorch with timm
- Pretraining: ImageNet
Intended Use
This model is intended for research and benchmarking in face anti spoofing and face liveness detection.
Potential application domains include biometric authentication systems, access control, and academic evaluation of transformer based approaches for spoof detection.
The model is not intended for deployment in high risk security environments without additional validation, calibration, and fairness analysis.
Dataset
CelebA Spoof is a large scale face anti spoofing dataset containing diverse spoof attack types such as print, replay, and mask attacks.
- Test samples: 1,747
- Live samples: 1,076
- Spoof samples: 671
- Protocol: PDA official split
Dataset reference:
https://github.com/Davidzhangyuanhan/CelebA-Spoof
Data Augmentation
To improve generalization, a GPU accelerated augmentation pipeline was implemented using Kornia.
Augmentations target variations in illumination, pose, blur, and camera artifacts.
Augmentation strategy:
- Live samples: 8 augmented variants per image
- Spoof samples: 2 augmented variants per image
Applied transformations include:
- Random horizontal flip
- Random rotation
- Color jitter
- Gaussian blur and noise
- Perspective distortion
- Elastic deformation
- Sharpness adjustment
Normalization follows ImageNet statistics used by ViT models.
Model Architecture
The base Vision Transformer encoder is initialized with ImageNet pretrained weights.
A custom classification head is appended for binary classification.
Architecture of the classification head:
LayerNorm -> Dropout(0.1) -> Linear(512) -> GELU -> Dropout(0.1) -> Linear(2)
Key configuration details:
- Patch size: 16
- Input resolution: 224 x 224
- Dropout: 0.1
- Mixed precision training enabled
Training Procedure
The model was trained on augmented CelebA Spoof data using focal loss to address class imbalance. Hyperparameters were optimized using Weights and Biases sweeps.
Training configuration:
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning rate | 3e-4 |
| Weight decay | 0.05 |
| Batch size | 128 |
| Epochs | 50 |
| Loss function | Focal Loss (alpha 0.25, gamma 2.0) |
| Scheduler | Cosine annealing with warmup |
| Early stopping | Enabled |
| Device | NVIDIA RTX A5000 |
Evaluation
Evaluation follows the CelebA Spoof PDA protocol using splits 19 to 21. Threshold optimization was applied to balance false acceptance and false rejection rates.
Reported metrics include accuracy, F1 score, AUC ROC, precision, recall, specificity, FAR, FRR, and EER.
Results
Overall Performance
| Metric | Value |
|---|---|
| Accuracy | 0.8329 |
| F1 Score | 0.8780 |
| AUC ROC | 0.9561 |
Detection Metrics
| Metric | Value |
|---|---|
| Precision | 0.7974 |
| Recall | 0.9768 |
| Specificity | 0.6021 |
| NPV | 0.9417 |
Error Rates
| Metric | Value |
|---|---|
| FAR | 0.3979 |
| FRR | 0.0232 |
| EER | 0.1083 |
Confusion Matrix
| Predicted Spoof | Predicted Live | |
|---|---|---|
| Actual Spoof | 404 | 267 |
| Actual Live | 25 | 1051 |
Limitations
- Performance may degrade on datasets with significantly different capture conditions
- High FAR indicates sensitivity to certain spoof patterns
- No cross dataset evaluation included
Citation
If you use this model in your research, please cite the CelebA Spoof dataset and reference the repository: