|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- keras/xception_41_imagenet |
|
|
--- |
|
|
# Model Summary |
|
|
|
|
|
This model is designed for detecting deepfake content in images and video frames. It uses a lightweight Convolutional Neural Network (CNN) trained on the **FaceForensics++ dataset**, focusing on high-resolution face manipulations (c23 compression). The model classifies whether a face in an input image is **real or fake**. |
|
|
|
|
|
* Architecture: CNN-based binary classifier |
|
|
* Input: Aligned and cropped face images (224x224 RGB) |
|
|
* Output: Real or Fake label with confidence |
|
|
* Accuracy: \~92% on unseen FaceForensics++ test set |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from keras.models import load_model |
|
|
import cv2 |
|
|
import numpy as np |
|
|
|
|
|
model = load_model('deepfake_cnn_model.h5') |
|
|
|
|
|
def preprocess(img_path): |
|
|
img = cv2.imread(img_path) |
|
|
img = cv2.resize(img, (224, 224)) |
|
|
img = img / 255.0 |
|
|
return np.expand_dims(img, axis=0) |
|
|
|
|
|
input_img = preprocess('test_face.jpg') |
|
|
pred = model.predict(input_img) |
|
|
print("Fake" if pred[0][0] > 0.5 else "Real") |
|
|
``` |
|
|
|
|
|
**Input shape**: `(1, 224, 224, 3)` |
|
|
**Output**: Probability of being fake |
|
|
|
|
|
⚠️ *Fails with very low-resolution images or occluded faces.* |
|
|
|
|
|
## System |
|
|
|
|
|
This model is **standalone**, usable in any face verification system or deepfake detection pipeline. Inputs should be properly aligned face crops. Output can be integrated into moderation systems or alerts. |
|
|
|
|
|
**Dependencies**: Keras/TensorFlow, OpenCV for preprocessing |
|
|
|
|
|
## Implementation requirements |
|
|
|
|
|
* Trained on Google Colab with a single NVIDIA T4 GPU |
|
|
* Training time: \~6 hours over 30 epochs |
|
|
* Model inference: <50ms per image |
|
|
* Memory requirement: \~150MB RAM at inference |
|
|
|
|
|
# Model Characteristics |
|
|
|
|
|
## Model initialization |
|
|
|
|
|
The model was **trained from scratch** using CNN layers, ReLU activations, dropout, and batch normalization. |
|
|
|
|
|
## Model stats |
|
|
|
|
|
* Size: \~10MB |
|
|
* Layers: \~8 convolutional layers + dense head |
|
|
* Inference latency: \~40ms on GPU, \~200ms on CPU |
|
|
|
|
|
## Other details |
|
|
|
|
|
* Not pruned or quantized |
|
|
* No use of differential privacy during training |
|
|
|
|
|
# Data Overview |
|
|
|
|
|
## Training data |
|
|
|
|
|
* Dataset: FaceForensics++ (c23 compression level) |
|
|
* Preprocessing: face alignment (using Dlib), resize to 224x224, normalization |
|
|
* Augmentations: horizontal flip, brightness variation |
|
|
|
|
|
## Demographic groups |
|
|
|
|
|
The dataset contains celebrity faces scraped from YouTube. It includes a mix of ethnicities and genders, but **not balanced or labeled** explicitly by demographic. |
|
|
|
|
|
## Evaluation data |
|
|
|
|
|
* Train/Val/Test: 70% / 15% / 15% |
|
|
* The test set includes unseen identities and manipulations (Deepfakes, FaceSwap, NeuralTextures) |
|
|
|
|
|
# Evaluation Results |
|
|
|
|
|
## Summary |
|
|
|
|
|
* Accuracy: \~92% |
|
|
* F1 Score: 0.91 |
|
|
* ROC-AUC: 0.95 |
|
|
|
|
|
## Subgroup evaluation results |
|
|
|
|
|
No explicit subgroup evaluation was conducted, but performance dropped slightly on: |
|
|
|
|
|
* Low-light images |
|
|
* Images with occlusions (masks, hands) |
|
|
|
|
|
## Fairness |
|
|
|
|
|
No explicit fairness metrics were applied due to lack of demographic labels. However, output bias may exist due to uneven representation in training data. |
|
|
|
|
|
## Usage limitations |
|
|
|
|
|
* Struggles on low-res or occluded faces |
|
|
* Doesn’t work on audio-based or voice deepfakes |
|
|
* Requires good lighting and clear facial visibility |
|
|
* Not suitable for legal or forensics-grade use cases without further testing |
|
|
|
|
|
## Ethics |
|
|
|
|
|
This model is intended for **educational and research purposes only**. It should not be used to make real-world judgments (legal, political, etc.) without human oversight. Deepfake detection systems must be transparent about their limitations and avoid misuse in surveillance or personal targeting. |