Deepfake Detection via LoRA Fine-Tuned ViT

Binary classifier distinguishing real portrait photos from AI-generated faces. Fine-tunes a pre-trained ViT-B/16 using LoRA adapters (PEFT), keeping 99%+ of the backbone frozen while adapting only the attention projections. LoRA adapters are merged before export β€” no PEFT dependency at inference time.

Primary dataset: 140K Real and Fake Faces β€” 140 000 images, perfectly balanced, predefined train/valid/test split. Real faces from Flickr, fake faces generated with StyleGAN2.

Model Details

Property Value
Backbone ViT-B/16 (google/vit-base-patch16-224)
Adapter LoRA r=16, target: query + value
Trained params 590k / 86M (0.68%)
Input RGB image, 224Γ—224, ImageNet normalisation
Output Single logit (sigmoid β†’ fake probability)

Usage

import numpy as np
import onnxruntime as ort
from PIL import Image
from torchvision.transforms import CenterCrop, Compose, Normalize, Resize, ToTensor

transform = Compose([
    Resize(256),
    CenterCrop(224),
    ToTensor(),
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

session = ort.InferenceSession("model_quint8.onnx")

img = Image.open("face.jpg").convert("RGB")
x = transform(img).unsqueeze(0).numpy()
logit = session.run(None, {"input": x})[0][0, 0]
prob_fake = float(1 / (1 + np.exp(-logit)))
print(f"Fake probability: {prob_fake:.3f}")

Results

Dataset: 140K Real and Fake Faces β€” 100k train / 20k val / 20k test, perfectly balanced. Model: ViT-B/16 + LoRA (r=16, target: query + value projections) Training: 10 epochs, AdamW, cosine LR with warmup, batch size 128

Classification (test set):

Model Accuracy AUROC F1
PyTorch FP32 99.29% 99.98% 99.28%
ONNX FP32 99.29% 99.98% 99.28%
ONNX INT8 99.13% 99.97% 99.14%
ONNX UINT8 99.18% 99.97% 99.17%

Quantization benchmark (CPU, 100 inference runs, batch size 1):

Model Size (MB) Latency mean (ms) Latency std (ms) Size Ξ” Latency Ξ”
ONNX FP32 327.5 136.3 36.8 β€” β€”
ONNX INT8 82.9 46.9 10.2 βˆ’74.7% βˆ’65.6%

The model converges rapidly β€” 96.8% accuracy is already reached after epoch 2, with diminishing gains thereafter. LoRA keeps 99%+ of backbone parameters frozen throughout, training only ~0.68% of total parameters (590k adapter params on top of 86M ViT-B/16 backbone).

Dynamic INT8 quantization reduces model size by 4Γ— and latency by 3Γ— with a negligible 0.16 percentage point accuracy drop.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support