DAViD β Checkpoints (ViFi-CLIP encoder + classification head)
Model weights for DAViD, a deepfake & AI-generated video/image detector.
This repo hosts two checkpoints:
| File | Size | Description |
|---|---|---|
k400_clip_complete_finetuned_30_epochs.pth |
~1.6 GB | ViFi-CLIP (ViT-B/16) image encoder, fine-tuned on Kinetics-400 for 30 epochs |
best_detector_model.pt |
~3 MB | MLP classification head (dense β dense1 β dense2), trained on the DAViD dataset + CDDB |
How they fit together
- Encoder β a ViFi-CLIP (ViT-B/16) visual backbone fine-tuned on Kinetics-400. Each frame (or image) is encoded into a 512-dim embedding.
- Classification head β a lightweight MLP that maps the (averaged) 512-dim
embedding to 3 classes:
real,deepfake,ai_gen. It was trained on a mix of the DAViD video dataset and CDDB (an image-based deepfake benchmark), so it supports both video and single-image input.
Why these live on the Hub
The DAViD Space downloads these at Docker build time. They were previously on Google Drive, but Drive throttles datacenter IPs and broke the Space build. Serving them from the HF Hub is reliable from HF's build infrastructure.
Usage
1. Get the model code from GitHub
The model definitions (model.py, encoder.py, and the clip/ package) are
not in this weights repo β they live in the training repo
aitf-its-tim3-dfk/david
(branch feat-cddb). Clone it first and run from inside it:
git clone -b feat-cddb https://github.com/aitf-its-tim3-dfk/david
cd david
pip install -r requirements.txt
This is what makes from model import ... and from encoder import ... below work.
2. Download the checkpoints (no auth needed β public repo)
from huggingface_hub import hf_hub_download
REPO = "aitf-its-tim3-dfk/david-encoder"
encoder_ckpt = hf_hub_download(REPO, "k400_clip_complete_finetuned_30_epochs.pth")
classifier_ckpt = hf_hub_download(REPO, "best_detector_model.pt")
3. Load and run
import torch
from encoder import load_feature_extractor # from the cloned GitHub repo
from model import ClassificationHead # from the cloned GitHub repo
feature_extractor = load_feature_extractor(
arch="ViT-B/16",
class_names=("real", "deepfake", "ai_gen"),
checkpoint_path=encoder_ckpt,
).eval()
classifier = ClassificationHead(input_dim=512, num_classes=3)
classifier.load_state_dict(torch.load(classifier_ckpt, map_location="cpu", weights_only=False))
classifier.eval()
# feats = feature_extractor.image_encoder(frames) # (N, 512)
# logits = classifier(feats.mean(dim=0, keepdim=True)) # (1, 3)
Training
- Encoder: CLIP ViT-B/16 (ViFi-CLIP), fine-tuned on Kinetics-400, 30 epochs, output dim 512.
- Classification head: MLP trained on DAViD video dataset + CDDB images (branch
feat-cddb).
Related
- π°οΈ Space:
aitf-its-tim3-dfk/David - π§ͺ Training code:
aitf-its-tim3-dfk/david(branchfeat-cddb)
License
Set the appropriate license for these weights (currently other). The CLIP
backbone, Kinetics-400, and CDDB carry their own upstream licenses/terms.