Document Forgery Detector

A fine-tuned Vision Transformer (ViT) model for detecting forged or tampered documents. Classifies any document image as either real or forged with 92.2% accuracy.

This model was developed as a Final Year Project (FYP) at Sir Syed University of Engineering & Technology (SSUET), Karachi, Pakistan.


Model Details

Model Description

  • Model type: Vision Transformer (ViT) fine-tuned for binary image classification
  • Base model: google/vit-base-patch16-224
  • Developed by: M. Umair Khan Computer Engineering Technology, SSUET Karachi
  • Institution: Sir Syed University of Engineering & Technology (SSUET), Karachi, Pakistan
  • Project type: Final Year Project (FYP)
  • Language(s): English
  • License: MIT
  • Finetuned from: google/vit-base-patch16-224

Uses

Direct Use

This model can be used to detect whether a scanned or photographed document has been tampered with or forged. Suitable for:

  • Identity document verification (ID cards, passports)
  • Academic certificate authentication
  • Invoice and financial document fraud detection
  • General document integrity checks

Downstream Use

Can be integrated into document verification pipelines, KYC (Know Your Customer) systems, HR onboarding tools, or any workflow that requires document authenticity checks.

Out-of-Scope Use

  • This model is not designed for pixel-level forgery localization (it predicts a document-level label only)
  • Not suitable for handwriting verification or signature authentication
  • Should not be used as the sole verification mechanism in high-stakes legal or financial decisions without human review

How to Get Started

from transformers import ViTForImageClassification, ViTImageProcessor
from PIL import Image, ImageChops
import torch
import torch.nn.functional as F
import io

# Load model and processor
model     = ViTForImageClassification.from_pretrained('zodumair/document-forgery-detector')
processor = ViTImageProcessor.from_pretrained('zodumair/document-forgery-detector')

def compute_ela(image_path, quality=90, scale=15):
    original = Image.open(image_path).convert('RGB')
    buf = io.BytesIO()
    original.save(buf, 'JPEG', quality=quality)
    buf.seek(0)
    recompressed = Image.open(buf).convert('RGB')
    ela = ImageChops.difference(original, recompressed)
    max_diff = max([ex[1] for ex in ela.getextrema()]) or 1
    ela = ela.point(lambda px: min(255, int(px * (255.0 / max_diff) * (scale / 10.0))))
    return ela

def predict(image_path):
    img = Image.open(image_path).convert('RGB')
    ela = compute_ela(image_path)
    blended = Image.blend(img, ela, alpha=0.3)
    inputs = processor(images=blended, return_tensors='pt')
    with torch.no_grad():
        logits = model(**inputs).logits
        probs  = F.softmax(logits, dim=-1)
        pred   = torch.argmax(probs).item()
    return {'label': model.config.id2label[pred], 'confidence': probs[0][pred].item()}

result = predict('your_document.jpg')
print(result)  # {'label': 'real', 'confidence': 0.97}

Training Details

Training Data

The model was trained on a combined dataset of 2000 real and 2000 forged document images:

  • Real documents: Sourced from chainyo/rvl-cdip (RVL-CDIP dataset) โ€” real scanned documents across 16 categories including invoices, letters, forms, emails, resumes, and more
  • Synthetic real documents: Faker-generated documents (invoices, ID cards, certificates, passports, transcripts) rendered using PIL
  • Forged documents: Programmatically generated by applying forgery attack functions to real documents, including:
    • Copy-move attack (region duplication)
    • Text replacement (erase and rewrite field values)
    • Stamp overlay (fake verification stamps)
    • JPEG compression artifacts (double-compression of regions)
    • Splicing (pasting regions from different documents)

Preprocessing

Each image undergoes Error Level Analysis (ELA) blending before being passed to the model. ELA highlights regions with inconsistent compression levels โ€” a reliable indicator of tampering. The ELA map is blended with the original image at alpha=0.3 before resizing to 224x224.

Training Hyperparameters

Parameter Value
Base model google/vit-base-patch16-224
Epochs 20 (best at epoch 13)
Batch size 32
Learning rate 1e-5
LR scheduler Cosine
Weight decay 0.05
Warmup steps 200
Label smoothing 0.1
Classifier dropout 0.4
Mixed precision FP16
Hardware Google Colab T4 GPU
Training time ~28 minutes

Model Details

  • Model type: Vision Transformer (ViT) for image classification
  • Base model: google/vit-base-patch16-224
  • Task: Binary classification (Real vs Forged documents)
  • Developed by: M. Umair Khan, Computer Engineering Technology
  • Institution: SSUET Karachi, Pakistan
  • License: MIT
  • Frameworks: PyTorch, HuggingFace Transformers

  • JPEG compression artifacts
  • Region splicing

Training Configuration

Parameter Value
Base model google/vit-base-patch16-224
Epochs 15
Batch size 32
Learning rate 1e-5
Scheduler Cosine
Weight decay 0.05
Warmup steps 200
Label smoothing 0.1
Dropout 0.4
Precision FP16
Hardware Google Colab T4 GPU

Evaluation Results

Verified Test Performance (500 random samples)

Metric Score
Accuracy ~91%
F1 Score ~0.91

This result is based on randomized evaluation over 500 unseen test samples.


Training Progress

Epoch Train Loss Val Loss Accuracy F1
1 0.715 0.688 0.543 0.539
2 0.574 0.546 0.749 0.700
3 0.449 0.405 0.870 0.868
4 0.389 0.375 0.886 0.886
5 0.392 0.374 0.881 0.875
6 0.359 0.365 0.887 0.885
7 0.334 0.374 0.888 0.883
8 0.328 0.358 0.894 0.893
9 0.328 0.371 0.891 0.888
10 0.308 0.369 0.901 0.900
11 0.306 0.364 0.907 0.907
12 0.296 0.364 0.903 0.902
13 0.265 0.370 0.901 0.900
14 0.276 0.374 0.901 0.899
15 0.262 0.383 0.894 0.890

Bias, Risks, and Limitations

  • The forgery attacks used in training are programmatic โ€” the model may not generalise perfectly to sophisticated AI-generated forgeries (e.g. deepfake documents, inpainting-based edits)
  • Performance may vary on document types not well represented in RVL-CDIP
  • The model predicts a document-level label only โ€” it does not localise which region was forged
  • Should be used as a screening tool, not as a definitive legal verdict

Environmental Impact

  • Hardware: Google Colab T4 GPU (NVIDIA Tesla T4, 16GB VRAM)
  • Cloud provider: Google Colab
  • Training time: ~28 minutes
  • Compute region: Google Cloud (us-central1)
  • Carbon emissions can be estimated using the ML Impact Calculator

Citation

If you use this model in your research or project, please cite:

@misc{umair2025forgerydetector,
  author    = {M. Umair Khan},
  title     = {Document Forgery Detector: A Fine-tuned ViT for Document Authenticity Classification},
  year      = {2026},
  publisher = {HuggingFace},
  institution = {Sir Syed University of Engineering & Technology, Karachi, Pakistan},
  url       = {https://huggingface.co/zodumair/document-forgery-detector}
}

Model Card Authors

M. Umair Khan Computer Engineering Technology Final Year Sir Syed University of Engineering & Technology (SSUET), Karachi, Pakistan


This model was developed as part of a Final Year Project (FYP) at SSUET Karachi. Built using HuggingFace Transformers, PyTorch, and Google Colab.

Downloads last month
212
Safetensors
Model size
85.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for zodumair/document-forgery-detector

Finetuned
(2058)
this model

Space using zodumair/document-forgery-detector 1