Document Forgery Detector

A fine-tuned Vision Transformer (ViT) model for detecting forged or tampered documents. Classifies any document image as either real or forged with 92.2% accuracy.

This model was developed as a Final Year Project (FYP) at Sir Syed University of Engineering & Technology (SSUET), Karachi, Pakistan.

Model Details

Model Description

Model type: Vision Transformer (ViT) fine-tuned for binary image classification
Base model: google/vit-base-patch16-224
Developed by: M. Umair Khan Computer Engineering Technology, SSUET Karachi
Institution: Sir Syed University of Engineering & Technology (SSUET), Karachi, Pakistan
Project type: Final Year Project (FYP)
Language(s): English
License: MIT
Finetuned from: google/vit-base-patch16-224

Uses

Direct Use

This model can be used to detect whether a scanned or photographed document has been tampered with or forged. Suitable for:

Identity document verification (ID cards, passports)
Academic certificate authentication
Invoice and financial document fraud detection
General document integrity checks

Downstream Use

Can be integrated into document verification pipelines, KYC (Know Your Customer) systems, HR onboarding tools, or any workflow that requires document authenticity checks.

Out-of-Scope Use

This model is not designed for pixel-level forgery localization (it predicts a document-level label only)
Not suitable for handwriting verification or signature authentication
Should not be used as the sole verification mechanism in high-stakes legal or financial decisions without human review

How to Get Started

from transformers import ViTForImageClassification, ViTImageProcessor
from PIL import Image, ImageChops
import torch
import torch.nn.functional as F
import io

# Load model and processor
model     = ViTForImageClassification.from_pretrained('zodumair/document-forgery-detector')
processor = ViTImageProcessor.from_pretrained('zodumair/document-forgery-detector')

def compute_ela(image_path, quality=90, scale=15):
    original = Image.open(image_path).convert('RGB')
    buf = io.BytesIO()
    original.save(buf, 'JPEG', quality=quality)
    buf.seek(0)
    recompressed = Image.open(buf).convert('RGB')
    ela = ImageChops.difference(original, recompressed)
    max_diff = max([ex[1] for ex in ela.getextrema()]) or 1
    ela = ela.point(lambda px: min(255, int(px * (255.0 / max_diff) * (scale / 10.0))))
    return ela

def predict(image_path):
    img = Image.open(image_path).convert('RGB')
    ela = compute_ela(image_path)
    blended = Image.blend(img, ela, alpha=0.3)
    inputs = processor(images=blended, return_tensors='pt')
    with torch.no_grad():
        logits = model(**inputs).logits
        probs  = F.softmax(logits, dim=-1)
        pred   = torch.argmax(probs).item()
    return {'label': model.config.id2label[pred], 'confidence': probs[0][pred].item()}

result = predict('your_document.jpg')
print(result)  # {'label': 'real', 'confidence': 0.97}

Training Details

Training Data

The model was trained on a combined dataset of 2000 real and 2000 forged document images:

Real documents: Sourced from chainyo/rvl-cdip (RVL-CDIP dataset) — real scanned documents across 16 categories including invoices, letters, forms, emails, resumes, and more
Synthetic real documents: Faker-generated documents (invoices, ID cards, certificates, passports, transcripts) rendered using PIL
Forged documents: Programmatically generated by applying forgery attack functions to real documents, including:
- Copy-move attack (region duplication)
- Text replacement (erase and rewrite field values)
- Stamp overlay (fake verification stamps)
- JPEG compression artifacts (double-compression of regions)
- Splicing (pasting regions from different documents)

Preprocessing

Each image undergoes Error Level Analysis (ELA) blending before being passed to the model. ELA highlights regions with inconsistent compression levels — a reliable indicator of tampering. The ELA map is blended with the original image at alpha=0.3 before resizing to 224x224.

Training Hyperparameters

Parameter	Value
Base model	google/vit-base-patch16-224
Epochs	20 (best at epoch 13)
Batch size	32
Learning rate	1e-5
LR scheduler	Cosine
Weight decay	0.05
Warmup steps	200
Label smoothing	0.1
Classifier dropout	0.4
Mixed precision	FP16
Hardware	Google Colab T4 GPU
Training time	~28 minutes

Model Details

Model type: Vision Transformer (ViT) for image classification
Base model: google/vit-base-patch16-224
Task: Binary classification (Real vs Forged documents)
Developed by: M. Umair Khan, Computer Engineering Technology
Institution: SSUET Karachi, Pakistan
License: MIT
Frameworks: PyTorch, HuggingFace Transformers

JPEG compression artifacts
Region splicing

Training Configuration

Parameter	Value
Base model	google/vit-base-patch16-224
Epochs	15
Batch size	32
Learning rate	1e-5
Scheduler	Cosine
Weight decay	0.05
Warmup steps	200
Label smoothing	0.1
Dropout	0.4
Precision	FP16
Hardware	Google Colab T4 GPU

Evaluation Results

Verified Test Performance (500 random samples)

Metric	Score
Accuracy	~91%
F1 Score	~0.91

This result is based on randomized evaluation over 500 unseen test samples.

Training Progress

Epoch	Train Loss	Val Loss	Accuracy	F1
1	0.715	0.688	0.543	0.539
2	0.574	0.546	0.749	0.700
3	0.449	0.405	0.870	0.868
4	0.389	0.375	0.886	0.886
5	0.392	0.374	0.881	0.875
6	0.359	0.365	0.887	0.885
7	0.334	0.374	0.888	0.883
8	0.328	0.358	0.894	0.893
9	0.328	0.371	0.891	0.888
10	0.308	0.369	0.901	0.900
11	0.306	0.364	0.907	0.907
12	0.296	0.364	0.903	0.902
13	0.265	0.370	0.901	0.900
14	0.276	0.374	0.901	0.899
15	0.262	0.383	0.894	0.890

Bias, Risks, and Limitations

The forgery attacks used in training are programmatic — the model may not generalise perfectly to sophisticated AI-generated forgeries (e.g. deepfake documents, inpainting-based edits)
Performance may vary on document types not well represented in RVL-CDIP
The model predicts a document-level label only — it does not localise which region was forged
Should be used as a screening tool, not as a definitive legal verdict

Environmental Impact

Hardware: Google Colab T4 GPU (NVIDIA Tesla T4, 16GB VRAM)
Cloud provider: Google Colab
Training time: ~28 minutes
Compute region: Google Cloud (us-central1)
Carbon emissions can be estimated using the ML Impact Calculator

Citation

If you use this model in your research or project, please cite:

@misc{umair2025forgerydetector,
  author    = {M. Umair Khan},
  title     = {Document Forgery Detector: A Fine-tuned ViT for Document Authenticity Classification},
  year      = {2026},
  publisher = {HuggingFace},
  institution = {Sir Syed University of Engineering & Technology, Karachi, Pakistan},
  url       = {https://huggingface.co/zodumair/document-forgery-detector}
}

Model Card Authors

M. Umair Khan Computer Engineering Technology Final Year Sir Syed University of Engineering & Technology (SSUET), Karachi, Pakistan

This model was developed as part of a Final Year Project (FYP) at SSUET Karachi. Built using HuggingFace Transformers, PyTorch, and Google Colab.

Downloads last month: 212

Safetensors

Model size

85.8M params

Tensor type

F32

Model tree for zodumair/document-forgery-detector

Base model

google/vit-base-patch16-224

Finetuned

(2058)

this model

zodumair
/

document-forgery-detector

Document Forgery Detector

Model Details

Model Description

Uses

Direct Use

Downstream Use

Out-of-Scope Use

How to Get Started

Training Details

Training Data

Preprocessing

Training Hyperparameters

Model Details

Training Configuration

Evaluation Results

Verified Test Performance (500 random samples)

Training Progress

Bias, Risks, and Limitations

Environmental Impact

Citation

Model Card Authors

Model tree for zodumair/document-forgery-detector

Space using zodumair/document-forgery-detector 1