metadata
base_model: facebook/dinov3-vitl16-pretrain-lvd1689m
datasets:
- DocTamperV1
- vankey/RealText-V2
- Jason37437/RealText-V2-Syn25k
language: en
library_name: pytorch
license: mit
metrics:
- f1
pipeline_tag: image-segmentation
tags:
- document-forgery-detection
- tampering-detection
- image-manipulation
- vision-transformer
- lora
SEED Detector
This repository contains the official detector model for SEED, presented in the paper SEED: Simple ViT and Evolving Harness for Explainable Text Forgery Detection.
SEED Detector is a lightweight vision transformer model for document forgery detection. It localizes tampered regions in document images and classifies images as real or forged.
Architecture
| Component | Detail |
|---|---|
| Backbone | DINOv3 ViT-L/16 |
| Finetuning | LoRA (rank=1, attention + MLP) |
| Queries | 1 mask query |
| Decoder blocks | 4 |
| Input size | 512 × 512 |
| Parameters | ~304M (only ~1M trainable with LoRA) |
Usage
Repository: GitHub | Checkpoint: Jason37437/SEED / Google Drive
from model.hf_wrapper import EoMTForTamperingDetection
model = EoMTForTamperingDetection.from_pretrained("Jason37437/SEED")
model.eval()
# The model outputs:
# - mask_logits: per-query segmentation masks
# - class_logits: per-query foreground/background scores
# - image_logits: image-level real vs forged classification
Performance
Localization (pixel-level F1)
| Dataset | F1 |
|---|---|
| T-SROIE | 0.782 |
| OSTF | 0.718 |
| TPIC-13 | 0.798 |
| RTM | 0.178 |
| Avg | 0.619 |
Detection (image-level F1)
| Dataset | F1 |
|---|---|
| T-SROIE | 0.738 |
| OSTF | 0.832 |
| TPIC-13 | 0.930 |
| RTM | 0.207 |
| Avg | 0.677 |
Citation
@article{wong2026seed,
title={SEED: Simple ViT and Evolving Harness for Explainable Text Forgery Detection},
author={Wong, Kahim and others},
journal={arXiv preprint arXiv:2606.21138},
year={2026}
}
License
MIT License.