| --- |
| base_model: facebook/dinov3-vitl16-pretrain-lvd1689m |
| datasets: |
| - DocTamperV1 |
| - vankey/RealText-V2 |
| - Jason37437/RealText-V2-Syn25k |
| language: en |
| library_name: pytorch |
| license: mit |
| metrics: |
| - f1 |
| pipeline_tag: image-segmentation |
| tags: |
| - document-forgery-detection |
| - tampering-detection |
| - image-manipulation |
| - vision-transformer |
| - lora |
| --- |
| |
| # SEED Detector |
|
|
| This repository contains the official detector model for **SEED**, presented in the paper [SEED: Simple ViT and Evolving Harness for Explainable Text Forgery Detection](https://huggingface.co/papers/2606.21138). |
|
|
| **SEED Detector** is a lightweight vision transformer model for document forgery detection. It localizes tampered regions in document images and classifies images as real or forged. |
|
|
| ## Architecture |
|
|
| | Component | Detail | |
| |-----------|--------| |
| | Backbone | [DINOv3 ViT-L/16](https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m) | |
| | Finetuning | LoRA (rank=1, attention + MLP) | |
| | Queries | 1 mask query | |
| | Decoder blocks | 4 | |
| | Input size | 512 × 512 | |
| | Parameters | ~304M (only ~1M trainable with LoRA) | |
|
|
| ## Usage |
|
|
| **Repository:** [GitHub](https://github.com/KahimWong/GenText-Forensics-3rd-Place) | **Checkpoint:** [Jason37437/SEED](https://huggingface.co/Jason37437/SEED) / [Google Drive](https://drive.google.com/file/d/1XRbcE2eEdSBdQbyiImg5w9Dn5oMRMKhv/view?usp=drive_link) |
|
|
| ```python |
| from model.hf_wrapper import EoMTForTamperingDetection |
| |
| model = EoMTForTamperingDetection.from_pretrained("Jason37437/SEED") |
| model.eval() |
| |
| # The model outputs: |
| # - mask_logits: per-query segmentation masks |
| # - class_logits: per-query foreground/background scores |
| # - image_logits: image-level real vs forged classification |
| ``` |
|
|
| ## Performance |
|
|
| ### Localization (pixel-level F1) |
| | Dataset | F1 | |
| |---------|-----| |
| | T-SROIE | 0.782 | |
| | OSTF | 0.718 | |
| | TPIC-13 | 0.798 | |
| | RTM | 0.178 | |
| | Avg | 0.619 | |
|
|
| ### Detection (image-level F1) |
| | Dataset | F1 | |
| |---------|-----| |
| | T-SROIE | 0.738 | |
| | OSTF | 0.832 | |
| | TPIC-13 | 0.930 | |
| | RTM | 0.207 | |
| | Avg | 0.677 | |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{wong2026seed, |
| title={SEED: Simple ViT and Evolving Harness for Explainable Text Forgery Detection}, |
| author={Wong, Kahim and others}, |
| journal={arXiv preprint arXiv:2606.21138}, |
| year={2026} |
| } |
| ``` |
|
|
| ## License |
|
|
| MIT License. |