| --- |
| license: bsd-3-clause |
| library_name: lavis |
| pipeline_tag: visual-question-answering |
| tags: |
| - explainable-ai |
| - deepfake-detection |
| - vlm |
| - instructblip |
| - forensic-explanation |
| - acl-2026 |
| --- |
| |
| # DFF: InstructBLIP-based Explainable DeepFake Detection |
|
|
| ## π Model Description |
| This is the core **DFF (DeepFake Detection and Forensic Explanation Framework)** model as described in the ACL 2026 paper: |
| *"Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline"*. |
|
|
| DFF is built upon the **InstructBLIP (Flan-T5 XL)** architecture. By integrating the Face-ViT auxiliary classifier, it achieves state-of-the-art performance in both **forgery localization (mask generation)** and **forensic explanation (captioning)**. |
|
|
| ## π Key Capabilities |
| 1. **Forgery Localization**: Generates high-resolution binary masks highlighting manipulated facial regions. |
| 2. **Natural Language Explanation**: Produces detailed text describing why a specific image is considered a forgery (e.g., "The texture around the eyes is unnatural due to GAN-based blending"). |
|
|
| ## π οΈ Model Details |
| - **Base LLM**: Flan-T5 XL. |
| - **Visual Encoder**: EVA-ViT-G. |
| - **Auxiliary Module**: Face-ViT (Multi-label perception). |
| - **Task**: Explainable Detection & Multi-modal Attribution Reporting. |
|
|
| ## π Links |
| - **Official Code**: [Generating-Attribution-Reports](https://github.com/NattyLianJc/Generating-Attribution-Reports) |
| - **Auxiliary Classifier**: [LianJC/Face-ViT-MultiLabel](https://huggingface.co/LianJC/Face-ViT-MultiLabel) |
| - **Dataset (MMTT)**: [LianJC/MMTT-Dataset](https://huggingface.co/datasets/LianJC/MMTT-Dataset) |
|
|
| ## π Citation |
| ```bibtex |
| @inproceedings{lian2026generating, |
| title={Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline}, |
| author={Lian, Jingchun and others}, |
| booktitle={Proceedings of ACL}, |
| year={2026}, |
| note={To appear} |
| } |