--- license: bsd-3-clause library_name: lavis pipeline_tag: visual-question-answering tags: - explainable-ai - deepfake-detection - vlm - instructblip - forensic-explanation - acl-2026 --- # DFF: InstructBLIP-based Explainable DeepFake Detection ## 📖 Model Description This is the core **DFF (DeepFake Detection and Forensic Explanation Framework)** model as described in the ACL 2026 paper: *"Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline"*. DFF is built upon the **InstructBLIP (Flan-T5 XL)** architecture. By integrating the Face-ViT auxiliary classifier, it achieves state-of-the-art performance in both **forgery localization (mask generation)** and **forensic explanation (captioning)**. ## 🌟 Key Capabilities 1. **Forgery Localization**: Generates high-resolution binary masks highlighting manipulated facial regions. 2. **Natural Language Explanation**: Produces detailed text describing why a specific image is considered a forgery (e.g., "The texture around the eyes is unnatural due to GAN-based blending"). ## 🛠️ Model Details - **Base LLM**: Flan-T5 XL. - **Visual Encoder**: EVA-ViT-G. - **Auxiliary Module**: Face-ViT (Multi-label perception). - **Task**: Explainable Detection & Multi-modal Attribution Reporting. ## 🚀 Links - **Official Code**: [Generating-Attribution-Reports](https://github.com/NattyLianJc/Generating-Attribution-Reports) - **Auxiliary Classifier**: [LianJC/Face-ViT-MultiLabel](https://huggingface.co/LianJC/Face-ViT-MultiLabel) - **Dataset (MMTT)**: [LianJC/MMTT-Dataset](https://huggingface.co/datasets/LianJC/MMTT-Dataset) ## 📜 Citation ```bibtex @inproceedings{lian2026generating, title={Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline}, author={Lian, Jingchun and others}, booktitle={Proceedings of ACL}, year={2026}, note={To appear} }