LianJC's picture
Update README.md
066b8b7 verified
metadata
license: bsd-3-clause
library_name: lavis
pipeline_tag: visual-question-answering
tags:
  - explainable-ai
  - deepfake-detection
  - vlm
  - instructblip
  - forensic-explanation
  - acl-2026

DFF: InstructBLIP-based Explainable DeepFake Detection

πŸ“– Model Description

This is the core DFF (DeepFake Detection and Forensic Explanation Framework) model as described in the ACL 2026 paper: "Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline".

DFF is built upon the InstructBLIP (Flan-T5 XL) architecture. By integrating the Face-ViT auxiliary classifier, it achieves state-of-the-art performance in both forgery localization (mask generation) and forensic explanation (captioning).

🌟 Key Capabilities

  1. Forgery Localization: Generates high-resolution binary masks highlighting manipulated facial regions.
  2. Natural Language Explanation: Produces detailed text describing why a specific image is considered a forgery (e.g., "The texture around the eyes is unnatural due to GAN-based blending").

πŸ› οΈ Model Details

  • Base LLM: Flan-T5 XL.
  • Visual Encoder: EVA-ViT-G.
  • Auxiliary Module: Face-ViT (Multi-label perception).
  • Task: Explainable Detection & Multi-modal Attribution Reporting.

πŸš€ Links

πŸ“œ Citation

@inproceedings{lian2026generating,
  title={Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline},
  author={Lian, Jingchun and others},
  booktitle={Proceedings of ACL},
  year={2026},
  note={To appear}
}