LianJC
/

Face-ViT-MultiLabel

Model card Files Files and versions

Face-ViT-MultiLabel / README.md

LianJC's picture

Create README.md

d8b37f8 verified 27 days ago

|

history blame contribute delete

1.77 kB

	license: bsd-3-clause
	library_name: pytorch
	pipeline_tag: image-classification
	tags:
	- facial-forgery-detection
	- multi-label-classification
	- vit
	- deepfake
	- acl-2026
	---

	# Face-ViT: Multi-Label Facial Forgery Region Classifier

	## 📖 Model Description
	This is the Face-ViT auxiliary perception module proposed in the ACL 2026 paper:
	"Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline".

	Face-ViT is a multi-label classifier based on the ViT-H/14 architecture. It is specifically trained to recognize 21 different types of facial manipulations (e.g., eye modification, skin smoothing, mouth tampering). In the DFF framework, it provides fine-grained visual cues that guide the large language model to generate accurate forensic explanations.

	## 🛠️ Model Details
	- Architecture: ViT-H/14 with an additional CNN branch and max-pooling for multi-label support.
	- Input Size: 224x224 RGB images.
	- Number of Classes: 21 (Facial attributes/manipulation types).
	- Training Objective: Joint loss including BCE, Focal, Dice, and Jaccard loss.

	## 🚀 Links
	- Official Code: [Generating-Attribution-Reports](https://github.com/NattyLianJc/Generating-Attribution-Reports)
	- Main Framework (DFF): [LianJC/DFF-InstructBLIP-Detection](https://huggingface.co/LianJC/DFF-InstructBLIP-Detection)
	- Dataset (MMTT): [LianJC/MMTT-Dataset](https://huggingface.co/datasets/LianJC/MMTT-Dataset)

	## 📜 Citation
	If you find this model useful, please cite:
	```bibtex
	@inproceedings{lian2026generating,
	title={Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline},
	author={Lian, Jingchun and others},
	booktitle={Proceedings of ACL},
	year={2026},
	note={To appear}
	}