PIXAR-7B_lite
PIXAR-7B_lite is a Vision-Language Model (VLM) for image tampering analysis, introduced in the paper "From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering", it was trained on a subset of the training set that only contains mask tampered images.
Given a query image, PIXAR-7B_lite jointly performs:
- Binary โ real or tampered
- Object classification โ identifies which of 81 COCO categories was modified
- Pixel-level localization โ generates a segmentation mask over the tampered region
- Natural language description โ describes what was changed and how
Model Description
- Developed by: MBZUAI VILA Lab
- Model type: Multimodal Vision-Language Model for Image Tampering Detection
- License: MIT
- Base model: SIDA-7B (LLaVA + LLaMA-2)
- Paper: From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering
Architecture
PIXAR-7B_lite is built on a LLaVA + LLaMA-2 backbone with LoRA fine-tuning (rank 8), integrated with SAM ViT-H for pixel-level decoding and CLIP ViT-L/14 for visual-language alignment. Three special tokens are inserted into the token sequence to anchor multi-task prediction heads:
| Token | Role |
|---|---|
[CLS] |
3-way classification (real / fully synthetic / tampered) via a linear head |
[OBJ] |
Multi-label object recognition over 81 COCO categories via a linear head |
[SEG] |
Pixel-level segmentation mask generation via SAM, optionally fused with the generated text embedding |
How to Get Started
For interactive inference, see the project repository:
python chat.py --version jiachengcui888/PIXAR-7B_lite --precision bf16 --seg_prompt_mode seg_only
Training Details
Training Procedure
Fine-tuned with DeepSpeed on a LLaVA + LLaMA-2 backbone using LoRA (rank 8). Key hyperparameters:
| Hyperparameter | Value |
|---|---|
| LoRA rank | 8 |
| Learning rate | 1e-4 |
| Batch size | 2 |
| Precision | bf16 |
| Threshold ฯ | 0.05 |
| text | 3.0 |
| cls | 1.0 |
| bce | 1.0 |
| dice | 1.0 |
| sem | 0.5 |
Citation
- Downloads last month
- 34