PIXAR-7B_lite

PIXAR-7B_lite is a Vision-Language Model (VLM) for image tampering analysis, introduced in the paper "From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering", it was trained on a subset of the training set that only contains mask tampered images.

Given a query image, PIXAR-7B_lite jointly performs:

  • Binary โ€” real or tampered
  • Object classification โ€” identifies which of 81 COCO categories was modified
  • Pixel-level localization โ€” generates a segmentation mask over the tampered region
  • Natural language description โ€” describes what was changed and how

Model Description

  • Developed by: MBZUAI VILA Lab
  • Model type: Multimodal Vision-Language Model for Image Tampering Detection
  • License: MIT
  • Base model: SIDA-7B (LLaVA + LLaMA-2)
  • Paper: From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering

Architecture

PIXAR-7B_lite is built on a LLaVA + LLaMA-2 backbone with LoRA fine-tuning (rank 8), integrated with SAM ViT-H for pixel-level decoding and CLIP ViT-L/14 for visual-language alignment. Three special tokens are inserted into the token sequence to anchor multi-task prediction heads:

Token Role
[CLS] 3-way classification (real / fully synthetic / tampered) via a linear head
[OBJ] Multi-label object recognition over 81 COCO categories via a linear head
[SEG] Pixel-level segmentation mask generation via SAM, optionally fused with the generated text embedding

How to Get Started

For interactive inference, see the project repository:

python chat.py --version jiachengcui888/PIXAR-7B_lite --precision bf16 --seg_prompt_mode seg_only

Training Details

Training Procedure

Fine-tuned with DeepSpeed on a LLaVA + LLaMA-2 backbone using LoRA (rank 8). Key hyperparameters:

Hyperparameter Value
LoRA rank 8
Learning rate 1e-4
Batch size 2
Precision bf16
Threshold ฯ„ 0.05
text 3.0
cls 1.0
bce 1.0
dice 1.0
sem 0.5

Citation


Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jiachengcui888/PIXAR-7B_lite

Finetuned
saberzl/SIDA-7B
Finetuned
(3)
this model