PIXAR-7B_lite

PIXAR-7B_lite is a Vision-Language Model (VLM) for image tampering analysis, introduced in the paper "From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering", it was trained on a subset of the training set that only contains mask tampered images.

Given a query image, PIXAR-7B_lite jointly performs:

  • Binary โ€” real or tampered
  • Object classification โ€” identifies which of 81 COCO categories was modified
  • Pixel-level localization โ€” generates a segmentation mask over the tampered region
  • Natural language description โ€” describes what was changed and how

Model Description

  • Developed by: MBZUAI VILA Lab
  • Model type: Multimodal Vision-Language Model for Image Tampering Detection
  • License: MIT
  • Base model: SIDA-7B (LLaVA + LLaMA-2)
  • Paper: From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering

Architecture

PIXAR-7B_lite is built on a LLaVA + LLaMA-2 backbone with LoRA fine-tuning (rank 8), integrated with SAM ViT-H for pixel-level decoding and CLIP ViT-L/14 for visual-language alignment. Three special tokens are inserted into the token sequence to anchor multi-task prediction heads:

Token Role
[CLS] 3-way classification (real / fully synthetic / tampered) via a linear head
[OBJ] Multi-label object recognition over 81 COCO categories via a linear head
[SEG] Pixel-level segmentation mask generation via SAM, optionally fused with the generated text embedding

How to Get Started

For interactive inference, see the project repository:

python chat.py --version jiachengcui888/PIXAR-7B_lite --precision bf16 --seg_prompt_mode seg_only

Training Details

Training Procedure

Fine-tuned with DeepSpeed on a LLaVA + LLaMA-2 backbone using LoRA (rank 8). Key hyperparameters:

Hyperparameter Value
LoRA rank 8
Learning rate 1e-4
Batch size 2
Precision bf16
Threshold ฯ„ 0.05
text 3.0
cls 1.0
bce 1.0
dice 1.0
sem 0.5

Citation


Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jiachengcui888/PIXAR-7B_lite

Finetuned
saberzl/SIDA-7B
Finetuned
(3)
this model