PIXAR-7B_lite

PIXAR-7B_lite is a Vision-Language Model (VLM) for image tampering analysis, introduced in the paper "From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering", it was trained on a subset of the training set that only contains mask tampered images.

Given a query image, PIXAR-7B_lite jointly performs:

Binary — real or tampered
Object classification — identifies which of 81 COCO categories was modified
Pixel-level localization — generates a segmentation mask over the tampered region
Natural language description — describes what was changed and how

Model Description

Developed by: MBZUAI VILA Lab
Model type: Multimodal Vision-Language Model for Image Tampering Detection
License: MIT
Base model: SIDA-7B (LLaVA + LLaMA-2)
Paper: From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering

Architecture

PIXAR-7B_lite is built on a LLaVA + LLaMA-2 backbone with LoRA fine-tuning (rank 8), integrated with SAM ViT-H for pixel-level decoding and CLIP ViT-L/14 for visual-language alignment. Three special tokens are inserted into the token sequence to anchor multi-task prediction heads:

Token	Role
`[CLS]`	3-way classification (real / fully synthetic / tampered) via a linear head
`[OBJ]`	Multi-label object recognition over 81 COCO categories via a linear head
`[SEG]`	Pixel-level segmentation mask generation via SAM, optionally fused with the generated text embedding

How to Get Started

For interactive inference, see the project repository:

python chat.py --version jiachengcui888/PIXAR-7B_lite --precision bf16 --seg_prompt_mode seg_only

Training Details

Training Procedure

Fine-tuned with DeepSpeed on a LLaVA + LLaMA-2 backbone using LoRA (rank 8). Key hyperparameters:

Hyperparameter	Value
LoRA rank	8
Learning rate	1e-4
Batch size	2
Precision	bf16
Threshold τ	0.05
text	3.0
cls	1.0
bce	1.0
dice	1.0
sem	0.5

Citation

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jiachengcui888/PIXAR-7B_lite

Base model

xinlai/LISA-7B-v1

Finetuned

saberzl/SIDA-7B

Finetuned

(3)

this model