MTS (Multi-scale Token Selection) - Attention Analysis Tools

This repository contains scripts and custom models for analyzing attention patterns in vision-language models, specifically for the MTS (Multi-scale Token Selection) project based on Qwen2.5-VL.

📊 Contents

Analysis Scripts

scripts/run_fg_bg_analysis.sh: Foreground vs Background token attention analysis
- Analyzes attention distribution on foreground vs background image regions
- Supports multiple layers: 2, 6, 15, 27
- Configurable bins for histogram analysis
scripts/qsub_save_paco_attention_50samples.pbs: PBS job script for saving attention
- Saves attention data for PACO dataset (50 samples)
- Configured for large memory nodes (128GB)
- Layers: 2, 6, 15, 27

Custom Qwen2.5-VL Implementation

modeling_qwen2_5_vl.py: Main model with MTS token selection
modeling_qwen2_5_vl_fast.py: Optimized version
multiscale_image_processor.py: Multi-scale image processing
multiscale_processor_fast.py: Fast processor
configuration_qwen2_5_vl.py: Model configuration
image_processing_qwen2_vl.py: Image preprocessing

🚀 Key Features

Multi-scale Token Selection (MTS)

Dynamic token selection based on importance scores
Reduces computational cost while maintaining performance
Supports foreground and background token analysis

Attention Visualization

Saves attention weights for selected layers
Analyzes attention patterns
Foreground/background distribution analysis

💻 Usage

Attention Analysis

bash scripts/run_fg_bg_analysis.sh

Save Attention Data (PBS)

qsub scripts/qsub_save_paco_attention_50samples.pbs

🔧 Environment Setup

Required packages:

PyTorch
Transformers
Hugging Face Hub
NumPy, Matplotlib, PIL

Environment variables:

SAVE_MTS_ATTENTION=1: Enable attention saving
MTS_ATTENTION_DIR: Directory for attention weights
MTS_ATTENTION_LAYERS: Comma-separated layer indices
MTS_ATTENTION_MAX_SAMPLES: Maximum samples
USE_GT_MASK=1: Use ground truth masks
DISABLE_FAST_ATTENTION=1: Force vanilla attention

📖 Model Architecture

Based on Qwen2.5-VL-3B with:

Multi-scale token selection mechanism
Efficient attention computation
Ground truth mask integration
Foreground/background analysis support

🔗 Related Resources

📄 License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support