MTS (Multi-scale Token Selection) - Attention Analysis Tools
This repository contains scripts and custom models for analyzing attention patterns in vision-language models, specifically for the MTS (Multi-scale Token Selection) project based on Qwen2.5-VL.
π Contents
Analysis Scripts
scripts/run_fg_bg_analysis.sh: Foreground vs Background token attention analysis- Analyzes attention distribution on foreground vs background image regions
- Supports multiple layers: 2, 6, 15, 27
- Configurable bins for histogram analysis
scripts/qsub_save_paco_attention_50samples.pbs: PBS job script for saving attention- Saves attention data for PACO dataset (50 samples)
- Configured for large memory nodes (128GB)
- Layers: 2, 6, 15, 27
Custom Qwen2.5-VL Implementation
modeling_qwen2_5_vl.py: Main model with MTS token selectionmodeling_qwen2_5_vl_fast.py: Optimized versionmultiscale_image_processor.py: Multi-scale image processingmultiscale_processor_fast.py: Fast processorconfiguration_qwen2_5_vl.py: Model configurationimage_processing_qwen2_vl.py: Image preprocessing
π Key Features
Multi-scale Token Selection (MTS)
- Dynamic token selection based on importance scores
- Reduces computational cost while maintaining performance
- Supports foreground and background token analysis
Attention Visualization
- Saves attention weights for selected layers
- Analyzes attention patterns
- Foreground/background distribution analysis
π» Usage
Attention Analysis
bash scripts/run_fg_bg_analysis.sh
Save Attention Data (PBS)
qsub scripts/qsub_save_paco_attention_50samples.pbs
π§ Environment Setup
Required packages:
- PyTorch
- Transformers
- Hugging Face Hub
- NumPy, Matplotlib, PIL
Environment variables:
SAVE_MTS_ATTENTION=1: Enable attention savingMTS_ATTENTION_DIR: Directory for attention weightsMTS_ATTENTION_LAYERS: Comma-separated layer indicesMTS_ATTENTION_MAX_SAMPLES: Maximum samplesUSE_GT_MASK=1: Use ground truth masksDISABLE_FAST_ATTENTION=1: Force vanilla attention
π Model Architecture
Based on Qwen2.5-VL-3B with:
- Multi-scale token selection mechanism
- Efficient attention computation
- Ground truth mask integration
- Foreground/background analysis support
π Related Resources
π License
Apache 2.0
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support