MTS (Multi-scale Token Selection) - Attention Analysis Tools

This repository contains scripts and custom models for analyzing attention patterns in vision-language models, specifically for the MTS (Multi-scale Token Selection) project based on Qwen2.5-VL.

πŸ“Š Contents

Analysis Scripts

  • scripts/run_fg_bg_analysis.sh: Foreground vs Background token attention analysis

    • Analyzes attention distribution on foreground vs background image regions
    • Supports multiple layers: 2, 6, 15, 27
    • Configurable bins for histogram analysis
  • scripts/qsub_save_paco_attention_50samples.pbs: PBS job script for saving attention

    • Saves attention data for PACO dataset (50 samples)
    • Configured for large memory nodes (128GB)
    • Layers: 2, 6, 15, 27

Custom Qwen2.5-VL Implementation

  • modeling_qwen2_5_vl.py: Main model with MTS token selection
  • modeling_qwen2_5_vl_fast.py: Optimized version
  • multiscale_image_processor.py: Multi-scale image processing
  • multiscale_processor_fast.py: Fast processor
  • configuration_qwen2_5_vl.py: Model configuration
  • image_processing_qwen2_vl.py: Image preprocessing

πŸš€ Key Features

Multi-scale Token Selection (MTS)

  • Dynamic token selection based on importance scores
  • Reduces computational cost while maintaining performance
  • Supports foreground and background token analysis

Attention Visualization

  • Saves attention weights for selected layers
  • Analyzes attention patterns
  • Foreground/background distribution analysis

πŸ’» Usage

Attention Analysis

bash scripts/run_fg_bg_analysis.sh

Save Attention Data (PBS)

qsub scripts/qsub_save_paco_attention_50samples.pbs

πŸ”§ Environment Setup

Required packages:

  • PyTorch
  • Transformers
  • Hugging Face Hub
  • NumPy, Matplotlib, PIL

Environment variables:

  • SAVE_MTS_ATTENTION=1: Enable attention saving
  • MTS_ATTENTION_DIR: Directory for attention weights
  • MTS_ATTENTION_LAYERS: Comma-separated layer indices
  • MTS_ATTENTION_MAX_SAMPLES: Maximum samples
  • USE_GT_MASK=1: Use ground truth masks
  • DISABLE_FAST_ATTENTION=1: Force vanilla attention

πŸ“– Model Architecture

Based on Qwen2.5-VL-3B with:

  • Multi-scale token selection mechanism
  • Efficient attention computation
  • Ground truth mask integration
  • Foreground/background analysis support

πŸ”— Related Resources

πŸ“„ License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support