--- title: "DAM vs DAM-QA Comparison Demo" emoji: "🤖" colorFrom: "blue" colorTo: "red" sdk: "gradio" sdk_version: "5.38.0" app_file: "app.py" pinned: false --- # 🤖 DAM vs DAM-QA Visual Question Answering Demo An interactive demo that compares DAM (Original) and DAM-QA (Sliding Window) models on Visual Question Answering tasks for text-rich images. ## 🚀 Quick Start ### Local Installation ```bash git clone cd DAM-QA-Demo pip install -r requirements.txt python app.py ``` ### Usage 1. **Ensure GPU**: Models require CUDA-compatible GPU with 8GB+ memory 2. Launch the app: `python app.py` 3. Wait for models to load (status will update automatically) 4. Choose a sample from dropdown OR upload your own image 5. Enter a question about the image (or use auto-filled sample question) 6. Click "Compare Models" to see both DAM Original and DAM-QA results 7. Analyze the detailed voting breakdown for DAM-QA's sliding window approach ### ⚠️ Hardware Requirements - **GPU**: CUDA-compatible with 8GB+ VRAM recommended - **CPU**: Multi-core processor for fallback (much slower) - **RAM**: 16GB+ system memory recommended ## 🧠 Technical Highlights - **DAM Original**: Uses the full image with NVIDIA's DAM-3B-Self-Contained model - **DAM-QA Sliding Window**: Implements sliding window approach with weighted voting aggregation - **Model Architecture**: Transformer-based visual language model with attention mechanisms - **Inference**: Supports both GPU and CPU inference with automatic device selection - **UI Framework**: Built with Gradio and custom VLAI template for professional presentation ## 📋 Requirements - Python 3.10+ - PyTorch 2.0+ - Transformers 4.30+ - Gradio 5.38+ - CUDA-compatible GPU (recommended) - 8GB+ GPU memory for optimal performance ## 🎨 Theming & Branding The UI is powered by `vlai_template.py` and can be customized programmatically: ```python import vlai_template as vt vt.configure( project_name="DAM vs DAM-QA Comparison Demo", year="2025", module="DAM", description=( "Compare DAM (Original) and DAM-QA (Sliding Window) performance " "on Visual Question Answering tasks" ), colors={ "primary": "#0F6CBD", "accent": "#C4314B", "bg1": "#F0F7FF", "bg2": "#E8F0FA", "bg3": "#DDE7F8", }, font_family=( "'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, " "'Helvetica Neue', Arial, 'Noto Sans', 'Liberation Sans', sans-serif" ), meta_items=[ ("Original DAM", "Full image processing"), ("DAM-QA", "Sliding window + voting"), ("Datasets", "DocVQA, InfographicVQA, TextVQA, ChartQA, VQAv2"), ], ) ``` ## 📊 Datasets Used This demo includes sample images and questions from: - **DocVQA**: Document visual question answering - **InfographicVQA**: Infographic-based questions - **TextVQA**: Scene text visual question answering - **ChartQA**: Chart and graph question answering - **VQAv2**: General visual question answering