Spaces:
Sleeping
Sleeping
| title: "DAM vs DAM-QA Comparison Demo" | |
| emoji: "π€" | |
| colorFrom: "blue" | |
| colorTo: "red" | |
| sdk: "gradio" | |
| sdk_version: "5.38.0" | |
| app_file: "app.py" | |
| pinned: false | |
| # π€ DAM vs DAM-QA Visual Question Answering Demo | |
| An interactive demo that compares DAM (Original) and DAM-QA (Sliding Window) models on Visual Question Answering tasks for text-rich images. | |
| ## π Quick Start | |
| ### Local Installation | |
| ```bash | |
| git clone <repository-url> | |
| cd DAM-QA-Demo | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| ### Usage | |
| 1. **Ensure GPU**: Models require CUDA-compatible GPU with 8GB+ memory | |
| 2. Launch the app: `python app.py` | |
| 3. Wait for models to load (status will update automatically) | |
| 4. Choose a sample from dropdown OR upload your own image | |
| 5. Enter a question about the image (or use auto-filled sample question) | |
| 6. Click "Compare Models" to see both DAM Original and DAM-QA results | |
| 7. Analyze the detailed voting breakdown for DAM-QA's sliding window approach | |
| ### β οΈ Hardware Requirements | |
| - **GPU**: CUDA-compatible with 8GB+ VRAM recommended | |
| - **CPU**: Multi-core processor for fallback (much slower) | |
| - **RAM**: 16GB+ system memory recommended | |
| ## π§ Technical Highlights | |
| - **DAM Original**: Uses the full image with NVIDIA's DAM-3B-Self-Contained model | |
| - **DAM-QA Sliding Window**: Implements sliding window approach with weighted voting aggregation | |
| - **Model Architecture**: Transformer-based visual language model with attention mechanisms | |
| - **Inference**: Supports both GPU and CPU inference with automatic device selection | |
| - **UI Framework**: Built with Gradio and custom VLAI template for professional presentation | |
| ## π Requirements | |
| - Python 3.10+ | |
| - PyTorch 2.0+ | |
| - Transformers 4.30+ | |
| - Gradio 5.38+ | |
| - CUDA-compatible GPU (recommended) | |
| - 8GB+ GPU memory for optimal performance | |
| ## π¨ Theming & Branding | |
| The UI is powered by `vlai_template.py` and can be customized programmatically: | |
| ```python | |
| import vlai_template as vt | |
| vt.configure( | |
| project_name="DAM vs DAM-QA Comparison Demo", | |
| year="2025", | |
| module="DAM", | |
| description=( | |
| "Compare DAM (Original) and DAM-QA (Sliding Window) performance " | |
| "on Visual Question Answering tasks" | |
| ), | |
| colors={ | |
| "primary": "#0F6CBD", | |
| "accent": "#C4314B", | |
| "bg1": "#F0F7FF", | |
| "bg2": "#E8F0FA", | |
| "bg3": "#DDE7F8", | |
| }, | |
| font_family=( | |
| "'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, " | |
| "'Helvetica Neue', Arial, 'Noto Sans', 'Liberation Sans', sans-serif" | |
| ), | |
| meta_items=[ | |
| ("Original DAM", "Full image processing"), | |
| ("DAM-QA", "Sliding window + voting"), | |
| ("Datasets", "DocVQA, InfographicVQA, TextVQA, ChartQA, VQAv2"), | |
| ], | |
| ) | |
| ``` | |
| ## π Datasets Used | |
| This demo includes sample images and questions from: | |
| - **DocVQA**: Document visual question answering | |
| - **InfographicVQA**: Infographic-based questions | |
| - **TextVQA**: Scene text visual question answering | |
| - **ChartQA**: Chart and graph question answering | |
| - **VQAv2**: General visual question answering | |