DAM-QA_Demo / README.md
duongtruongbinh's picture
Initial commit
3fd9d26

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: DAM vs DAM-QA Comparison Demo
emoji: πŸ€–
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false

πŸ€– DAM vs DAM-QA Visual Question Answering Demo

An interactive demo that compares DAM (Original) and DAM-QA (Sliding Window) models on Visual Question Answering tasks for text-rich images.

πŸš€ Quick Start

Local Installation

git clone <repository-url>
cd DAM-QA-Demo
pip install -r requirements.txt
python app.py

Usage

  1. Ensure GPU: Models require CUDA-compatible GPU with 8GB+ memory
  2. Launch the app: python app.py
  3. Wait for models to load (status will update automatically)
  4. Choose a sample from dropdown OR upload your own image
  5. Enter a question about the image (or use auto-filled sample question)
  6. Click "Compare Models" to see both DAM Original and DAM-QA results
  7. Analyze the detailed voting breakdown for DAM-QA's sliding window approach

⚠️ Hardware Requirements

  • GPU: CUDA-compatible with 8GB+ VRAM recommended
  • CPU: Multi-core processor for fallback (much slower)
  • RAM: 16GB+ system memory recommended

🧠 Technical Highlights

  • DAM Original: Uses the full image with NVIDIA's DAM-3B-Self-Contained model
  • DAM-QA Sliding Window: Implements sliding window approach with weighted voting aggregation
  • Model Architecture: Transformer-based visual language model with attention mechanisms
  • Inference: Supports both GPU and CPU inference with automatic device selection
  • UI Framework: Built with Gradio and custom VLAI template for professional presentation

πŸ“‹ Requirements

  • Python 3.10+
  • PyTorch 2.0+
  • Transformers 4.30+
  • Gradio 5.38+
  • CUDA-compatible GPU (recommended)
  • 8GB+ GPU memory for optimal performance

🎨 Theming & Branding

The UI is powered by vlai_template.py and can be customized programmatically:

import vlai_template as vt

vt.configure(
    project_name="DAM vs DAM-QA Comparison Demo",
    year="2025",
    module="DAM",
    description=(
        "Compare DAM (Original) and DAM-QA (Sliding Window) performance "
        "on Visual Question Answering tasks"
    ),
    colors={
        "primary": "#0F6CBD",
        "accent": "#C4314B",
        "bg1": "#F0F7FF",
        "bg2": "#E8F0FA",
        "bg3": "#DDE7F8",
    },
    font_family=(
        "'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, "
        "'Helvetica Neue', Arial, 'Noto Sans', 'Liberation Sans', sans-serif"
    ),
    meta_items=[
        ("Original DAM", "Full image processing"),
        ("DAM-QA", "Sliding window + voting"),
        ("Datasets", "DocVQA, InfographicVQA, TextVQA, ChartQA, VQAv2"),
    ],
)

πŸ“Š Datasets Used

This demo includes sample images and questions from:

  • DocVQA: Document visual question answering
  • InfographicVQA: Infographic-based questions
  • TextVQA: Scene text visual question answering
  • ChartQA: Chart and graph question answering
  • VQAv2: General visual question answering