Spaces:

VLAI-AIVN
/

DAM-QA_Demo

Sleeping

App Files Files Community

DAM-QA_Demo / README.md

duongtruongbinh

Initial commit

3fd9d26 3 months ago

preview code

raw

history blame

3.06 kB

metadata

title: DAM vs DAM-QA Comparison Demo
emoji: 🤖
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false

🤖 DAM vs DAM-QA Visual Question Answering Demo

An interactive demo that compares DAM (Original) and DAM-QA (Sliding Window) models on Visual Question Answering tasks for text-rich images.

🚀 Quick Start

Local Installation

git clone <repository-url>
cd DAM-QA-Demo
pip install -r requirements.txt
python app.py

Usage

Ensure GPU: Models require CUDA-compatible GPU with 8GB+ memory
Launch the app: python app.py
Wait for models to load (status will update automatically)
Choose a sample from dropdown OR upload your own image
Enter a question about the image (or use auto-filled sample question)
Click "Compare Models" to see both DAM Original and DAM-QA results
Analyze the detailed voting breakdown for DAM-QA's sliding window approach

⚠️ Hardware Requirements

GPU: CUDA-compatible with 8GB+ VRAM recommended
CPU: Multi-core processor for fallback (much slower)
RAM: 16GB+ system memory recommended

🧠 Technical Highlights

DAM Original: Uses the full image with NVIDIA's DAM-3B-Self-Contained model
DAM-QA Sliding Window: Implements sliding window approach with weighted voting aggregation
Model Architecture: Transformer-based visual language model with attention mechanisms
Inference: Supports both GPU and CPU inference with automatic device selection
UI Framework: Built with Gradio and custom VLAI template for professional presentation

📋 Requirements

Python 3.10+
PyTorch 2.0+
Transformers 4.30+
Gradio 5.38+
CUDA-compatible GPU (recommended)
8GB+ GPU memory for optimal performance

🎨 Theming & Branding

The UI is powered by vlai_template.py and can be customized programmatically:

import vlai_template as vt

vt.configure(
    project_name="DAM vs DAM-QA Comparison Demo",
    year="2025",
    module="DAM",
    description=(
        "Compare DAM (Original) and DAM-QA (Sliding Window) performance "
        "on Visual Question Answering tasks"
    ),
    colors={
        "primary": "#0F6CBD",
        "accent": "#C4314B",
        "bg1": "#F0F7FF",
        "bg2": "#E8F0FA",
        "bg3": "#DDE7F8",
    },
    font_family=(
        "'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, "
        "'Helvetica Neue', Arial, 'Noto Sans', 'Liberation Sans', sans-serif"
    ),
    meta_items=[
        ("Original DAM", "Full image processing"),
        ("DAM-QA", "Sliding window + voting"),
        ("Datasets", "DocVQA, InfographicVQA, TextVQA, ChartQA, VQAv2"),
    ],
)

📊 Datasets Used

This demo includes sample images and questions from:

DocVQA: Document visual question answering
InfographicVQA: Infographic-based questions
TextVQA: Scene text visual question answering
ChartQA: Chart and graph question answering
VQAv2: General visual question answering