Spaces:
Sleeping
Sleeping
metadata
title: DAM vs DAM-QA Comparison Demo
emoji: π€
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
π€ DAM vs DAM-QA Visual Question Answering Demo
An interactive demo that compares DAM (Original) and DAM-QA (Sliding Window) models on Visual Question Answering tasks for text-rich images.
π Quick Start
Local Installation
git clone <repository-url>
cd DAM-QA-Demo
pip install -r requirements.txt
python app.py
Usage
- Ensure GPU: Models require CUDA-compatible GPU with 8GB+ memory
- Launch the app:
python app.py - Wait for models to load (status will update automatically)
- Choose a sample from dropdown OR upload your own image
- Enter a question about the image (or use auto-filled sample question)
- Click "Compare Models" to see both DAM Original and DAM-QA results
- Analyze the detailed voting breakdown for DAM-QA's sliding window approach
β οΈ Hardware Requirements
- GPU: CUDA-compatible with 8GB+ VRAM recommended
- CPU: Multi-core processor for fallback (much slower)
- RAM: 16GB+ system memory recommended
π§ Technical Highlights
- DAM Original: Uses the full image with NVIDIA's DAM-3B-Self-Contained model
- DAM-QA Sliding Window: Implements sliding window approach with weighted voting aggregation
- Model Architecture: Transformer-based visual language model with attention mechanisms
- Inference: Supports both GPU and CPU inference with automatic device selection
- UI Framework: Built with Gradio and custom VLAI template for professional presentation
π Requirements
- Python 3.10+
- PyTorch 2.0+
- Transformers 4.30+
- Gradio 5.38+
- CUDA-compatible GPU (recommended)
- 8GB+ GPU memory for optimal performance
π¨ Theming & Branding
The UI is powered by vlai_template.py and can be customized programmatically:
import vlai_template as vt
vt.configure(
project_name="DAM vs DAM-QA Comparison Demo",
year="2025",
module="DAM",
description=(
"Compare DAM (Original) and DAM-QA (Sliding Window) performance "
"on Visual Question Answering tasks"
),
colors={
"primary": "#0F6CBD",
"accent": "#C4314B",
"bg1": "#F0F7FF",
"bg2": "#E8F0FA",
"bg3": "#DDE7F8",
},
font_family=(
"'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, "
"'Helvetica Neue', Arial, 'Noto Sans', 'Liberation Sans', sans-serif"
),
meta_items=[
("Original DAM", "Full image processing"),
("DAM-QA", "Sliding window + voting"),
("Datasets", "DocVQA, InfographicVQA, TextVQA, ChartQA, VQAv2"),
],
)
π Datasets Used
This demo includes sample images and questions from:
- DocVQA: Document visual question answering
- InfographicVQA: Infographic-based questions
- TextVQA: Scene text visual question answering
- ChartQA: Chart and graph question answering
- VQAv2: General visual question answering