Spaces:

VLAI-AIVN
/

DAM-QA_Demo

Sleeping

File size: 3,055 Bytes

f403b46
3fd9d26
 
 
 
 
 
 
f403b46
 
 
3fd9d26

---
title: "DAM vs DAM-QA Comparison Demo"
emoji: "🤖"
colorFrom: "blue"
colorTo: "red"
sdk: "gradio"
sdk_version: "5.38.0"
app_file: "app.py"
pinned: false
---

# 🤖 DAM vs DAM-QA Visual Question Answering Demo

An interactive demo that compares DAM (Original) and DAM-QA (Sliding Window) models on Visual Question Answering tasks for text-rich images.


## 🚀 Quick Start

### Local Installation
```bash
git clone <repository-url>
cd DAM-QA-Demo
pip install -r requirements.txt
python app.py
```

### Usage
1. **Ensure GPU**: Models require CUDA-compatible GPU with 8GB+ memory
2. Launch the app: `python app.py`
3. Wait for models to load (status will update automatically)
4. Choose a sample from dropdown OR upload your own image
5. Enter a question about the image (or use auto-filled sample question)
6. Click "Compare Models" to see both DAM Original and DAM-QA results
7. Analyze the detailed voting breakdown for DAM-QA's sliding window approach

### ⚠️ Hardware Requirements
- **GPU**: CUDA-compatible with 8GB+ VRAM recommended
- **CPU**: Multi-core processor for fallback (much slower)
- **RAM**: 16GB+ system memory recommended

## 🧠 Technical Highlights

- **DAM Original**: Uses the full image with NVIDIA's DAM-3B-Self-Contained model
- **DAM-QA Sliding Window**: Implements sliding window approach with weighted voting aggregation
- **Model Architecture**: Transformer-based visual language model with attention mechanisms
- **Inference**: Supports both GPU and CPU inference with automatic device selection
- **UI Framework**: Built with Gradio and custom VLAI template for professional presentation

## 📋 Requirements

- Python 3.10+
- PyTorch 2.0+
- Transformers 4.30+
- Gradio 5.38+
- CUDA-compatible GPU (recommended)
- 8GB+ GPU memory for optimal performance

## 🎨 Theming & Branding

The UI is powered by `vlai_template.py` and can be customized programmatically:

```python
import vlai_template as vt

vt.configure(
    project_name="DAM vs DAM-QA Comparison Demo",
    year="2025",
    module="DAM",
    description=(
        "Compare DAM (Original) and DAM-QA (Sliding Window) performance "
        "on Visual Question Answering tasks"
    ),
    colors={
        "primary": "#0F6CBD",
        "accent": "#C4314B",
        "bg1": "#F0F7FF",
        "bg2": "#E8F0FA",
        "bg3": "#DDE7F8",
    },
    font_family=(
        "'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, "
        "'Helvetica Neue', Arial, 'Noto Sans', 'Liberation Sans', sans-serif"
    ),
    meta_items=[
        ("Original DAM", "Full image processing"),
        ("DAM-QA", "Sliding window + voting"),
        ("Datasets", "DocVQA, InfographicVQA, TextVQA, ChartQA, VQAv2"),
    ],
)
```

## 📊 Datasets Used

This demo includes sample images and questions from:

- **DocVQA**: Document visual question answering
- **InfographicVQA**: Infographic-based questions  
- **TextVQA**: Scene text visual question answering
- **ChartQA**: Chart and graph question answering
- **VQAv2**: General visual question answering