DAM-QA_Demo / README.md
duongtruongbinh's picture
Initial commit
3fd9d26
|
raw
history blame
3.06 kB
---
title: "DAM vs DAM-QA Comparison Demo"
emoji: "πŸ€–"
colorFrom: "blue"
colorTo: "red"
sdk: "gradio"
sdk_version: "5.38.0"
app_file: "app.py"
pinned: false
---
# πŸ€– DAM vs DAM-QA Visual Question Answering Demo
An interactive demo that compares DAM (Original) and DAM-QA (Sliding Window) models on Visual Question Answering tasks for text-rich images.
## πŸš€ Quick Start
### Local Installation
```bash
git clone <repository-url>
cd DAM-QA-Demo
pip install -r requirements.txt
python app.py
```
### Usage
1. **Ensure GPU**: Models require CUDA-compatible GPU with 8GB+ memory
2. Launch the app: `python app.py`
3. Wait for models to load (status will update automatically)
4. Choose a sample from dropdown OR upload your own image
5. Enter a question about the image (or use auto-filled sample question)
6. Click "Compare Models" to see both DAM Original and DAM-QA results
7. Analyze the detailed voting breakdown for DAM-QA's sliding window approach
### ⚠️ Hardware Requirements
- **GPU**: CUDA-compatible with 8GB+ VRAM recommended
- **CPU**: Multi-core processor for fallback (much slower)
- **RAM**: 16GB+ system memory recommended
## 🧠 Technical Highlights
- **DAM Original**: Uses the full image with NVIDIA's DAM-3B-Self-Contained model
- **DAM-QA Sliding Window**: Implements sliding window approach with weighted voting aggregation
- **Model Architecture**: Transformer-based visual language model with attention mechanisms
- **Inference**: Supports both GPU and CPU inference with automatic device selection
- **UI Framework**: Built with Gradio and custom VLAI template for professional presentation
## πŸ“‹ Requirements
- Python 3.10+
- PyTorch 2.0+
- Transformers 4.30+
- Gradio 5.38+
- CUDA-compatible GPU (recommended)
- 8GB+ GPU memory for optimal performance
## 🎨 Theming & Branding
The UI is powered by `vlai_template.py` and can be customized programmatically:
```python
import vlai_template as vt
vt.configure(
project_name="DAM vs DAM-QA Comparison Demo",
year="2025",
module="DAM",
description=(
"Compare DAM (Original) and DAM-QA (Sliding Window) performance "
"on Visual Question Answering tasks"
),
colors={
"primary": "#0F6CBD",
"accent": "#C4314B",
"bg1": "#F0F7FF",
"bg2": "#E8F0FA",
"bg3": "#DDE7F8",
},
font_family=(
"'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, "
"'Helvetica Neue', Arial, 'Noto Sans', 'Liberation Sans', sans-serif"
),
meta_items=[
("Original DAM", "Full image processing"),
("DAM-QA", "Sliding window + voting"),
("Datasets", "DocVQA, InfographicVQA, TextVQA, ChartQA, VQAv2"),
],
)
```
## πŸ“Š Datasets Used
This demo includes sample images and questions from:
- **DocVQA**: Document visual question answering
- **InfographicVQA**: Infographic-based questions
- **TextVQA**: Scene text visual question answering
- **ChartQA**: Chart and graph question answering
- **VQAv2**: General visual question answering