Spaces:

VLAI-AIVN
/

DAM-QA_Demo

Sleeping

App Files Files Community

DAM-QA_Demo / README.md

duongtruongbinh

Initial commit

3fd9d26 3 months ago

preview code

raw

history blame

3.06 kB

	---
	title: "DAM vs DAM-QA Comparison Demo"
	emoji: "🤖"
	colorFrom: "blue"
	colorTo: "red"
	sdk: "gradio"
	sdk_version: "5.38.0"
	app_file: "app.py"
	pinned: false
	---

	# 🤖 DAM vs DAM-QA Visual Question Answering Demo

	An interactive demo that compares DAM (Original) and DAM-QA (Sliding Window) models on Visual Question Answering tasks for text-rich images.


	## 🚀 Quick Start

	### Local Installation
	```bash
	git clone <repository-url>
	cd DAM-QA-Demo
	pip install -r requirements.txt
	python app.py
	```

	### Usage
	1. Ensure GPU: Models require CUDA-compatible GPU with 8GB+ memory
	2. Launch the app: `python app.py`
	3. Wait for models to load (status will update automatically)
	4. Choose a sample from dropdown OR upload your own image
	5. Enter a question about the image (or use auto-filled sample question)
	6. Click "Compare Models" to see both DAM Original and DAM-QA results
	7. Analyze the detailed voting breakdown for DAM-QA's sliding window approach

	### ⚠️ Hardware Requirements
	- GPU: CUDA-compatible with 8GB+ VRAM recommended
	- CPU: Multi-core processor for fallback (much slower)
	- RAM: 16GB+ system memory recommended

	## 🧠 Technical Highlights

	- DAM Original: Uses the full image with NVIDIA's DAM-3B-Self-Contained model
	- DAM-QA Sliding Window: Implements sliding window approach with weighted voting aggregation
	- Model Architecture: Transformer-based visual language model with attention mechanisms
	- Inference: Supports both GPU and CPU inference with automatic device selection
	- UI Framework: Built with Gradio and custom VLAI template for professional presentation

	## 📋 Requirements

	- Python 3.10+
	- PyTorch 2.0+
	- Transformers 4.30+
	- Gradio 5.38+
	- CUDA-compatible GPU (recommended)
	- 8GB+ GPU memory for optimal performance

	## 🎨 Theming & Branding

	The UI is powered by `vlai_template.py` and can be customized programmatically:

	```python
	import vlai_template as vt

	vt.configure(
	project_name="DAM vs DAM-QA Comparison Demo",
	year="2025",
	module="DAM",
	description=(
	"Compare DAM (Original) and DAM-QA (Sliding Window) performance "
	"on Visual Question Answering tasks"
	),
	colors={
	"primary": "#0F6CBD",
	"accent": "#C4314B",
	"bg1": "#F0F7FF",
	"bg2": "#E8F0FA",
	"bg3": "#DDE7F8",
	},
	font_family=(
	"'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, "
	"'Helvetica Neue', Arial, 'Noto Sans', 'Liberation Sans', sans-serif"
	),
	meta_items=[
	("Original DAM", "Full image processing"),
	("DAM-QA", "Sliding window + voting"),
	("Datasets", "DocVQA, InfographicVQA, TextVQA, ChartQA, VQAv2"),
	],
	)
	```

	## 📊 Datasets Used

	This demo includes sample images and questions from:

	- DocVQA: Document visual question answering
	- InfographicVQA: Infographic-based questions
	- TextVQA: Scene text visual question answering
	- ChartQA: Chart and graph question answering
	- VQAv2: General visual question answering