Spaces:

pavanpraneeth
/

CaptionIQ

Sleeping

App Files Files Community

CaptionIQ / README.md

pavanpraneeth

Upload folder using huggingface_hub

8b5e37f verified 3 months ago

preview code

Raw

History Blame Contribute Delete

3.8 kB

	---
	title: CaptionIQ
	emoji: 🧠
	colorFrom: indigo
	colorTo: purple
	sdk: streamlit
	sdk_version: 1.42.0
	python_version: "3.10"
	app_file: app.py
	pinned: false
	---
	# 🧠 CaptionIQ — AI Image Captioning

	> Generate natural language captions for images using VGG16/VGG19 + Bahdanau Attention LSTM on the Flickr8K dataset.

	---

	## ✨ Features

	- Dual CNN Backbones — VGG16 and VGG19 for spatial feature extraction (7×7×512)
	- Bahdanau Attention LSTM — Attends to specific image regions per word
	- Ensemble Mode (BLIP) — High-quality captions from Salesforce BLIP model
	- Beam Search — Top-5 diverse captions with confidence bars
	- 🔥 Attention Heatmap — Interactive word-by-word gradient saliency overlay
	- ☁️ Word Cloud — Live word distribution from beam candidates
	- 🔄 Model Comparison — VGG16 vs VGG19 vs Ensemble side-by-side with 🏆 winner
	- 📋 Session History — Track all generated captions, export as JSON/CSV
	- 🎲 Surprise Me — Random Flickr8K image with one click
	- BLEU Evaluation — Per-image BLEU-1 through BLEU-4 scoring

	---

	## 🏗️ Architecture

	```
	Image → VGG16/19 block5_pool → (49 × 512) spatial map
	↓
	Bahdanau Attention
	↓
	Caption tokens → Embedding(256) → LSTM(512) → Softmax(vocab)
	```


	---

	## 🚀 Quick Start

	### 1. Install Dependencies

	```bash
	pip install -r requirements.txt
	```

	### 2. Preprocess Dataset

	```bash
	python src/preprocess.py
	```

	Downloads Flickr8K, cleans captions, builds vocabulary, creates train/val/test splits.

	### 3. Extract Features

	```bash
	python src/extract_features.py --backbone both
	```

	Extracts 4096-d features from VGG16 and VGG19 (saved as `.pkl`).

	### 4. Train Models

	```bash
	python src/train.py --backbone both --epochs 20
	```

	Trains both VGG16 and VGG19 captioning models. Saves checkpoints and loss plots.

	### 5. Evaluate

	```bash
	python src/evaluate.py --backbone both
	```

	Computes BLEU-1 to BLEU-4 on the test set. Prints VGG16 vs VGG19 comparison table.

	### 6. Generate Captions

	```bash
	python src/inference.py --image path/to/image.jpg --backbone vgg16
	```

	### 7. Launch Web App

	```bash
	streamlit run app.py
	```

	---

	## 📁 Project Structure

	```
	├── data/ # Dataset & preprocessed files
	├── models/ # Trained model checkpoints (.h5)
	├── outputs/ # Loss plots, BLEU results
	├── src/
	│ ├── config.py # Paths & hyperparameters
	│ ├── preprocess.py # Caption cleaning & tokenization
	│ ├── extract_features.py # VGG feature extraction
	│ ├── model.py # CNN-LSTM architecture
	│ ├── train.py # Training with data generator
	│ ├── inference.py # Greedy & beam search
	│ ├── evaluate.py # BLEU score evaluation
	│ └── utils.py # Shared utilities
	├── app.py # Streamlit web app
	├── requirements.txt # Dependencies
	└── README.md
	```

	---

	## 📊 Results

	\| Metric \| VGG16 \| VGG19 \|
	\|---------\|--------\|--------\|
	\| BLEU-1 \| — \| — \|
	\| BLEU-2 \| — \| — \|
	\| BLEU-3 \| — \| — \|
	\| BLEU-4 \| — \| — \|

	> Results will be populated after training and evaluation.

	---

	## 🛠️ Tech Stack

	- Deep Learning: TensorFlow / Keras
	- Feature Extraction: VGG16, VGG19 (ImageNet pretrained)
	- Text Processing: NLTK, Keras Tokenizer
	- Evaluation: NLTK BLEU
	- Web App: Streamlit
	- Dataset: Flickr8K (8,000 images, 5 captions each)

	---

	## 📄 License

	This project is for educational purposes.