Spaces:

pavanpraneeth
/

CaptionIQ

Sleeping

App Files Files Community

CaptionIQ / README.md

pavanpraneeth

Upload folder using huggingface_hub

8b5e37f verified 3 months ago

preview code

Raw

History Blame Contribute Delete

3.8 kB

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade

metadata

title: CaptionIQ
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: streamlit
sdk_version: 1.42.0
python_version: '3.10'
app_file: app.py
pinned: false

🧠 CaptionIQ — AI Image Captioning

Generate natural language captions for images using VGG16/VGG19 + Bahdanau Attention LSTM on the Flickr8K dataset.

✨ Features

Dual CNN Backbones — VGG16 and VGG19 for spatial feature extraction (7×7×512)
Bahdanau Attention LSTM — Attends to specific image regions per word
Ensemble Mode (BLIP) — High-quality captions from Salesforce BLIP model
Beam Search — Top-5 diverse captions with confidence bars
🔥 Attention Heatmap — Interactive word-by-word gradient saliency overlay
☁️ Word Cloud — Live word distribution from beam candidates
🔄 Model Comparison — VGG16 vs VGG19 vs Ensemble side-by-side with 🏆 winner
📋 Session History — Track all generated captions, export as JSON/CSV
🎲 Surprise Me — Random Flickr8K image with one click
BLEU Evaluation — Per-image BLEU-1 through BLEU-4 scoring

🏗️ Architecture

Image → VGG16/19 block5_pool → (49 × 512) spatial map
                                      ↓
                          Bahdanau Attention
                                      ↓
Caption tokens → Embedding(256) → LSTM(512) → Softmax(vocab)

🚀 Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Preprocess Dataset

python src/preprocess.py

Downloads Flickr8K, cleans captions, builds vocabulary, creates train/val/test splits.

3. Extract Features

python src/extract_features.py --backbone both

Extracts 4096-d features from VGG16 and VGG19 (saved as .pkl).

4. Train Models

python src/train.py --backbone both --epochs 20

Trains both VGG16 and VGG19 captioning models. Saves checkpoints and loss plots.

5. Evaluate

python src/evaluate.py --backbone both

Computes BLEU-1 to BLEU-4 on the test set. Prints VGG16 vs VGG19 comparison table.

6. Generate Captions

python src/inference.py --image path/to/image.jpg --backbone vgg16

7. Launch Web App

streamlit run app.py

📁 Project Structure

├── data/                    # Dataset & preprocessed files
├── models/                  # Trained model checkpoints (.h5)
├── outputs/                 # Loss plots, BLEU results
├── src/
│   ├── config.py            # Paths & hyperparameters
│   ├── preprocess.py        # Caption cleaning & tokenization
│   ├── extract_features.py  # VGG feature extraction
│   ├── model.py             # CNN-LSTM architecture
│   ├── train.py             # Training with data generator
│   ├── inference.py         # Greedy & beam search
│   ├── evaluate.py          # BLEU score evaluation
│   └── utils.py             # Shared utilities
├── app.py                   # Streamlit web app
├── requirements.txt         # Dependencies
└── README.md

📊 Results

Metric	VGG16	VGG19
BLEU-1	—	—
BLEU-2	—	—
BLEU-3	—	—
BLEU-4	—	—

Results will be populated after training and evaluation.

🛠️ Tech Stack

Deep Learning: TensorFlow / Keras
Feature Extraction: VGG16, VGG19 (ImageNet pretrained)
Text Processing: NLTK, Keras Tokenizer
Evaluation: NLTK BLEU
Web App: Streamlit
Dataset: Flickr8K (8,000 images, 5 captions each)

📄 License

This project is for educational purposes.