CaptionIQ / README.md
pavanpraneeth's picture
Upload folder using huggingface_hub
8b5e37f verified
|
Raw
History Blame Contribute Delete
3.8 kB

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade
metadata
title: CaptionIQ
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: streamlit
sdk_version: 1.42.0
python_version: '3.10'
app_file: app.py
pinned: false

🧠 CaptionIQ β€” AI Image Captioning

Generate natural language captions for images using VGG16/VGG19 + Bahdanau Attention LSTM on the Flickr8K dataset.


✨ Features

  • Dual CNN Backbones β€” VGG16 and VGG19 for spatial feature extraction (7Γ—7Γ—512)
  • Bahdanau Attention LSTM β€” Attends to specific image regions per word
  • Ensemble Mode (BLIP) β€” High-quality captions from Salesforce BLIP model
  • Beam Search β€” Top-5 diverse captions with confidence bars
  • πŸ”₯ Attention Heatmap β€” Interactive word-by-word gradient saliency overlay
  • ☁️ Word Cloud β€” Live word distribution from beam candidates
  • πŸ”„ Model Comparison β€” VGG16 vs VGG19 vs Ensemble side-by-side with πŸ† winner
  • πŸ“‹ Session History β€” Track all generated captions, export as JSON/CSV
  • 🎲 Surprise Me β€” Random Flickr8K image with one click
  • BLEU Evaluation β€” Per-image BLEU-1 through BLEU-4 scoring

πŸ—οΈ Architecture

Image β†’ VGG16/19 block5_pool β†’ (49 Γ— 512) spatial map
                                      ↓
                          Bahdanau Attention
                                      ↓
Caption tokens β†’ Embedding(256) β†’ LSTM(512) β†’ Softmax(vocab)

πŸš€ Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Preprocess Dataset

python src/preprocess.py

Downloads Flickr8K, cleans captions, builds vocabulary, creates train/val/test splits.

3. Extract Features

python src/extract_features.py --backbone both

Extracts 4096-d features from VGG16 and VGG19 (saved as .pkl).

4. Train Models

python src/train.py --backbone both --epochs 20

Trains both VGG16 and VGG19 captioning models. Saves checkpoints and loss plots.

5. Evaluate

python src/evaluate.py --backbone both

Computes BLEU-1 to BLEU-4 on the test set. Prints VGG16 vs VGG19 comparison table.

6. Generate Captions

python src/inference.py --image path/to/image.jpg --backbone vgg16

7. Launch Web App

streamlit run app.py

πŸ“ Project Structure

β”œβ”€β”€ data/                    # Dataset & preprocessed files
β”œβ”€β”€ models/                  # Trained model checkpoints (.h5)
β”œβ”€β”€ outputs/                 # Loss plots, BLEU results
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config.py            # Paths & hyperparameters
β”‚   β”œβ”€β”€ preprocess.py        # Caption cleaning & tokenization
β”‚   β”œβ”€β”€ extract_features.py  # VGG feature extraction
β”‚   β”œβ”€β”€ model.py             # CNN-LSTM architecture
β”‚   β”œβ”€β”€ train.py             # Training with data generator
β”‚   β”œβ”€β”€ inference.py         # Greedy & beam search
β”‚   β”œβ”€β”€ evaluate.py          # BLEU score evaluation
β”‚   └── utils.py             # Shared utilities
β”œβ”€β”€ app.py                   # Streamlit web app
β”œβ”€β”€ requirements.txt         # Dependencies
└── README.md

πŸ“Š Results

Metric VGG16 VGG19
BLEU-1 β€” β€”
BLEU-2 β€” β€”
BLEU-3 β€” β€”
BLEU-4 β€” β€”

Results will be populated after training and evaluation.


πŸ› οΈ Tech Stack

  • Deep Learning: TensorFlow / Keras
  • Feature Extraction: VGG16, VGG19 (ImageNet pretrained)
  • Text Processing: NLTK, Keras Tokenizer
  • Evaluation: NLTK BLEU
  • Web App: Streamlit
  • Dataset: Flickr8K (8,000 images, 5 captions each)

πŸ“„ License

This project is for educational purposes.