Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available: 1.58.0
metadata
title: CaptionIQ
emoji: π§
colorFrom: indigo
colorTo: purple
sdk: streamlit
sdk_version: 1.42.0
python_version: '3.10'
app_file: app.py
pinned: false
π§ CaptionIQ β AI Image Captioning
Generate natural language captions for images using VGG16/VGG19 + Bahdanau Attention LSTM on the Flickr8K dataset.
β¨ Features
- Dual CNN Backbones β VGG16 and VGG19 for spatial feature extraction (7Γ7Γ512)
- Bahdanau Attention LSTM β Attends to specific image regions per word
- Ensemble Mode (BLIP) β High-quality captions from Salesforce BLIP model
- Beam Search β Top-5 diverse captions with confidence bars
- π₯ Attention Heatmap β Interactive word-by-word gradient saliency overlay
- βοΈ Word Cloud β Live word distribution from beam candidates
- π Model Comparison β VGG16 vs VGG19 vs Ensemble side-by-side with π winner
- π Session History β Track all generated captions, export as JSON/CSV
- π² Surprise Me β Random Flickr8K image with one click
- BLEU Evaluation β Per-image BLEU-1 through BLEU-4 scoring
ποΈ Architecture
Image β VGG16/19 block5_pool β (49 Γ 512) spatial map
β
Bahdanau Attention
β
Caption tokens β Embedding(256) β LSTM(512) β Softmax(vocab)
π Quick Start
1. Install Dependencies
pip install -r requirements.txt
2. Preprocess Dataset
python src/preprocess.py
Downloads Flickr8K, cleans captions, builds vocabulary, creates train/val/test splits.
3. Extract Features
python src/extract_features.py --backbone both
Extracts 4096-d features from VGG16 and VGG19 (saved as .pkl).
4. Train Models
python src/train.py --backbone both --epochs 20
Trains both VGG16 and VGG19 captioning models. Saves checkpoints and loss plots.
5. Evaluate
python src/evaluate.py --backbone both
Computes BLEU-1 to BLEU-4 on the test set. Prints VGG16 vs VGG19 comparison table.
6. Generate Captions
python src/inference.py --image path/to/image.jpg --backbone vgg16
7. Launch Web App
streamlit run app.py
π Project Structure
βββ data/ # Dataset & preprocessed files
βββ models/ # Trained model checkpoints (.h5)
βββ outputs/ # Loss plots, BLEU results
βββ src/
β βββ config.py # Paths & hyperparameters
β βββ preprocess.py # Caption cleaning & tokenization
β βββ extract_features.py # VGG feature extraction
β βββ model.py # CNN-LSTM architecture
β βββ train.py # Training with data generator
β βββ inference.py # Greedy & beam search
β βββ evaluate.py # BLEU score evaluation
β βββ utils.py # Shared utilities
βββ app.py # Streamlit web app
βββ requirements.txt # Dependencies
βββ README.md
π Results
| Metric | VGG16 | VGG19 |
|---|---|---|
| BLEU-1 | β | β |
| BLEU-2 | β | β |
| BLEU-3 | β | β |
| BLEU-4 | β | β |
Results will be populated after training and evaluation.
π οΈ Tech Stack
- Deep Learning: TensorFlow / Keras
- Feature Extraction: VGG16, VGG19 (ImageNet pretrained)
- Text Processing: NLTK, Keras Tokenizer
- Evaluation: NLTK BLEU
- Web App: Streamlit
- Dataset: Flickr8K (8,000 images, 5 captions each)
π License
This project is for educational purposes.