CaptionIQ / README.md
pavanpraneeth's picture
Upload folder using huggingface_hub
8b5e37f verified
|
Raw
History Blame Contribute Delete
3.8 kB
---
title: CaptionIQ
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: streamlit
sdk_version: 1.42.0
python_version: "3.10"
app_file: app.py
pinned: false
---
# 🧠 CaptionIQ β€” AI Image Captioning
> Generate natural language captions for images using VGG16/VGG19 + Bahdanau Attention LSTM on the Flickr8K dataset.
---
## ✨ Features
- **Dual CNN Backbones** β€” VGG16 and VGG19 for spatial feature extraction (7Γ—7Γ—512)
- **Bahdanau Attention LSTM** β€” Attends to specific image regions per word
- **Ensemble Mode (BLIP)** β€” High-quality captions from Salesforce BLIP model
- **Beam Search** β€” Top-5 diverse captions with confidence bars
- **πŸ”₯ Attention Heatmap** β€” Interactive word-by-word gradient saliency overlay
- **☁️ Word Cloud** β€” Live word distribution from beam candidates
- **πŸ”„ Model Comparison** β€” VGG16 vs VGG19 vs Ensemble side-by-side with πŸ† winner
- **πŸ“‹ Session History** β€” Track all generated captions, export as JSON/CSV
- **🎲 Surprise Me** β€” Random Flickr8K image with one click
- **BLEU Evaluation** β€” Per-image BLEU-1 through BLEU-4 scoring
---
## πŸ—οΈ Architecture
```
Image β†’ VGG16/19 block5_pool β†’ (49 Γ— 512) spatial map
↓
Bahdanau Attention
↓
Caption tokens β†’ Embedding(256) β†’ LSTM(512) β†’ Softmax(vocab)
```
---
## πŸš€ Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Preprocess Dataset
```bash
python src/preprocess.py
```
Downloads Flickr8K, cleans captions, builds vocabulary, creates train/val/test splits.
### 3. Extract Features
```bash
python src/extract_features.py --backbone both
```
Extracts 4096-d features from VGG16 and VGG19 (saved as `.pkl`).
### 4. Train Models
```bash
python src/train.py --backbone both --epochs 20
```
Trains both VGG16 and VGG19 captioning models. Saves checkpoints and loss plots.
### 5. Evaluate
```bash
python src/evaluate.py --backbone both
```
Computes BLEU-1 to BLEU-4 on the test set. Prints VGG16 vs VGG19 comparison table.
### 6. Generate Captions
```bash
python src/inference.py --image path/to/image.jpg --backbone vgg16
```
### 7. Launch Web App
```bash
streamlit run app.py
```
---
## πŸ“ Project Structure
```
β”œβ”€β”€ data/ # Dataset & preprocessed files
β”œβ”€β”€ models/ # Trained model checkpoints (.h5)
β”œβ”€β”€ outputs/ # Loss plots, BLEU results
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ config.py # Paths & hyperparameters
β”‚ β”œβ”€β”€ preprocess.py # Caption cleaning & tokenization
β”‚ β”œβ”€β”€ extract_features.py # VGG feature extraction
β”‚ β”œβ”€β”€ model.py # CNN-LSTM architecture
β”‚ β”œβ”€β”€ train.py # Training with data generator
β”‚ β”œβ”€β”€ inference.py # Greedy & beam search
β”‚ β”œβ”€β”€ evaluate.py # BLEU score evaluation
β”‚ └── utils.py # Shared utilities
β”œβ”€β”€ app.py # Streamlit web app
β”œβ”€β”€ requirements.txt # Dependencies
└── README.md
```
---
## πŸ“Š Results
| Metric | VGG16 | VGG19 |
|---------|--------|--------|
| BLEU-1 | β€” | β€” |
| BLEU-2 | β€” | β€” |
| BLEU-3 | β€” | β€” |
| BLEU-4 | β€” | β€” |
> Results will be populated after training and evaluation.
---
## πŸ› οΈ Tech Stack
- **Deep Learning**: TensorFlow / Keras
- **Feature Extraction**: VGG16, VGG19 (ImageNet pretrained)
- **Text Processing**: NLTK, Keras Tokenizer
- **Evaluation**: NLTK BLEU
- **Web App**: Streamlit
- **Dataset**: Flickr8K (8,000 images, 5 captions each)
---
## πŸ“„ License
This project is for educational purposes.