Spaces:
Sleeping
Sleeping
File size: 3,795 Bytes
678d1bf 290f366 8b5e37f caafd9a 678d1bf 290f366 678d1bf 8b5e37f 290f366 8b5e37f 290f366 8b5e37f 290f366 8b5e37f 290f366 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | ---
title: CaptionIQ
emoji: π§
colorFrom: indigo
colorTo: purple
sdk: streamlit
sdk_version: 1.42.0
python_version: "3.10"
app_file: app.py
pinned: false
---
# π§ CaptionIQ β AI Image Captioning
> Generate natural language captions for images using VGG16/VGG19 + Bahdanau Attention LSTM on the Flickr8K dataset.
---
## β¨ Features
- **Dual CNN Backbones** β VGG16 and VGG19 for spatial feature extraction (7Γ7Γ512)
- **Bahdanau Attention LSTM** β Attends to specific image regions per word
- **Ensemble Mode (BLIP)** β High-quality captions from Salesforce BLIP model
- **Beam Search** β Top-5 diverse captions with confidence bars
- **π₯ Attention Heatmap** β Interactive word-by-word gradient saliency overlay
- **βοΈ Word Cloud** β Live word distribution from beam candidates
- **π Model Comparison** β VGG16 vs VGG19 vs Ensemble side-by-side with π winner
- **π Session History** β Track all generated captions, export as JSON/CSV
- **π² Surprise Me** β Random Flickr8K image with one click
- **BLEU Evaluation** β Per-image BLEU-1 through BLEU-4 scoring
---
## ποΈ Architecture
```
Image β VGG16/19 block5_pool β (49 Γ 512) spatial map
β
Bahdanau Attention
β
Caption tokens β Embedding(256) β LSTM(512) β Softmax(vocab)
```
---
## π Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Preprocess Dataset
```bash
python src/preprocess.py
```
Downloads Flickr8K, cleans captions, builds vocabulary, creates train/val/test splits.
### 3. Extract Features
```bash
python src/extract_features.py --backbone both
```
Extracts 4096-d features from VGG16 and VGG19 (saved as `.pkl`).
### 4. Train Models
```bash
python src/train.py --backbone both --epochs 20
```
Trains both VGG16 and VGG19 captioning models. Saves checkpoints and loss plots.
### 5. Evaluate
```bash
python src/evaluate.py --backbone both
```
Computes BLEU-1 to BLEU-4 on the test set. Prints VGG16 vs VGG19 comparison table.
### 6. Generate Captions
```bash
python src/inference.py --image path/to/image.jpg --backbone vgg16
```
### 7. Launch Web App
```bash
streamlit run app.py
```
---
## π Project Structure
```
βββ data/ # Dataset & preprocessed files
βββ models/ # Trained model checkpoints (.h5)
βββ outputs/ # Loss plots, BLEU results
βββ src/
β βββ config.py # Paths & hyperparameters
β βββ preprocess.py # Caption cleaning & tokenization
β βββ extract_features.py # VGG feature extraction
β βββ model.py # CNN-LSTM architecture
β βββ train.py # Training with data generator
β βββ inference.py # Greedy & beam search
β βββ evaluate.py # BLEU score evaluation
β βββ utils.py # Shared utilities
βββ app.py # Streamlit web app
βββ requirements.txt # Dependencies
βββ README.md
```
---
## π Results
| Metric | VGG16 | VGG19 |
|---------|--------|--------|
| BLEU-1 | β | β |
| BLEU-2 | β | β |
| BLEU-3 | β | β |
| BLEU-4 | β | β |
> Results will be populated after training and evaluation.
---
## π οΈ Tech Stack
- **Deep Learning**: TensorFlow / Keras
- **Feature Extraction**: VGG16, VGG19 (ImageNet pretrained)
- **Text Processing**: NLTK, Keras Tokenizer
- **Evaluation**: NLTK BLEU
- **Web App**: Streamlit
- **Dataset**: Flickr8K (8,000 images, 5 captions each)
---
## π License
This project is for educational purposes.
|