Spaces:
Sleeping
Sleeping
| title: CaptionIQ | |
| emoji: π§ | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: streamlit | |
| sdk_version: 1.42.0 | |
| python_version: "3.10" | |
| app_file: app.py | |
| pinned: false | |
| # π§ CaptionIQ β AI Image Captioning | |
| > Generate natural language captions for images using VGG16/VGG19 + Bahdanau Attention LSTM on the Flickr8K dataset. | |
| --- | |
| ## β¨ Features | |
| - **Dual CNN Backbones** β VGG16 and VGG19 for spatial feature extraction (7Γ7Γ512) | |
| - **Bahdanau Attention LSTM** β Attends to specific image regions per word | |
| - **Ensemble Mode (BLIP)** β High-quality captions from Salesforce BLIP model | |
| - **Beam Search** β Top-5 diverse captions with confidence bars | |
| - **π₯ Attention Heatmap** β Interactive word-by-word gradient saliency overlay | |
| - **βοΈ Word Cloud** β Live word distribution from beam candidates | |
| - **π Model Comparison** β VGG16 vs VGG19 vs Ensemble side-by-side with π winner | |
| - **π Session History** β Track all generated captions, export as JSON/CSV | |
| - **π² Surprise Me** β Random Flickr8K image with one click | |
| - **BLEU Evaluation** β Per-image BLEU-1 through BLEU-4 scoring | |
| --- | |
| ## ποΈ Architecture | |
| ``` | |
| Image β VGG16/19 block5_pool β (49 Γ 512) spatial map | |
| β | |
| Bahdanau Attention | |
| β | |
| Caption tokens β Embedding(256) β LSTM(512) β Softmax(vocab) | |
| ``` | |
| --- | |
| ## π Quick Start | |
| ### 1. Install Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Preprocess Dataset | |
| ```bash | |
| python src/preprocess.py | |
| ``` | |
| Downloads Flickr8K, cleans captions, builds vocabulary, creates train/val/test splits. | |
| ### 3. Extract Features | |
| ```bash | |
| python src/extract_features.py --backbone both | |
| ``` | |
| Extracts 4096-d features from VGG16 and VGG19 (saved as `.pkl`). | |
| ### 4. Train Models | |
| ```bash | |
| python src/train.py --backbone both --epochs 20 | |
| ``` | |
| Trains both VGG16 and VGG19 captioning models. Saves checkpoints and loss plots. | |
| ### 5. Evaluate | |
| ```bash | |
| python src/evaluate.py --backbone both | |
| ``` | |
| Computes BLEU-1 to BLEU-4 on the test set. Prints VGG16 vs VGG19 comparison table. | |
| ### 6. Generate Captions | |
| ```bash | |
| python src/inference.py --image path/to/image.jpg --backbone vgg16 | |
| ``` | |
| ### 7. Launch Web App | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| --- | |
| ## π Project Structure | |
| ``` | |
| βββ data/ # Dataset & preprocessed files | |
| βββ models/ # Trained model checkpoints (.h5) | |
| βββ outputs/ # Loss plots, BLEU results | |
| βββ src/ | |
| β βββ config.py # Paths & hyperparameters | |
| β βββ preprocess.py # Caption cleaning & tokenization | |
| β βββ extract_features.py # VGG feature extraction | |
| β βββ model.py # CNN-LSTM architecture | |
| β βββ train.py # Training with data generator | |
| β βββ inference.py # Greedy & beam search | |
| β βββ evaluate.py # BLEU score evaluation | |
| β βββ utils.py # Shared utilities | |
| βββ app.py # Streamlit web app | |
| βββ requirements.txt # Dependencies | |
| βββ README.md | |
| ``` | |
| --- | |
| ## π Results | |
| | Metric | VGG16 | VGG19 | | |
| |---------|--------|--------| | |
| | BLEU-1 | β | β | | |
| | BLEU-2 | β | β | | |
| | BLEU-3 | β | β | | |
| | BLEU-4 | β | β | | |
| > Results will be populated after training and evaluation. | |
| --- | |
| ## π οΈ Tech Stack | |
| - **Deep Learning**: TensorFlow / Keras | |
| - **Feature Extraction**: VGG16, VGG19 (ImageNet pretrained) | |
| - **Text Processing**: NLTK, Keras Tokenizer | |
| - **Evaluation**: NLTK BLEU | |
| - **Web App**: Streamlit | |
| - **Dataset**: Flickr8K (8,000 images, 5 captions each) | |
| --- | |
| ## π License | |
| This project is for educational purposes. | |