caption-gen / README.md
Sher1988's picture
Upload folder using huggingface_hub
1694958 verified

A newer version of the Streamlit SDK is available: 1.56.0

Upgrade
metadata
title: Caption Gen
emoji: 📸
sdk: streamlit
sdk_version: 1.43.0
app_file: app.py

# AI Image Caption Generator

A deep learning–based image captioning system built using a **ResNet50 encoder** and an **LSTM decoder**. The model generates natural language descriptions for uploaded images.

## Architecture

* **Encoder:** ResNet50 (frozen backbone)

* **Decoder:** LSTM-based sequence generator

* **Training Dataset:** Flickr8k

* **Inference Framework:** Streamlit

* **Evaluation Metric:** SacreBLEU

The encoder extracts high-level visual features, which are then passed to the decoder to generate captions word by word.


## How It Works

1. User uploads an image.

2. Image is preprocessed and passed through the ResNet50 encoder.

3. Extracted feature vector is fed into the LSTM decoder.

4. Caption is generated using temperature-based sampling.

5. If the image belongs to the Flickr8k dataset, BLEU metrics are displayed.


## Features

* Temperature-controlled caption generation

* SacreBLEU evaluation

* N-gram precision breakdown (1–4 gram)

* Clean Streamlit interface

* Fully CPU-compatible deployment


## Project Structure


app.py

models/

    encoder.pth

    decoder.pth

models/

    encoder.py

    decoder.py

utils/

    transforms.py

    vocab.py

    helpers.py

vocabulary.json

requirements.txt

## Model Details

* Encoder weights size: ~92 MB

* Decoder weights size: ~32 MB

* Full encoder backbone included in state_dict

* Inference runs on CPU


## Limitations

* Trained on Flickr8k (8,000 images)

* Performs best on outdoor scenes, people, and animals

* May generalize poorly to unseen domains

* CPU inference can be slow (2–5 seconds per image)


## Setup (Local)


pip install -r requirements.txt

streamlit run app.py

## Deployment

This project is deployed on **Hugging Face Spaces** using Streamlit.


## License

MIT License


If you want, I can also write a **short portfolio-style README** optimized specifically for recruiters.