Image_captioning / README.md
Prashanthsrn's picture
Update README.md
18ac5a7 verified

A newer version of the Streamlit SDK is available: 1.53.1

Upgrade
metadata
license: mit
title: Image_captioning
sdk: streamlit
emoji: 🚀
colorFrom: red
colorTo: red
pinned: true

Image Captioning App

This Streamlit app uses a pre-trained model to generate captions for uploaded images.

Challenges Faced

  1. Image Processing: Ensuring correct image preprocessing to match the model's input requirements.
  2. Tensor Conversion: Handling the conversion of image data to the appropriate tensor format.
  3. Error Handling: Implementing error handling and logging

Models Used

The app uses the following pre-trained model from Hugging Face:

  • Model: nlpconnect/vit-gpt2-image-captioning
  • Architecture: Vision Encoder-Decoder Model
  • Vision Encoder: ViT (Vision Transformer)
  • Text Decoder: GPT-2

Steps for Deployment

  1. Set up the environment:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
    pip install -r requirements.txt
    
  2. Prepare the files:

    • Ensure app.py, image_to_text.py, and requirements.txt are in the project directory.
  3. Run the app locally:

    streamlit run app.py
    
  4. Deploy to Streamlit Cloud (optional):

    • Push your code to a GitHub repository.
    • Connect your GitHub account to Streamlit Cloud.
    • Select the repository and branch to deploy.
    • Configure the app settings and deploy.
  5. Alternative Deployment Options:

    • Deploy to Heroku using a Procfile and runtime.txt.
    • Use Docker to containerize the app for deployment on platforms like AWS, Google Cloud, or Azure.

Requirements

See requirements.txt for a full list of dependencies. Key libraries include:

  • streamlit
  • torch
  • transformers
  • Pillow
  • numpy

Usage

  1. Run the Streamlit app.
  2. Upload an image using the file uploader.
  3. Click the "Generate Caption" button.
  4. View the generated caption below the image.

Future Improvements

  • Implement multiple model options for comparison.
  • Use better models
  • Add support for batch processing of images.
  • Improve the UI with additional styling and user feedback.