Spaces:

Prashanthsrn
/

Image_captioning

Sleeping

File size: 2,107 Bytes

---
license: mit
title: Image_captioning
sdk: streamlit
emoji: 🚀
colorFrom: red
colorTo: red
pinned: true
---
# Image Captioning App

This Streamlit app uses a pre-trained model to generate captions for uploaded images.

## Challenges Faced

1. **Image Processing**: Ensuring correct image preprocessing to match the model's input requirements.
2. **Tensor Conversion**: Handling the conversion of image data to the appropriate tensor format.
3. **Error Handling**: Implementing error handling and logging

## Models Used

The app uses the following pre-trained model from Hugging Face:

- **Model**: `nlpconnect/vit-gpt2-image-captioning`
- **Architecture**: Vision Encoder-Decoder Model
- **Vision Encoder**: ViT (Vision Transformer)
- **Text Decoder**: GPT-2

## Steps for Deployment

1. **Set up the environment**:
   ```
   python -m venv venv
   source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
   pip install -r requirements.txt
   ```

2. **Prepare the files**:
   - Ensure `app.py`, `image_to_text.py`, and `requirements.txt` are in the project directory.

3. **Run the app locally**:
   ```
   streamlit run app.py
   ```

4. **Deploy to Streamlit Cloud** (optional):
   - Push your code to a GitHub repository.
   - Connect your GitHub account to Streamlit Cloud.
   - Select the repository and branch to deploy.
   - Configure the app settings and deploy.

5. **Alternative Deployment Options**:
   - Deploy to Heroku using a Procfile and runtime.txt.
   - Use Docker to containerize the app for deployment on platforms like AWS, Google Cloud, or Azure.

## Requirements

See `requirements.txt` for a full list of dependencies. Key libraries include:

- streamlit
- torch
- transformers
- Pillow
- numpy

## Usage

1. Run the Streamlit app.
2. Upload an image using the file uploader.
3. Click the "Generate Caption" button.
4. View the generated caption below the image.

## Future Improvements

- Implement multiple model options for comparison.
- Use better models
- Add support for batch processing of images.
- Improve the UI with additional styling and user feedback.