Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.53.1
metadata
license: mit
title: Image_captioning
sdk: streamlit
emoji: 🚀
colorFrom: red
colorTo: red
pinned: true
Image Captioning App
This Streamlit app uses a pre-trained model to generate captions for uploaded images.
Challenges Faced
- Image Processing: Ensuring correct image preprocessing to match the model's input requirements.
- Tensor Conversion: Handling the conversion of image data to the appropriate tensor format.
- Error Handling: Implementing error handling and logging
Models Used
The app uses the following pre-trained model from Hugging Face:
- Model:
nlpconnect/vit-gpt2-image-captioning - Architecture: Vision Encoder-Decoder Model
- Vision Encoder: ViT (Vision Transformer)
- Text Decoder: GPT-2
Steps for Deployment
Set up the environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate` pip install -r requirements.txtPrepare the files:
- Ensure
app.py,image_to_text.py, andrequirements.txtare in the project directory.
- Ensure
Run the app locally:
streamlit run app.pyDeploy to Streamlit Cloud (optional):
- Push your code to a GitHub repository.
- Connect your GitHub account to Streamlit Cloud.
- Select the repository and branch to deploy.
- Configure the app settings and deploy.
Alternative Deployment Options:
- Deploy to Heroku using a Procfile and runtime.txt.
- Use Docker to containerize the app for deployment on platforms like AWS, Google Cloud, or Azure.
Requirements
See requirements.txt for a full list of dependencies. Key libraries include:
- streamlit
- torch
- transformers
- Pillow
- numpy
Usage
- Run the Streamlit app.
- Upload an image using the file uploader.
- Click the "Generate Caption" button.
- View the generated caption below the image.
Future Improvements
- Implement multiple model options for comparison.
- Use better models
- Add support for batch processing of images.
- Improve the UI with additional styling and user feedback.