Image_captioning / README.md
Prashanthsrn's picture
Update README.md
18ac5a7 verified
---
license: mit
title: Image_captioning
sdk: streamlit
emoji: ๐Ÿš€
colorFrom: red
colorTo: red
pinned: true
---
# Image Captioning App
This Streamlit app uses a pre-trained model to generate captions for uploaded images.
## Challenges Faced
1. **Image Processing**: Ensuring correct image preprocessing to match the model's input requirements.
2. **Tensor Conversion**: Handling the conversion of image data to the appropriate tensor format.
3. **Error Handling**: Implementing error handling and logging
## Models Used
The app uses the following pre-trained model from Hugging Face:
- **Model**: `nlpconnect/vit-gpt2-image-captioning`
- **Architecture**: Vision Encoder-Decoder Model
- **Vision Encoder**: ViT (Vision Transformer)
- **Text Decoder**: GPT-2
## Steps for Deployment
1. **Set up the environment**:
```
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
pip install -r requirements.txt
```
2. **Prepare the files**:
- Ensure `app.py`, `image_to_text.py`, and `requirements.txt` are in the project directory.
3. **Run the app locally**:
```
streamlit run app.py
```
4. **Deploy to Streamlit Cloud** (optional):
- Push your code to a GitHub repository.
- Connect your GitHub account to Streamlit Cloud.
- Select the repository and branch to deploy.
- Configure the app settings and deploy.
5. **Alternative Deployment Options**:
- Deploy to Heroku using a Procfile and runtime.txt.
- Use Docker to containerize the app for deployment on platforms like AWS, Google Cloud, or Azure.
## Requirements
See `requirements.txt` for a full list of dependencies. Key libraries include:
- streamlit
- torch
- transformers
- Pillow
- numpy
## Usage
1. Run the Streamlit app.
2. Upload an image using the file uploader.
3. Click the "Generate Caption" button.
4. View the generated caption below the image.
## Future Improvements
- Implement multiple model options for comparison.
- Use better models
- Add support for batch processing of images.
- Improve the UI with additional styling and user feedback.