--- license: mit title: Image_captioning sdk: streamlit emoji: 🚀 colorFrom: red colorTo: red pinned: true --- # Image Captioning App This Streamlit app uses a pre-trained model to generate captions for uploaded images. ## Challenges Faced 1. **Image Processing**: Ensuring correct image preprocessing to match the model's input requirements. 2. **Tensor Conversion**: Handling the conversion of image data to the appropriate tensor format. 3. **Error Handling**: Implementing error handling and logging ## Models Used The app uses the following pre-trained model from Hugging Face: - **Model**: `nlpconnect/vit-gpt2-image-captioning` - **Architecture**: Vision Encoder-Decoder Model - **Vision Encoder**: ViT (Vision Transformer) - **Text Decoder**: GPT-2 ## Steps for Deployment 1. **Set up the environment**: ``` python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate` pip install -r requirements.txt ``` 2. **Prepare the files**: - Ensure `app.py`, `image_to_text.py`, and `requirements.txt` are in the project directory. 3. **Run the app locally**: ``` streamlit run app.py ``` 4. **Deploy to Streamlit Cloud** (optional): - Push your code to a GitHub repository. - Connect your GitHub account to Streamlit Cloud. - Select the repository and branch to deploy. - Configure the app settings and deploy. 5. **Alternative Deployment Options**: - Deploy to Heroku using a Procfile and runtime.txt. - Use Docker to containerize the app for deployment on platforms like AWS, Google Cloud, or Azure. ## Requirements See `requirements.txt` for a full list of dependencies. Key libraries include: - streamlit - torch - transformers - Pillow - numpy ## Usage 1. Run the Streamlit app. 2. Upload an image using the file uploader. 3. Click the "Generate Caption" button. 4. View the generated caption below the image. ## Future Improvements - Implement multiple model options for comparison. - Use better models - Add support for batch processing of images. - Improve the UI with additional styling and user feedback.