Spaces:

Prashanthsrn
/

Image_captioning

Sleeping

App Files Files Community

Image_captioning / README.md

Prashanthsrn

Update README.md

18ac5a7 verified over 1 year ago

preview code

raw

history blame contribute delete

2.11 kB

	---
	license: mit
	title: Image_captioning
	sdk: streamlit
	emoji: 🚀
	colorFrom: red
	colorTo: red
	pinned: true
	---
	# Image Captioning App

	This Streamlit app uses a pre-trained model to generate captions for uploaded images.

	## Challenges Faced

	1. Image Processing: Ensuring correct image preprocessing to match the model's input requirements.
	2. Tensor Conversion: Handling the conversion of image data to the appropriate tensor format.
	3. Error Handling: Implementing error handling and logging

	## Models Used

	The app uses the following pre-trained model from Hugging Face:

	- Model: `nlpconnect/vit-gpt2-image-captioning`
	- Architecture: Vision Encoder-Decoder Model
	- Vision Encoder: ViT (Vision Transformer)
	- Text Decoder: GPT-2

	## Steps for Deployment

	1. Set up the environment:
	```
	python -m venv venv
	source venv/bin/activate # On Windows, use `venv\Scripts\activate`
	pip install -r requirements.txt
	```

	2. Prepare the files:
	- Ensure `app.py`, `image_to_text.py`, and `requirements.txt` are in the project directory.

	3. Run the app locally:
	```
	streamlit run app.py
	```

	4. Deploy to Streamlit Cloud (optional):
	- Push your code to a GitHub repository.
	- Connect your GitHub account to Streamlit Cloud.
	- Select the repository and branch to deploy.
	- Configure the app settings and deploy.

	5. Alternative Deployment Options:
	- Deploy to Heroku using a Procfile and runtime.txt.
	- Use Docker to containerize the app for deployment on platforms like AWS, Google Cloud, or Azure.

	## Requirements

	See `requirements.txt` for a full list of dependencies. Key libraries include:

	- streamlit
	- torch
	- transformers
	- Pillow
	- numpy

	## Usage

	1. Run the Streamlit app.
	2. Upload an image using the file uploader.
	3. Click the "Generate Caption" button.
	4. View the generated caption below the image.

	## Future Improvements

	- Implement multiple model options for comparison.
	- Use better models
	- Add support for batch processing of images.
	- Improve the UI with additional styling and user feedback.