Spaces:
Sleeping
Update README.md
Browse files# πΌοΈ Multi-Model Image Caption Generator
A powerful Streamlit application that generates captions for images using multiple AI models (OpenAI GPT-4o, Google Gemini, and GROQ Vision) with advanced image processing capabilities using OpenCV and LangChain for history management.
## β¨ Features
- **Multi-Model Support**: Choose from OpenAI GPT-4o, Google Gemini, or GROQ Vision models
- **Smart Caption Generation**: Clean, professional captions (10-50 words, no emojis/symbols)
- **Advanced Image Processing**: Two caption overlay methods using OpenCV
- **LangChain Integration**: Comprehensive history management and conversation memory
- **Custom Typography**: Uses Poppins font with intelligent fallbacks
- **Interactive UI**: Modern Streamlit interface with real-time preview
- **Export Functionality**: Download processed images with captions
## π Quick Start
### Prerequisites
- Python 3.8+
- API keys for at least one of the supported models
### Installation
1. **Clone the repository**
```bash
git clone <your-repo-url>
cd multi-model-caption-generator
```
2. **Install dependencies**
```bash
pip install streamlit opencv-python pillow openai google-generativeai groq langchain python-dotenv
```
3. **Set up environment variables**
Create a `.env` file in the project root:
```env
OPENAI_API_KEY_IC=your_openai_api_key_here
GEMINI_API_KEY_IC=your_gemini_api_key_here
GROQ_API_KEY_IC=your_groq_api_key_here
```
4. **Set up fonts (optional)**
Place your font file at:
```
fonts/Poppins-Regular.ttf
```
5. **Run the application**
```bash
streamlit run main.py
```
## π Project Structure
```
multi-model-caption-generator/
βββ main.py # Main Streamlit application
βββ caption_generation.py # Multi-model caption generation
βββ caption_history.py # LangChain history management
βββ caption_overlay.py # OpenCV image processing
βββ fonts/ # Font directory
β βββ Poppins-Regular.ttf # Custom font (optional)
βββ .env # Environment variables
βββ caption_history.json # Auto-generated history file
βββ README.md # This file
```
## π€ Supported AI Models
### OpenAI GPT-4o
- **Model**: `gpt-4o`
- **Strengths**: Detailed image analysis, high accuracy
- **API**: OpenAI Vision API
### Google Gemini
- **Model**: `gemini-1.5-flash`
- **Strengths**: Fast processing, multimodal understanding
- **API**: Google Generative AI
### GROQ Vision
- **Model**: `llama-3.2-11b-vision-preview`
- **Strengths**: High-speed inference, efficient processing
- **API**: GROQ API
## π¨ Caption Overlay Options
### 1. Overlay on Image
- Position: Top, Center, or Bottom
- Customizable font size and thickness
- Auto text wrapping for long captions
- Semi-transparent background for readability
### 2. Background Behind Image
- Caption appears above the image
- Customizable background and text colors
- Adjustable margins
- Uses Poppins font with fallbacks
## π Caption History Management
The application uses LangChain for sophisticated history management:
- **Persistent Storage**: All captions saved to `caption_history.json`
- **Memory Integration**: LangChain ConversationBufferMemory
- **Search & Filter**: Find previous captions by image name or content
- **Export History**: View and manage generation history
## π§ Configuration
### API Keys Setup
Get your API keys from:
- **OpenAI**: [https://platform.openai.com/api-keys](https://platform.openai.com/api-keys)
- **Google Gemini**: [https://makersuite.google.com/app/apikey](https://makersuite.google.com/app/apikey)
- **GROQ**: [https://console.groq.com/keys](https://console.groq.com/keys)
### Font Configuration
The app automatically uses fonts in this priority:
1. Custom font path (if specified in UI)
2. `fonts/Poppins-Regular.ttf` (if available)
3. System default font
### Caption Settings
- **Word Limit**: 10-50 words maximum
- **Format**: Plain text only (no emojis or special characters)
- **Style**: Descriptive but concise
## π₯οΈ Usage
1. **Configure APIs**: Add your API keys to `.env` file and click "Configure APIs"
2. **Upload Image**: Choose PNG, JPG, JPEG, BMP, or TIFF files
3. **Select Model**: Choose from OpenAI, Gemini, or GROQ
4. **Generate Caption**: Click to generate and see real-time preview
5. **Customize Overlay**: Adjust position, colors, and styling
6. **Download**: Save the final image with caption
## π― Key Features Explained
### Smart Caption Generation
- All models generate clean, professional captions
- Consistent 10-50 word length
- No emojis or special characters
- Perfect for image overlays
### Advanced Image Processing
- OpenCV-powered text rendering
- Automatic text wrapping
- High-quality font rendering with PIL
- Multiple overlay styles
### History Management
- LangChain integration for conversation memory
- Searchable history with timestamps
- Model tracking for each generation
- Easy history clearing and management
## π οΈ Technical Details
### Dependencies
```
streamlit>=1.28.0
opencv-python>=4.8.0
pillow>=10.0.0
openai>=1.0.0
google-generativeai>=0.3.0
groq>=0.4.0
langchain>=0.1.0
python-dotenv>=1.0.0
numpy>=1.24.0
```
### Performance Optimizations
- Efficient base64 encoding for API calls
- Optimized image processing with OpenCV
- Smart memory management with LangChain
- Reduced token limits for faster generation
## π Troubleshooting
### Common Issues
**API Key Errors**
- Ensure all API keys are correctly set in `.env` file
- Check API key validity and quotas
- Restart the application after adding keys
**Font Loading Issues**
- Verify font file exists at `fonts/Poppins-Regular.ttf`
- Check file permissions
- App will fallback to default font if custom font fails
**Image Processing Errors**
- Ensure uploaded images are valid formats
- Check image file size (very large images may cause issues)
- Try different image formats if problems persist
**Model-Specific Issues**
- **OpenAI**: Verify you have access to GPT-4o vision model
- **Gemini**: Ensure Gemini API is enabled in your Google Cloud project
- **GROQ**: Check that vision models are available in your region
### Error Messages
| Error | Solution |
|-------|----------|
| "API key not configured" | Add the required API key to `.env` file |
| "Model not available" | Check model name and API access |
| "Image processing failed" | Try a different image format or size |
| "Font loading error" | Check font file path or use default font |
## π€ Contributing
1. Fork the repository
2. Create a feature branch: `git checkout -b feature-name`
3. Make your changes and commit: `git commit -m 'Add feature'`
4. Push to the branch: `git push origin feature-name`
5. Submit a pull request
## π License
This project is licensed under the MIT License - see the [MIT LICENSE](https://mit-license.org/) file for details.
## π Acknowledgments
- **Streamlit** for the amazing web app framework
- **OpenCV** for powerful image processing capabilities
- **LangChain** for conversation memory management
- **OpenAI, Google, and GROQ** for providing excellent vision APIs
- **Poppins Font** for beautiful typography
## π Support
If you encounter any issues or have questions:
1. Check the troubleshooting section above
2. Review the [Issues](https://github.com/your-repo/issues) page
3. Create a new issue with detailed information
4. Provide error messages and steps to reproduce
---
**Built with β€οΈ using Streamlit, LangChain, OpenCV, and multi-model AI APIs**
|
@@ -1,3 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# πΌοΈ Multi-Model Image Caption Generator
|
| 2 |
|
| 3 |
A powerful Streamlit application that generates captions for images using multiple AI models (OpenAI GPT-4o, Google Gemini, and GROQ Vision) with advanced image processing capabilities using OpenCV and LangChain for history management.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
title: Multi_LLM_Image_Captioning
|
| 4 |
+
sdk: streamlit
|
| 5 |
+
emoji: π»
|
| 6 |
+
colorFrom: purple
|
| 7 |
+
colorTo: indigo
|
| 8 |
+
pinned: true
|
| 9 |
+
thumbnail: >-
|
| 10 |
+
https://cdn-uploads.huggingface.co/production/uploads/662234af4dd89a733b09e612/gnrlvy8935CNe0fcx0HZs.png
|
| 11 |
+
short_description: A powerful Streamlit application that generates captions for
|
| 12 |
+
---
|
| 13 |
# πΌοΈ Multi-Model Image Caption Generator
|
| 14 |
|
| 15 |
A powerful Streamlit application that generates captions for images using multiple AI models (OpenAI GPT-4o, Google Gemini, and GROQ Vision) with advanced image processing capabilities using OpenCV and LangChain for history management.
|