Spaces:
Sleeping
Upload 9 files
Browse files# πΌοΈ Multi-Model Image Caption Generator
A powerful Streamlit application that generates captions for images using multiple AI models (OpenAI GPT-4o, Google Gemini, and GROQ Vision) with advanced image processing capabilities using OpenCV and LangChain for history management.
## β¨ Features
- **Multi-Model Support**: Choose from OpenAI GPT-4o, Google Gemini, or GROQ Vision models
- **Smart Caption Generation**: Clean, professional captions (10-50 words, no emojis/symbols)
- **Advanced Image Processing**: Two caption overlay methods using OpenCV
- **LangChain Integration**: Comprehensive history management and conversation memory
- **Custom Typography**: Uses Poppins font with intelligent fallbacks
- **Interactive UI**: Modern Streamlit interface with real-time preview
- **Export Functionality**: Download processed images with captions
## π Quick Start
### Prerequisites
- Python 3.8+
- API keys for at least one of the supported models
### Installation
1. **Clone the repository**
```bash
git clone <your-repo-url>
cd multi-model-caption-generator
```
2. **Install dependencies**
```bash
pip install streamlit opencv-python pillow openai google-generativeai groq langchain python-dotenv
```
3. **Set up environment variables**
Create a `.env` file in the project root:
```env
OPENAI_API_KEY_IC=your_openai_api_key_here
GEMINI_API_KEY_IC=your_gemini_api_key_here
GROQ_API_KEY_IC=your_groq_api_key_here
```
4. **Set up fonts (optional)**
Place your font file at:
```
fonts/Poppins-Regular.ttf
```
5. **Run the application**
```bash
streamlit run main.py
```
## π Project Structure
```
multi-model-caption-generator/
βββ main.py # Main Streamlit application
βββ caption_generation.py # Multi-model caption generation
βββ caption_history.py # LangChain history management
βββ caption_overlay.py # OpenCV image processing
βββ fonts/ # Font directory
β βββ Poppins-Regular.ttf # Custom font (optional)
βββ .env # Environment variables
βββ caption_history.json # Auto-generated history file
βββ README.md # This file
```
## π€ Supported AI Models
### OpenAI GPT-4o
- **Model**: `gpt-4o`
- **Strengths**: Detailed image analysis, high accuracy
- **API**: OpenAI Vision API
### Google Gemini
- **Model**: `gemini-1.5-flash`
- **Strengths**: Fast processing, multimodal understanding
- **API**: Google Generative AI
### GROQ Vision
- **Model**: `llama-3.2-11b-vision-preview`
- **Strengths**: High-speed inference, efficient processing
- **API**: GROQ API
## π¨ Caption Overlay Options
### 1. Overlay on Image
- Position: Top, Center, or Bottom
- Customizable font size and thickness
- Auto text wrapping for long captions
- Semi-transparent background for readability
### 2. Background Behind Image
- Caption appears above the image
- Customizable background and text colors
- Adjustable margins
- Uses Poppins font with fallbacks
## π Caption History Management
The application uses LangChain for sophisticated history management:
- **Persistent Storage**: All captions saved to `caption_history.json`
- **Memory Integration**: LangChain ConversationBufferMemory
- **Search & Filter**: Find previous captions by image name or content
- **Export History**: View and manage generation history
## π§ Configuration
### API Keys Setup
Get your API keys from:
- **OpenAI**: [https://platform.openai.com/api-keys](https://platform.openai.com/api-keys)
- **Google Gemini**: [https://makersuite.google.com/app/apikey](https://makersuite.google.com/app/apikey)
- **GROQ**: [https://console.groq.com/keys](https://console.groq.com/keys)
### Font Configuration
The app automatically uses fonts in this priority:
1. Custom font path (if specified in UI)
2. `fonts/Poppins-Regular.ttf` (if available)
3. System default font
### Caption Settings
- **Word Limit**: 10-50 words maximum
- **Format**: Plain text only (no emojis or special characters)
- **Style**: Descriptive but concise
## π₯οΈ Usage
1. **Configure APIs**: Add your API keys to `.env` file and click "Configure APIs"
2. **Upload Image**: Choose PNG, JPG, JPEG, BMP, or TIFF files
3. **Select Model**: Choose from OpenAI, Gemini, or GROQ
4. **Generate Caption**: Click to generate and see real-time preview
5. **Customize Overlay**: Adjust position, colors, and styling
6. **Download**: Save the final image with caption
## π― Key Features Explained
### Smart Caption Generation
- All models generate clean, professional captions
- Consistent 10-50 word length
- No emojis or special characters
- Perfect for image overlays
### Advanced Image Processing
- OpenCV-powered text rendering
- Automatic text wrapping
- High-quality font rendering with PIL
- Multiple overlay styles
### History Management
- LangChain integration for conversation memory
- Searchable history with timestamps
- Model tracking for each generation
- Easy history clearing and management
## π οΈ Technical Details
### Dependencies
```
streamlit>=1.28.0
opencv-python>=4.8.0
pillow>=10.0.0
openai>=1.0.0
google-generativeai>=0.3.0
groq>=0.4.0
langchain>=0.1.0
python-dotenv>=1.0.0
numpy>=1.24.0
```
### Performance Optimizations
- Efficient base64 encoding for API calls
- Optimized image processing with OpenCV
- Smart memory management with LangChain
- Reduced token limits for faster generation
## π Troubleshooting
### Common Issues
**API Key Errors**
- Ensure all API keys are correctly set in `.env` file
- Check API key validity and quotas
- Restart the application after adding keys
**Font Loading Issues**
- Verify font file exists at `fonts/Poppins-Regular.ttf`
- Check file permissions
- App will fallback to default font if custom font fails
**Image Processing Errors**
- Ensure uploaded images are valid formats
- Check image file size (very large images may cause issues)
- Try different image formats if problems persist
**Model-Specific Issues**
- **OpenAI**: Verify you have access to GPT-4o vision model
- **Gemini**: Ensure Gemini API is enabled in your Google Cloud project
- **GROQ**: Check that vision models are available in your region
### Error Messages
| Error | Solution |
|-------|----------|
| "API key not configured" | Add the required API key to `.env` file |
| "Model not available" | Check model name and API access |
| "Image processing failed" | Try a different image format or size |
| "Font loading error" | Check font file path or use default font |
## π€ Contributing
1. Fork the repository
2. Create a feature branch: `git checkout -b feature-name`
3. Make your changes and commit: `git commit -m 'Add feature'`
4. Push to the branch: `git push origin feature-name`
5. Submit a pull request
## π License
This project is licensed under the MIT License - see the [MIT LICENSE](https://mit-license.org/) file for details.
## π Acknowledgments
- **Streamlit** for the amazing web app framework
- **OpenCV** for powerful image processing capabilities
- **LangChain** for conversation memory management
- **OpenAI, Google, and GROQ** for providing excellent vision APIs
- **Poppins Font** for beautiful typography
## π Support
If you encounter any issues or have questions:
1. Check the troubleshooting section above
2. Review the [Issues](https://github.com/your-repo/issues) page
3. Create a new issue with detailed information
4. Provide error messages and steps to reproduce
---
**Built with β€οΈ using Streamlit, LangChain, OpenCV, and multi-model AI APIs**
- .env +3 -0
- .gitignore +160 -0
- README.md +242 -17
- caption_generation.py +116 -0
- caption_history.json +110 -0
- caption_history.py +71 -0
- caption_overlay.py +154 -0
- main.py +294 -0
- requirements.txt +8 -2
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
OPENAI_API_KEY_IC = "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"
|
| 2 |
+
GEMINI_API_KEY_IC ="AIzaSyC2zOnkcC5bK0zmWgszdjO8bhMf3sRHZsM"
|
| 3 |
+
GROQ_API_KEY_IC = "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"
|
|
@@ -0,0 +1,160 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Byte-compiled / optimized / DLL files
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.py[cod]
|
| 4 |
+
*$py.class
|
| 5 |
+
|
| 6 |
+
# C extensions
|
| 7 |
+
*.so
|
| 8 |
+
|
| 9 |
+
# Distribution / packaging
|
| 10 |
+
.Python
|
| 11 |
+
build/
|
| 12 |
+
develop-eggs/
|
| 13 |
+
dist/
|
| 14 |
+
downloads/
|
| 15 |
+
eggs/
|
| 16 |
+
.eggs/
|
| 17 |
+
lib/
|
| 18 |
+
lib64/
|
| 19 |
+
parts/
|
| 20 |
+
sdist/
|
| 21 |
+
var/
|
| 22 |
+
wheels/
|
| 23 |
+
share/python-wheels/
|
| 24 |
+
*.egg-info/
|
| 25 |
+
.installed.cfg
|
| 26 |
+
*.egg
|
| 27 |
+
MANIFEST
|
| 28 |
+
|
| 29 |
+
# PyInstaller
|
| 30 |
+
# Usually these files are written by a python script from a template
|
| 31 |
+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
| 32 |
+
*.manifest
|
| 33 |
+
*.spec
|
| 34 |
+
|
| 35 |
+
# Installer logs
|
| 36 |
+
pip-log.txt
|
| 37 |
+
pip-delete-this-directory.txt
|
| 38 |
+
|
| 39 |
+
# Unit test / coverage reports
|
| 40 |
+
htmlcov/
|
| 41 |
+
.tox/
|
| 42 |
+
.nox/
|
| 43 |
+
.coverage
|
| 44 |
+
.coverage.*
|
| 45 |
+
.cache
|
| 46 |
+
nosetests.xml
|
| 47 |
+
coverage.xml
|
| 48 |
+
*.cover
|
| 49 |
+
*.py,cover
|
| 50 |
+
.hypothesis/
|
| 51 |
+
.pytest_cache/
|
| 52 |
+
cover/
|
| 53 |
+
|
| 54 |
+
# Translations
|
| 55 |
+
*.mo
|
| 56 |
+
*.pot
|
| 57 |
+
|
| 58 |
+
# Django stuff:
|
| 59 |
+
*.log
|
| 60 |
+
local_settings.py
|
| 61 |
+
db.sqlite3
|
| 62 |
+
db.sqlite3-journal
|
| 63 |
+
|
| 64 |
+
# Flask stuff:
|
| 65 |
+
instance/
|
| 66 |
+
.webassets-cache
|
| 67 |
+
|
| 68 |
+
# Scrapy stuff:
|
| 69 |
+
.scrapy
|
| 70 |
+
|
| 71 |
+
# Sphinx documentation
|
| 72 |
+
docs/_build/
|
| 73 |
+
|
| 74 |
+
# PyBuilder
|
| 75 |
+
.pybuilder/
|
| 76 |
+
target/
|
| 77 |
+
|
| 78 |
+
# Jupyter Notebook
|
| 79 |
+
.ipynb_checkpoints
|
| 80 |
+
|
| 81 |
+
# IPython
|
| 82 |
+
profile_default/
|
| 83 |
+
ipython_config.py
|
| 84 |
+
|
| 85 |
+
# pyenv
|
| 86 |
+
# For a library or package, you might want to ignore these files since the code is
|
| 87 |
+
# intended to run in multiple environments; otherwise, check them in:
|
| 88 |
+
# .python-version
|
| 89 |
+
|
| 90 |
+
# pipenv
|
| 91 |
+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
|
| 92 |
+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
|
| 93 |
+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
|
| 94 |
+
# install all needed dependencies.
|
| 95 |
+
#Pipfile.lock
|
| 96 |
+
|
| 97 |
+
# poetry
|
| 98 |
+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
|
| 99 |
+
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
| 100 |
+
# commonly ignored for libraries.
|
| 101 |
+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
|
| 102 |
+
#poetry.lock
|
| 103 |
+
|
| 104 |
+
# pdm
|
| 105 |
+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
|
| 106 |
+
#pdm.lock
|
| 107 |
+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
|
| 108 |
+
# in version control.
|
| 109 |
+
# https://pdm.fming.dev/#use-with-ide
|
| 110 |
+
.pdm.toml
|
| 111 |
+
|
| 112 |
+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
|
| 113 |
+
__pypackages__/
|
| 114 |
+
|
| 115 |
+
# Celery stuff
|
| 116 |
+
celerybeat-schedule
|
| 117 |
+
celerybeat.pid
|
| 118 |
+
|
| 119 |
+
# SageMath parsed files
|
| 120 |
+
*.sage.py
|
| 121 |
+
|
| 122 |
+
# Environments
|
| 123 |
+
.env
|
| 124 |
+
.venv
|
| 125 |
+
env/
|
| 126 |
+
venv/
|
| 127 |
+
ENV/
|
| 128 |
+
env.bak/
|
| 129 |
+
venv.bak/
|
| 130 |
+
|
| 131 |
+
# Spyder project settings
|
| 132 |
+
.spyderproject
|
| 133 |
+
.spyproject
|
| 134 |
+
|
| 135 |
+
# Rope project settings
|
| 136 |
+
.ropeproject
|
| 137 |
+
|
| 138 |
+
# mkdocs documentation
|
| 139 |
+
/site
|
| 140 |
+
|
| 141 |
+
# mypy
|
| 142 |
+
.mypy_cache/
|
| 143 |
+
.dmypy.json
|
| 144 |
+
dmypy.json
|
| 145 |
+
|
| 146 |
+
# Pyre type checker
|
| 147 |
+
.pyre/
|
| 148 |
+
|
| 149 |
+
# pytype static type analyzer
|
| 150 |
+
.pytype/
|
| 151 |
+
|
| 152 |
+
# Cython debug symbols
|
| 153 |
+
cython_debug/
|
| 154 |
+
|
| 155 |
+
# PyCharm
|
| 156 |
+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
|
| 157 |
+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
|
| 158 |
+
# and can be added to the global gitignore or merged into this file. For a more nuclear
|
| 159 |
+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
| 160 |
+
#.idea/
|
|
@@ -1,20 +1,245 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
-
|
| 20 |
-
forums](https://discuss.streamlit.io).
|
|
|
|
| 1 |
+
# πΌοΈ Multi-Model Image Caption Generator
|
| 2 |
+
|
| 3 |
+
A powerful Streamlit application that generates captions for images using multiple AI models (OpenAI GPT-4o, Google Gemini, and GROQ Vision) with advanced image processing capabilities using OpenCV and LangChain for history management.
|
| 4 |
+
|
| 5 |
+
## β¨ Features
|
| 6 |
+
|
| 7 |
+
- **Multi-Model Support**: Choose from OpenAI GPT-4o, Google Gemini, or GROQ Vision models
|
| 8 |
+
- **Smart Caption Generation**: Clean, professional captions (10-50 words, no emojis/symbols)
|
| 9 |
+
- **Advanced Image Processing**: Two caption overlay methods using OpenCV
|
| 10 |
+
- **LangChain Integration**: Comprehensive history management and conversation memory
|
| 11 |
+
- **Custom Typography**: Uses Poppins font with intelligent fallbacks
|
| 12 |
+
- **Interactive UI**: Modern Streamlit interface with real-time preview
|
| 13 |
+
- **Export Functionality**: Download processed images with captions
|
| 14 |
+
|
| 15 |
+
## π Quick Start
|
| 16 |
+
|
| 17 |
+
### Prerequisites
|
| 18 |
+
|
| 19 |
+
- Python 3.8+
|
| 20 |
+
- API keys for at least one of the supported models
|
| 21 |
+
|
| 22 |
+
### Installation
|
| 23 |
+
|
| 24 |
+
1. **Clone the repository**
|
| 25 |
+
```bash
|
| 26 |
+
git clone <your-repo-url>
|
| 27 |
+
cd multi-model-caption-generator
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
2. **Install dependencies**
|
| 31 |
+
```bash
|
| 32 |
+
pip install streamlit opencv-python pillow openai google-generativeai groq langchain python-dotenv
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
3. **Set up environment variables**
|
| 36 |
+
Create a `.env` file in the project root:
|
| 37 |
+
```env
|
| 38 |
+
OPENAI_API_KEY_IC=your_openai_api_key_here
|
| 39 |
+
GEMINI_API_KEY_IC=your_gemini_api_key_here
|
| 40 |
+
GROQ_API_KEY_IC=your_groq_api_key_here
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
4. **Set up fonts (optional)**
|
| 44 |
+
Place your font file at:
|
| 45 |
+
```
|
| 46 |
+
fonts/Poppins-Regular.ttf
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
5. **Run the application**
|
| 50 |
+
```bash
|
| 51 |
+
streamlit run main.py
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
## π Project Structure
|
| 55 |
+
|
| 56 |
+
```
|
| 57 |
+
multi-model-caption-generator/
|
| 58 |
+
βββ main.py # Main Streamlit application
|
| 59 |
+
βββ caption_generation.py # Multi-model caption generation
|
| 60 |
+
βββ caption_history.py # LangChain history management
|
| 61 |
+
βββ caption_overlay.py # OpenCV image processing
|
| 62 |
+
βββ fonts/ # Font directory
|
| 63 |
+
β βββ Poppins-Regular.ttf # Custom font (optional)
|
| 64 |
+
βββ .env # Environment variables
|
| 65 |
+
βββ caption_history.json # Auto-generated history file
|
| 66 |
+
βββ README.md # This file
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
## π€ Supported AI Models
|
| 70 |
+
|
| 71 |
+
### OpenAI GPT-4o
|
| 72 |
+
- **Model**: `gpt-4o`
|
| 73 |
+
- **Strengths**: Detailed image analysis, high accuracy
|
| 74 |
+
- **API**: OpenAI Vision API
|
| 75 |
+
|
| 76 |
+
### Google Gemini
|
| 77 |
+
- **Model**: `gemini-1.5-flash`
|
| 78 |
+
- **Strengths**: Fast processing, multimodal understanding
|
| 79 |
+
- **API**: Google Generative AI
|
| 80 |
+
|
| 81 |
+
### GROQ Vision
|
| 82 |
+
- **Model**: `llama-3.2-11b-vision-preview`
|
| 83 |
+
- **Strengths**: High-speed inference, efficient processing
|
| 84 |
+
- **API**: GROQ API
|
| 85 |
+
|
| 86 |
+
## π¨ Caption Overlay Options
|
| 87 |
+
|
| 88 |
+
### 1. Overlay on Image
|
| 89 |
+
- Position: Top, Center, or Bottom
|
| 90 |
+
- Customizable font size and thickness
|
| 91 |
+
- Auto text wrapping for long captions
|
| 92 |
+
- Semi-transparent background for readability
|
| 93 |
+
|
| 94 |
+
### 2. Background Behind Image
|
| 95 |
+
- Caption appears above the image
|
| 96 |
+
- Customizable background and text colors
|
| 97 |
+
- Adjustable margins
|
| 98 |
+
- Uses Poppins font with fallbacks
|
| 99 |
+
|
| 100 |
+
## π Caption History Management
|
| 101 |
+
|
| 102 |
+
The application uses LangChain for sophisticated history management:
|
| 103 |
+
|
| 104 |
+
- **Persistent Storage**: All captions saved to `caption_history.json`
|
| 105 |
+
- **Memory Integration**: LangChain ConversationBufferMemory
|
| 106 |
+
- **Search & Filter**: Find previous captions by image name or content
|
| 107 |
+
- **Export History**: View and manage generation history
|
| 108 |
+
|
| 109 |
+
## π§ Configuration
|
| 110 |
+
|
| 111 |
+
### API Keys Setup
|
| 112 |
+
|
| 113 |
+
Get your API keys from:
|
| 114 |
+
- **OpenAI**: [https://platform.openai.com/api-keys](https://platform.openai.com/api-keys)
|
| 115 |
+
- **Google Gemini**: [https://makersuite.google.com/app/apikey](https://makersuite.google.com/app/apikey)
|
| 116 |
+
- **GROQ**: [https://console.groq.com/keys](https://console.groq.com/keys)
|
| 117 |
+
|
| 118 |
+
### Font Configuration
|
| 119 |
|
| 120 |
+
The app automatically uses fonts in this priority:
|
| 121 |
+
1. Custom font path (if specified in UI)
|
| 122 |
+
2. `fonts/Poppins-Regular.ttf` (if available)
|
| 123 |
+
3. System default font
|
| 124 |
|
| 125 |
+
### Caption Settings
|
| 126 |
+
|
| 127 |
+
- **Word Limit**: 10-50 words maximum
|
| 128 |
+
- **Format**: Plain text only (no emojis or special characters)
|
| 129 |
+
- **Style**: Descriptive but concise
|
| 130 |
+
|
| 131 |
+
## π₯οΈ Usage
|
| 132 |
+
|
| 133 |
+
1. **Configure APIs**: Add your API keys to `.env` file and click "Configure APIs"
|
| 134 |
+
2. **Upload Image**: Choose PNG, JPG, JPEG, BMP, or TIFF files
|
| 135 |
+
3. **Select Model**: Choose from OpenAI, Gemini, or GROQ
|
| 136 |
+
4. **Generate Caption**: Click to generate and see real-time preview
|
| 137 |
+
5. **Customize Overlay**: Adjust position, colors, and styling
|
| 138 |
+
6. **Download**: Save the final image with caption
|
| 139 |
+
|
| 140 |
+
## π― Key Features Explained
|
| 141 |
+
|
| 142 |
+
### Smart Caption Generation
|
| 143 |
+
- All models generate clean, professional captions
|
| 144 |
+
- Consistent 10-50 word length
|
| 145 |
+
- No emojis or special characters
|
| 146 |
+
- Perfect for image overlays
|
| 147 |
+
|
| 148 |
+
### Advanced Image Processing
|
| 149 |
+
- OpenCV-powered text rendering
|
| 150 |
+
- Automatic text wrapping
|
| 151 |
+
- High-quality font rendering with PIL
|
| 152 |
+
- Multiple overlay styles
|
| 153 |
+
|
| 154 |
+
### History Management
|
| 155 |
+
- LangChain integration for conversation memory
|
| 156 |
+
- Searchable history with timestamps
|
| 157 |
+
- Model tracking for each generation
|
| 158 |
+
- Easy history clearing and management
|
| 159 |
+
|
| 160 |
+
## π οΈ Technical Details
|
| 161 |
+
|
| 162 |
+
### Dependencies
|
| 163 |
+
```
|
| 164 |
+
streamlit>=1.28.0
|
| 165 |
+
opencv-python>=4.8.0
|
| 166 |
+
pillow>=10.0.0
|
| 167 |
+
openai>=1.0.0
|
| 168 |
+
google-generativeai>=0.3.0
|
| 169 |
+
groq>=0.4.0
|
| 170 |
+
langchain>=0.1.0
|
| 171 |
+
python-dotenv>=1.0.0
|
| 172 |
+
numpy>=1.24.0
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
### Performance Optimizations
|
| 176 |
+
- Efficient base64 encoding for API calls
|
| 177 |
+
- Optimized image processing with OpenCV
|
| 178 |
+
- Smart memory management with LangChain
|
| 179 |
+
- Reduced token limits for faster generation
|
| 180 |
+
|
| 181 |
+
## π Troubleshooting
|
| 182 |
+
|
| 183 |
+
### Common Issues
|
| 184 |
+
|
| 185 |
+
**API Key Errors**
|
| 186 |
+
- Ensure all API keys are correctly set in `.env` file
|
| 187 |
+
- Check API key validity and quotas
|
| 188 |
+
- Restart the application after adding keys
|
| 189 |
+
|
| 190 |
+
**Font Loading Issues**
|
| 191 |
+
- Verify font file exists at `fonts/Poppins-Regular.ttf`
|
| 192 |
+
- Check file permissions
|
| 193 |
+
- App will fallback to default font if custom font fails
|
| 194 |
+
|
| 195 |
+
**Image Processing Errors**
|
| 196 |
+
- Ensure uploaded images are valid formats
|
| 197 |
+
- Check image file size (very large images may cause issues)
|
| 198 |
+
- Try different image formats if problems persist
|
| 199 |
+
|
| 200 |
+
**Model-Specific Issues**
|
| 201 |
+
- **OpenAI**: Verify you have access to GPT-4o vision model
|
| 202 |
+
- **Gemini**: Ensure Gemini API is enabled in your Google Cloud project
|
| 203 |
+
- **GROQ**: Check that vision models are available in your region
|
| 204 |
+
|
| 205 |
+
### Error Messages
|
| 206 |
+
|
| 207 |
+
| Error | Solution |
|
| 208 |
+
|-------|----------|
|
| 209 |
+
| "API key not configured" | Add the required API key to `.env` file |
|
| 210 |
+
| "Model not available" | Check model name and API access |
|
| 211 |
+
| "Image processing failed" | Try a different image format or size |
|
| 212 |
+
| "Font loading error" | Check font file path or use default font |
|
| 213 |
+
|
| 214 |
+
## π€ Contributing
|
| 215 |
+
|
| 216 |
+
1. Fork the repository
|
| 217 |
+
2. Create a feature branch: `git checkout -b feature-name`
|
| 218 |
+
3. Make your changes and commit: `git commit -m 'Add feature'`
|
| 219 |
+
4. Push to the branch: `git push origin feature-name`
|
| 220 |
+
5. Submit a pull request
|
| 221 |
+
|
| 222 |
+
## π License
|
| 223 |
+
|
| 224 |
+
This project is licensed under the MIT License - see the [MIT LICENSE](https://mit-license.org/) file for details.
|
| 225 |
+
|
| 226 |
+
## π Acknowledgments
|
| 227 |
+
|
| 228 |
+
- **Streamlit** for the amazing web app framework
|
| 229 |
+
- **OpenCV** for powerful image processing capabilities
|
| 230 |
+
- **LangChain** for conversation memory management
|
| 231 |
+
- **OpenAI, Google, and GROQ** for providing excellent vision APIs
|
| 232 |
+
- **Poppins Font** for beautiful typography
|
| 233 |
+
|
| 234 |
+
## π Support
|
| 235 |
+
|
| 236 |
+
If you encounter any issues or have questions:
|
| 237 |
+
|
| 238 |
+
1. Check the troubleshooting section above
|
| 239 |
+
2. Review the [Issues](https://github.com/your-repo/issues) page
|
| 240 |
+
3. Create a new issue with detailed information
|
| 241 |
+
4. Provide error messages and steps to reproduce
|
| 242 |
+
|
| 243 |
+
---
|
| 244 |
|
| 245 |
+
**Built with β€οΈ using Streamlit, LangChain, OpenCV, and multi-model AI APIs**
|
|
|
|
@@ -0,0 +1,116 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import base64
|
| 2 |
+
import io
|
| 3 |
+
import os
|
| 4 |
+
from PIL import Image
|
| 5 |
+
|
| 6 |
+
# API Imports
|
| 7 |
+
import openai
|
| 8 |
+
import google.generativeai as genai
|
| 9 |
+
from groq import Groq
|
| 10 |
+
|
| 11 |
+
from dotenv import load_dotenv
|
| 12 |
+
|
| 13 |
+
load_dotenv()
|
| 14 |
+
|
| 15 |
+
openai_key = os.getenv("OPENAI_API_KEY_IC")
|
| 16 |
+
gemini_key = os.getenv("GEMINI_API_KEY_IC")
|
| 17 |
+
groq_key = os.getenv("GROQ_API_KEY_IC")
|
| 18 |
+
|
| 19 |
+
class MultiModelCaptionGenerator:
|
| 20 |
+
"""Handles caption generation using multiple models."""
|
| 21 |
+
def __init__(self):
|
| 22 |
+
self.openai_client = None
|
| 23 |
+
self.groq_client = None
|
| 24 |
+
self.gemini_configured = False
|
| 25 |
+
|
| 26 |
+
def configure_apis(self, openai_key: str|None = openai_key, groq_key: str|None = groq_key,
|
| 27 |
+
gemini_key: str|None = gemini_key):
|
| 28 |
+
|
| 29 |
+
if openai_key:
|
| 30 |
+
self.openai_client = openai.OpenAI(api_key=openai_key)
|
| 31 |
+
|
| 32 |
+
if groq_key:
|
| 33 |
+
self.groq_client = Groq(api_key=groq_key)
|
| 34 |
+
|
| 35 |
+
if gemini_key:
|
| 36 |
+
genai.configure(api_key=gemini_key)
|
| 37 |
+
self.gemini_configured = True
|
| 38 |
+
|
| 39 |
+
def encode_image_base64(self, image: Image.Image) -> str:
|
| 40 |
+
buffered = io.BytesIO()
|
| 41 |
+
image.save(buffered, format="PNG")
|
| 42 |
+
return base64.b64encode(buffered.getvalue()).decode()
|
| 43 |
+
|
| 44 |
+
def generate_caption_openai(self, image: Image.Image, model: str = "gpt-4o-mini") -> str:
|
| 45 |
+
"""Fixed OpenAI caption generation with correct model and image_url format"""
|
| 46 |
+
if not self.openai_client:
|
| 47 |
+
raise ValueError("OpenAI API key not configured.")
|
| 48 |
+
|
| 49 |
+
base64_image = self.encode_image_base64(image)
|
| 50 |
+
|
| 51 |
+
response = self.openai_client.chat.completions.create(
|
| 52 |
+
model=model, # Use gpt-4o or gpt-4o-mini for vision
|
| 53 |
+
messages=[
|
| 54 |
+
{
|
| 55 |
+
"role": "user",
|
| 56 |
+
"content": [
|
| 57 |
+
{
|
| 58 |
+
"type": "text",
|
| 59 |
+
"text": "Generate the caption for this image. IMPORTANT: Use 10 words or 50 characters maximum. Use only plain text - no emojis, special character but can use ASCII punctuations if you want. Be descriptive but concise."
|
| 60 |
+
},
|
| 61 |
+
{
|
| 62 |
+
"type": "image_url",
|
| 63 |
+
"image_url": {
|
| 64 |
+
"url": f"data:image/png;base64,{base64_image}" # Fixed: removed space after comma
|
| 65 |
+
}
|
| 66 |
+
}
|
| 67 |
+
]
|
| 68 |
+
}
|
| 69 |
+
],
|
| 70 |
+
max_tokens=300
|
| 71 |
+
)
|
| 72 |
+
return response.choices[0].message.content
|
| 73 |
+
|
| 74 |
+
def generate_caption_gemini(self, image: Image.Image,
|
| 75 |
+
model: str = "gemini-2.5-flash") -> str: # Fixed: use correct model name
|
| 76 |
+
"""Fixed Gemini caption generation with correct model name"""
|
| 77 |
+
if not self.gemini_configured:
|
| 78 |
+
raise ValueError("Gemini API key not configured!")
|
| 79 |
+
|
| 80 |
+
model_instance = genai.GenerativeModel(model)
|
| 81 |
+
prompt = "Generate the caption for this image. IMPORTANT: Use 10 words or 50 characters maximum. Use only plain text - no emojis, special character but can use ASCII punctuations if you want. Be descriptive but concise."
|
| 82 |
+
|
| 83 |
+
response = model_instance.generate_content([prompt, image])
|
| 84 |
+
return response.text
|
| 85 |
+
|
| 86 |
+
def generate_caption_groq(self, image: Image.Image,
|
| 87 |
+
model: str = "meta-llama/llama-4-scout-17b-16e-instruct") -> str:
|
| 88 |
+
"""Fixed GROQ caption generation with correct model name and API structure"""
|
| 89 |
+
if not self.groq_client:
|
| 90 |
+
raise ValueError("GROQ API key is not configured!")
|
| 91 |
+
|
| 92 |
+
base64_image = self.encode_image_base64(image)
|
| 93 |
+
|
| 94 |
+
completion = self.groq_client.chat.completions.create(
|
| 95 |
+
model=model, # Fixed: added missing model parameter
|
| 96 |
+
messages=[
|
| 97 |
+
{
|
| 98 |
+
"role": "user",
|
| 99 |
+
"content": [
|
| 100 |
+
{
|
| 101 |
+
"type": "text",
|
| 102 |
+
"text": "Generate the caption for this image. IMPORTANT: Use 10 words or 50 characters maximum. Use only plain text - no emojis, special character but can use ASCII punctuations if you want. Be descriptive but concise."
|
| 103 |
+
},
|
| 104 |
+
{
|
| 105 |
+
"type": "image_url",
|
| 106 |
+
"image_url": {
|
| 107 |
+
"url": f"data:image/png;base64,{base64_image}" # Fixed: proper format with url key
|
| 108 |
+
}
|
| 109 |
+
}
|
| 110 |
+
]
|
| 111 |
+
}
|
| 112 |
+
],
|
| 113 |
+
max_tokens=300,
|
| 114 |
+
temperature=0.7
|
| 115 |
+
)
|
| 116 |
+
return completion.choices[0].message.content
|
|
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"timestamp": "2025-06-28T11:06:08.409242",
|
| 4 |
+
"image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
|
| 5 |
+
"model": "Google Gemini",
|
| 6 |
+
"caption": "Here are a few options, choose the one that best fits your tone:\n\n**Option 1 (Focus on Spectacle):**\n\"Witness the vibrant energy of a grand procession! Thousands of devotees in traditional orange and white attire pull three towering, ornate chariots, as a shower of golden petals fills the sky. A powerful display of collective devotion.\"\n\n**Option 2 (More direct):**\n\"A sea of devotees in orange and white tirelessly pull magnificent, ornate chariots under a shower of golden petals. This grand spectacle of faith and collective effort is truly awe-inspiring.\"\n\n**Option 3 (Slightly shorter):**\n\"Thousands of devotees in vibrant orange and white attire pull three magnificent chariots, as a cascade of golden petals rains from the sky. A powerful scene of spiritual celebration and unity.\""
|
| 7 |
+
},
|
| 8 |
+
{
|
| 9 |
+
"timestamp": "2025-06-28T11:17:53.982671",
|
| 10 |
+
"image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
|
| 11 |
+
"model": "Google Gemini",
|
| 12 |
+
"caption": "A magnificent procession unfolds as a vast multitude, many in vibrant orange attire, collectively pulls enormous, ornate chariots. Golden and orange leaves rain down from the sky, creating a breathtaking and powerful spectacle of devotion and tradition."
|
| 13 |
+
},
|
| 14 |
+
{
|
| 15 |
+
"timestamp": "2025-06-28T11:31:29.073833",
|
| 16 |
+
"image_name": "ChatGPT Image Jun 24, 2025, 09_02_19 AM.png",
|
| 17 |
+
"model": "Google Gemini",
|
| 18 |
+
"caption": "Here's a detailed and engaging caption for the image:\n\nNostalgia activated! \u2728 Dive back into the pixel-perfect world of Super Mario with this iconic scene. From the classic question block and a gleaming coin to that instantly recognizable mushroom, it's all set against the vibrant blue sky and green hills we know and love. Pure retro gaming joy! What's your favorite Super Mario memory or level?\n\n#SuperMario #Nintendo #RetroGaming #GamingNostalgia #ClassicGames #MarioBros #PixelArt #ChildhoodMemories #MushroomKingdom #VideoGames"
|
| 19 |
+
},
|
| 20 |
+
{
|
| 21 |
+
"timestamp": "2025-06-28T11:55:38.783106",
|
| 22 |
+
"image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
|
| 23 |
+
"model": "Google Gemini",
|
| 24 |
+
"caption": "Here are a few options for a detailed and engaging caption, keeping it descriptive yet concise:\n\n**Option 1 (Focus on atmosphere & common knowledge):**\nA breathtaking display of devotion and collective energy, as a vast multitude of devotees in vibrant orange pull three magnificent, ornate chariots. A shower of colorful petals or confetti rains down from the sky, highlighting the grandeur and joyous spirit of this spiritual procession.\n\n**Option 2 (More specific, if Ratha Yatra is identified):**\nThe vibrant spectacle of a Ratha Yatra festival, with thousands of devotees pulling the colossal, richly decorated chariots. A cascade of orange and yellow petals fills the air, adding a festive and sacred touch to this powerful demonstration of faith.\n\n**Option 3 (Concise and evocative):**\nA powerful scene of celebration and spiritual fervor, where a sea of people in saffron hues pulls three majestic, golden chariots. The sky above showers down a joyous burst of colorful confetti, capturing the dynamic energy of this grand procession."
|
| 25 |
+
},
|
| 26 |
+
{
|
| 27 |
+
"timestamp": "2025-06-28T11:56:31.347041",
|
| 28 |
+
"image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
|
| 29 |
+
"model": "Google Gemini",
|
| 30 |
+
"caption": "Here are a few options for a detailed and engaging caption, playing with slightly different focuses:\n\n**Option 1 (Focus on spectacle & devotion):**\n\"A breathtaking display of devotion at the Ratha Yatra festival! Thousands of devotees unite to pull the majestic, ornate chariots, as a shower of auspicious petals blesses the vibrant procession. An incredible testament to faith and community.\"\n\n**Option 2 (Focus on energy & scale):**\n\"Experience the electrifying energy of the Ratha Yatra! A sea of devotees, clad in traditional orange and white, meticulously pull the colossal, brightly adorned chariots under a sky alive with falling blossoms. A truly grand spiritual spectacle.\"\n\n**Option 3 (More concise & evocative):**\n\"Majestic chariots, propelled by a vibrant sea of devotees, embark on their sacred journey during Ratha Yatra. The air shimmers with collective devotion and a shower of auspicious blessings from above.\""
|
| 31 |
+
},
|
| 32 |
+
{
|
| 33 |
+
"timestamp": "2025-06-28T12:37:12.999773",
|
| 34 |
+
"image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
|
| 35 |
+
"model": "Google Gemini",
|
| 36 |
+
"caption": "Here are a few options, choose the one that best fits your platform/tone:\n\n**Option 1 (Concise & Evocative):**\nA powerful scene of devotion unfolds as thousands pull the colossal, ornately decorated chariots of a grand procession. Golden and fiery petals rain down from the sky, enhancing the vibrant, spiritual atmosphere of this ancient tradition.\n\n**Option 2 (Slightly more descriptive):**\nWitness the immense energy of a traditional festival, where countless devotees in vibrant attire pull majestic, multi-tiered chariots. The sky above showers golden and orange petals, adding a sacred and celebratory feel to this grand display of faith.\n\n**Option 3 (Short & Sweet):**\nThousands of devotees unite to pull magnificent chariots in a grand procession, as a shower of colorful petals blesses the vibrant spiritual atmosphere."
|
| 37 |
+
},
|
| 38 |
+
{
|
| 39 |
+
"timestamp": "2025-06-28T12:37:35.022969",
|
| 40 |
+
"image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
|
| 41 |
+
"model": "Google Gemini",
|
| 42 |
+
"caption": "Here's a detailed yet concise and engaging caption for the image:\n\n\"A magnificent Ratha Yatra procession unfolds, as grand, ornate chariots of red and gold are pulled through a vast sea of devoted pilgrims dressed in vibrant orange. The sky above is showered with countless golden petals, adding a breathtaking, celebratory energy to this ancient and spiritual spectacle.\""
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
"timestamp": "2025-06-28T12:38:09.032392",
|
| 46 |
+
"image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
|
| 47 |
+
"model": "Google Gemini",
|
| 48 |
+
"caption": "A vibrant spectacle of devotion unfolds as hundreds of people, primarily in saffron and white attire, collectively pull colossal, intricately designed chariots across a vast open ground. The sky above showers them with a beautiful cascade of golden and red leaves or petals, adding a festive and sacred ambiance to this powerful display of community and faith."
|
| 49 |
+
},
|
| 50 |
+
{
|
| 51 |
+
"timestamp": "2025-06-29T10:30:43.414914",
|
| 52 |
+
"image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
|
| 53 |
+
"model": "OpenAI GPT-4o",
|
| 54 |
+
"caption": "\"Unlocking the power of AI with a few lines of code! This snippet showcases an asynchronous chat function using `httpx` and Google's Generative AI. The `chat` function establishes a client connection, initiates the Gemini Pro model, and sends a simple greeting \u2014 'Hello!' \u2014 to demonstrate seamless interaction. It's a perfect blend of modern programming practices and innovative technology, paving the way for dynamic, automated conversations!\""
|
| 55 |
+
},
|
| 56 |
+
{
|
| 57 |
+
"timestamp": "2025-06-29T10:31:19.761048",
|
| 58 |
+
"image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
|
| 59 |
+
"model": "Google Gemini",
|
| 60 |
+
"caption": "Here are a few options for a detailed and engaging caption, keeping it descriptive yet concise:\n\n**Option 1 (Concise & Impactful):**\n\n> Say 'Hello!' to Google Gemini Pro with just a few lines of Python! This asynchronous snippet shows how incredibly simple it is to kickstart a conversational AI using `google.generativeai` and `httpx`. Ready to build your next intelligent app?\n>\n> \\#Python #GenerativeAI #GeminiPro #AIdevelopment\n\n**Option 2 (Slightly More Descriptive, Developer-Focused):**\n\n> Powering conversations with Python and Google Gemini Pro! \ud83d\ude80 This clean code snippet demonstrates how to asynchronously connect to the `gemini-pro` model, initiate a chat, and send your first message. Leverages `httpx` for efficient network calls, making AI integration smoother than ever. What brilliant bot will you create?\n>\n> \\#AI #Python #GeminiAPI #AsyncPython #MachineLearning\n\n**Option 3 (Engaging Tone):**\n\n> Witness the magic of Generative AI in action! \u2728 This elegant Python code reveals how effortlessly you can start a conversation with Google's powerful `gemini-pro` model. Using `async/await` for modern, non-blocking operations, it's never been easier to infuse your applications with intelligent chat capabilities. Send your first 'Hello!' to the future!\n>\n> \\#Code #AI #GoogleAI #PythonDev #Chatbot"
|
| 61 |
+
},
|
| 62 |
+
{
|
| 63 |
+
"timestamp": "2025-06-29T10:36:09.609008",
|
| 64 |
+
"image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
|
| 65 |
+
"model": "OpenAI GPT-4o",
|
| 66 |
+
"caption": "\ud83d\ude80 Dive into the exciting realm of asynchronous programming with this Python snippet! \ud83d\udcbb\u2728\n\nHere, we harness the power of the Google Generative AI model, 'gemini-pro', to create an interactive chat experience. With the `httpx` library, we're establishing a streamlined, asynchronous connection, allowing for efficient network calls. The code showcases the seamless integration of AI, where it awaits a response after sending a greeting message. \n\nWhether you're a budding coder or an experienced developer, this snippet is a gateway to explore dynamic chat applications and the fascinating possibilities of AI integration. Let's code the future! \ud83c\udf10\ud83e\udd16"
|
| 67 |
+
},
|
| 68 |
+
{
|
| 69 |
+
"timestamp": "2025-06-29T10:45:45.277825",
|
| 70 |
+
"image_name": "ChatGPT Image Jun 22, 2025, 12_23_42 AM.png",
|
| 71 |
+
"model": "OpenAI GPT-4o",
|
| 72 |
+
"caption": "\"Unraveling patterns in complex data: This illustration captures the essence of logistic regression applied to cancer datasets. As the curve artfully bridges two distinct groups represented by orange and blue scatter points, it highlights the critical role of data analysis in healthcare. The hospital icon signifies the ultimate goal: utilizing data-driven insights to improve patient outcomes and advance cancer research. A powerful reminder of how statistics can illuminate pathways to better health and innovation!\""
|
| 73 |
+
},
|
| 74 |
+
{
|
| 75 |
+
"timestamp": "2025-06-29T10:46:09.997416",
|
| 76 |
+
"image_name": "ChatGPT Image Jun 22, 2025, 12_23_42 AM.png",
|
| 77 |
+
"model": "Google Gemini",
|
| 78 |
+
"caption": "Here's a detailed and engaging caption, descriptive yet concise:\n\n---\n\n**Harnessing the power of Logistic Regression for vital cancer detection! \ud83d\udcca\ud83e\ude7a**\n\nThis visual beautifully illustrates how this machine learning algorithm classifies data, using the characteristic sigmoid curve to distinguish between different patient outcomes (e.g., healthy vs. diseased). A crucial tool in advancing precision medicine and improving healthcare, empowering medical professionals with data-driven insights.\n\n#LogisticRegression #MachineLearning #AIinHealthcare #CancerDetection #DataScience #HealthcareInnovation #MedicalAI #PredictiveAnalytics"
|
| 79 |
+
},
|
| 80 |
+
{
|
| 81 |
+
"timestamp": "2025-06-29T10:46:27.426996",
|
| 82 |
+
"image_name": "ChatGPT Image Jun 22, 2025, 12_23_42 AM.png",
|
| 83 |
+
"model": "GROQ Vision",
|
| 84 |
+
"caption": "The image presents a visual representation of logistic regression on a cancer dataset, featuring a graph and an illustration of a hospital.\n\n* **Title**\n * The title \"LOGISTIC REGRESSION ON CANCER DATASET\" is prominently displayed at the top of the image in large, dark blue text.\n* **Graph**\n * A line graph is situated to the left of the hospital illustration, showcasing a curved line that increases as it moves from left to right.\n * The graph features orange dots on the upper left side, representing one group of data points, and blue X's on the lower right side, representing another group.\n * The curved line begins on the lower left side of the graph, gradually rising to intersect with the orange dots and then leveling off as it approaches the upper right side.\n* **Hospital Illustration**\n * A light blue hospital building with a dark blue cross on its roof is depicted on the right side of the image.\n * The hospital features a central section with a door and windows, accompanied by two smaller sections on either side.\n* **Background**\n * The background of the image is a pale yellow color.\n\nIn summary, the image effectively illustrates the concept of logistic regression on a cancer dataset through a clear and concise visual representation. The graph and hospital illustration work together to convey the relationship between the data points and the predicted outcomes, making it easier for viewers to understand the concept."
|
| 85 |
+
},
|
| 86 |
+
{
|
| 87 |
+
"timestamp": "2025-06-29T11:03:21.461222",
|
| 88 |
+
"image_name": "Mistral_AI.png",
|
| 89 |
+
"model": "OpenAI GPT-4o",
|
| 90 |
+
"caption": "Mistral AI logo featuring modern pixel art design elements."
|
| 91 |
+
},
|
| 92 |
+
{
|
| 93 |
+
"timestamp": "2025-06-29T11:03:49.005152",
|
| 94 |
+
"image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
|
| 95 |
+
"model": "OpenAI GPT-4o",
|
| 96 |
+
"caption": "Code snippet for asynchronous chat with Google Generative AI."
|
| 97 |
+
},
|
| 98 |
+
{
|
| 99 |
+
"timestamp": "2025-06-29T11:04:36.328163",
|
| 100 |
+
"image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
|
| 101 |
+
"model": "Google Gemini",
|
| 102 |
+
"caption": "Python async code for Gemini Pro AI chat."
|
| 103 |
+
},
|
| 104 |
+
{
|
| 105 |
+
"timestamp": "2025-06-29T11:04:53.452858",
|
| 106 |
+
"image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
|
| 107 |
+
"model": "GROQ Vision",
|
| 108 |
+
"caption": "Python code for an asynchronous chat function using Google's Gemini AI."
|
| 109 |
+
}
|
| 110 |
+
]
|
|
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import datetime
|
| 2 |
+
import json
|
| 3 |
+
import os
|
| 4 |
+
from typing import Dict, List, Optional
|
| 5 |
+
from langchain.schema import HumanMessage, AIMessage
|
| 6 |
+
from langchain.memory import ConversationBufferMemory
|
| 7 |
+
|
| 8 |
+
class CaptionHistory:
|
| 9 |
+
"""
|
| 10 |
+
Manages caption generation history using Langchain
|
| 11 |
+
"""
|
| 12 |
+
def __init__(self):
|
| 13 |
+
self.memory = ConversationBufferMemory(
|
| 14 |
+
return_messages=True,
|
| 15 |
+
memory_key="chat_history"
|
| 16 |
+
)
|
| 17 |
+
self.history_file = "caption_history.json"
|
| 18 |
+
self.load_history() # Load existing history on initialization
|
| 19 |
+
|
| 20 |
+
def add_interaction(self, image_name: str, model: str,
|
| 21 |
+
caption: str, timestamp: str|None = None):
|
| 22 |
+
if not timestamp:
|
| 23 |
+
timestamp = datetime.datetime.now().isoformat()
|
| 24 |
+
|
| 25 |
+
interaction = {
|
| 26 |
+
"timestamp": timestamp,
|
| 27 |
+
"image_name": image_name,
|
| 28 |
+
"model": model,
|
| 29 |
+
"caption": caption
|
| 30 |
+
}
|
| 31 |
+
|
| 32 |
+
# Add to langchain memory
|
| 33 |
+
human_msg = HumanMessage(
|
| 34 |
+
content=f"Generate caption for {image_name} using {model}"
|
| 35 |
+
)
|
| 36 |
+
ai_msg = AIMessage(content=caption)
|
| 37 |
+
|
| 38 |
+
self.memory.chat_memory.add_user_message(human_msg.content)
|
| 39 |
+
self.memory.chat_memory.add_ai_message(ai_msg.content)
|
| 40 |
+
|
| 41 |
+
# Save the file
|
| 42 |
+
self.save_interaction(interaction)
|
| 43 |
+
|
| 44 |
+
def get_history(self) -> List[Optional[Dict[str, str]]]:
|
| 45 |
+
try:
|
| 46 |
+
with open(self.history_file, mode="r") as f:
|
| 47 |
+
return json.load(f)
|
| 48 |
+
except FileNotFoundError:
|
| 49 |
+
return []
|
| 50 |
+
|
| 51 |
+
def save_interaction(self, interaction: Dict[str, str]) -> None:
|
| 52 |
+
history = self.get_history()
|
| 53 |
+
history.append(interaction)
|
| 54 |
+
with open(self.history_file, mode="w") as f:
|
| 55 |
+
json.dump(history, f, indent=2)
|
| 56 |
+
|
| 57 |
+
def load_history(self):
|
| 58 |
+
"""Fixed: Proper string formatting in f-strings"""
|
| 59 |
+
history = self.get_history()
|
| 60 |
+
for item in history:
|
| 61 |
+
human_msg = HumanMessage(
|
| 62 |
+
content=f"Generate caption for {item['image_name']} using {item['model']}" # Fixed: proper quotes
|
| 63 |
+
)
|
| 64 |
+
ai_msg = AIMessage(content=item["caption"])
|
| 65 |
+
self.memory.chat_memory.add_user_message(human_msg.content)
|
| 66 |
+
self.memory.chat_memory.add_ai_message(ai_msg.content)
|
| 67 |
+
|
| 68 |
+
def clear_history(self):
|
| 69 |
+
self.memory.clear()
|
| 70 |
+
if os.path.exists(self.history_file):
|
| 71 |
+
os.remove(self.history_file)
|
|
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import cv2
|
| 3 |
+
import numpy as np
|
| 4 |
+
|
| 5 |
+
from PIL import Image, ImageDraw, ImageFont
|
| 6 |
+
|
| 7 |
+
class ImageCaptionOverlay:
|
| 8 |
+
"""Handles adding captions to images using OpenCV"""
|
| 9 |
+
|
| 10 |
+
@staticmethod
|
| 11 |
+
def add_caption_overlay(image: np.ndarray, caption: str, position: str = "bottom",
|
| 12 |
+
font_size: int = 1, thickness: int = 2) -> np.ndarray:
|
| 13 |
+
"""Add caption as overlay on the image"""
|
| 14 |
+
img_copy = image.copy()
|
| 15 |
+
height, width = img_copy.shape[:2]
|
| 16 |
+
|
| 17 |
+
# Prepare text
|
| 18 |
+
font = cv2.FONT_HERSHEY_SIMPLEX
|
| 19 |
+
|
| 20 |
+
# Calculate text size and position
|
| 21 |
+
text_size = cv2.getTextSize(caption, font, font_size, thickness)[0]
|
| 22 |
+
|
| 23 |
+
# Wrap text if too long
|
| 24 |
+
max_width = width - 40
|
| 25 |
+
if text_size[0] > max_width:
|
| 26 |
+
words = caption.split()
|
| 27 |
+
lines = []
|
| 28 |
+
current_line = ""
|
| 29 |
+
|
| 30 |
+
for word in words:
|
| 31 |
+
test_line = current_line + " " + word if current_line else word
|
| 32 |
+
test_size = cv2.getTextSize(test_line, font, font_size, thickness)[0]
|
| 33 |
+
|
| 34 |
+
if test_size[0] <= max_width:
|
| 35 |
+
current_line = test_line
|
| 36 |
+
else:
|
| 37 |
+
if current_line:
|
| 38 |
+
lines.append(current_line)
|
| 39 |
+
current_line = word
|
| 40 |
+
|
| 41 |
+
if current_line:
|
| 42 |
+
lines.append(current_line)
|
| 43 |
+
else:
|
| 44 |
+
lines = [caption]
|
| 45 |
+
|
| 46 |
+
# Calculate positions
|
| 47 |
+
line_height = cv2.getTextSize("A", font, font_size, thickness)[0][1] + 10
|
| 48 |
+
total_height = len(lines) * line_height
|
| 49 |
+
|
| 50 |
+
if position == "bottom":
|
| 51 |
+
start_y = height - total_height - 20
|
| 52 |
+
elif position == "top":
|
| 53 |
+
start_y = 30
|
| 54 |
+
else: # center
|
| 55 |
+
start_y = (height - total_height) // 2
|
| 56 |
+
|
| 57 |
+
# Add background rectangle for better readability
|
| 58 |
+
for i, line in enumerate(lines):
|
| 59 |
+
text_size = cv2.getTextSize(line, font, font_size, thickness)[0]
|
| 60 |
+
text_x = (width - text_size[0]) // 2
|
| 61 |
+
text_y = start_y + (i * line_height) + text_size[1]
|
| 62 |
+
|
| 63 |
+
# Background rectangle
|
| 64 |
+
cv2.rectangle(img_copy,
|
| 65 |
+
(text_x - 10, text_y - text_size[1] - 5),
|
| 66 |
+
(text_x + text_size[0] + 10, text_y + 5),
|
| 67 |
+
(0, 0, 0), -1)
|
| 68 |
+
|
| 69 |
+
# Text
|
| 70 |
+
cv2.putText(img_copy, line, (text_x, text_y), font, font_size, (255, 255, 255), thickness)
|
| 71 |
+
|
| 72 |
+
return img_copy
|
| 73 |
+
|
| 74 |
+
@staticmethod
|
| 75 |
+
def add_caption_background(image: np.ndarray, caption: str,
|
| 76 |
+
font_path: str = None,
|
| 77 |
+
background_color: tuple = (0, 0, 0),
|
| 78 |
+
text_color: tuple = (255, 255, 255),
|
| 79 |
+
margin: int = 50) -> np.ndarray:
|
| 80 |
+
"""Add caption on a background behind the image"""
|
| 81 |
+
height, width = image.shape[:2]
|
| 82 |
+
|
| 83 |
+
# Use PIL for better text rendering
|
| 84 |
+
pil_image = Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
|
| 85 |
+
|
| 86 |
+
# Try to use Poppins font first, then fallback to default
|
| 87 |
+
try:
|
| 88 |
+
# First priority: custom font path if provided
|
| 89 |
+
if font_path and os.path.exists(font_path):
|
| 90 |
+
font = ImageFont.truetype(font_path, 24)
|
| 91 |
+
# Second priority: check for Poppins font in fonts directory
|
| 92 |
+
elif os.path.exists("fonts/Poppins-Regular.ttf"):
|
| 93 |
+
font = ImageFont.truetype("fonts/Poppins-Regular.ttf", 24)
|
| 94 |
+
else:
|
| 95 |
+
# Fallback to default font
|
| 96 |
+
font = ImageFont.load_default()
|
| 97 |
+
except Exception:
|
| 98 |
+
# If anything fails, use default font
|
| 99 |
+
font = ImageFont.load_default()
|
| 100 |
+
|
| 101 |
+
# Calculate text dimensions
|
| 102 |
+
draw = ImageDraw.Draw(pil_image)
|
| 103 |
+
bbox = draw.textbbox((0, 0), caption, font=font)
|
| 104 |
+
text_width = bbox[2] - bbox[0]
|
| 105 |
+
text_height = bbox[3] - bbox[1]
|
| 106 |
+
|
| 107 |
+
# Wrap text if necessary
|
| 108 |
+
max_width = width - (2 * margin)
|
| 109 |
+
if text_width > max_width:
|
| 110 |
+
words = caption.split()
|
| 111 |
+
lines = []
|
| 112 |
+
current_line = ""
|
| 113 |
+
|
| 114 |
+
for word in words:
|
| 115 |
+
test_line = current_line + " " + word if current_line else word
|
| 116 |
+
test_bbox = draw.textbbox((0, 0), test_line, font=font)
|
| 117 |
+
test_width = test_bbox[2] - test_bbox[0]
|
| 118 |
+
|
| 119 |
+
if test_width <= max_width:
|
| 120 |
+
current_line = test_line
|
| 121 |
+
else:
|
| 122 |
+
if current_line:
|
| 123 |
+
lines.append(current_line)
|
| 124 |
+
current_line = word
|
| 125 |
+
|
| 126 |
+
if current_line:
|
| 127 |
+
lines.append(current_line)
|
| 128 |
+
else:
|
| 129 |
+
lines = [caption]
|
| 130 |
+
|
| 131 |
+
# Calculate total text height
|
| 132 |
+
total_text_height = len(lines) * text_height + (len(lines) - 1) * 10
|
| 133 |
+
|
| 134 |
+
# Create new image with space for text
|
| 135 |
+
new_height = height + total_text_height + (2 * margin)
|
| 136 |
+
new_image = Image.new('RGB', (width, new_height), background_color)
|
| 137 |
+
|
| 138 |
+
# Paste original image
|
| 139 |
+
new_image.paste(pil_image, (0, total_text_height + (2 * margin)))
|
| 140 |
+
|
| 141 |
+
# Add text
|
| 142 |
+
draw = ImageDraw.Draw(new_image)
|
| 143 |
+
y_offset = margin
|
| 144 |
+
|
| 145 |
+
for line in lines:
|
| 146 |
+
bbox = draw.textbbox((0, 0), line, font=font)
|
| 147 |
+
line_width = bbox[2] - bbox[0]
|
| 148 |
+
x_position = (width - line_width) // 2
|
| 149 |
+
|
| 150 |
+
draw.text((x_position, y_offset), line, fill=text_color, font=font)
|
| 151 |
+
y_offset += text_height + 10
|
| 152 |
+
|
| 153 |
+
# Convert back to OpenCV format
|
| 154 |
+
return cv2.cvtColor(np.array(new_image), cv2.COLOR_RGB2BGR)
|
|
@@ -0,0 +1,294 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from caption_history import CaptionHistory
|
| 2 |
+
from caption_generation import MultiModelCaptionGenerator
|
| 3 |
+
from caption_overlay import ImageCaptionOverlay
|
| 4 |
+
|
| 5 |
+
import io
|
| 6 |
+
import os
|
| 7 |
+
|
| 8 |
+
import cv2
|
| 9 |
+
import numpy as np
|
| 10 |
+
from PIL import Image
|
| 11 |
+
import streamlit as st
|
| 12 |
+
from dotenv import load_dotenv
|
| 13 |
+
|
| 14 |
+
load_dotenv()
|
| 15 |
+
|
| 16 |
+
openai_key = os.getenv("OPENAI_API_KEY_IC")
|
| 17 |
+
gemini_key = os.getenv("GEMINI_API_KEY_IC")
|
| 18 |
+
groq_key = os.getenv("GROQ_API_KEY_IC")
|
| 19 |
+
|
| 20 |
+
def main():
|
| 21 |
+
st.set_page_config(
|
| 22 |
+
page_title="Multi-Model Image Caption Generator",
|
| 23 |
+
page_icon="πΌοΈ",
|
| 24 |
+
layout="wide"
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
st.title("πΌοΈ Multi-Model Image Caption Generator")
|
| 28 |
+
st.markdown("Generate captions using OpenAI GPT-4V, Google Gemini, and GROQ Vision models")
|
| 29 |
+
|
| 30 |
+
# Initialize session state
|
| 31 |
+
if 'caption_history' not in st.session_state:
|
| 32 |
+
st.session_state.caption_history = CaptionHistory()
|
| 33 |
+
|
| 34 |
+
if 'caption_generator' not in st.session_state:
|
| 35 |
+
st.session_state.caption_generator = MultiModelCaptionGenerator()
|
| 36 |
+
|
| 37 |
+
# Sidebar for API configuration
|
| 38 |
+
with st.sidebar:
|
| 39 |
+
st.header("π API Configuration")
|
| 40 |
+
|
| 41 |
+
# Show API status
|
| 42 |
+
if openai_key:
|
| 43 |
+
st.success("β
OpenAI API Key loaded from .env")
|
| 44 |
+
else:
|
| 45 |
+
st.warning("β οΈ OpenAI API Key not found in .env")
|
| 46 |
+
|
| 47 |
+
if gemini_key:
|
| 48 |
+
st.success("β
Gemini API Key loaded from .env")
|
| 49 |
+
else:
|
| 50 |
+
st.warning("β οΈ Gemini API Key not found in .env")
|
| 51 |
+
|
| 52 |
+
if groq_key:
|
| 53 |
+
st.success("β
GROQ API Key loaded from .env")
|
| 54 |
+
else:
|
| 55 |
+
st.warning("β οΈ GROQ API Key not found in .env")
|
| 56 |
+
|
| 57 |
+
if st.button("Configure APIs"):
|
| 58 |
+
try:
|
| 59 |
+
st.session_state.caption_generator.configure_apis(
|
| 60 |
+
openai_key=openai_key,
|
| 61 |
+
gemini_key=gemini_key,
|
| 62 |
+
groq_key=groq_key
|
| 63 |
+
)
|
| 64 |
+
st.success("APIs configured successfully!")
|
| 65 |
+
except Exception as e:
|
| 66 |
+
st.error(f"Error configuring APIs: {str(e)}")
|
| 67 |
+
|
| 68 |
+
st.markdown("---")
|
| 69 |
+
|
| 70 |
+
# Caption overlay settings
|
| 71 |
+
st.header("π¨ Caption Settings")
|
| 72 |
+
caption_method = st.selectbox(
|
| 73 |
+
"Caption Method",
|
| 74 |
+
["Overlay on Image", "Background Behind Image"]
|
| 75 |
+
)
|
| 76 |
+
|
| 77 |
+
if caption_method == "Overlay on Image":
|
| 78 |
+
position = st.selectbox("Position", ["bottom", "top", "center"])
|
| 79 |
+
font_size = st.slider("Font Size", 0.5, 3.0, 1.0, 0.1)
|
| 80 |
+
thickness = st.slider("Thickness", 1, 5, 2)
|
| 81 |
+
else:
|
| 82 |
+
bg_color = st.color_picker("Background Color", "#000000")
|
| 83 |
+
text_color = st.color_picker("Text Color", "#FFFFFF")
|
| 84 |
+
margin = st.slider("Margin", 20, 100, 50)
|
| 85 |
+
|
| 86 |
+
# Optional: Custom font path
|
| 87 |
+
custom_font = st.text_input(
|
| 88 |
+
"Custom Font Path (optional)",
|
| 89 |
+
placeholder="e.g., fonts/Poppins-Regular.ttf"
|
| 90 |
+
)
|
| 91 |
+
|
| 92 |
+
st.markdown("---")
|
| 93 |
+
|
| 94 |
+
# History management
|
| 95 |
+
st.header("π Caption History")
|
| 96 |
+
if st.button("View History"):
|
| 97 |
+
st.session_state.show_history = True
|
| 98 |
+
|
| 99 |
+
if st.button("Hide History"):
|
| 100 |
+
st.session_state.show_history = False
|
| 101 |
+
|
| 102 |
+
if st.button("Clear History"):
|
| 103 |
+
st.session_state.caption_history.clear_history()
|
| 104 |
+
st.success("History cleared!")
|
| 105 |
+
|
| 106 |
+
# Main content area
|
| 107 |
+
col1, col2 = st.columns([1, 1])
|
| 108 |
+
|
| 109 |
+
with col1:
|
| 110 |
+
st.header("π€ Upload Image")
|
| 111 |
+
uploaded_file = st.file_uploader(
|
| 112 |
+
"Choose an image...",
|
| 113 |
+
type=['png', 'jpg', 'jpeg', 'bmp', 'tiff']
|
| 114 |
+
)
|
| 115 |
+
|
| 116 |
+
if uploaded_file is not None:
|
| 117 |
+
# Display original image
|
| 118 |
+
image = Image.open(uploaded_file)
|
| 119 |
+
st.image(image, caption="Original Image", use_container_width=True)
|
| 120 |
+
|
| 121 |
+
# Model selection
|
| 122 |
+
st.header("π€ Select Model")
|
| 123 |
+
models = {
|
| 124 |
+
"OpenAI GPT-4o": "openai", # Updated model name
|
| 125 |
+
"Google Gemini": "gemini",
|
| 126 |
+
"GROQ Vision": "groq"
|
| 127 |
+
}
|
| 128 |
+
|
| 129 |
+
selected_model = st.selectbox("Choose a model", list(models.keys()))
|
| 130 |
+
|
| 131 |
+
# Show model-specific info
|
| 132 |
+
model_info = {
|
| 133 |
+
"OpenAI GPT-4o": "Uses GPT-4o vision model for detailed image analysis",
|
| 134 |
+
"Google Gemini": "Uses Gemini-1.5-flash for fast and accurate captions",
|
| 135 |
+
"GROQ Vision": "Uses Llama-3.2-11b-vision for high-speed processing"
|
| 136 |
+
}
|
| 137 |
+
st.info(model_info[selected_model])
|
| 138 |
+
|
| 139 |
+
if st.button("Generate Caption", type="primary"):
|
| 140 |
+
# Check if APIs are configured
|
| 141 |
+
if not any([openai_key, gemini_key, groq_key]):
|
| 142 |
+
st.error("Please add API keys to your .env file and click 'Configure APIs'")
|
| 143 |
+
return
|
| 144 |
+
|
| 145 |
+
try:
|
| 146 |
+
model_key = models[selected_model]
|
| 147 |
+
|
| 148 |
+
# Check specific API availability
|
| 149 |
+
if model_key == "openai" and not openai_key:
|
| 150 |
+
st.error("OpenAI API key not available. Please add it to your .env file.")
|
| 151 |
+
return
|
| 152 |
+
elif model_key == "gemini" and not gemini_key:
|
| 153 |
+
st.error("Gemini API key not available. Please add it to your .env file.")
|
| 154 |
+
return
|
| 155 |
+
elif model_key == "groq" and not groq_key:
|
| 156 |
+
st.error("GROQ API key not available. Please add it to your .env file.")
|
| 157 |
+
return
|
| 158 |
+
|
| 159 |
+
with st.spinner(f"Generating caption with {selected_model}..."):
|
| 160 |
+
if model_key == "openai":
|
| 161 |
+
caption = st.session_state.caption_generator.generate_caption_openai(image)
|
| 162 |
+
elif model_key == "gemini":
|
| 163 |
+
caption = st.session_state.caption_generator.generate_caption_gemini(image)
|
| 164 |
+
elif model_key == "groq":
|
| 165 |
+
caption = st.session_state.caption_generator.generate_caption_groq(image)
|
| 166 |
+
|
| 167 |
+
st.session_state.current_caption = caption
|
| 168 |
+
st.session_state.current_image = image
|
| 169 |
+
st.session_state.current_model = selected_model
|
| 170 |
+
|
| 171 |
+
# Add to history
|
| 172 |
+
st.session_state.caption_history.add_interaction(
|
| 173 |
+
uploaded_file.name,
|
| 174 |
+
selected_model,
|
| 175 |
+
caption
|
| 176 |
+
)
|
| 177 |
+
|
| 178 |
+
st.success(f"Caption generated successfully with {selected_model}!")
|
| 179 |
+
|
| 180 |
+
except Exception as e:
|
| 181 |
+
st.error(f"Error generating caption: {str(e)}")
|
| 182 |
+
st.error("Please check your API keys and internet connection.")
|
| 183 |
+
|
| 184 |
+
with col2:
|
| 185 |
+
st.header("β¨ Generated Caption & Preview")
|
| 186 |
+
|
| 187 |
+
if hasattr(st.session_state, 'current_caption'):
|
| 188 |
+
# Editable caption
|
| 189 |
+
edited_caption = st.text_area(
|
| 190 |
+
"Generated Caption (editable)",
|
| 191 |
+
st.session_state.current_caption,
|
| 192 |
+
height=100,
|
| 193 |
+
help="You can edit the caption before applying it to the image"
|
| 194 |
+
)
|
| 195 |
+
|
| 196 |
+
# Update the caption if edited
|
| 197 |
+
if edited_caption != st.session_state.current_caption:
|
| 198 |
+
st.session_state.current_caption = edited_caption
|
| 199 |
+
|
| 200 |
+
# Generate preview with caption
|
| 201 |
+
if hasattr(st.session_state, 'current_image'):
|
| 202 |
+
# Convert PIL to OpenCV format
|
| 203 |
+
cv_image = cv2.cvtColor(np.array(st.session_state.current_image), cv2.COLOR_RGB2BGR)
|
| 204 |
+
|
| 205 |
+
try:
|
| 206 |
+
if caption_method == "Overlay on Image":
|
| 207 |
+
result_image = ImageCaptionOverlay.add_caption_overlay(
|
| 208 |
+
cv_image,
|
| 209 |
+
st.session_state.current_caption,
|
| 210 |
+
position=position,
|
| 211 |
+
font_size=font_size,
|
| 212 |
+
thickness=thickness
|
| 213 |
+
)
|
| 214 |
+
else:
|
| 215 |
+
# Convert hex colors to RGB tuples
|
| 216 |
+
bg_rgb = tuple(int(bg_color[i:i+2], 16) for i in (1, 3, 5))
|
| 217 |
+
text_rgb = tuple(int(text_color[i:i+2], 16) for i in (1, 3, 5))
|
| 218 |
+
|
| 219 |
+
# Use custom font if provided
|
| 220 |
+
font_path = custom_font if custom_font and os.path.exists(custom_font) else None
|
| 221 |
+
|
| 222 |
+
result_image = ImageCaptionOverlay.add_caption_background(
|
| 223 |
+
cv_image,
|
| 224 |
+
st.session_state.current_caption,
|
| 225 |
+
font_path=font_path,
|
| 226 |
+
background_color=bg_rgb,
|
| 227 |
+
text_color=text_rgb,
|
| 228 |
+
margin=margin
|
| 229 |
+
)
|
| 230 |
+
|
| 231 |
+
# Convert back to PIL for display
|
| 232 |
+
result_pil = Image.fromarray(cv2.cvtColor(result_image, cv2.COLOR_BGR2RGB))
|
| 233 |
+
st.image(result_pil, caption="Image with Caption", use_container_width=True)
|
| 234 |
+
|
| 235 |
+
# Download button
|
| 236 |
+
img_buffer = io.BytesIO()
|
| 237 |
+
result_pil.save(img_buffer, format='PNG')
|
| 238 |
+
|
| 239 |
+
st.download_button(
|
| 240 |
+
label="π₯ Download Image with Caption",
|
| 241 |
+
data=img_buffer.getvalue(),
|
| 242 |
+
file_name=f"captioned_{uploaded_file.name if uploaded_file else 'image'}.png",
|
| 243 |
+
mime="image/png"
|
| 244 |
+
)
|
| 245 |
+
|
| 246 |
+
except Exception as e:
|
| 247 |
+
st.error(f"Error processing image: {str(e)}")
|
| 248 |
+
else:
|
| 249 |
+
st.info("π Upload an image and generate a caption to see the preview here")
|
| 250 |
+
|
| 251 |
+
# History display
|
| 252 |
+
if getattr(st.session_state, 'show_history', False):
|
| 253 |
+
st.markdown("---")
|
| 254 |
+
st.header("π Caption Generation History")
|
| 255 |
+
|
| 256 |
+
history = st.session_state.caption_history.get_history()
|
| 257 |
+
|
| 258 |
+
if history:
|
| 259 |
+
# Add search/filter functionality
|
| 260 |
+
search_term = st.text_input("π Search history", placeholder="Search by image name or caption...")
|
| 261 |
+
|
| 262 |
+
filtered_history = history
|
| 263 |
+
if search_term:
|
| 264 |
+
filtered_history = [
|
| 265 |
+
item for item in history
|
| 266 |
+
if search_term.lower() in item['image_name'].lower()
|
| 267 |
+
or search_term.lower() in item['caption'].lower()
|
| 268 |
+
or search_term.lower() in item['model'].lower()
|
| 269 |
+
]
|
| 270 |
+
|
| 271 |
+
if filtered_history:
|
| 272 |
+
for i, item in enumerate(reversed(filtered_history[-20:])): # Show last 20 items
|
| 273 |
+
with st.expander(f"{item['timestamp'][:19]} - {item['image_name']} ({item['model']})"):
|
| 274 |
+
st.write(f"**Model:** {item['model']}")
|
| 275 |
+
st.write(f"**Image:** {item['image_name']}")
|
| 276 |
+
st.write(f"**Caption:** {item['caption']}")
|
| 277 |
+
st.write(f"**Timestamp:** {item['timestamp']}")
|
| 278 |
+
else:
|
| 279 |
+
st.info("No matching history found.")
|
| 280 |
+
else:
|
| 281 |
+
st.info("No caption history available.")
|
| 282 |
+
|
| 283 |
+
# Footer
|
| 284 |
+
st.markdown("---")
|
| 285 |
+
st.markdown("""
|
| 286 |
+
<div style='text-align: center'>
|
| 287 |
+
<p>Built with Streamlit, LangChain, OpenCV, and multi-model AI APIs</p>
|
| 288 |
+
<p>Supports OpenAI GPT-4o, Google Gemini, and GROQ Vision models</p>
|
| 289 |
+
<p><small>Make sure to add your API keys to the .env file</small></p>
|
| 290 |
+
</div>
|
| 291 |
+
""", unsafe_allow_html=True)
|
| 292 |
+
|
| 293 |
+
if __name__ == "__main__":
|
| 294 |
+
main()
|
|
@@ -1,3 +1,9 @@
|
|
| 1 |
-
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
streamlit
|
|
|
|
| 1 |
+
google-generativeai
|
| 2 |
+
groq
|
| 3 |
+
langchain
|
| 4 |
+
python-dotenv
|
| 5 |
+
openai
|
| 6 |
+
opencv-python
|
| 7 |
+
numpy
|
| 8 |
+
pillow
|
| 9 |
streamlit
|