HarshitX commited on
Commit
8a8f3ed
Β·
verified Β·
1 Parent(s): 53e5164

Upload 9 files

Browse files

# πŸ–ΌοΈ Multi-Model Image Caption Generator

A powerful Streamlit application that generates captions for images using multiple AI models (OpenAI GPT-4o, Google Gemini, and GROQ Vision) with advanced image processing capabilities using OpenCV and LangChain for history management.

## ✨ Features

- **Multi-Model Support**: Choose from OpenAI GPT-4o, Google Gemini, or GROQ Vision models
- **Smart Caption Generation**: Clean, professional captions (10-50 words, no emojis/symbols)
- **Advanced Image Processing**: Two caption overlay methods using OpenCV
- **LangChain Integration**: Comprehensive history management and conversation memory
- **Custom Typography**: Uses Poppins font with intelligent fallbacks
- **Interactive UI**: Modern Streamlit interface with real-time preview
- **Export Functionality**: Download processed images with captions

## πŸš€ Quick Start

### Prerequisites

- Python 3.8+
- API keys for at least one of the supported models

### Installation

1. **Clone the repository**
```bash
git clone <your-repo-url>
cd multi-model-caption-generator
```

2. **Install dependencies**
```bash
pip install streamlit opencv-python pillow openai google-generativeai groq langchain python-dotenv
```

3. **Set up environment variables**
Create a `.env` file in the project root:
```env
OPENAI_API_KEY_IC=your_openai_api_key_here
GEMINI_API_KEY_IC=your_gemini_api_key_here
GROQ_API_KEY_IC=your_groq_api_key_here
```

4. **Set up fonts (optional)**
Place your font file at:
```
fonts/Poppins-Regular.ttf
```

5. **Run the application**
```bash
streamlit run main.py
```

## πŸ“ Project Structure

```
multi-model-caption-generator/
β”œβ”€β”€ main.py # Main Streamlit application
β”œβ”€β”€ caption_generation.py # Multi-model caption generation
β”œβ”€β”€ caption_history.py # LangChain history management
β”œβ”€β”€ caption_overlay.py # OpenCV image processing
β”œβ”€β”€ fonts/ # Font directory
β”‚ └── Poppins-Regular.ttf # Custom font (optional)
β”œβ”€β”€ .env # Environment variables
β”œβ”€β”€ caption_history.json # Auto-generated history file
└── README.md # This file
```

## πŸ€– Supported AI Models

### OpenAI GPT-4o
- **Model**: `gpt-4o`
- **Strengths**: Detailed image analysis, high accuracy
- **API**: OpenAI Vision API

### Google Gemini
- **Model**: `gemini-1.5-flash`
- **Strengths**: Fast processing, multimodal understanding
- **API**: Google Generative AI

### GROQ Vision
- **Model**: `llama-3.2-11b-vision-preview`
- **Strengths**: High-speed inference, efficient processing
- **API**: GROQ API

## 🎨 Caption Overlay Options

### 1. Overlay on Image
- Position: Top, Center, or Bottom
- Customizable font size and thickness
- Auto text wrapping for long captions
- Semi-transparent background for readability

### 2. Background Behind Image
- Caption appears above the image
- Customizable background and text colors
- Adjustable margins
- Uses Poppins font with fallbacks

## πŸ“ Caption History Management

The application uses LangChain for sophisticated history management:

- **Persistent Storage**: All captions saved to `caption_history.json`
- **Memory Integration**: LangChain ConversationBufferMemory
- **Search & Filter**: Find previous captions by image name or content
- **Export History**: View and manage generation history

## πŸ”§ Configuration

### API Keys Setup

Get your API keys from:
- **OpenAI**: [https://platform.openai.com/api-keys](https://platform.openai.com/api-keys)
- **Google Gemini**: [https://makersuite.google.com/app/apikey](https://makersuite.google.com/app/apikey)
- **GROQ**: [https://console.groq.com/keys](https://console.groq.com/keys)

### Font Configuration

The app automatically uses fonts in this priority:
1. Custom font path (if specified in UI)
2. `fonts/Poppins-Regular.ttf` (if available)
3. System default font

### Caption Settings

- **Word Limit**: 10-50 words maximum
- **Format**: Plain text only (no emojis or special characters)
- **Style**: Descriptive but concise

## πŸ–₯️ Usage

1. **Configure APIs**: Add your API keys to `.env` file and click "Configure APIs"
2. **Upload Image**: Choose PNG, JPG, JPEG, BMP, or TIFF files
3. **Select Model**: Choose from OpenAI, Gemini, or GROQ
4. **Generate Caption**: Click to generate and see real-time preview
5. **Customize Overlay**: Adjust position, colors, and styling
6. **Download**: Save the final image with caption

## 🎯 Key Features Explained

### Smart Caption Generation
- All models generate clean, professional captions
- Consistent 10-50 word length
- No emojis or special characters
- Perfect for image overlays

### Advanced Image Processing
- OpenCV-powered text rendering
- Automatic text wrapping
- High-quality font rendering with PIL
- Multiple overlay styles

### History Management
- LangChain integration for conversation memory
- Searchable history with timestamps
- Model tracking for each generation
- Easy history clearing and management

## πŸ› οΈ Technical Details

### Dependencies
```
streamlit>=1.28.0
opencv-python>=4.8.0
pillow>=10.0.0
openai>=1.0.0
google-generativeai>=0.3.0
groq>=0.4.0
langchain>=0.1.0
python-dotenv>=1.0.0
numpy>=1.24.0
```

### Performance Optimizations
- Efficient base64 encoding for API calls
- Optimized image processing with OpenCV
- Smart memory management with LangChain
- Reduced token limits for faster generation

## πŸ” Troubleshooting

### Common Issues

**API Key Errors**
- Ensure all API keys are correctly set in `.env` file
- Check API key validity and quotas
- Restart the application after adding keys

**Font Loading Issues**
- Verify font file exists at `fonts/Poppins-Regular.ttf`
- Check file permissions
- App will fallback to default font if custom font fails

**Image Processing Errors**
- Ensure uploaded images are valid formats
- Check image file size (very large images may cause issues)
- Try different image formats if problems persist

**Model-Specific Issues**
- **OpenAI**: Verify you have access to GPT-4o vision model
- **Gemini**: Ensure Gemini API is enabled in your Google Cloud project
- **GROQ**: Check that vision models are available in your region

### Error Messages

| Error | Solution |
|-------|----------|
| "API key not configured" | Add the required API key to `.env` file |
| "Model not available" | Check model name and API access |
| "Image processing failed" | Try a different image format or size |
| "Font loading error" | Check font file path or use default font |

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch: `git checkout -b feature-name`
3. Make your changes and commit: `git commit -m 'Add feature'`
4. Push to the branch: `git push origin feature-name`
5. Submit a pull request

## πŸ“„ License

This project is licensed under the MIT License - see the [MIT LICENSE](https://mit-license.org/) file for details.

## πŸ™ Acknowledgments

- **Streamlit** for the amazing web app framework
- **OpenCV** for powerful image processing capabilities
- **LangChain** for conversation memory management
- **OpenAI, Google, and GROQ** for providing excellent vision APIs
- **Poppins Font** for beautiful typography

## πŸ“ž Support

If you encounter any issues or have questions:

1. Check the troubleshooting section above
2. Review the [Issues](https://github.com/your-repo/issues) page
3. Create a new issue with detailed information
4. Provide error messages and steps to reproduce

---

**Built with ❀️ using Streamlit, LangChain, OpenCV, and multi-model AI APIs**

Files changed (9) hide show
  1. .env +3 -0
  2. .gitignore +160 -0
  3. README.md +242 -17
  4. caption_generation.py +116 -0
  5. caption_history.json +110 -0
  6. caption_history.py +71 -0
  7. caption_overlay.py +154 -0
  8. main.py +294 -0
  9. requirements.txt +8 -2
.env ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ OPENAI_API_KEY_IC = "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"
2
+ GEMINI_API_KEY_IC ="AIzaSyC2zOnkcC5bK0zmWgszdjO8bhMf3sRHZsM"
3
+ GROQ_API_KEY_IC = "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"
.gitignore ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Byte-compiled / optimized / DLL files
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+
6
+ # C extensions
7
+ *.so
8
+
9
+ # Distribution / packaging
10
+ .Python
11
+ build/
12
+ develop-eggs/
13
+ dist/
14
+ downloads/
15
+ eggs/
16
+ .eggs/
17
+ lib/
18
+ lib64/
19
+ parts/
20
+ sdist/
21
+ var/
22
+ wheels/
23
+ share/python-wheels/
24
+ *.egg-info/
25
+ .installed.cfg
26
+ *.egg
27
+ MANIFEST
28
+
29
+ # PyInstaller
30
+ # Usually these files are written by a python script from a template
31
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
32
+ *.manifest
33
+ *.spec
34
+
35
+ # Installer logs
36
+ pip-log.txt
37
+ pip-delete-this-directory.txt
38
+
39
+ # Unit test / coverage reports
40
+ htmlcov/
41
+ .tox/
42
+ .nox/
43
+ .coverage
44
+ .coverage.*
45
+ .cache
46
+ nosetests.xml
47
+ coverage.xml
48
+ *.cover
49
+ *.py,cover
50
+ .hypothesis/
51
+ .pytest_cache/
52
+ cover/
53
+
54
+ # Translations
55
+ *.mo
56
+ *.pot
57
+
58
+ # Django stuff:
59
+ *.log
60
+ local_settings.py
61
+ db.sqlite3
62
+ db.sqlite3-journal
63
+
64
+ # Flask stuff:
65
+ instance/
66
+ .webassets-cache
67
+
68
+ # Scrapy stuff:
69
+ .scrapy
70
+
71
+ # Sphinx documentation
72
+ docs/_build/
73
+
74
+ # PyBuilder
75
+ .pybuilder/
76
+ target/
77
+
78
+ # Jupyter Notebook
79
+ .ipynb_checkpoints
80
+
81
+ # IPython
82
+ profile_default/
83
+ ipython_config.py
84
+
85
+ # pyenv
86
+ # For a library or package, you might want to ignore these files since the code is
87
+ # intended to run in multiple environments; otherwise, check them in:
88
+ # .python-version
89
+
90
+ # pipenv
91
+ # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92
+ # However, in case of collaboration, if having platform-specific dependencies or dependencies
93
+ # having no cross-platform support, pipenv may install dependencies that don't work, or not
94
+ # install all needed dependencies.
95
+ #Pipfile.lock
96
+
97
+ # poetry
98
+ # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
99
+ # This is especially recommended for binary packages to ensure reproducibility, and is more
100
+ # commonly ignored for libraries.
101
+ # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102
+ #poetry.lock
103
+
104
+ # pdm
105
+ # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106
+ #pdm.lock
107
+ # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108
+ # in version control.
109
+ # https://pdm.fming.dev/#use-with-ide
110
+ .pdm.toml
111
+
112
+ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
113
+ __pypackages__/
114
+
115
+ # Celery stuff
116
+ celerybeat-schedule
117
+ celerybeat.pid
118
+
119
+ # SageMath parsed files
120
+ *.sage.py
121
+
122
+ # Environments
123
+ .env
124
+ .venv
125
+ env/
126
+ venv/
127
+ ENV/
128
+ env.bak/
129
+ venv.bak/
130
+
131
+ # Spyder project settings
132
+ .spyderproject
133
+ .spyproject
134
+
135
+ # Rope project settings
136
+ .ropeproject
137
+
138
+ # mkdocs documentation
139
+ /site
140
+
141
+ # mypy
142
+ .mypy_cache/
143
+ .dmypy.json
144
+ dmypy.json
145
+
146
+ # Pyre type checker
147
+ .pyre/
148
+
149
+ # pytype static type analyzer
150
+ .pytype/
151
+
152
+ # Cython debug symbols
153
+ cython_debug/
154
+
155
+ # PyCharm
156
+ # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
157
+ # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
158
+ # and can be added to the global gitignore or merged into this file. For a more nuclear
159
+ # option (not recommended) you can uncomment the following to ignore the entire idea folder.
160
+ #.idea/
README.md CHANGED
@@ -1,20 +1,245 @@
1
- ---
2
- title: Multi LLM Image Captioning
3
- emoji: πŸš€
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: A comprehensive multi-model image caption generator.
12
- license: mit
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- # Welcome to Streamlit!
 
 
 
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
1
+ # πŸ–ΌοΈ Multi-Model Image Caption Generator
2
+
3
+ A powerful Streamlit application that generates captions for images using multiple AI models (OpenAI GPT-4o, Google Gemini, and GROQ Vision) with advanced image processing capabilities using OpenCV and LangChain for history management.
4
+
5
+ ## ✨ Features
6
+
7
+ - **Multi-Model Support**: Choose from OpenAI GPT-4o, Google Gemini, or GROQ Vision models
8
+ - **Smart Caption Generation**: Clean, professional captions (10-50 words, no emojis/symbols)
9
+ - **Advanced Image Processing**: Two caption overlay methods using OpenCV
10
+ - **LangChain Integration**: Comprehensive history management and conversation memory
11
+ - **Custom Typography**: Uses Poppins font with intelligent fallbacks
12
+ - **Interactive UI**: Modern Streamlit interface with real-time preview
13
+ - **Export Functionality**: Download processed images with captions
14
+
15
+ ## πŸš€ Quick Start
16
+
17
+ ### Prerequisites
18
+
19
+ - Python 3.8+
20
+ - API keys for at least one of the supported models
21
+
22
+ ### Installation
23
+
24
+ 1. **Clone the repository**
25
+ ```bash
26
+ git clone <your-repo-url>
27
+ cd multi-model-caption-generator
28
+ ```
29
+
30
+ 2. **Install dependencies**
31
+ ```bash
32
+ pip install streamlit opencv-python pillow openai google-generativeai groq langchain python-dotenv
33
+ ```
34
+
35
+ 3. **Set up environment variables**
36
+ Create a `.env` file in the project root:
37
+ ```env
38
+ OPENAI_API_KEY_IC=your_openai_api_key_here
39
+ GEMINI_API_KEY_IC=your_gemini_api_key_here
40
+ GROQ_API_KEY_IC=your_groq_api_key_here
41
+ ```
42
+
43
+ 4. **Set up fonts (optional)**
44
+ Place your font file at:
45
+ ```
46
+ fonts/Poppins-Regular.ttf
47
+ ```
48
+
49
+ 5. **Run the application**
50
+ ```bash
51
+ streamlit run main.py
52
+ ```
53
+
54
+ ## πŸ“ Project Structure
55
+
56
+ ```
57
+ multi-model-caption-generator/
58
+ β”œβ”€β”€ main.py # Main Streamlit application
59
+ β”œβ”€β”€ caption_generation.py # Multi-model caption generation
60
+ β”œβ”€β”€ caption_history.py # LangChain history management
61
+ β”œβ”€β”€ caption_overlay.py # OpenCV image processing
62
+ β”œβ”€β”€ fonts/ # Font directory
63
+ β”‚ └── Poppins-Regular.ttf # Custom font (optional)
64
+ β”œβ”€β”€ .env # Environment variables
65
+ β”œβ”€β”€ caption_history.json # Auto-generated history file
66
+ └── README.md # This file
67
+ ```
68
+
69
+ ## πŸ€– Supported AI Models
70
+
71
+ ### OpenAI GPT-4o
72
+ - **Model**: `gpt-4o`
73
+ - **Strengths**: Detailed image analysis, high accuracy
74
+ - **API**: OpenAI Vision API
75
+
76
+ ### Google Gemini
77
+ - **Model**: `gemini-1.5-flash`
78
+ - **Strengths**: Fast processing, multimodal understanding
79
+ - **API**: Google Generative AI
80
+
81
+ ### GROQ Vision
82
+ - **Model**: `llama-3.2-11b-vision-preview`
83
+ - **Strengths**: High-speed inference, efficient processing
84
+ - **API**: GROQ API
85
+
86
+ ## 🎨 Caption Overlay Options
87
+
88
+ ### 1. Overlay on Image
89
+ - Position: Top, Center, or Bottom
90
+ - Customizable font size and thickness
91
+ - Auto text wrapping for long captions
92
+ - Semi-transparent background for readability
93
+
94
+ ### 2. Background Behind Image
95
+ - Caption appears above the image
96
+ - Customizable background and text colors
97
+ - Adjustable margins
98
+ - Uses Poppins font with fallbacks
99
+
100
+ ## πŸ“ Caption History Management
101
+
102
+ The application uses LangChain for sophisticated history management:
103
+
104
+ - **Persistent Storage**: All captions saved to `caption_history.json`
105
+ - **Memory Integration**: LangChain ConversationBufferMemory
106
+ - **Search & Filter**: Find previous captions by image name or content
107
+ - **Export History**: View and manage generation history
108
+
109
+ ## πŸ”§ Configuration
110
+
111
+ ### API Keys Setup
112
+
113
+ Get your API keys from:
114
+ - **OpenAI**: [https://platform.openai.com/api-keys](https://platform.openai.com/api-keys)
115
+ - **Google Gemini**: [https://makersuite.google.com/app/apikey](https://makersuite.google.com/app/apikey)
116
+ - **GROQ**: [https://console.groq.com/keys](https://console.groq.com/keys)
117
+
118
+ ### Font Configuration
119
 
120
+ The app automatically uses fonts in this priority:
121
+ 1. Custom font path (if specified in UI)
122
+ 2. `fonts/Poppins-Regular.ttf` (if available)
123
+ 3. System default font
124
 
125
+ ### Caption Settings
126
+
127
+ - **Word Limit**: 10-50 words maximum
128
+ - **Format**: Plain text only (no emojis or special characters)
129
+ - **Style**: Descriptive but concise
130
+
131
+ ## πŸ–₯️ Usage
132
+
133
+ 1. **Configure APIs**: Add your API keys to `.env` file and click "Configure APIs"
134
+ 2. **Upload Image**: Choose PNG, JPG, JPEG, BMP, or TIFF files
135
+ 3. **Select Model**: Choose from OpenAI, Gemini, or GROQ
136
+ 4. **Generate Caption**: Click to generate and see real-time preview
137
+ 5. **Customize Overlay**: Adjust position, colors, and styling
138
+ 6. **Download**: Save the final image with caption
139
+
140
+ ## 🎯 Key Features Explained
141
+
142
+ ### Smart Caption Generation
143
+ - All models generate clean, professional captions
144
+ - Consistent 10-50 word length
145
+ - No emojis or special characters
146
+ - Perfect for image overlays
147
+
148
+ ### Advanced Image Processing
149
+ - OpenCV-powered text rendering
150
+ - Automatic text wrapping
151
+ - High-quality font rendering with PIL
152
+ - Multiple overlay styles
153
+
154
+ ### History Management
155
+ - LangChain integration for conversation memory
156
+ - Searchable history with timestamps
157
+ - Model tracking for each generation
158
+ - Easy history clearing and management
159
+
160
+ ## πŸ› οΈ Technical Details
161
+
162
+ ### Dependencies
163
+ ```
164
+ streamlit>=1.28.0
165
+ opencv-python>=4.8.0
166
+ pillow>=10.0.0
167
+ openai>=1.0.0
168
+ google-generativeai>=0.3.0
169
+ groq>=0.4.0
170
+ langchain>=0.1.0
171
+ python-dotenv>=1.0.0
172
+ numpy>=1.24.0
173
+ ```
174
+
175
+ ### Performance Optimizations
176
+ - Efficient base64 encoding for API calls
177
+ - Optimized image processing with OpenCV
178
+ - Smart memory management with LangChain
179
+ - Reduced token limits for faster generation
180
+
181
+ ## πŸ” Troubleshooting
182
+
183
+ ### Common Issues
184
+
185
+ **API Key Errors**
186
+ - Ensure all API keys are correctly set in `.env` file
187
+ - Check API key validity and quotas
188
+ - Restart the application after adding keys
189
+
190
+ **Font Loading Issues**
191
+ - Verify font file exists at `fonts/Poppins-Regular.ttf`
192
+ - Check file permissions
193
+ - App will fallback to default font if custom font fails
194
+
195
+ **Image Processing Errors**
196
+ - Ensure uploaded images are valid formats
197
+ - Check image file size (very large images may cause issues)
198
+ - Try different image formats if problems persist
199
+
200
+ **Model-Specific Issues**
201
+ - **OpenAI**: Verify you have access to GPT-4o vision model
202
+ - **Gemini**: Ensure Gemini API is enabled in your Google Cloud project
203
+ - **GROQ**: Check that vision models are available in your region
204
+
205
+ ### Error Messages
206
+
207
+ | Error | Solution |
208
+ |-------|----------|
209
+ | "API key not configured" | Add the required API key to `.env` file |
210
+ | "Model not available" | Check model name and API access |
211
+ | "Image processing failed" | Try a different image format or size |
212
+ | "Font loading error" | Check font file path or use default font |
213
+
214
+ ## 🀝 Contributing
215
+
216
+ 1. Fork the repository
217
+ 2. Create a feature branch: `git checkout -b feature-name`
218
+ 3. Make your changes and commit: `git commit -m 'Add feature'`
219
+ 4. Push to the branch: `git push origin feature-name`
220
+ 5. Submit a pull request
221
+
222
+ ## πŸ“„ License
223
+
224
+ This project is licensed under the MIT License - see the [MIT LICENSE](https://mit-license.org/) file for details.
225
+
226
+ ## πŸ™ Acknowledgments
227
+
228
+ - **Streamlit** for the amazing web app framework
229
+ - **OpenCV** for powerful image processing capabilities
230
+ - **LangChain** for conversation memory management
231
+ - **OpenAI, Google, and GROQ** for providing excellent vision APIs
232
+ - **Poppins Font** for beautiful typography
233
+
234
+ ## πŸ“ž Support
235
+
236
+ If you encounter any issues or have questions:
237
+
238
+ 1. Check the troubleshooting section above
239
+ 2. Review the [Issues](https://github.com/your-repo/issues) page
240
+ 3. Create a new issue with detailed information
241
+ 4. Provide error messages and steps to reproduce
242
+
243
+ ---
244
 
245
+ **Built with ❀️ using Streamlit, LangChain, OpenCV, and multi-model AI APIs**
 
caption_generation.py ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import base64
2
+ import io
3
+ import os
4
+ from PIL import Image
5
+
6
+ # API Imports
7
+ import openai
8
+ import google.generativeai as genai
9
+ from groq import Groq
10
+
11
+ from dotenv import load_dotenv
12
+
13
+ load_dotenv()
14
+
15
+ openai_key = os.getenv("OPENAI_API_KEY_IC")
16
+ gemini_key = os.getenv("GEMINI_API_KEY_IC")
17
+ groq_key = os.getenv("GROQ_API_KEY_IC")
18
+
19
+ class MultiModelCaptionGenerator:
20
+ """Handles caption generation using multiple models."""
21
+ def __init__(self):
22
+ self.openai_client = None
23
+ self.groq_client = None
24
+ self.gemini_configured = False
25
+
26
+ def configure_apis(self, openai_key: str|None = openai_key, groq_key: str|None = groq_key,
27
+ gemini_key: str|None = gemini_key):
28
+
29
+ if openai_key:
30
+ self.openai_client = openai.OpenAI(api_key=openai_key)
31
+
32
+ if groq_key:
33
+ self.groq_client = Groq(api_key=groq_key)
34
+
35
+ if gemini_key:
36
+ genai.configure(api_key=gemini_key)
37
+ self.gemini_configured = True
38
+
39
+ def encode_image_base64(self, image: Image.Image) -> str:
40
+ buffered = io.BytesIO()
41
+ image.save(buffered, format="PNG")
42
+ return base64.b64encode(buffered.getvalue()).decode()
43
+
44
+ def generate_caption_openai(self, image: Image.Image, model: str = "gpt-4o-mini") -> str:
45
+ """Fixed OpenAI caption generation with correct model and image_url format"""
46
+ if not self.openai_client:
47
+ raise ValueError("OpenAI API key not configured.")
48
+
49
+ base64_image = self.encode_image_base64(image)
50
+
51
+ response = self.openai_client.chat.completions.create(
52
+ model=model, # Use gpt-4o or gpt-4o-mini for vision
53
+ messages=[
54
+ {
55
+ "role": "user",
56
+ "content": [
57
+ {
58
+ "type": "text",
59
+ "text": "Generate the caption for this image. IMPORTANT: Use 10 words or 50 characters maximum. Use only plain text - no emojis, special character but can use ASCII punctuations if you want. Be descriptive but concise."
60
+ },
61
+ {
62
+ "type": "image_url",
63
+ "image_url": {
64
+ "url": f"data:image/png;base64,{base64_image}" # Fixed: removed space after comma
65
+ }
66
+ }
67
+ ]
68
+ }
69
+ ],
70
+ max_tokens=300
71
+ )
72
+ return response.choices[0].message.content
73
+
74
+ def generate_caption_gemini(self, image: Image.Image,
75
+ model: str = "gemini-2.5-flash") -> str: # Fixed: use correct model name
76
+ """Fixed Gemini caption generation with correct model name"""
77
+ if not self.gemini_configured:
78
+ raise ValueError("Gemini API key not configured!")
79
+
80
+ model_instance = genai.GenerativeModel(model)
81
+ prompt = "Generate the caption for this image. IMPORTANT: Use 10 words or 50 characters maximum. Use only plain text - no emojis, special character but can use ASCII punctuations if you want. Be descriptive but concise."
82
+
83
+ response = model_instance.generate_content([prompt, image])
84
+ return response.text
85
+
86
+ def generate_caption_groq(self, image: Image.Image,
87
+ model: str = "meta-llama/llama-4-scout-17b-16e-instruct") -> str:
88
+ """Fixed GROQ caption generation with correct model name and API structure"""
89
+ if not self.groq_client:
90
+ raise ValueError("GROQ API key is not configured!")
91
+
92
+ base64_image = self.encode_image_base64(image)
93
+
94
+ completion = self.groq_client.chat.completions.create(
95
+ model=model, # Fixed: added missing model parameter
96
+ messages=[
97
+ {
98
+ "role": "user",
99
+ "content": [
100
+ {
101
+ "type": "text",
102
+ "text": "Generate the caption for this image. IMPORTANT: Use 10 words or 50 characters maximum. Use only plain text - no emojis, special character but can use ASCII punctuations if you want. Be descriptive but concise."
103
+ },
104
+ {
105
+ "type": "image_url",
106
+ "image_url": {
107
+ "url": f"data:image/png;base64,{base64_image}" # Fixed: proper format with url key
108
+ }
109
+ }
110
+ ]
111
+ }
112
+ ],
113
+ max_tokens=300,
114
+ temperature=0.7
115
+ )
116
+ return completion.choices[0].message.content
caption_history.json ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "timestamp": "2025-06-28T11:06:08.409242",
4
+ "image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
5
+ "model": "Google Gemini",
6
+ "caption": "Here are a few options, choose the one that best fits your tone:\n\n**Option 1 (Focus on Spectacle):**\n\"Witness the vibrant energy of a grand procession! Thousands of devotees in traditional orange and white attire pull three towering, ornate chariots, as a shower of golden petals fills the sky. A powerful display of collective devotion.\"\n\n**Option 2 (More direct):**\n\"A sea of devotees in orange and white tirelessly pull magnificent, ornate chariots under a shower of golden petals. This grand spectacle of faith and collective effort is truly awe-inspiring.\"\n\n**Option 3 (Slightly shorter):**\n\"Thousands of devotees in vibrant orange and white attire pull three magnificent chariots, as a cascade of golden petals rains from the sky. A powerful scene of spiritual celebration and unity.\""
7
+ },
8
+ {
9
+ "timestamp": "2025-06-28T11:17:53.982671",
10
+ "image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
11
+ "model": "Google Gemini",
12
+ "caption": "A magnificent procession unfolds as a vast multitude, many in vibrant orange attire, collectively pulls enormous, ornate chariots. Golden and orange leaves rain down from the sky, creating a breathtaking and powerful spectacle of devotion and tradition."
13
+ },
14
+ {
15
+ "timestamp": "2025-06-28T11:31:29.073833",
16
+ "image_name": "ChatGPT Image Jun 24, 2025, 09_02_19 AM.png",
17
+ "model": "Google Gemini",
18
+ "caption": "Here's a detailed and engaging caption for the image:\n\nNostalgia activated! \u2728 Dive back into the pixel-perfect world of Super Mario with this iconic scene. From the classic question block and a gleaming coin to that instantly recognizable mushroom, it's all set against the vibrant blue sky and green hills we know and love. Pure retro gaming joy! What's your favorite Super Mario memory or level?\n\n#SuperMario #Nintendo #RetroGaming #GamingNostalgia #ClassicGames #MarioBros #PixelArt #ChildhoodMemories #MushroomKingdom #VideoGames"
19
+ },
20
+ {
21
+ "timestamp": "2025-06-28T11:55:38.783106",
22
+ "image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
23
+ "model": "Google Gemini",
24
+ "caption": "Here are a few options for a detailed and engaging caption, keeping it descriptive yet concise:\n\n**Option 1 (Focus on atmosphere & common knowledge):**\nA breathtaking display of devotion and collective energy, as a vast multitude of devotees in vibrant orange pull three magnificent, ornate chariots. A shower of colorful petals or confetti rains down from the sky, highlighting the grandeur and joyous spirit of this spiritual procession.\n\n**Option 2 (More specific, if Ratha Yatra is identified):**\nThe vibrant spectacle of a Ratha Yatra festival, with thousands of devotees pulling the colossal, richly decorated chariots. A cascade of orange and yellow petals fills the air, adding a festive and sacred touch to this powerful demonstration of faith.\n\n**Option 3 (Concise and evocative):**\nA powerful scene of celebration and spiritual fervor, where a sea of people in saffron hues pulls three majestic, golden chariots. The sky above showers down a joyous burst of colorful confetti, capturing the dynamic energy of this grand procession."
25
+ },
26
+ {
27
+ "timestamp": "2025-06-28T11:56:31.347041",
28
+ "image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
29
+ "model": "Google Gemini",
30
+ "caption": "Here are a few options for a detailed and engaging caption, playing with slightly different focuses:\n\n**Option 1 (Focus on spectacle & devotion):**\n\"A breathtaking display of devotion at the Ratha Yatra festival! Thousands of devotees unite to pull the majestic, ornate chariots, as a shower of auspicious petals blesses the vibrant procession. An incredible testament to faith and community.\"\n\n**Option 2 (Focus on energy & scale):**\n\"Experience the electrifying energy of the Ratha Yatra! A sea of devotees, clad in traditional orange and white, meticulously pull the colossal, brightly adorned chariots under a sky alive with falling blossoms. A truly grand spiritual spectacle.\"\n\n**Option 3 (More concise & evocative):**\n\"Majestic chariots, propelled by a vibrant sea of devotees, embark on their sacred journey during Ratha Yatra. The air shimmers with collective devotion and a shower of auspicious blessings from above.\""
31
+ },
32
+ {
33
+ "timestamp": "2025-06-28T12:37:12.999773",
34
+ "image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
35
+ "model": "Google Gemini",
36
+ "caption": "Here are a few options, choose the one that best fits your platform/tone:\n\n**Option 1 (Concise & Evocative):**\nA powerful scene of devotion unfolds as thousands pull the colossal, ornately decorated chariots of a grand procession. Golden and fiery petals rain down from the sky, enhancing the vibrant, spiritual atmosphere of this ancient tradition.\n\n**Option 2 (Slightly more descriptive):**\nWitness the immense energy of a traditional festival, where countless devotees in vibrant attire pull majestic, multi-tiered chariots. The sky above showers golden and orange petals, adding a sacred and celebratory feel to this grand display of faith.\n\n**Option 3 (Short & Sweet):**\nThousands of devotees unite to pull magnificent chariots in a grand procession, as a shower of colorful petals blesses the vibrant spiritual atmosphere."
37
+ },
38
+ {
39
+ "timestamp": "2025-06-28T12:37:35.022969",
40
+ "image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
41
+ "model": "Google Gemini",
42
+ "caption": "Here's a detailed yet concise and engaging caption for the image:\n\n\"A magnificent Ratha Yatra procession unfolds, as grand, ornate chariots of red and gold are pulled through a vast sea of devoted pilgrims dressed in vibrant orange. The sky above is showered with countless golden petals, adding a breathtaking, celebratory energy to this ancient and spiritual spectacle.\""
43
+ },
44
+ {
45
+ "timestamp": "2025-06-28T12:38:09.032392",
46
+ "image_name": "ChatGPT Image Jun 27, 2025, 09_59_51 AM.png",
47
+ "model": "Google Gemini",
48
+ "caption": "A vibrant spectacle of devotion unfolds as hundreds of people, primarily in saffron and white attire, collectively pull colossal, intricately designed chariots across a vast open ground. The sky above showers them with a beautiful cascade of golden and red leaves or petals, adding a festive and sacred ambiance to this powerful display of community and faith."
49
+ },
50
+ {
51
+ "timestamp": "2025-06-29T10:30:43.414914",
52
+ "image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
53
+ "model": "OpenAI GPT-4o",
54
+ "caption": "\"Unlocking the power of AI with a few lines of code! This snippet showcases an asynchronous chat function using `httpx` and Google's Generative AI. The `chat` function establishes a client connection, initiates the Gemini Pro model, and sends a simple greeting \u2014 'Hello!' \u2014 to demonstrate seamless interaction. It's a perfect blend of modern programming practices and innovative technology, paving the way for dynamic, automated conversations!\""
55
+ },
56
+ {
57
+ "timestamp": "2025-06-29T10:31:19.761048",
58
+ "image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
59
+ "model": "Google Gemini",
60
+ "caption": "Here are a few options for a detailed and engaging caption, keeping it descriptive yet concise:\n\n**Option 1 (Concise & Impactful):**\n\n> Say 'Hello!' to Google Gemini Pro with just a few lines of Python! This asynchronous snippet shows how incredibly simple it is to kickstart a conversational AI using `google.generativeai` and `httpx`. Ready to build your next intelligent app?\n>\n> \\#Python #GenerativeAI #GeminiPro #AIdevelopment\n\n**Option 2 (Slightly More Descriptive, Developer-Focused):**\n\n> Powering conversations with Python and Google Gemini Pro! \ud83d\ude80 This clean code snippet demonstrates how to asynchronously connect to the `gemini-pro` model, initiate a chat, and send your first message. Leverages `httpx` for efficient network calls, making AI integration smoother than ever. What brilliant bot will you create?\n>\n> \\#AI #Python #GeminiAPI #AsyncPython #MachineLearning\n\n**Option 3 (Engaging Tone):**\n\n> Witness the magic of Generative AI in action! \u2728 This elegant Python code reveals how effortlessly you can start a conversation with Google's powerful `gemini-pro` model. Using `async/await` for modern, non-blocking operations, it's never been easier to infuse your applications with intelligent chat capabilities. Send your first 'Hello!' to the future!\n>\n> \\#Code #AI #GoogleAI #PythonDev #Chatbot"
61
+ },
62
+ {
63
+ "timestamp": "2025-06-29T10:36:09.609008",
64
+ "image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
65
+ "model": "OpenAI GPT-4o",
66
+ "caption": "\ud83d\ude80 Dive into the exciting realm of asynchronous programming with this Python snippet! \ud83d\udcbb\u2728\n\nHere, we harness the power of the Google Generative AI model, 'gemini-pro', to create an interactive chat experience. With the `httpx` library, we're establishing a streamlined, asynchronous connection, allowing for efficient network calls. The code showcases the seamless integration of AI, where it awaits a response after sending a greeting message. \n\nWhether you're a budding coder or an experienced developer, this snippet is a gateway to explore dynamic chat applications and the fascinating possibilities of AI integration. Let's code the future! \ud83c\udf10\ud83e\udd16"
67
+ },
68
+ {
69
+ "timestamp": "2025-06-29T10:45:45.277825",
70
+ "image_name": "ChatGPT Image Jun 22, 2025, 12_23_42 AM.png",
71
+ "model": "OpenAI GPT-4o",
72
+ "caption": "\"Unraveling patterns in complex data: This illustration captures the essence of logistic regression applied to cancer datasets. As the curve artfully bridges two distinct groups represented by orange and blue scatter points, it highlights the critical role of data analysis in healthcare. The hospital icon signifies the ultimate goal: utilizing data-driven insights to improve patient outcomes and advance cancer research. A powerful reminder of how statistics can illuminate pathways to better health and innovation!\""
73
+ },
74
+ {
75
+ "timestamp": "2025-06-29T10:46:09.997416",
76
+ "image_name": "ChatGPT Image Jun 22, 2025, 12_23_42 AM.png",
77
+ "model": "Google Gemini",
78
+ "caption": "Here's a detailed and engaging caption, descriptive yet concise:\n\n---\n\n**Harnessing the power of Logistic Regression for vital cancer detection! \ud83d\udcca\ud83e\ude7a**\n\nThis visual beautifully illustrates how this machine learning algorithm classifies data, using the characteristic sigmoid curve to distinguish between different patient outcomes (e.g., healthy vs. diseased). A crucial tool in advancing precision medicine and improving healthcare, empowering medical professionals with data-driven insights.\n\n#LogisticRegression #MachineLearning #AIinHealthcare #CancerDetection #DataScience #HealthcareInnovation #MedicalAI #PredictiveAnalytics"
79
+ },
80
+ {
81
+ "timestamp": "2025-06-29T10:46:27.426996",
82
+ "image_name": "ChatGPT Image Jun 22, 2025, 12_23_42 AM.png",
83
+ "model": "GROQ Vision",
84
+ "caption": "The image presents a visual representation of logistic regression on a cancer dataset, featuring a graph and an illustration of a hospital.\n\n* **Title**\n * The title \"LOGISTIC REGRESSION ON CANCER DATASET\" is prominently displayed at the top of the image in large, dark blue text.\n* **Graph**\n * A line graph is situated to the left of the hospital illustration, showcasing a curved line that increases as it moves from left to right.\n * The graph features orange dots on the upper left side, representing one group of data points, and blue X's on the lower right side, representing another group.\n * The curved line begins on the lower left side of the graph, gradually rising to intersect with the orange dots and then leveling off as it approaches the upper right side.\n* **Hospital Illustration**\n * A light blue hospital building with a dark blue cross on its roof is depicted on the right side of the image.\n * The hospital features a central section with a door and windows, accompanied by two smaller sections on either side.\n* **Background**\n * The background of the image is a pale yellow color.\n\nIn summary, the image effectively illustrates the concept of logistic regression on a cancer dataset through a clear and concise visual representation. The graph and hospital illustration work together to convey the relationship between the data points and the predicted outcomes, making it easier for viewers to understand the concept."
85
+ },
86
+ {
87
+ "timestamp": "2025-06-29T11:03:21.461222",
88
+ "image_name": "Mistral_AI.png",
89
+ "model": "OpenAI GPT-4o",
90
+ "caption": "Mistral AI logo featuring modern pixel art design elements."
91
+ },
92
+ {
93
+ "timestamp": "2025-06-29T11:03:49.005152",
94
+ "image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
95
+ "model": "OpenAI GPT-4o",
96
+ "caption": "Code snippet for asynchronous chat with Google Generative AI."
97
+ },
98
+ {
99
+ "timestamp": "2025-06-29T11:04:36.328163",
100
+ "image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
101
+ "model": "Google Gemini",
102
+ "caption": "Python async code for Gemini Pro AI chat."
103
+ },
104
+ {
105
+ "timestamp": "2025-06-29T11:04:53.452858",
106
+ "image_name": "ChatGPT Image Jun 29, 2025, 01_58_05 AM.png",
107
+ "model": "GROQ Vision",
108
+ "caption": "Python code for an asynchronous chat function using Google's Gemini AI."
109
+ }
110
+ ]
caption_history.py ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import datetime
2
+ import json
3
+ import os
4
+ from typing import Dict, List, Optional
5
+ from langchain.schema import HumanMessage, AIMessage
6
+ from langchain.memory import ConversationBufferMemory
7
+
8
+ class CaptionHistory:
9
+ """
10
+ Manages caption generation history using Langchain
11
+ """
12
+ def __init__(self):
13
+ self.memory = ConversationBufferMemory(
14
+ return_messages=True,
15
+ memory_key="chat_history"
16
+ )
17
+ self.history_file = "caption_history.json"
18
+ self.load_history() # Load existing history on initialization
19
+
20
+ def add_interaction(self, image_name: str, model: str,
21
+ caption: str, timestamp: str|None = None):
22
+ if not timestamp:
23
+ timestamp = datetime.datetime.now().isoformat()
24
+
25
+ interaction = {
26
+ "timestamp": timestamp,
27
+ "image_name": image_name,
28
+ "model": model,
29
+ "caption": caption
30
+ }
31
+
32
+ # Add to langchain memory
33
+ human_msg = HumanMessage(
34
+ content=f"Generate caption for {image_name} using {model}"
35
+ )
36
+ ai_msg = AIMessage(content=caption)
37
+
38
+ self.memory.chat_memory.add_user_message(human_msg.content)
39
+ self.memory.chat_memory.add_ai_message(ai_msg.content)
40
+
41
+ # Save the file
42
+ self.save_interaction(interaction)
43
+
44
+ def get_history(self) -> List[Optional[Dict[str, str]]]:
45
+ try:
46
+ with open(self.history_file, mode="r") as f:
47
+ return json.load(f)
48
+ except FileNotFoundError:
49
+ return []
50
+
51
+ def save_interaction(self, interaction: Dict[str, str]) -> None:
52
+ history = self.get_history()
53
+ history.append(interaction)
54
+ with open(self.history_file, mode="w") as f:
55
+ json.dump(history, f, indent=2)
56
+
57
+ def load_history(self):
58
+ """Fixed: Proper string formatting in f-strings"""
59
+ history = self.get_history()
60
+ for item in history:
61
+ human_msg = HumanMessage(
62
+ content=f"Generate caption for {item['image_name']} using {item['model']}" # Fixed: proper quotes
63
+ )
64
+ ai_msg = AIMessage(content=item["caption"])
65
+ self.memory.chat_memory.add_user_message(human_msg.content)
66
+ self.memory.chat_memory.add_ai_message(ai_msg.content)
67
+
68
+ def clear_history(self):
69
+ self.memory.clear()
70
+ if os.path.exists(self.history_file):
71
+ os.remove(self.history_file)
caption_overlay.py ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import cv2
3
+ import numpy as np
4
+
5
+ from PIL import Image, ImageDraw, ImageFont
6
+
7
+ class ImageCaptionOverlay:
8
+ """Handles adding captions to images using OpenCV"""
9
+
10
+ @staticmethod
11
+ def add_caption_overlay(image: np.ndarray, caption: str, position: str = "bottom",
12
+ font_size: int = 1, thickness: int = 2) -> np.ndarray:
13
+ """Add caption as overlay on the image"""
14
+ img_copy = image.copy()
15
+ height, width = img_copy.shape[:2]
16
+
17
+ # Prepare text
18
+ font = cv2.FONT_HERSHEY_SIMPLEX
19
+
20
+ # Calculate text size and position
21
+ text_size = cv2.getTextSize(caption, font, font_size, thickness)[0]
22
+
23
+ # Wrap text if too long
24
+ max_width = width - 40
25
+ if text_size[0] > max_width:
26
+ words = caption.split()
27
+ lines = []
28
+ current_line = ""
29
+
30
+ for word in words:
31
+ test_line = current_line + " " + word if current_line else word
32
+ test_size = cv2.getTextSize(test_line, font, font_size, thickness)[0]
33
+
34
+ if test_size[0] <= max_width:
35
+ current_line = test_line
36
+ else:
37
+ if current_line:
38
+ lines.append(current_line)
39
+ current_line = word
40
+
41
+ if current_line:
42
+ lines.append(current_line)
43
+ else:
44
+ lines = [caption]
45
+
46
+ # Calculate positions
47
+ line_height = cv2.getTextSize("A", font, font_size, thickness)[0][1] + 10
48
+ total_height = len(lines) * line_height
49
+
50
+ if position == "bottom":
51
+ start_y = height - total_height - 20
52
+ elif position == "top":
53
+ start_y = 30
54
+ else: # center
55
+ start_y = (height - total_height) // 2
56
+
57
+ # Add background rectangle for better readability
58
+ for i, line in enumerate(lines):
59
+ text_size = cv2.getTextSize(line, font, font_size, thickness)[0]
60
+ text_x = (width - text_size[0]) // 2
61
+ text_y = start_y + (i * line_height) + text_size[1]
62
+
63
+ # Background rectangle
64
+ cv2.rectangle(img_copy,
65
+ (text_x - 10, text_y - text_size[1] - 5),
66
+ (text_x + text_size[0] + 10, text_y + 5),
67
+ (0, 0, 0), -1)
68
+
69
+ # Text
70
+ cv2.putText(img_copy, line, (text_x, text_y), font, font_size, (255, 255, 255), thickness)
71
+
72
+ return img_copy
73
+
74
+ @staticmethod
75
+ def add_caption_background(image: np.ndarray, caption: str,
76
+ font_path: str = None,
77
+ background_color: tuple = (0, 0, 0),
78
+ text_color: tuple = (255, 255, 255),
79
+ margin: int = 50) -> np.ndarray:
80
+ """Add caption on a background behind the image"""
81
+ height, width = image.shape[:2]
82
+
83
+ # Use PIL for better text rendering
84
+ pil_image = Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
85
+
86
+ # Try to use Poppins font first, then fallback to default
87
+ try:
88
+ # First priority: custom font path if provided
89
+ if font_path and os.path.exists(font_path):
90
+ font = ImageFont.truetype(font_path, 24)
91
+ # Second priority: check for Poppins font in fonts directory
92
+ elif os.path.exists("fonts/Poppins-Regular.ttf"):
93
+ font = ImageFont.truetype("fonts/Poppins-Regular.ttf", 24)
94
+ else:
95
+ # Fallback to default font
96
+ font = ImageFont.load_default()
97
+ except Exception:
98
+ # If anything fails, use default font
99
+ font = ImageFont.load_default()
100
+
101
+ # Calculate text dimensions
102
+ draw = ImageDraw.Draw(pil_image)
103
+ bbox = draw.textbbox((0, 0), caption, font=font)
104
+ text_width = bbox[2] - bbox[0]
105
+ text_height = bbox[3] - bbox[1]
106
+
107
+ # Wrap text if necessary
108
+ max_width = width - (2 * margin)
109
+ if text_width > max_width:
110
+ words = caption.split()
111
+ lines = []
112
+ current_line = ""
113
+
114
+ for word in words:
115
+ test_line = current_line + " " + word if current_line else word
116
+ test_bbox = draw.textbbox((0, 0), test_line, font=font)
117
+ test_width = test_bbox[2] - test_bbox[0]
118
+
119
+ if test_width <= max_width:
120
+ current_line = test_line
121
+ else:
122
+ if current_line:
123
+ lines.append(current_line)
124
+ current_line = word
125
+
126
+ if current_line:
127
+ lines.append(current_line)
128
+ else:
129
+ lines = [caption]
130
+
131
+ # Calculate total text height
132
+ total_text_height = len(lines) * text_height + (len(lines) - 1) * 10
133
+
134
+ # Create new image with space for text
135
+ new_height = height + total_text_height + (2 * margin)
136
+ new_image = Image.new('RGB', (width, new_height), background_color)
137
+
138
+ # Paste original image
139
+ new_image.paste(pil_image, (0, total_text_height + (2 * margin)))
140
+
141
+ # Add text
142
+ draw = ImageDraw.Draw(new_image)
143
+ y_offset = margin
144
+
145
+ for line in lines:
146
+ bbox = draw.textbbox((0, 0), line, font=font)
147
+ line_width = bbox[2] - bbox[0]
148
+ x_position = (width - line_width) // 2
149
+
150
+ draw.text((x_position, y_offset), line, fill=text_color, font=font)
151
+ y_offset += text_height + 10
152
+
153
+ # Convert back to OpenCV format
154
+ return cv2.cvtColor(np.array(new_image), cv2.COLOR_RGB2BGR)
main.py ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from caption_history import CaptionHistory
2
+ from caption_generation import MultiModelCaptionGenerator
3
+ from caption_overlay import ImageCaptionOverlay
4
+
5
+ import io
6
+ import os
7
+
8
+ import cv2
9
+ import numpy as np
10
+ from PIL import Image
11
+ import streamlit as st
12
+ from dotenv import load_dotenv
13
+
14
+ load_dotenv()
15
+
16
+ openai_key = os.getenv("OPENAI_API_KEY_IC")
17
+ gemini_key = os.getenv("GEMINI_API_KEY_IC")
18
+ groq_key = os.getenv("GROQ_API_KEY_IC")
19
+
20
+ def main():
21
+ st.set_page_config(
22
+ page_title="Multi-Model Image Caption Generator",
23
+ page_icon="πŸ–ΌοΈ",
24
+ layout="wide"
25
+ )
26
+
27
+ st.title("πŸ–ΌοΈ Multi-Model Image Caption Generator")
28
+ st.markdown("Generate captions using OpenAI GPT-4V, Google Gemini, and GROQ Vision models")
29
+
30
+ # Initialize session state
31
+ if 'caption_history' not in st.session_state:
32
+ st.session_state.caption_history = CaptionHistory()
33
+
34
+ if 'caption_generator' not in st.session_state:
35
+ st.session_state.caption_generator = MultiModelCaptionGenerator()
36
+
37
+ # Sidebar for API configuration
38
+ with st.sidebar:
39
+ st.header("πŸ”‘ API Configuration")
40
+
41
+ # Show API status
42
+ if openai_key:
43
+ st.success("βœ… OpenAI API Key loaded from .env")
44
+ else:
45
+ st.warning("⚠️ OpenAI API Key not found in .env")
46
+
47
+ if gemini_key:
48
+ st.success("βœ… Gemini API Key loaded from .env")
49
+ else:
50
+ st.warning("⚠️ Gemini API Key not found in .env")
51
+
52
+ if groq_key:
53
+ st.success("βœ… GROQ API Key loaded from .env")
54
+ else:
55
+ st.warning("⚠️ GROQ API Key not found in .env")
56
+
57
+ if st.button("Configure APIs"):
58
+ try:
59
+ st.session_state.caption_generator.configure_apis(
60
+ openai_key=openai_key,
61
+ gemini_key=gemini_key,
62
+ groq_key=groq_key
63
+ )
64
+ st.success("APIs configured successfully!")
65
+ except Exception as e:
66
+ st.error(f"Error configuring APIs: {str(e)}")
67
+
68
+ st.markdown("---")
69
+
70
+ # Caption overlay settings
71
+ st.header("🎨 Caption Settings")
72
+ caption_method = st.selectbox(
73
+ "Caption Method",
74
+ ["Overlay on Image", "Background Behind Image"]
75
+ )
76
+
77
+ if caption_method == "Overlay on Image":
78
+ position = st.selectbox("Position", ["bottom", "top", "center"])
79
+ font_size = st.slider("Font Size", 0.5, 3.0, 1.0, 0.1)
80
+ thickness = st.slider("Thickness", 1, 5, 2)
81
+ else:
82
+ bg_color = st.color_picker("Background Color", "#000000")
83
+ text_color = st.color_picker("Text Color", "#FFFFFF")
84
+ margin = st.slider("Margin", 20, 100, 50)
85
+
86
+ # Optional: Custom font path
87
+ custom_font = st.text_input(
88
+ "Custom Font Path (optional)",
89
+ placeholder="e.g., fonts/Poppins-Regular.ttf"
90
+ )
91
+
92
+ st.markdown("---")
93
+
94
+ # History management
95
+ st.header("πŸ“ Caption History")
96
+ if st.button("View History"):
97
+ st.session_state.show_history = True
98
+
99
+ if st.button("Hide History"):
100
+ st.session_state.show_history = False
101
+
102
+ if st.button("Clear History"):
103
+ st.session_state.caption_history.clear_history()
104
+ st.success("History cleared!")
105
+
106
+ # Main content area
107
+ col1, col2 = st.columns([1, 1])
108
+
109
+ with col1:
110
+ st.header("πŸ“€ Upload Image")
111
+ uploaded_file = st.file_uploader(
112
+ "Choose an image...",
113
+ type=['png', 'jpg', 'jpeg', 'bmp', 'tiff']
114
+ )
115
+
116
+ if uploaded_file is not None:
117
+ # Display original image
118
+ image = Image.open(uploaded_file)
119
+ st.image(image, caption="Original Image", use_container_width=True)
120
+
121
+ # Model selection
122
+ st.header("πŸ€– Select Model")
123
+ models = {
124
+ "OpenAI GPT-4o": "openai", # Updated model name
125
+ "Google Gemini": "gemini",
126
+ "GROQ Vision": "groq"
127
+ }
128
+
129
+ selected_model = st.selectbox("Choose a model", list(models.keys()))
130
+
131
+ # Show model-specific info
132
+ model_info = {
133
+ "OpenAI GPT-4o": "Uses GPT-4o vision model for detailed image analysis",
134
+ "Google Gemini": "Uses Gemini-1.5-flash for fast and accurate captions",
135
+ "GROQ Vision": "Uses Llama-3.2-11b-vision for high-speed processing"
136
+ }
137
+ st.info(model_info[selected_model])
138
+
139
+ if st.button("Generate Caption", type="primary"):
140
+ # Check if APIs are configured
141
+ if not any([openai_key, gemini_key, groq_key]):
142
+ st.error("Please add API keys to your .env file and click 'Configure APIs'")
143
+ return
144
+
145
+ try:
146
+ model_key = models[selected_model]
147
+
148
+ # Check specific API availability
149
+ if model_key == "openai" and not openai_key:
150
+ st.error("OpenAI API key not available. Please add it to your .env file.")
151
+ return
152
+ elif model_key == "gemini" and not gemini_key:
153
+ st.error("Gemini API key not available. Please add it to your .env file.")
154
+ return
155
+ elif model_key == "groq" and not groq_key:
156
+ st.error("GROQ API key not available. Please add it to your .env file.")
157
+ return
158
+
159
+ with st.spinner(f"Generating caption with {selected_model}..."):
160
+ if model_key == "openai":
161
+ caption = st.session_state.caption_generator.generate_caption_openai(image)
162
+ elif model_key == "gemini":
163
+ caption = st.session_state.caption_generator.generate_caption_gemini(image)
164
+ elif model_key == "groq":
165
+ caption = st.session_state.caption_generator.generate_caption_groq(image)
166
+
167
+ st.session_state.current_caption = caption
168
+ st.session_state.current_image = image
169
+ st.session_state.current_model = selected_model
170
+
171
+ # Add to history
172
+ st.session_state.caption_history.add_interaction(
173
+ uploaded_file.name,
174
+ selected_model,
175
+ caption
176
+ )
177
+
178
+ st.success(f"Caption generated successfully with {selected_model}!")
179
+
180
+ except Exception as e:
181
+ st.error(f"Error generating caption: {str(e)}")
182
+ st.error("Please check your API keys and internet connection.")
183
+
184
+ with col2:
185
+ st.header("✨ Generated Caption & Preview")
186
+
187
+ if hasattr(st.session_state, 'current_caption'):
188
+ # Editable caption
189
+ edited_caption = st.text_area(
190
+ "Generated Caption (editable)",
191
+ st.session_state.current_caption,
192
+ height=100,
193
+ help="You can edit the caption before applying it to the image"
194
+ )
195
+
196
+ # Update the caption if edited
197
+ if edited_caption != st.session_state.current_caption:
198
+ st.session_state.current_caption = edited_caption
199
+
200
+ # Generate preview with caption
201
+ if hasattr(st.session_state, 'current_image'):
202
+ # Convert PIL to OpenCV format
203
+ cv_image = cv2.cvtColor(np.array(st.session_state.current_image), cv2.COLOR_RGB2BGR)
204
+
205
+ try:
206
+ if caption_method == "Overlay on Image":
207
+ result_image = ImageCaptionOverlay.add_caption_overlay(
208
+ cv_image,
209
+ st.session_state.current_caption,
210
+ position=position,
211
+ font_size=font_size,
212
+ thickness=thickness
213
+ )
214
+ else:
215
+ # Convert hex colors to RGB tuples
216
+ bg_rgb = tuple(int(bg_color[i:i+2], 16) for i in (1, 3, 5))
217
+ text_rgb = tuple(int(text_color[i:i+2], 16) for i in (1, 3, 5))
218
+
219
+ # Use custom font if provided
220
+ font_path = custom_font if custom_font and os.path.exists(custom_font) else None
221
+
222
+ result_image = ImageCaptionOverlay.add_caption_background(
223
+ cv_image,
224
+ st.session_state.current_caption,
225
+ font_path=font_path,
226
+ background_color=bg_rgb,
227
+ text_color=text_rgb,
228
+ margin=margin
229
+ )
230
+
231
+ # Convert back to PIL for display
232
+ result_pil = Image.fromarray(cv2.cvtColor(result_image, cv2.COLOR_BGR2RGB))
233
+ st.image(result_pil, caption="Image with Caption", use_container_width=True)
234
+
235
+ # Download button
236
+ img_buffer = io.BytesIO()
237
+ result_pil.save(img_buffer, format='PNG')
238
+
239
+ st.download_button(
240
+ label="πŸ“₯ Download Image with Caption",
241
+ data=img_buffer.getvalue(),
242
+ file_name=f"captioned_{uploaded_file.name if uploaded_file else 'image'}.png",
243
+ mime="image/png"
244
+ )
245
+
246
+ except Exception as e:
247
+ st.error(f"Error processing image: {str(e)}")
248
+ else:
249
+ st.info("πŸ‘† Upload an image and generate a caption to see the preview here")
250
+
251
+ # History display
252
+ if getattr(st.session_state, 'show_history', False):
253
+ st.markdown("---")
254
+ st.header("πŸ“‹ Caption Generation History")
255
+
256
+ history = st.session_state.caption_history.get_history()
257
+
258
+ if history:
259
+ # Add search/filter functionality
260
+ search_term = st.text_input("πŸ” Search history", placeholder="Search by image name or caption...")
261
+
262
+ filtered_history = history
263
+ if search_term:
264
+ filtered_history = [
265
+ item for item in history
266
+ if search_term.lower() in item['image_name'].lower()
267
+ or search_term.lower() in item['caption'].lower()
268
+ or search_term.lower() in item['model'].lower()
269
+ ]
270
+
271
+ if filtered_history:
272
+ for i, item in enumerate(reversed(filtered_history[-20:])): # Show last 20 items
273
+ with st.expander(f"{item['timestamp'][:19]} - {item['image_name']} ({item['model']})"):
274
+ st.write(f"**Model:** {item['model']}")
275
+ st.write(f"**Image:** {item['image_name']}")
276
+ st.write(f"**Caption:** {item['caption']}")
277
+ st.write(f"**Timestamp:** {item['timestamp']}")
278
+ else:
279
+ st.info("No matching history found.")
280
+ else:
281
+ st.info("No caption history available.")
282
+
283
+ # Footer
284
+ st.markdown("---")
285
+ st.markdown("""
286
+ <div style='text-align: center'>
287
+ <p>Built with Streamlit, LangChain, OpenCV, and multi-model AI APIs</p>
288
+ <p>Supports OpenAI GPT-4o, Google Gemini, and GROQ Vision models</p>
289
+ <p><small>Make sure to add your API keys to the .env file</small></p>
290
+ </div>
291
+ """, unsafe_allow_html=True)
292
+
293
+ if __name__ == "__main__":
294
+ main()
requirements.txt CHANGED
@@ -1,3 +1,9 @@
1
- altair
2
- pandas
 
 
 
 
 
 
3
  streamlit
 
1
+ google-generativeai
2
+ groq
3
+ langchain
4
+ python-dotenv
5
+ openai
6
+ opencv-python
7
+ numpy
8
+ pillow
9
  streamlit