--- title: AI Image Caption Generator emoji: ๐Ÿค– colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.8.0 app_file: app.py pinned: false license: mit --- # ๐Ÿ–ผ๏ธ AI Image Caption Generator [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![PyTorch](https://img.shields.io/badge/PyTorch-2.1.0-EE4C2C.svg)](https://pytorch.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Hugging Face](https://img.shields.io/badge/๐Ÿค—%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/ChinmayM06/ai-image-caption-generator) > Generate AI-powered image captions with multiple style optionsโ€”completely free, no API costs. A lightweight, GPU-accelerated image captioning tool using state-of-the-art vision-language models (BLIP & GIT) with style customization powered by Groq's free LLM API. --- ## โœจ Features - ๐ŸŽฏ **Dual Model Support**: Both BLIP-base (fast) and GIT-large (high quality) run simultaneously - ๐ŸŽจ **5 Caption Styles**: None, Creative, Social Media, Professional, Technical - โšก **GPU Accelerated**: Optimized for NVIDIA GPUs (works on CPU too) - ๐Ÿ“Š **Analytics Tracking**: Built-in usage statistics and performance metrics - ๐Ÿ–ผ๏ธ **Image Processing**: Automatic validation, resizing, and format conversion - ๐Ÿ”„ **Fallback Mechanisms**: Graceful degradation when API is unavailable - ๐Ÿ’ฐ **100% Free**: No OpenAI credits, no hidden costs - ๐Ÿ”’ **Privacy First**: Local inference option available --- ## ๐Ÿš€ Live Demo Try it out without any installation: **[๐ŸŽฎ Launch Live Demo โ†’](https://huggingface.co/spaces/CXM06/ai-image-caption-generation)** *Will be a little slow as it is running on a CPU instead of GPU* --- ## ๐Ÿ› ๏ธ Tech Stack | Component | Technology | |-----------|-----------| | **Vision Models** | BLIP-base, GIT-large (Hugging Face) | | **Style LLM** | Groq API (free tier) | | **Framework** | PyTorch 2.1.0 + CUDA 11.8 | | **Interface** | Gradio 4.8.0 | | **Deployment** | Hugging Face Spaces (T4 GPU) | --- ## ๐Ÿ“ฆ Quick Start ### Prerequisites - Python 3.10+ - NVIDIA GPU with 4GB+ VRAM (recommended) or CPU - CUDA 11.8 (for GPU acceleration) ### Installation ```bash # Clone repository git clone https://github.com/ChinmayM06/ai-image-caption-generator.git cd ai-image-caption-generator # Create virtual environment python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # Install PyTorch with CUDA support pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu118 # Install dependencies pip install -r requirements.txt # Set up environment variables (optional) # Create a .env file in the project root with: # GROQ_API_KEY=your_groq_api_key_here # Get your free API key at https://console.groq.com # Note: The app works without API key but styling features will use fallback templates # Run the application python app.py ``` Access at `http://localhost:7860` --- ## ๐ŸŽฏ Usage ### Basic Usage ```python from src.models import get_model_manager, get_style_model from src.utils import get_image_processor from PIL import Image # Initialize components (singleton pattern) model_manager = get_model_manager() style_model = get_style_model() image_processor = get_image_processor() # Load models (BLIP and GIT) blip_success, git_success = model_manager.load_all_models() # Load and preprocess image image = Image.open("your_image.jpg") processed_img, metadata = image_processor.preprocess_image(image) # Generate captions from both models captions = model_manager.generate_captions(processed_img) blip_caption = captions["blip"] git_caption = captions["git"] # Apply style (optional) styled_blip = style_model.style_caption(blip_caption, style="Professional") styled_git = style_model.style_caption(git_caption, style="Creative") ``` ### Available Models Both models run simultaneously to provide comparison: - **BLIP-base**: Fast inference (~1-2s), good quality, efficient - **GIT-large**: Slower (~3-4s), superior caption quality, more detailed ### Caption Styles | Style | Use Case | Example | |-------|----------|---------| | **None** | Raw model output | "A dog sitting on grass" | | **Creative** | Artistic, imaginative | "A joyful golden retriever basking in nature's embrace" | | **Social Media** | Engaging, hashtag-ready | "Meet this good boy enjoying sunny vibes! ๐Ÿ•โ˜€๏ธ #DogLife" | | **Professional** | Business, formal | "Canine subject positioned in outdoor environment" | | **Technical** | Detailed, analytical | "Golden retriever breed, seated posture, natural lighting, outdoor setting" | --- ## ๐Ÿณ Docker Deployment ```bash # Build image docker build -t caption-generator . # Run container (with GPU) docker run --gpus all -p 7860:7860 caption-generator # Run container (CPU only) docker run -p 7860:7860 -e DEVICE=cpu caption-generator ``` --- ## โš™๏ธ Configuration ### Environment Variables Create a `.env` file in the project root (optional): ```bash # Groq API Key (required for advanced styling, fallback available) GROQ_API_KEY=your_groq_api_key_here # Hardware Configuration (optional, defaults to 'cuda' if available) DEVICE=cuda # or 'cpu' # Logging Level (optional) LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR ``` --- ## ๐ŸŽ“ Why This Project? Built as a learning project to explore: - **GenAI Fundamentals**: Vision-language models, prompt engineering - **Practical ML Skills**: GPU optimization, model deployment, API integration - **Cost Optimization**: Demonstrating production-quality AI without expensive APIs - **Software Architecture**: Caching, analytics, error handling, thread safety Perfect for understanding how modern image captioning works under the hood while keeping infrastructure costs at zero. --- ## ๐Ÿค Contributing Contributions welcome! Feel free to: - Report bugs - Suggest features - Submit pull requests - Improve documentation - Add new caption styles - Optimize performance --- ## ๐Ÿ“ License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. --- ## ๐Ÿ™ Acknowledgments - [Salesforce BLIP](https://github.com/salesforce/BLIP) - Image captioning model - [Microsoft GIT](https://github.com/microsoft/GenerativeImage2Text) - High-quality captions - [Groq](https://groq.com) - Free LLM inference API - [Hugging Face](https://huggingface.co) - Model hosting & deployment --- ## ๐Ÿ“ฌ Contact **Chinmay M** - [@ChinmayM06](https://github.com/ChinmayM06) Project Link: [https://github.com/ChinmayM06/ai-image-caption-generator](https://github.com/ChinmayM06/ai-image-caption-generator) ---
**[โญ Star this repo](https://github.com/ChinmayM06/ai-image-caption-generator)** if you find it helpful! Made with โค๏ธ and lots of โ˜•