Spaces:
Runtime error
Runtime error
| title: JoyCaption Reliable | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| # π JoyCaption Reliable | |
| **Ultra-optimized JoyCaption for ZeroGPU - No more stuck generations!** | |
| This is a streamlined version of JoyCaption designed specifically for reliable performance on Hugging Face's ZeroGPU infrastructure. It prioritizes **consistency and speed** over advanced features. | |
| ## β Key Optimizations | |
| - **45-second GPU limit** - Prevents ZeroGPU timeouts | |
| - **Aggressive memory cleanup** - Immediate model deletion after each generation | |
| - **Fast loading** - Optimized with `low_cpu_mem_usage=True` | |
| - **Progress tracking** - Timestamps show exactly where processing is at | |
| - **Emergency cleanup** - Graceful error handling with memory clearing | |
| ## π― Features | |
| - **Multiple Styles**: Engaging, Descriptive, SEO-Friendly, Creative | |
| - **Length Control**: Short (100 tokens), Medium (200 tokens), Long (300 tokens) | |
| - **Fast Processing**: Typically completes in 15-25 seconds | |
| - **No Freezing**: Designed to avoid the common ZeroGPU stuck generation issue | |
| ## π Performance | |
| - **Loading**: 5-10 seconds | |
| - **Generation**: 10-20 seconds | |
| - **Total Time**: 15-30 seconds | |
| - **Memory Usage**: Aggressively cleaned after each request | |
| ## π‘ Why This Version is More Reliable | |
| Unlike complex dual-model setups that can timeout or freeze, this version: | |
| 1. Uses only the JoyCaption model (no secondary Venice model) | |
| 2. Limits GPU duration to prevent ZeroGPU timeouts | |
| 3. Performs immediate cleanup to prevent memory issues | |
| 4. Has simplified prompts for faster processing | |
| 5. Includes progress timestamps to track performance | |
| ## π§ Technical Details | |
| - **Model**: `fancyfeast/llama-joycaption-beta-one-hf-llava` | |
| - **Framework**: Transformers + PyTorch | |
| - **Optimization**: `torch.bfloat16`, `device_map="auto"` | |
| - **GPU Duration**: 45 seconds maximum | |
| - **Token Limits**: 100-300 based on length setting | |
| ## π Trade-offs | |
| **Gained**: | |
| - β Consistent, reliable performance | |
| - β Fast loading and generation | |
| - β No stuck generations or timeouts | |
| - β Predictable timing | |
| **Sacrificed**: | |
| - β No secondary Venice model integration | |
| - β No advanced keyword injection | |
| - β No complex correction systems | |
| - β Reduced maximum output length | |
| This version is perfect if you want **reliable, fast captions** without the complexity and potential issues of multi-model systems. | |
| ## π¨ Caption Styles | |
| - **Engaging**: Creative, captivating descriptions that avoid "A photo of" | |
| - **Descriptive**: Focused on people, poses, clothing, and setting details | |
| - **SEO-Friendly**: Optimized for search with engaging language | |
| - **Creative**: Witty captions with interesting, unique language | |
| Perfect for content creators, social media managers, and anyone who needs consistent, quality image captions without waiting or worrying about system freezes! |