joycaption-reliable / README.md
nickdigger's picture
Fix app_file reference to app.py
a53c16e verified
---
title: JoyCaption Reliable
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
---
# πŸ” JoyCaption Reliable
**Ultra-optimized JoyCaption for ZeroGPU - No more stuck generations!**
This is a streamlined version of JoyCaption designed specifically for reliable performance on Hugging Face's ZeroGPU infrastructure. It prioritizes **consistency and speed** over advanced features.
## βœ… Key Optimizations
- **45-second GPU limit** - Prevents ZeroGPU timeouts
- **Aggressive memory cleanup** - Immediate model deletion after each generation
- **Fast loading** - Optimized with `low_cpu_mem_usage=True`
- **Progress tracking** - Timestamps show exactly where processing is at
- **Emergency cleanup** - Graceful error handling with memory clearing
## 🎯 Features
- **Multiple Styles**: Engaging, Descriptive, SEO-Friendly, Creative
- **Length Control**: Short (100 tokens), Medium (200 tokens), Long (300 tokens)
- **Fast Processing**: Typically completes in 15-25 seconds
- **No Freezing**: Designed to avoid the common ZeroGPU stuck generation issue
## πŸš€ Performance
- **Loading**: 5-10 seconds
- **Generation**: 10-20 seconds
- **Total Time**: 15-30 seconds
- **Memory Usage**: Aggressively cleaned after each request
## πŸ’‘ Why This Version is More Reliable
Unlike complex dual-model setups that can timeout or freeze, this version:
1. Uses only the JoyCaption model (no secondary Venice model)
2. Limits GPU duration to prevent ZeroGPU timeouts
3. Performs immediate cleanup to prevent memory issues
4. Has simplified prompts for faster processing
5. Includes progress timestamps to track performance
## πŸ”§ Technical Details
- **Model**: `fancyfeast/llama-joycaption-beta-one-hf-llava`
- **Framework**: Transformers + PyTorch
- **Optimization**: `torch.bfloat16`, `device_map="auto"`
- **GPU Duration**: 45 seconds maximum
- **Token Limits**: 100-300 based on length setting
## πŸ“Š Trade-offs
**Gained**:
- βœ… Consistent, reliable performance
- βœ… Fast loading and generation
- βœ… No stuck generations or timeouts
- βœ… Predictable timing
**Sacrificed**:
- ❌ No secondary Venice model integration
- ❌ No advanced keyword injection
- ❌ No complex correction systems
- ❌ Reduced maximum output length
This version is perfect if you want **reliable, fast captions** without the complexity and potential issues of multi-model systems.
## 🎨 Caption Styles
- **Engaging**: Creative, captivating descriptions that avoid "A photo of"
- **Descriptive**: Focused on people, poses, clothing, and setting details
- **SEO-Friendly**: Optimized for search with engaging language
- **Creative**: Witty captions with interesting, unique language
Perfect for content creators, social media managers, and anyone who needs consistent, quality image captions without waiting or worrying about system freezes!