Spaces:

nickdigger
/

joycaption-reliable

Runtime error

App Files Files Community

joycaption-reliable / README.md

nickdigger

Fix app_file reference to app.py

a53c16e verified 3 months ago

preview code

raw

history blame contribute delete

3 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

metadata

title: JoyCaption Reliable
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0

🔍 JoyCaption Reliable

Ultra-optimized JoyCaption for ZeroGPU - No more stuck generations!

This is a streamlined version of JoyCaption designed specifically for reliable performance on Hugging Face's ZeroGPU infrastructure. It prioritizes consistency and speed over advanced features.

✅ Key Optimizations

45-second GPU limit - Prevents ZeroGPU timeouts
Aggressive memory cleanup - Immediate model deletion after each generation
Fast loading - Optimized with low_cpu_mem_usage=True
Progress tracking - Timestamps show exactly where processing is at
Emergency cleanup - Graceful error handling with memory clearing

🎯 Features

Multiple Styles: Engaging, Descriptive, SEO-Friendly, Creative
Length Control: Short (100 tokens), Medium (200 tokens), Long (300 tokens)
Fast Processing: Typically completes in 15-25 seconds
No Freezing: Designed to avoid the common ZeroGPU stuck generation issue

🚀 Performance

Loading: 5-10 seconds
Generation: 10-20 seconds
Total Time: 15-30 seconds
Memory Usage: Aggressively cleaned after each request

💡 Why This Version is More Reliable

Unlike complex dual-model setups that can timeout or freeze, this version:

Uses only the JoyCaption model (no secondary Venice model)
Limits GPU duration to prevent ZeroGPU timeouts
Performs immediate cleanup to prevent memory issues
Has simplified prompts for faster processing
Includes progress timestamps to track performance

🔧 Technical Details

Model: fancyfeast/llama-joycaption-beta-one-hf-llava
Framework: Transformers + PyTorch
Optimization: torch.bfloat16, device_map="auto"
GPU Duration: 45 seconds maximum
Token Limits: 100-300 based on length setting

📊 Trade-offs

Gained:

✅ Consistent, reliable performance
✅ Fast loading and generation
✅ No stuck generations or timeouts
✅ Predictable timing

Sacrificed:

❌ No secondary Venice model integration
❌ No advanced keyword injection
❌ No complex correction systems
❌ Reduced maximum output length

This version is perfect if you want reliable, fast captions without the complexity and potential issues of multi-model systems.

🎨 Caption Styles

Engaging: Creative, captivating descriptions that avoid "A photo of"
Descriptive: Focused on people, poses, clothing, and setting details
SEO-Friendly: Optimized for search with engaging language
Creative: Witty captions with interesting, unique language

Perfect for content creators, social media managers, and anyone who needs consistent, quality image captions without waiting or worrying about system freezes!