joycaption-reliable / README.md
nickdigger's picture
Fix app_file reference to app.py
a53c16e verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: JoyCaption Reliable
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0

πŸ” JoyCaption Reliable

Ultra-optimized JoyCaption for ZeroGPU - No more stuck generations!

This is a streamlined version of JoyCaption designed specifically for reliable performance on Hugging Face's ZeroGPU infrastructure. It prioritizes consistency and speed over advanced features.

βœ… Key Optimizations

  • 45-second GPU limit - Prevents ZeroGPU timeouts
  • Aggressive memory cleanup - Immediate model deletion after each generation
  • Fast loading - Optimized with low_cpu_mem_usage=True
  • Progress tracking - Timestamps show exactly where processing is at
  • Emergency cleanup - Graceful error handling with memory clearing

🎯 Features

  • Multiple Styles: Engaging, Descriptive, SEO-Friendly, Creative
  • Length Control: Short (100 tokens), Medium (200 tokens), Long (300 tokens)
  • Fast Processing: Typically completes in 15-25 seconds
  • No Freezing: Designed to avoid the common ZeroGPU stuck generation issue

πŸš€ Performance

  • Loading: 5-10 seconds
  • Generation: 10-20 seconds
  • Total Time: 15-30 seconds
  • Memory Usage: Aggressively cleaned after each request

πŸ’‘ Why This Version is More Reliable

Unlike complex dual-model setups that can timeout or freeze, this version:

  1. Uses only the JoyCaption model (no secondary Venice model)
  2. Limits GPU duration to prevent ZeroGPU timeouts
  3. Performs immediate cleanup to prevent memory issues
  4. Has simplified prompts for faster processing
  5. Includes progress timestamps to track performance

πŸ”§ Technical Details

  • Model: fancyfeast/llama-joycaption-beta-one-hf-llava
  • Framework: Transformers + PyTorch
  • Optimization: torch.bfloat16, device_map="auto"
  • GPU Duration: 45 seconds maximum
  • Token Limits: 100-300 based on length setting

πŸ“Š Trade-offs

Gained:

  • βœ… Consistent, reliable performance
  • βœ… Fast loading and generation
  • βœ… No stuck generations or timeouts
  • βœ… Predictable timing

Sacrificed:

  • ❌ No secondary Venice model integration
  • ❌ No advanced keyword injection
  • ❌ No complex correction systems
  • ❌ Reduced maximum output length

This version is perfect if you want reliable, fast captions without the complexity and potential issues of multi-model systems.

🎨 Caption Styles

  • Engaging: Creative, captivating descriptions that avoid "A photo of"
  • Descriptive: Focused on people, poses, clothing, and setting details
  • SEO-Friendly: Optimized for search with engaging language
  • Creative: Witty captions with interesting, unique language

Perfect for content creators, social media managers, and anyone who needs consistent, quality image captions without waiting or worrying about system freezes!