Spaces:

nickdigger
/

joycaption-reliable

Runtime error

App Files Files Community

joycaption-reliable / README.md

nickdigger

Fix app_file reference to app.py

a53c16e verified 3 months ago

preview code

raw

history blame contribute delete

3 kB

	---
	title: JoyCaption Reliable
	emoji: 🔍
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	# 🔍 JoyCaption Reliable

	Ultra-optimized JoyCaption for ZeroGPU - No more stuck generations!

	This is a streamlined version of JoyCaption designed specifically for reliable performance on Hugging Face's ZeroGPU infrastructure. It prioritizes consistency and speed over advanced features.

	## ✅ Key Optimizations

	- 45-second GPU limit - Prevents ZeroGPU timeouts
	- Aggressive memory cleanup - Immediate model deletion after each generation
	- Fast loading - Optimized with `low_cpu_mem_usage=True`
	- Progress tracking - Timestamps show exactly where processing is at
	- Emergency cleanup - Graceful error handling with memory clearing

	## 🎯 Features

	- Multiple Styles: Engaging, Descriptive, SEO-Friendly, Creative
	- Length Control: Short (100 tokens), Medium (200 tokens), Long (300 tokens)
	- Fast Processing: Typically completes in 15-25 seconds
	- No Freezing: Designed to avoid the common ZeroGPU stuck generation issue

	## 🚀 Performance

	- Loading: 5-10 seconds
	- Generation: 10-20 seconds
	- Total Time: 15-30 seconds
	- Memory Usage: Aggressively cleaned after each request

	## 💡 Why This Version is More Reliable

	Unlike complex dual-model setups that can timeout or freeze, this version:

	1. Uses only the JoyCaption model (no secondary Venice model)
	2. Limits GPU duration to prevent ZeroGPU timeouts
	3. Performs immediate cleanup to prevent memory issues
	4. Has simplified prompts for faster processing
	5. Includes progress timestamps to track performance

	## 🔧 Technical Details

	- Model: `fancyfeast/llama-joycaption-beta-one-hf-llava`
	- Framework: Transformers + PyTorch
	- Optimization: `torch.bfloat16`, `device_map="auto"`
	- GPU Duration: 45 seconds maximum
	- Token Limits: 100-300 based on length setting

	## 📊 Trade-offs

	Gained:
	- ✅ Consistent, reliable performance
	- ✅ Fast loading and generation
	- ✅ No stuck generations or timeouts
	- ✅ Predictable timing

	Sacrificed:
	- ❌ No secondary Venice model integration
	- ❌ No advanced keyword injection
	- ❌ No complex correction systems
	- ❌ Reduced maximum output length

	This version is perfect if you want reliable, fast captions without the complexity and potential issues of multi-model systems.

	## 🎨 Caption Styles

	- Engaging: Creative, captivating descriptions that avoid "A photo of"
	- Descriptive: Focused on people, poses, clothing, and setting details
	- SEO-Friendly: Optimized for search with engaging language
	- Creative: Witty captions with interesting, unique language

	Perfect for content creators, social media managers, and anyone who needs consistent, quality image captions without waiting or worrying about system freezes!