videoAI / README_LOCAL.md

sravya

Upload 33 files

54ed165 verified 5 months ago

preview code

raw

history blame contribute delete

6.2 kB

🎬 Local AI Video Generator

Generate AI videos completely locally on your computer using CogVideoX-2B model!

🌟 Features

✅ 100% Local - No API keys, no cloud services, runs on your computer
🚀 CogVideoX-2B - State-of-the-art text-to-video model by Tsinghua University
🎥 6-second videos - Generate 49 frames at 8 fps (720p quality)
💻 GPU or CPU - Works on both (GPU recommended for speed)
🎨 Simple UI - Clean web interface for easy video generation

📋 Requirements

Hardware Requirements

Minimum (CPU):

16GB RAM
10GB free disk space
Generation time: 5-10 minutes per video

Recommended (GPU):

NVIDIA GPU with 8GB+ VRAM (RTX 3060 or better)
16GB RAM
10GB free disk space
Generation time: 30-120 seconds per video

Software Requirements

Python 3.9 or higher
CUDA 11.8+ (for GPU acceleration)

🚀 Quick Start

1. Install Dependencies

# Install PyTorch with CUDA support (for GPU)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Or install PyTorch for CPU only
pip install torch torchvision torchaudio

# Install other requirements
pip install -r requirements_local.txt

2. Run the Backend

python backend_local.py

The server will start on http://localhost:5000

First Run Notes:

The model (~5GB) will be downloaded automatically
This happens only once
Subsequent runs will be much faster

3. Open the Web Interface

Open index_local.html in your browser:

# On macOS
open index_local.html

# On Linux
xdg-open index_local.html

# On Windows
start index_local.html

Or manually open: http://localhost:5000 and navigate to the HTML file

4. Initialize the Model

Click the "🚀 Initialize Model" button in the UI
Wait 2-5 minutes for the model to load
Once loaded, you can start generating videos!

5. Generate Videos

Enter a descriptive prompt (e.g., "A cat playing with a ball of yarn")
Click "🎬 Generate Video"
Wait 30-120 seconds (GPU) or 5-10 minutes (CPU)
Download or share your video!

📝 Example Prompts

"A golden retriever running through a field of flowers at sunset"
"Ocean waves crashing on a beach, aerial view"
"A bird flying through clouds, slow motion"
"City street with cars at night, neon lights"
"Flowers blooming in a garden, time-lapse"

🎯 Tips for Best Results

Be Descriptive - Include details about lighting, camera angle, movement
Keep it Simple - Focus on one main subject or action
Use Cinematic Terms - "aerial view", "close-up", "slow motion", etc.
GPU Recommended - Much faster generation (30-120s vs 5-10min)
First Generation - May take longer as model initializes

🔧 Troubleshooting

Model Not Loading

Issue: Model fails to download or load
Solution: Check internet connection, ensure 10GB free disk space

Out of Memory (GPU)

Issue: CUDA out of memory error
Solution: Close other GPU applications, or use CPU mode

Slow Generation (CPU)

Issue: Takes 5-10 minutes per video
Solution: This is normal for CPU. Consider using a GPU for faster generation

Server Won't Start

Issue: Port 5000 already in use
Solution: Change port in backend_local.py (line 33): FLASK_PORT = 5001

Video Quality Issues

Issue: Video looks blurry or low quality
Solution: This is expected for the 2B model. For better quality, upgrade to CogVideoX-5B (requires more VRAM)

📊 Performance Benchmarks

Hardware	Model Load Time	Generation Time	Quality
RTX 4090	1-2 min	30-45 sec	Excellent
RTX 3060	2-3 min	60-90 sec	Good
CPU (16GB)	3-5 min	5-10 min	Good

🔄 Model Information

Model: CogVideoX-2B
Developer: Tsinghua University (THUDM)
License: Apache 2.0
Size: ~5GB
Output: 49 frames, 720p, 8 fps (~6 seconds)

📁 File Structure

hailuo-clone/
├── backend_local.py          # Local backend server
├── index_local.html          # Web interface for local backend
├── requirements_local.txt    # Python dependencies
├── README_LOCAL.md          # This file
└── generated_videos/        # Output directory (auto-created)

🆚 Comparison with Cloud Backends

Feature	Local (backend_local.py)	Cloud (backend_enhanced.py)
Setup	Complex (install PyTorch, download model)	Simple (just API keys)
Cost	Free (one-time setup)	Pay per generation
Speed	30-120s (GPU) or 5-10min (CPU)	30-60s
Privacy	100% private	Data sent to cloud
Quality	Good (2B model)	Excellent (5B+ models)
Internet	Only for first download	Required for every generation

🛠️ Advanced Configuration

Change Model

Edit backend_local.py line 54-56 to use a different model:

# For better quality (requires 16GB+ VRAM)
pipeline = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    torch_dtype=torch.float16
)

Adjust Generation Parameters

Edit backend_local.py lines 126-132:

num_frames = 49          # More frames = longer video
guidance_scale = 6.0     # Higher = more prompt adherence
num_inference_steps = 50 # More steps = better quality (slower)

Pre-load Model on Startup

Uncomment lines 232-233 in backend_local.py:

logger.info("Pre-loading model...")
initialize_model()

📚 Resources

🤝 Support

If you encounter issues:

Check the console logs in the terminal
Check browser console (F12) for errors
Ensure all dependencies are installed correctly
Verify GPU drivers are up to date (for GPU mode)

📄 License

This project uses CogVideoX-2B which is licensed under Apache 2.0.

Happy Video Generation! 🎬✨