videoAI / README_LOCAL.md
sravya's picture
Upload 33 files
54ed165 verified

🎬 Local AI Video Generator

Generate AI videos completely locally on your computer using CogVideoX-2B model!

🌟 Features

  • βœ… 100% Local - No API keys, no cloud services, runs on your computer
  • πŸš€ CogVideoX-2B - State-of-the-art text-to-video model by Tsinghua University
  • πŸŽ₯ 6-second videos - Generate 49 frames at 8 fps (720p quality)
  • πŸ’» GPU or CPU - Works on both (GPU recommended for speed)
  • 🎨 Simple UI - Clean web interface for easy video generation

πŸ“‹ Requirements

Hardware Requirements

Minimum (CPU):

  • 16GB RAM
  • 10GB free disk space
  • Generation time: 5-10 minutes per video

Recommended (GPU):

  • NVIDIA GPU with 8GB+ VRAM (RTX 3060 or better)
  • 16GB RAM
  • 10GB free disk space
  • Generation time: 30-120 seconds per video

Software Requirements

  • Python 3.9 or higher
  • CUDA 11.8+ (for GPU acceleration)

πŸš€ Quick Start

1. Install Dependencies

# Install PyTorch with CUDA support (for GPU)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Or install PyTorch for CPU only
pip install torch torchvision torchaudio

# Install other requirements
pip install -r requirements_local.txt

2. Run the Backend

python backend_local.py

The server will start on http://localhost:5000

First Run Notes:

  • The model (~5GB) will be downloaded automatically
  • This happens only once
  • Subsequent runs will be much faster

3. Open the Web Interface

Open index_local.html in your browser:

# On macOS
open index_local.html

# On Linux
xdg-open index_local.html

# On Windows
start index_local.html

Or manually open: http://localhost:5000 and navigate to the HTML file

4. Initialize the Model

  1. Click the "πŸš€ Initialize Model" button in the UI
  2. Wait 2-5 minutes for the model to load
  3. Once loaded, you can start generating videos!

5. Generate Videos

  1. Enter a descriptive prompt (e.g., "A cat playing with a ball of yarn")
  2. Click "🎬 Generate Video"
  3. Wait 30-120 seconds (GPU) or 5-10 minutes (CPU)
  4. Download or share your video!

πŸ“ Example Prompts

  • "A golden retriever running through a field of flowers at sunset"
  • "Ocean waves crashing on a beach, aerial view"
  • "A bird flying through clouds, slow motion"
  • "City street with cars at night, neon lights"
  • "Flowers blooming in a garden, time-lapse"

🎯 Tips for Best Results

  1. Be Descriptive - Include details about lighting, camera angle, movement
  2. Keep it Simple - Focus on one main subject or action
  3. Use Cinematic Terms - "aerial view", "close-up", "slow motion", etc.
  4. GPU Recommended - Much faster generation (30-120s vs 5-10min)
  5. First Generation - May take longer as model initializes

πŸ”§ Troubleshooting

Model Not Loading

  • Issue: Model fails to download or load
  • Solution: Check internet connection, ensure 10GB free disk space

Out of Memory (GPU)

  • Issue: CUDA out of memory error
  • Solution: Close other GPU applications, or use CPU mode

Slow Generation (CPU)

  • Issue: Takes 5-10 minutes per video
  • Solution: This is normal for CPU. Consider using a GPU for faster generation

Server Won't Start

  • Issue: Port 5000 already in use
  • Solution: Change port in backend_local.py (line 33): FLASK_PORT = 5001

Video Quality Issues

  • Issue: Video looks blurry or low quality
  • Solution: This is expected for the 2B model. For better quality, upgrade to CogVideoX-5B (requires more VRAM)

πŸ“Š Performance Benchmarks

Hardware Model Load Time Generation Time Quality
RTX 4090 1-2 min 30-45 sec Excellent
RTX 3060 2-3 min 60-90 sec Good
CPU (16GB) 3-5 min 5-10 min Good

πŸ”„ Model Information

  • Model: CogVideoX-2B
  • Developer: Tsinghua University (THUDM)
  • License: Apache 2.0
  • Size: ~5GB
  • Output: 49 frames, 720p, 8 fps (~6 seconds)

πŸ“ File Structure

hailuo-clone/
β”œβ”€β”€ backend_local.py          # Local backend server
β”œβ”€β”€ index_local.html          # Web interface for local backend
β”œβ”€β”€ requirements_local.txt    # Python dependencies
β”œβ”€β”€ README_LOCAL.md          # This file
└── generated_videos/        # Output directory (auto-created)

πŸ†š Comparison with Cloud Backends

Feature Local (backend_local.py) Cloud (backend_enhanced.py)
Setup Complex (install PyTorch, download model) Simple (just API keys)
Cost Free (one-time setup) Pay per generation
Speed 30-120s (GPU) or 5-10min (CPU) 30-60s
Privacy 100% private Data sent to cloud
Quality Good (2B model) Excellent (5B+ models)
Internet Only for first download Required for every generation

πŸ› οΈ Advanced Configuration

Change Model

Edit backend_local.py line 54-56 to use a different model:

# For better quality (requires 16GB+ VRAM)
pipeline = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    torch_dtype=torch.float16
)

Adjust Generation Parameters

Edit backend_local.py lines 126-132:

num_frames = 49          # More frames = longer video
guidance_scale = 6.0     # Higher = more prompt adherence
num_inference_steps = 50 # More steps = better quality (slower)

Pre-load Model on Startup

Uncomment lines 232-233 in backend_local.py:

logger.info("Pre-loading model...")
initialize_model()

πŸ“š Resources

🀝 Support

If you encounter issues:

  1. Check the console logs in the terminal
  2. Check browser console (F12) for errors
  3. Ensure all dependencies are installed correctly
  4. Verify GPU drivers are up to date (for GPU mode)

πŸ“„ License

This project uses CogVideoX-2B which is licensed under Apache 2.0.


Happy Video Generation! 🎬✨